Internet Archaeol 1. Beardah and Baxter. 2 Univariate KDEs

2 Univariate KDEs - a non-technical introduction

2.1 The basic univariate KDE

The idea underlying a univariate KDE is very simply illustrated. Figure 1 is based on a set of measurements for the diameters of sixty Bronze Age cups from Italy, given in Baxter (1994) and based on a more extensive data set from Lukesh and Howe (1978). About each data point, located on the horizontal axis, a symmetrical `bump' may be placed. One such bump is shown in the figure at about 16cm. At each point on the horizontal axis (not just the data points) the heights of the bumps are summed to get the one-dimensional KDE. In the present example there are two clear peaks, at about 10 and 20cm, and a somewhat smaller one at 30cm. As far as rim diameter goes there are two main size classes, with a somewhat smaller number of large cups.

Figure 1: One dimensional KDE construction

For this particular data set a histogram, with sensibly chosen interval widths, would show as much in an unambiguous fashion, though we note that the appearance of the KDE has greater aesthetic appeal. The example does raise a number of issues that need to be dealt with in this and other sections. The "bumps" or kernels used are usually defined mathematically as symmetric probability density functions (pdfs). The final shape of the KDE is, however, surprisingly insensitive to the particular choice of density function and in this paper - unless otherwise specified - the normal pdf has been used. The spread of the bump, determined by its window- or band-width, has a much more critical effect on the appearance of the KDE. If the spread is too large then the final KDE will be too smooth and important detail may be lost; if the spread is too small then there may be too much spurious detail in the final result. The choice of window-width is important and this will be discussed and illustrated.

2.2 Variations on the basic KDE

The small number of large diameters to the right of the figure might be regarded as outliers. Such points can have an undue influence on the KDE causing it to be biased. One way of dealing with this is to omit the offending points. A second approach is to use an adaptive kernel density estimate, analogous to the use of intervals of varying widths in a histogram, which down-weights the influence of extreme points.

Another circumstance in which it is useful to vary the form of kernel arises with bounded data. Such data are common with artefact compositional data where, for example, the measurement of the percentage presence of an oxide is bounded below by zero. Use of the basic KDE will result in an estimate that includes negative values. To avoid this problem a boundary KDE may be used in which non-symmetric kernels are associated with points near a boundary.

These possibilities, and the use of KDEs for comparative purposes are illustrated in the section of examples of univariate KDEs.

PREVIOUS NEXT CONTENTS HOME