5.6 Similarity and dissimilarity

In multivariate classification techniques first it is necessary to define a variable to express the degree of (dis)similarity between two sites. There is a large number of these so-called (dis)similarity coefficients. In the present situation, where we have chosen to convert raw numbers of artefacts per type into progressive classes, an ordinal correlation coefficient such as rs or tau seems at first sight called for. The modified data are not actual rankings and for instance have a great many equal values (ties), making these coefficients less applicable. The data are also not counts, so in the end we have chosen to consider them nominal data and select a correlation coefficient based on Chi2.

To illustrate this we have displayed for the sites 52B-168 and 52E-150 the progressive class values side by side in a cross table. For this table a Chi2 value of 4.375 is valid. Statistically this absolute value is not high enough and there are too many cells with a low expectation value to indicate a significantly different artefact composition. In principle an increasing Chi2 value should indicate increasing dissimilarity. However, the Chi2 value is unsuitable as a dissimilarity variable, as it is dependent on the absolute numbers in the cross table. Comparison of two large sites yields a higher Chi2 value more easily than comparison of two small sites. A workable variable is obtained by dividing the Chi2 value by the numbers of observations (n). This variable Phi has been defined as:

Phi = Ö (Chi2/n)

In this situation, with an r x 2 cross table, Phi equals the Cramer's V-value (Shennan 1988). The minimum value is 0 when both sites are similar and the maximum value of 1 is reached when both sites are completely dissimilar. The absence of an artefact type on both sites does not affect the end result. In this example Phi has a value of 0.468, indicating a reasonable degree of similarity. In spite of the fact that there are far less finds on site 52B-168, the pattern of the artefact types that are present is more or less similar to the pattern of site 52E-150. Relatively abundant types on the 'rich' site 52E-150, such as production debris and scrapers, are precisely the types that occur on the 'poor' site 52B-168. The types that occur only in limited amounts on 52E-150, are those absent from 52B-168.

artefact type52B-168 52E-150Total
 observedexpected observedexpected 
triangular invasive retouched arrowhead0 .2 1 .81
teardrop arrowhead00 0
leaf-like arrowhead00 0
pointed blade00 0
macrolithic artefact1 .4 1 1.62
scraper1 .6 2 2.43
borer0.21 .8 1
notch0 .2 1 .8 1
retouched blade/flake0 .4 21.62
axe00 0
pottery00 0
grinding stone0 .4 2 1.62
hammer stone00 0
core0 .42 1.62
other artefacts/debris21.2 4 4.86
Total4 1620
Table 12 Cross table of two Michelsberg sites in order to calculate the dissimilarity variable Phi


