Internet Archaeol 4. Wansleeben & Verhart. Multivariate statistics

4.3 Multivariate statistics

On the initiative of New Archaeology in particular, a large number of statistical techniques have been used to arrange archaeological data. By the nature of the problem, where various artefact types determine the character of the site, the most useful techniques appear to be cluster analysis, principal component, factor, correspondence analyses and other multivariate techniques. One of the preconditions for these 'advanced' techniques could in principle be met by expressing the artefact composition of each site in percentages. In that case standardized numerical data would be available, whose values would all be within a single range (0-100%).

Multivariate analyses provide the opportunity to display the variation in the data not only in one, but also in two, three or more dimensions. The data are combined mathematically in as optimal a way as possible in groups or components. By way of tree diagrams or principal component plots, an insight into the data can be displayed visually. Observations and/or variables that are highly similar are grouped closer together and large differences are translated into larger distances in the plots.

There are many ways of executing a cluster analysis (arranging observation by types), factor analysis (arranging variables by components) or correspondence analysis (arrangement of observations and variables). A thorough knowledge of applicability, preconditions and limitations of the various techniques is necessary to be able to make a balanced decision and obtain relevant archaeological results. The risk is that the data will be run through a number of different techniques to try and produce a result which may fit the data but be archaeologically irrelevant. The relative ease with which this can be done by modern software certainly contributes to this. On the one hand this could mean that an archaeologist has insufficient insight into the data to be able to judge whether the statistical results are archaeologically meaningful. For example, the correlation coefficient is the basis for other calculations in a principal components analysis. However, when the correlation coefficient is not a good numerical reflection of the relationship between two variables, this means that a computer will spew out a multivariate classification, which has no archaeological value whatsoever.

Fig. 37 The calculated correlation coefficient (R) is mathematically correct, but does not give a useful summary of the relationship between these two variables

less detail

On the other hand this testing of a large number of techniques means that basically we gain nothing. In the end we opt for those results which best fit the picture of the structure in our data. This picture had probably already formed in our heads even before the multivariate analysis, as a result of the collection of the data, our archaeological experience and simple counts and graphs.

"Analytical methods such as spatial autocorrelation and the mapping of principal components have not been able to provide understandable answers to the questions asked of them." (Savage 1990: 331).