There have been several excellent accounts of the modelling process, concentrating on computer simulation models (e.g. Hamond 1978; Aldenderfer 1981; Freeman 1988; Lake 2001). The following summary draws on these, but attempts to broaden them to cover the field of mathematical models in general.

*Step 1:* define the problem, and if appropriate the
hypothesis.

What is the archaeological question that we are asking? What data are
available to us that might be relevant to the question? Do we need
more? If so, how do we acquire them?

**Step 2:****construct a conceptual
model.**

In this stage we simplify the problem to make it amenable to a
mathematical approach. This involves deciding which variables are, and
are not, relevant to the problem in hand, how they behave (e.g. what
statistical distributions may be used to describe them) and what their
limits are. The relationships between them (correlations or
associations) also must be built into the model. There is always a
trade-off between usefulness and verisimilitude: 'In general the most
useful models are the simplest that include all relevant aspects of
the real world system' (Lake 2001,
725).

*Step 3:* choose the appropriate type of model in which
to implement the conceptual model.

Is it to be a deterministic model, a stochastic model or a simulation
model, for example? Here Freeman's fundamental question '0. Must I
simulate?' (Freeman 1988, 140) is of
prime importance. Simulation is a major theoretical and practical
undertaking, and should be seen as a last resort rather than the
option of choice. If an algebraic or statistical solution can be
found, it is usually preferable.

*Step 4:* implementation.

This step may be anything from writing down a simple equation, to
several months' work constructing a computer program. Clearly, the
simplest route must be sought. Even when a direct mathematical
solution is possible, it may be useful to implement it in a general
computer package such as *Excel*, so that the effects of
varying the values of the independent variables can be studied
rapidly. Indeed, it has been suggested that *Excel* is
preferable to some more specialised simulation packages for
implementing simulation models (Sermier, *pers comm*). Freeman
(1988, 142) has issued a warning
about the use of random number generators provided with some
commercial packages.

**Step 5:****validation.**

Different types of model require different types of validation (Lake 2001, 725). In many cases, what will
matter is whether the outcome of the model matches sufficiently
closely the relevant data. There is a potential pit-fall here, in that
if the details of the model are based on data, a good fit to the data
is only to be expected. It may be necessary to use only some of the
data initially, while retaining some to test the model (a procedure
known as split-sample validation). This is especially important if
parameters (e.g. regression coefficients) are to be estimated from
data (e.g. in a predictive model), since it raises the possibility of
over-fitting, that is, of obtaining estimates of the parameters which
fit a particular dataset better than the 'true' values of the
parameters would. This in turn can give rise to the observed
phenomenon that a model never fits data quite as well as it does the
first time around.

**Step 6:****interpretation.**

At the end of the cycle we have to return to the archaeology of the
problem. What does it mean that a particular model fits, or does not
fit, a particular dataset? In a sense, 'failure to fit' is the more
creative outcome, because it is open-ended and points towards
refinement or even complete re-casting of the model, but 'fit', while
undoubtedly satisfying, is a 'closed' conclusion. The statisticians'
saying 'always examine the residuals' is very relevant here — it is
the difference between the data and the model that may really shed
light on what is happening. One might even say that the fitted model
represents a 'processual' conclusion to an archaeological problem,
while failure to fit, particularly to a model which has 'succeeded' in
other circumstances, may represent the more individualist outcome
favoured by a post-processual approach, and we might distinguish between
the model (the 'processual' part of the data) and the residuals (the
'extra-processual' part of the data).

© Internet Archaeology
URL: http://intarch.ac.uk/journal/issue15/6/co5.html

Last updated: Wed 28 Jan 2004