Re-discovering Archaeological Discoveries. Experiments with reproducing archaeological survey analysisOpen Materials

Néhémie Strupler

Cite this as: Strupler, N. 2021 Re-discovering Archaeological Discoveries. Experiments with reproducing archaeological survey analysis, Internet Archaeology 56.


This article describes an attempt to reproduce the published analysis from three archaeological field-walking surveys by using datasets collected between 1990 and 2005 which are publicly available in digital format. The exact methodologies used to produce the analyses (diagrams, statistical analysis, maps, etc.) are often incomplete, leaving a gap between the dataset and the published report. By using the published descriptions to reconstruct how the outputs were manipulated, I expected to reproduce and corroborate the results. While these experiments highlight some successes, they also point to significant problems in reproducing an analysis at various stages, from reading the data to plotting the results. Consequently, this article proposes some guidance on how to increase the reproducibility of data in order to assist aspirations of refining results or methodology. Without a stronger emphasis on reproducibility, the published datasets may not be sufficient to confirm published results and the scientific process of self-correction is at risk.

Corresponding author: Néhémie StruplerORCID logo
McDonald Institute for Archaeological Research, Cambridge & Institut Français d'Études Anatoliennes, Istanbul.

Figure 1: An attempt to reproduce a figure from the Boeotia Survey book showing discrepancies in the ID numbers: Figure 1a: the reproduced map; Figure 1b: figure 1.4 from Bintliff et al. (2007). ID divergence can be seen at the top of the figure where transects 501 and 502 should be 177 and 178.

Figure 2: A second attempt to reproduce a figure from the Boeotia Survey book, which also reveals discrepancies in the ID numbers. Figure 2a: the reproduced map; Figure 2b: figure 1.6 from Bintliff et al. (2007).

Figure 3: Pottery density map generated by the author on the basis of the data. It shows random distribution that does not match the reported results which is showing clearly delimited sites. The legend (top right) presents the colour for each class as well as the associated interval, from low density in blue (between 0 and 214 sherds) to high density in red (between 8839 and 11288 sherds).

Figure 4: Map of the sherd density per unit (Sydney Cyprus Survey Project) on a Raster background derived from Copernicus data (see Strupler 2018).

Figure 5: Plot of linear regression of 'Ground Visibility' and 'Adjusted Visibility' showing an almost perfect linear relation (Sydney Cyprus Survey Project).

Figure 6: Plot of the published 'Ground Visibility' and 'Adjusted Visibility' data, as well as a reproduced 'Adjusted Visibility' ('bgc' stands for 'background confusion') (Sydney Cyprus Survey Project).

Figure 7: Plot of the adjusted pottery count (Sydney Cyprus Survey Project). Figure 7a: as published; Figure 7b: as reproduced.

Figure 8: Plot of the visibility percentage by units (Pyla-Koutsopetria Archaeological Project) as published in the book (Figure 8a) and as reproduced (Figure 8b). The two figures show a strong similarity, even if the shapes of the units are not identical (if some units are oriented to the North and have a regular shape, multiple units were adapted to the terrain and their individual shape can not be exactly emulated with the information provided).

Figure 9: Plot of the Late Bronze Age artefacts (Pyla-Koutsopetria Archaeological Project) as published in the book (Figure 9a) and as reproduced (Figure 9b). The two figures show that the units with points are the same but the number of (visible) points displayed (i.e. artefacts) differs considerably.

Table 1: Screenshot of the tabular data from the CD-ROM in Bintliff et al. (2007).

Table 2: Head of the tabular data of the units file (Sydney Cyprus Survey Project. Explanation of variable names is provided in the text files of the project archive.

Table 3: Table published in the monograph (Caraher et al. 2014, 203). It is divided into four main periods, each subdivided into three or four chronotypes.

Table 4: The reproduced analysis (Pyla-Koutsopetria Archaeological Project) showing the individual numbers as well as sums for each period, proving that the same subset of 205 sherds (7+6+19+173) is being used as in the monograph publication.

