4. Data Publication and Archiving

The issue of the availability and access to data on plant macrofossils is key to the continued significance of archaeobotany, and is a problem raised across science (Costello et al. 2013). Plant macrofossil data consist of a list of different taxonomic groupings, plant items and levels of identification, with either full or semi-quantification of each category, typically in terms of absolute number, minimum number of individuals, a semi-quantitative scale or weight (Popper 1988). The usefulness of an archaeobotanical dataset for reuse is dependent on the region, period and site type in relation to the previously available archaeobotanical resource.

For instance, in Roman Britain, where the range of crops grown has been established, a baseline has been suggested for 30 samples per site phase (Van der Veen et al. 2007). In contrast, in regions where little archaeobotanical data is available, any dataset is a vital contribution. Plant macrofossil data is typically displayed in lengthy tables, often confined to the appendices of archaeological reports, the CD-ROM, the microfiche, or the electronic supplementary table of journal articles. A key issue with archaeobotanical data tables is that they are largely unintelligible to the majority of non-archaeobotanists. In comparison, a basic zooarchaeological data table has far fewer categories, of which the names are intelligible (sheep/goat, cattle, pig, dog, cat, deer etc.). Barker (2001) raised this issue in the original Meaning and Purpose… TAG session, arguing that if only 20 people cared about the data then these tables would not need to feature in the final publication. The use of terminology to refer to the reliability of an identification such as cf. (confer or compares to), and broad categories to indicate when item could fall within several groups (e.g. cereal/Poaceae), provide vital information about the quality of preservation and sample taphonomy. This is important because it is necessary to understand the caveats of specialist data before utilising it in new analysis. However, given the wide range of options available for data archiving (see below), a system of in-text summary tables with measures of frequency or sum per site phase or area, accompanied by an appendix or online full data tables would ensure that the main patterns in the plant macrofossils reported are intelligible to other archaeologists. An example would be a recent Çatalhöyük site report, where an in-text table contains summary statistics for presence, ubiquity, sum and max. per sample for the major plant categories (Bogaard et al. 2013, table 7.2, 97), while the full sample data is presented on a CD-ROM (table 7.12).

Of course, the publication of summary data tables only works if the full archaeobotanical data are also made available. Complete sample level data are vital for evaluating the conclusions of studies and reusing the data for meta-analysis. Some of the most successful samples of meta-analysis in archaeobotany indicate the importance of sample level data availability, such as the identification of Neolithic farming practice (Bogaard 2004) and pre-domestication cultivation (Colledge 2002). However, access to raw data is currently a problem in archaeobotany, both in terms of grey literature deriving from developer-funded excavation and academic research (Lodwick forthcoming). The former issue stems from a range of report formats that are deposited in HERs and are difficult to access remotely (Van der Veen et al. 2007).

The move towards online archiving of such reports through OASIS and the ADS means that such reports are now much more easily available (Evans 2015), while large research projects, such as the Rural Settlement of Roman Britain project, have made older reports available online (Allen et al. 2018). The availability of so-called grey literature reports online as PDFs is great progress, although the same problems exist of having to transcribe data (Evans 2015; Evans and Moore 2014). Publication of full environmental archaeology is recommended in the current Historic England guidelines (Campbell et al. 2011, 8), but a further problem arises when developer-funded reports are published in county journals, which are increasingly resistant to publishing specialist data tables. An example would be a recent article presenting a significant 1st-century AD assemblage of malting waste in Kent. The final article was produced without consulting the archaeobotanist and contained absolutely no tabulated data (Helm and Carruthers 2011).

Data availability is also a major problem in academic research. One aspect is the publication of research articles in non-open access journals, which have a pay-wall for anyone without university affiliation. This problem is particularly pertinent when research articles synthesising data derived from developer-funded excavations cannot be accessed by those who produced the data (Evans 2015). Two open-access journals publish archaeobotanical articles; Acta Paleobotanica, and the Journal of Ethnobiology Letters. The problem is compounded when journal articles, or monographs, do not contain the full data tables. The use of electronic supplementary material enables archaeobotanical data to be published as an .xls or .csv file, and has been advocated by some journals, such as the Journal of Archaeological Science (Torrence et al. 2015). However, these contain further problems: an absence of peer-review, not being curated in the same way as the articles themselves, and link decay (Warinner and d'Alpoim Guedes 2014, 155; Whitlock 2011; Costello et al. 2013). The publication of data tables as PDFs on websites related to a traditional monograph publication is one option, such as the Danebury Environs Roman Programme. However, this dataset has no DOI, and again it is much more time-consuming to extract data from a PDF table.

The current consensus within archaeobotany has been one of data sharing, and the collation of individual databases. Archaeobotanists have long been concerned with how to synthesise the constant stream of new archaeobotanical data. Reid's (1899) Origin of the British Flora presented the first overview of palaeo- and archaeobotanical records in Britain, the database of which became the foundation for Godwin's (1975) The History of the British Flora. One of the aims of the foundation of the IWGP was the compilation of archaeobotanical data (Van Zeist et al. 1991). An article listing new archaeobotanical data by taxa was published regularly in Vegetation History and Archaeobotany by Kroll (e.g. 1997). The Archaeobotanical Computer Database was published in 1996, in the first issue of Internet Archaeology (Tomlinson and Hall 1996), containing much of the available data on plant macrofossils from Britain at the time, largely on an individual sample basis, with fields also listing sample quality, specialist, site name, period and reliability. The database was compiled between 1989 and 1994, in advance of comparable databases.

The large compilation of archaeobotanical data has made substantial contributions to successful synthetic projects in Britain (Van der Veen et al. 2008; 2013). However, the database could only be centrally updated, with the online resource remaining a static version, lacking much of the new data produced after PPG16. Further centrally updated archaeobotanical databases include that of Riehl (2009), covering data from 533 eastern Mediterranean and Near Eastern sites and Kroll's (2005) online version of Vegetation History and Archaeobotany articles. These databases have all given the role of data archiving to data consumers rather than data producers, which is an unsustainable situation.

The main advance in the last two decades has been the development of the ArboDat database in Europe. ArboDat uses standardised taxa names, terms for preservation, feature types and periods, allowing easy integration of datasets produced by different archaeobotanists. The database was developed by Angela Kreuz at the Kommission für Archäologische Landesforschung in Hessen (Kreuz and Schäfer 2002), and is now used by researchers in several countries including Germany, the Czech Republic, France, and recently in England.

Data sharing enabled by ArboDat has facilitated research on Neolithic agriculture in Austria, Germany and Bulgaria (Kreuz et al. 2005), and Bronze Age agriculture in Europe (Stika and Heiss 2012). The use of ArboDat is clearly progress, particularly with regard to integrating datasets on contemporary past societies that fall into different modern nation states. However, the database has not advanced accessibility to data, but relies on data sharing between specialists. While the majority of archaeobotanists are very willing to share their data, the reliance on such informal links means that data can be lost over time, the process is inefficient and time-consuming, and data preservation issues are not dealt with (Kansa and Kansa 2013).

Much more preferable is the archiving of plant macrofossils datasets with the Archaeology Data Service (ADS), such as from excavations by MOLA at 1 Poultry, London (Museum of London Archaeology 2013), Vaihingen by Bogaard (2011), and the synthetic dataset produced by the EUROEVOL project at UCL (Colledge 2016). One scenario is the online open-access publication of a synthesis of the excavations at Heybridge (Atkinson and Preston 2015b), accompanied by a digital archive of plant macrofossils datasets (although as PDFs) on the ADS (Essex County Council 2016), and a print monograph (Atkinson and Preston 2015a). Beyond the ADS, a wide range of archives is now available for archiving archaeobotanical data, from institutional to subject-specific archives. The discipline, however, lacks agreement on the use of a central archive, but options include PANGAEA and DRYAD (Warinner and d'Alpoim Guedes 2014). Data publication has also been advocated for within archaeology, and more widely within science (Costello et al. 2013; Atici et al. 2013; Kansa and Kansa 2013), enabling peer review of data, quality checks, and increased credit for the author. As well as Internet Archaeology, the Journal of Open Archaeology Data provides one venue for the publication of datasets; see for instance datasets published on Neolithic Europe (Colledge 2016) and Bronze Age Ireland (Johnston 2014). Data publication has clear benefits for the author, such as increased citation and improved non-academic dissemination, and also for the discipline, namely greater survivability of data (Costello et al. 2013; Tennant et al. 2016).

Alongside publication or archiving of raw data, it is also important that metadata is supplied to increase the likelihood the data are easily discoverable (Kansa and Kansa 2013) and that information making archaeobotanical data comparable is included (contextual information, sampling procedure, recovery information, identification resources, flora used). While data publication and archiving are vital, it is equally important that the methods used to analyse the data, plus the underlying datasets used, are included in the publication to ensure research transparency and for the reuse of data in meta-analysis. An optimal example would be McKerracher (2016), where all publications used are referenced and the methodology for data reduction is clear. Other recent syntheses have not cited any underlying data, meaning any replication or reuse of the dataset is impossible. In contrast, a recent zooarchaeological meta-analysis cites all data, and the underlying code and dataset are made available in an institutional repository (Orton et al. 2016). Radically improving the data archiving and publication practices within archaeobotany would enable successful meta-analysis to take place much more easily, and 'reduce the Balkanization of the field' (Marston et al. 2014b, 12).

Guidelines on data archiving currently vary between the most common journals in which archaeobotanical studies are published, from recommending PANGAEA in Vegetation History and Archaeobotany to publishing supplemental material via Figshare in Environmental Archaeology.

Greater awareness among the archaeobotany community of the benefits and challenges of reusing published datasets, and the need for accessible content should lead to greater advocacy for online submission of data. Enforcing data archiving or publication by journal editors would ensure that a large volume of archaeobotanical data was openly available. However, ensuring that relevant datasets can be found quickly and easily is also imperative both for researchers and commercial specialists. In Britain, regional resource assessments have played this role, such as for the East Midlands (Monckton 2006) or the review of plant macrofossils from Northern England (Hall and Huntley 2007), but they become outdated very quickly, and do not have the capacity for spatial searching. The East Midlands regional database provides a good option, whereby the research document is online and updatable (East Midlands Heritage 2016), ensuring the growing quantities of archaeobotanical data available can still be easily navigated.


Cite this as: Lodwick, L. 2019 Agendas for Archaeobotany in the 21st Century: data, dissemination and new directions, Internet Archaeology 53.

Internet Archaeology is an open access journal. Except where otherwise noted, content from this work may be used under the terms of the Creative Commons Attribution 3.0 (CC BY) Unported licence, which permits unrestricted use, distribution, and reproduction in any medium, provided that attribution to the author(s), the title of the work, the Internet Archaeology journal and the relevant URL/DOI are given.

Terms and Conditions | Legal Statements | Privacy Policy | Cookies Policy | Citing IA

Internet Archaeology content is preserved for the long term with the Archaeology Data Service. Help sustain and support open access publication by donating to our Open Access Archaeology Fund.