Digital archiving

Internet technologies have developed to the point where they offer considerable potential for the dissemination of archaeological results, but if we are to exploit this potential effectively it is important that the Internet should not become a dumping ground for raw and undigested data of doubtful value, and that we do not begin a massive programme of digitisation of project paper records for the sake of the digital bandwagon. It becomes essential to define exactly what we mean by a digital archive and to consider what additional work it will involve.

One of the basic principles established in the first edition of the Archaeology Data Service's Excavation and Fieldwork Guide to Good Practice (Brown et al. 1999) was that for information not already held in digital format it was unnecessary to embark upon a massive digitisation programme. As more and more data are collected in digital format, and as computers begin to pervade all the activities of archaeological contractors from project management to post-excavation analysis, the proportion of primary data available digitally will inevitably grow. Digitisation of paper records may also be desirable to increase access, or as part of the preservation of rare and fragile documentary sources, but it should not be done for 'its own sake'. There has also been a recognition that different archaeological projects may deserve different levels of digital archive. Going back to the idea of preservation for a purpose, it is not clear that all data should be automatically preserved as a re-use value has to be demonstrated. For instance, although an animal bones database will have clear value for comparative purposes, it is less obvious that the automatically logged total station coordinates from an earthwork survey of a minor site should be preserved indefinitely. The end product of the computer terrain model or contour plot may be important, but not necessarily the raw data points used to produce it.

Not all archaeological investigations make the same contribution to knowledge and not all merit the same level of digitisation, although it is important to recognise that it is not always possible to identify this at the time and clearly necessary to have a proper assessment procedure to determine their significance. MAP2 (English Heritage 1991) drew upon the Frere (1975) and Cunliffe (1983) Reports, especially the latter, and tried to clarify the mechanisms required for the iterative reviews which Cunliffe envisaged. This was assisted by a refined terminology, which distinguished between the site archive, the research archive, summary publication, and full publication. The second edition of the ADS Excavation and Fieldwork Guide to Good Practice also adopts the idea of phases through which projects should pass, and the outputs required at each stage, in an attempt to categorise the equivalent levels of digital archive (Richards and Robinson 2000). Four levels of archive are identified — in ascending order of complexity these are:

  1. Index level,
  2. Assessment level,
  3. Research level, and
  4. Integrated archive and electronic publication.

Many desk top assessments or evaluations carried out as part of the planning process produce little archaeology. In these cases it may simply be appropriate to create the minimal digital archive: a single metadata index record which identifies the site and its location, those responsible for the work, the components of the paper and physical record, and where these can be found. The OASIS project aims to provide such an index level record for all archaeological investigations in England. Where a report has been submitted then this will normally have been prepared in electronic format and it makes sense to archive this as well, ideally linked from the index record, so that it can be accessed more easily than if it were simply lodged in a filing cabinet in the local Sites and Monuments Record.

For more significant projects that may warrant further analysis, then the MAP procedure proposes the production of an Assessment Report which identifies the key findings and datasets available for further study. For projects that reach this stage, this too should be archived, along with such details of the stratigraphic and structural sequence as have been digitised, as well as specialist reports and databases used in the production of the Assessment Report.

The majority of important projects will proceed beyond the Assessment Phase to full post-excavation analysis and the production of a monograph or journal article. Large quantities of digital data will be created as part of this process, including detailed specialist reports, databases, spreadsheets, CAD phase plans and so forth. It is recommended that these should all be archived, along with the final text of the full report, to create a Research Level archive. The English Heritage sponsored DAPPER project (Digital Archive Pilot Project for Excavation Records), conducted jointly between ADS and the Museum of London Archaeology Service and Oxford Archaeological Unit created two research level archives, for the Royal Opera House site and Eynsham Abbey. Neither had been planned for digital archiving and dissemination at the outset and both were to be published by means of traditional hard copy monographs. Nonetheless the digital archives are extremely valuable additions to the dissemination of the site and each has had high levels of re-use. Within the first 24 months there were over 10,000 visits to the Eynsham Abbey archive, and individual files, which include CAD and ARCView plans, were downloaded on up to 30 occasions. ADS has received enquiries from as far away as the United States, where a class from New York State University were engaged in post-excavation study of the development of Anglo-Saxon London through the DAPPER archive. Generally, however, it is difficult to assess how online archives are used and download figures may indicate novelty value of such resources for casual browsers rather than serious research. Web access statistics are notoriously difficult to interpret (Kilbride and Winters 2001).

In the case of research level archives it must also be recognised that we are really only making available the digital residues left over from the post-excavation analysis, reflecting the post-excavation practices of each archaeological contractor. Where digital dissemination can be planned into a project from the outset, then it is possible to produce a much more integrated and exciting product which may fully exploit the potential of the Internet for archaeological publication.

The ADS has developed the concept of the integrated archive, linking archive and publication and allowing users to pursue ideas found in the synthetic publication into the evidence which has been used to support these interpretations. In many cases the digital archive may simply provide a much more effective and accessible equivalent of microfiche, allowing dissemination of detailed tables, plans, photographs, and supporting text, normally judged too expensive to publish. The Online Archive provides a much more usable alternative to fiche, and may also allow authors to include far more supplementary material. The Fyfield and Overton Down Project was published at a number of levels. A popular book, The Land of Lettice Sweetapple publishes the main results for the general reader (Fowler and Blackwell 1998). The Society of Antiquaries monograph (Fowler 2000) includes the archaeological evidence from the key sites for the academic reader, whilst in the project digital archive Peter Fowler makes available not only the text of four further monograph-length reports, but also some 100 Fyfield and Overton working papers, including specialist reports, draft texts, and background documentation. The traditional monograph includes the URL for the archive and basic instructions on how to use it.

Where the archive can be developed alongside the publication then it is possible to develop a more complete set of links, publishing URLs for specific files so that readers can more easily locate the precise information they require. It is planned that the forthcoming publication of a major site in London, No.1 Poultry, will explore methods of detailed linkage, as well as providing a database search mechanism for the archive and a clickable map-base to allow the user to explore the archive spatially.

"This is not just a case of dumping raw undigested material on the web. It is essential that the electronic publication must still 'tell a story' and answer the research agenda set by the investigator"

Where a site is also published electronically then the full potential of live hypertext links can be explored, allowing the user to move between publication and archive at will. A tentative and preliminary model of some of the possibilities is explored in the layered electronic publication of an Anglian and Anglo-Scandinavian farmstead at Cottam in Internet Archaeology (Richards 2001) and the simultaneous release of the archive on the ADS site. At the top level the electronic publication follows the familiar model of a traditional printed report through introduction, methodology, results and discussion. Within the electronic version, however, readers can follow hypertext links to pull up illustrative material such as plans and photographs, and can also read more detail of the archaeological findings. They can also search an online database of the finds from the excavation, field-walking, and metal-detecting. Furthermore, by clicking on further links they can seamlessly move into the archive, reading the specialist reports, the detailed stratigraphic evidence, and are able to download context and finds databases, and raw geophysics surveys and CAD files.

Although Cottam is a relatively small site, the project demonstrates some of the potential of integrated Online Archives. This is not just a case of dumping raw undigested material on the web. It is essential that the electronic publication must still 'tell a story' and answer the research agenda set by the investigator. In the archive, however, resides the material necessary for the reader to explore alternative interpretations and multiple narratives. Nonetheless, we have to be aware that data in an archive are not value-free. Data are recorded observations and are theory-laden. Therefore the supporting documentation must include information not just about what was recorded, but how and why it was done. Traditional archives are generally very bad at this. We tend to be very good at filling in context sheets in meticulous detail, but may not include information about why a decision was taken to define a discrete context at that point, for example. What archaeologists at the time may have taken for granted may seem less obvious to someone trying to make sense of the data at a later date. If we are to create a culture of data re-use it will be necessary for archaeologists to learn to make the context of their observations explicit, or as Hodder (1989) has argued, to reintegrate description and interpretation. We may be able to democratise archaeological knowledge, but as Huggett (1995) observed, this will require clarification of issues of context, access, and ownership. We must also plan for the archive at the outset of a project, not just regard it as an afterthought once the final publication proofs have been checked, and we must train future generations of archaeologists to apply skills of source criticism in their use of archival sources. If the next generation of archaeologists are to be encouraged and trained to use online archives it is essential that their use is integrated within the University curriculum. With funding from the JISC the ADS has established the PATOIS project (Publications and Archives in Teaching: Online Information Sources) to develop four web-based tutorials based on its resources. Each tutorial aims to provide an introduction to using different aspects of archaeological or historical archives or electronic publications as part of the core syllabus taught by archaeology departments in the British Isles (Kilbride et al. 2002). However, where archives are re-used in a less-structured environment, outside of the classroom we are still very unclear about what users do with them. Research into Users and their Uses of HEIRs conducted by the Cultural Heritage Consortium on behalf of HEIRNET has indicated the need for a major assessment of user needs and activity, and development of evaluation and feedback methods. although it does reveal a significant preference for online access as opposed to CD-ROM (Cultural Heritage Consortium 2002, 21). Another conclusion of their report is that, although in order to develop an effective information resource it is necessary to establish who will be using it, and for what purpose, 'it is also clear that regardless of actual target audiences, online resources will always attract users from outside the target groups.' (Cultural Heritage Consortium 2002, 17)

