4 Problems in Compiling Archives

During the compilation of the archives that have been used for the ARENA project, several common problems existed which were discussed at various meetings and presented in a workshop at the Computer Applications in Archaeology (CAA) conference in 2003 (Kenny and Kilbride 2003). Long-term data collection was a problem, involving several different software types and formats as well as being collected by different people who archived in different ways. For example the Dankirke and Vorbasse archives were derived from long-term excavations and the plans digitised in early vector-based programmes. Fortunately it was possible to convert the digitised files into standard vector format. Those archives that were collected over a long period were also fragmented and had storage problems or inconsistencies in documentation. The Hofstaðir archive was collected over ten years and some of the data had relatively good documentation in a digital format, but in some instances the digital form had to be cross-referenced with the original research archive to obtain the correct information.

Many of the archives were never meant to be presented online. In a number of examples archaeologists had the good foresight to produce digital data archives but had not considered all the issues connected with access and preservation. The Newham archive, part of which is now hosted and presented by the ADS, was deposited for emergency retrieval and archive preparation after the Newham Museum Archaeological Service was closed down in 1998. The archive arrived in the form of 230 floppy discs, over 6,000 files at 130Mb, much of which could not be recovered. Similarly, the Hofstaðir archive maintains the archaeological methods of retrieval and post-excavation in its presentation. Conveniently, the first archaeological work by the Institute of Archaeology in Iceland in 1992, a topographic survey over the long-house at Hofstair using a dumpy level, was transferred into a digital form that enabled interpolation in 2002 of this data to produce a surface model. The systematic collection of data and the transfer of it into digital format has been a consistent theme in the project since 1992.

Data migration, the transference from one (sometimes proprietary) software format and version to another, is a problem for long-term projects (e.g. Hofstaðir), as well as the associated problems faced by digital archivists confronted by archives that have been stored and remained untouched for many years (e.g. Newham). For those archives that had not migrated their data to new formats and versions, online presentation was a substantial problem. Not only does it make the process of data preparation for archive and preparation longer, there is an increased risk of losing data. Therefore the intention of the ARENA project was to ensure that good data management practice extended to all the partners and to the archives that they were preparing and presenting. For example the Ager Tarraconensis archive represents data from a survey conducted between 1985 and 1990 in the territory of Tarragona in Spain. The archive was presented to the ADS with database information and a set of plans showing the transects and fields that were walked. The ARENA presentation allows navigation via the scans of the plans, with some hyperlinks to the database information to gain detail about the survey in any particular field. The original Ager Tarrconensis transect and plan data was in a proprietary format that is no longer supported and was unusable. The files had to be re-created through scanning the publication proofs (thankfully still available). The preparation of this archive is an excellent example of the problems for both long-term 'live' and 'static' archives in obselete software formats and the vulnerability of digital archives in general.

Dynamic or 'live' archives present a challenge. Core data should be presented but the data associated with the publication process are updated in order to give the user correct and up-to-date information. In the case of the Hofstaðir archive, much of the material available for download is being worked on for the monograph publication of the site. As a result a publication area was established that allowed snippets of the publication and the post-excavation processes to be given to users. Once the post-excavation process was completed for an area or structure, it is uploaded to the publication area. This, however, is an intermittant process but one that will gain momentum prior to the publication in early 2006.

Assessing the use of the ARENA archives by the user communities that engage with them is difficult but important. This should be part of determining how they are presented and the amount of work that goes into making them presentable. Each partner presented their archives at a level that would encourage the user to access and download data. Creating an interface between the research archive and the publication is one method that meets an expected usage for dynamic archives. For those excavations already published it was important to create a link between the publication and the online archive, either with access to more information or new data beyond the publication; this was done for the Kowalewko and Danebury archives.


