1. Introduction

The advantages of making research data available have become widely recognised in recent years. In archaeology, however, this aim has been difficult to achieve at a detailed data level. Repositories of curated excavation datasets are emerging, such as Arachne in Germany, the Archaeology Data Service in the UK, DANS e-depot for Dutch Archaeology, and tDAR in the USA, but cross search across different organisational data structures remains difficult. Increasingly, such digital libraries include the 'grey literature' of excavation reports not formally published. However, these reports are not meaningfully connected with other online data.

Meaningful cross search is elusive. Problems of semantic interoperability include differing terminology and database schema, while excavation methodologies and recording practices may differ. Various archaeological teams may use different terms to mean the same thing and, conversely, the same term may be used for different things. In addition, database structure varies and similar entities may not have the same names and field structure, making like with like searches difficult.

There is a need to map an element in one schema to a corresponding element in another schema, in order to compare the same underlying data items. Relying on the string comparison of field labels is insufficient because of the same terminology problem. Indeed, a data element in one schema may correspond to a combination of elements in another schema. Matching free text grey literature reports with datasets poses even greater challenges.

Thus, for example, a grey literature report, or an excavation database, may refer to postholes, while another excavation database may refer to post-holes. One database may refer to 100-200AD while another refers to Roman Period. One excavation database finds dimension table might contain columns headed diameter, width and length, while another might contain a name column with diameter, width and length as possible values. Archaeological databases also vary in how they represent stratigraphic and other relationships.

Conventional search techniques are unable to cross search different archaeological databases, even when made available on the web. Similarly, conventional techniques are unable to cross search between archaeological datasets and web-based grey literature. This article discusses how semantic techniques can offer solutions and reports results from two projects (STAR and STELLAR) that have addressed these issues.

The remainder of this section introduces the research projects, gives some background and outlines the high-level conceptual framework (an archaeological extension of the CIDOC CRM ontology) used to integrate the various components. Section 2 outlines the various methods and terminology resources employed to create the semantic database (a triple store), automatically extracting information from excavation datasets and also grey literature reports. Section 3 describes a research Demonstrator, which cross searches five different databases and an extract of excavation reports from the OASIS index of grey literature, provided by the Archaeology Data Service (ADS). The Demonstrator is available online. Detailed use scenarios (and a screen capture video) provide a basis for the discussion of key issues in section 4, including an examination of cost-benefits. Conclusions are presented in section 5.


© Internet Archaeology/Author(s)
University of York legal statements | Terms and Conditions | File last updated: Mon July 18 2011