Connecting Archaeological Data and Grey Literature via Semantic Cross Search

Douglas Tudhope1, Keith May2, Ceri Binding1 and Andreas Vlachidis1

1. Hypermedia Research Unit, University of Glamorgan, Pontypridd CF37 1DL, Wales, UK. Email:
2. Strategic Digital Information Archaeologist, English Heritage

Cite this as: D. Tudhope et al. 2011 'Connecting Archaeological Data and Grey Literature via Semantic Cross Search', Internet Archaeology 30.


Differing terminology and database structure hinders meaningful cross search of excavation datasets. Matching free text grey literature reports with datasets poses yet more challenges. Conventional search techniques are unable to cross search between archaeological datasets and Web-based grey literature.

Results are reported from two AHRC funded research projects that investigated the use of semantic techniques to link digital archive databases, vocabularies and associated grey literature. STAR (Semantic Technologies for Archaeological Resources) was a collaboration between the University of Glamorgan, Hypermedia Research Unit and English Heritage (EH). The main outcome is a research Demonstrator (available online), which cross searches over excavation datasets from different database schemas, including Raunds Roman, Raunds Prehistoric, Museum of London, Silchester Roman and Stanwick sampling. The system additionally cross searches over an extract of excavation reports from the OASIS index of grey literature, operated by the Archaeology Data Service (ADS).

A conceptual framework provided by the CIDOC Conceptual Reference Model (CRM) integrates the different database structures and the metadata automatically generated from the OASIS reports by natural language processing techniques. The methods employed for extracting semantic RDF representations from the datasets and the information extraction from grey literature are described. The STELLAR project provides freely available tools to reduce the costs of mapping and extracting data to semantic search systems such as the Demonstrator and to linked data representation generally. Detailed use scenarios (and a screen capture video) provide a basis for a discussion of key issues, including cost-benefits, ontology modelling, mapping, terminology control, semantic implementation and information extraction issues.

The scenarios show that semantic interoperability can be achieved by mapping and extracting different datasets and key concepts from OASIS reports to a central RDF based triple store. It is not necessary to expose the full detail of the ontological model; the Demonstrator shows that user interfaces for retrieval (or mapping) systems can be expressed using familiar archaeological concepts. Working with the CRM-EH archaeological extension of the CIDOC CRM ontology allows specific archaeological queries, while permitting interoperability at the more general CRM level, potentially extending to other areas of cultural heritage. The ability to connect published datasets with the hitherto under-utilised grey literature holds potential for meta studies, where aggregate patterns can be compared and hypotheses for future detailed investigation uncovered. Connecting the interpretation with the underlying context data via the semantic model facilitates the revisiting of previous interpretations by third parties, the possibility of juxtaposing parallel interpretations, or exposing the data to new research questions.

The STAR and STELLAR projects were supported by the Arts and Humanities Research Council [grant numbers AH/D001528/1, AH/H037357/1].


Go to article Table of Contents


© Internet Archaeology/Author(s)
University of York legal statements | Terms and Conditions | File last updated: Mon July 18 2011