4.6 General evaluation

Having discussed technical implementation and NLP issues, we turn to broader evaluation and cost-benefit considerations. The Demonstrator shows that semantic cross search across datasets from different organisations and across grey literature is achievable. As a very basic comparison, the methods have delivered a solution not previously possible using the various research or commercially operated recording database systems encountered during the course of the project. This cross search capability between different database systems and grey literature also serves to demonstrate potential technological methods for bridging the research outputs of the two practitioning cultures, academic and commercial, highlighted by Bradley (2006).

The system is a research demonstrator, rather than an operational system. If future resources permitted and a larger range of data sources were included, a user study could investigate a range of pragmatic issues in an operational setting or performing realistic tasks. As discussed in section 4.4, different user interfaces may suit different contexts. However, we can also consider whether the objectives might have been achieved in a simpler way. The CRM-EH is a fairly complex ontology. Could there be simpler alternatives?


We might, for example, envisage searching a free text index derived from the different databases, perhaps enhanced by some of the terminological support, discussed in section 4.3. However, even if terminological differences were overcome, such a system would still be subject to the data schema mismatch problems discussed in the Introduction. A large number of false matches would be likely and this would be compounded when cross searching for combinations of concepts.

We could consider simply mapping data elements to a flat list of upper concepts. This might yield the equivalent of a fielded database search. However, lacking the connecting properties and events of the CRM, it would not easily permit different relationships between time periods and finds, say, or allow other classifications or interpretations to be provided by different agents. Indeed, the effort that resulted in the CIDOC CRM had its origins in a simpler relational CIDOC data model for museum information interchange. In 1996, the CIDOC Documentation Standards Working Group decided to investigate an object orientated approach 'in order to benefit from its expressive power and extensibility for dealing with the necessary diversity and complexity of data structures in the domain' (CIDOC CRM SIG), resulting in the first complete version of the CIDOC CRM in 1999.

It is possible to imagine a simpler specialisation of the CRM than that offered by CRM-EH (perhaps just the basic CRM), by modelling only major finds and monuments (or dwellings), omitting the context data and much of the grouping/phasing information. For example, this might go straight to the main finds with their dating and interpretations of human activity and settlements, omitting the complexity of the underlying data. Do the benefits outweigh the costs inherent in the additional complexity of modelling and extracting archaeological excavation data at the context level?

The issue of how much specificity to model and provide in a scholarly search system at a particular time involves a triangulation between judgements of the likelihood of sufficient data being made available, user (re)search interest at this granularity of detail and the ability to provide technical solutions.

Recent developments in semantic technologies suggest that technical solutions may be possible; some performance issues remain to be explored but we may expect that progress can be made. Particularly important is the emergence of standards for the representation of vocabularies, conceptual models and also data, as emphasised by Richards and Hardman (2008). Standards are a prerequisite for interoperability.

The intention of (loosely) basing the Demonstrator scenarios in a published academic article is to make a case that the scenarios correspond to plausible and realistic research interests, assuming that sufficient data were available. Generalising beyond the immediate scenarios in the current research demonstrator, the issue converges with broader digital archaeology concerns and the move to make primary research data in eScience available more widely.

Part of the cost-benefit equation is whether a critical mass of data could be made available. There needs to be a sufficient body of data for cross search to be useful. For grey literature, it might be argued that a critical mass already exists and constitutes a valuable resource in the aggregate. Here the issue is whether robust NLP methods can be developed to extract rich metadata at an appropriate level of detail. On the excavation data side, there will inevitably be overheads in extracting data at a detailed conceptual level. The STELLAR tools aim to facilitate this process to some extent.

To motivate detailed data extraction, however, benefits must be anticipated. Although the particular technological context is new, the issue has been a recurring one for digital archaeology. A tradition dating back to Pitt Rivers has seen archaeological publication as ideally combining the interpretation with the underlying data. Various efforts in the last few decades have attempted to balance the pragmatic difficulties of operating with limited resources but with the desire to publish as much data as possible (Richards and Hardman 2008, who also advocate the potential of detailed semantic mapping). Current computing technologies offer new possibilities, as explored in the issues of this journal, for example Williams (2008) discusses various issues for computer supported interpretation based on a detailed case study of Sultan Kala (Merv). Semantic technologies offer another reason for undertaking the additional work, the potential for meaningful cross search at a detailed level, as the scenarios discussed in section 3.1 have attempted to illustrate over a combination of excavation datasets and an extract from the archaeological grey literature.

This is a strong argument for modelling at the level of detail made possible by the CRM-EH. It makes possible detailed search over the semantic relationships, as suggested by the stratigraphic scenarios. Looking to the future, it opens the prospect, for example, of being able to query whether certain types of archaeological feature often appear stratigraphically above (or below) other associated contexts (or are related by physical relationships if we extend the model). In addition to meta research possibilities, understanding and identifying such patterns in the stratigraphic sequence would help confirm the nature of uncertain contexts, by comparison with similar sequences on other sites.

Connecting the interpretation with the underlying context data via the semantic model facilitates critical (re)interpretation by third parties, the possibility of juxtaposing parallel interpretations, or exposing the data to new research questions. This is one of the aims of the Çatalhöyük excavation database. The database does support various elements discussed in sections 4.1 and 4.3, including physical relationships, typologies combined with free text detailed interpretive notes. However, it does not support the explicit modelling of the assignment of types and interpretations (nor search over relationship patterns, as discussed above). The aims of projects, such as Çatalhöyük, could be further advanced by extending STAR's NLP processing to encompass free text interpretive comments and diaries, in order to suggest links via the semantic typologies.

Since the CRM-EH is an event based ontology, it allows multiple type assignments or dating of a specific find, or multiple interpretations and alternate groupings for a set of contexts. It is possible with the current model to record via a type assignment event that the allocation of controlled types was an intellectual semantic alignment by a project team member (as discussed in section 4.3) and to give the date. The potential (if desired) to model explicitly the event of assigning a classification (or date or grouping) with the date and person associated with that event facilitates some key reflexive recording practices (Hodder 1999) and provides an opportunity for holding multiple interpretations.


© Internet Archaeology/Author(s)
University of York legal statements | Terms and Conditions | File last updated: Mon July 18 2011