4.5 Information extraction evaluation issues

A large scale evaluation exercise of the information extraction performance from an enhanced version of the pipeline is currently underway. As with most automatic indexing, there tends to be a trade-off between recall and precision, with the chance of some false positives unless results are subject to expert human inspection. Initial analysis from the evaluation pilot suggests that operational recall and precision rates are achieved. The enhancements (not reflected in the current demonstrator) include improved negation detection, which is intended to handle false positives such as the well examples. Other enhancements include improved identification of rich index phrases, informed by a bottom-up analysis of the corpus.

It should be remembered that excavation reports (especially summaries) tend to be at a higher level of generality than excavation datasets. Consequently, the information extraction work has in some cases been based on a slightly higher level interpretation of the CRM-EH model than the datasets. For example, group and context have been taken as a combined entity for NLP purposes. A tentative finding from the evaluation pilot is that it might also be helpful not to make a modelling distinction between material and object entities for NLP indexing and retrieval purposes (e.g. brick or pottery as material vs object). Another consequence of the broader generality is that CRM-EH events connecting ontology classes, such as Context Find Deposition can be taken as implicit in the reports (see the example in Figure 3). The event is not mentioned explicitly but can be inferred, bearing in mind that the text is known to be an archaeological report.

While some of the information extraction pipeline components are domain specific, much of the pipeline should be applicable to related areas, where the CIDOC CRM is an appropriate upper ontology. Work is underway generalising the methods to be applicable to other subject areas within the cultural heritage domain. Future work includes extending to the full OASIS grey literature library and including further elements of the model, such as place names. On the modelling and visualisation side, further consideration will be given on how best to indicate the provenance of RDF statements resulting from NLP methods, which inevitably need to be viewed with more caution than the results of data extraction. Developments in 'named graphs' with RDF may be helpful here. Another issue is the appropriate balance between recall and precision, although to some extent this can be configured when setting up a retrieval system. It might, for example, be considered desirable to favour recall in a broad research study looking to identify every possible occurence of a conceptual pattern, for subsequent intellectual examination.


© Internet Archaeology/Author(s)
University of York legal statements | Terms and Conditions | File last updated: Mon July 18 2011