PREVIOUS   NEXT   CONTENTS   HOME 

The Needle in the Haystack

As has been frequently noted, it is currently not always straightforward to find exactly what you want on the Web. Searching on a couple of keywords may well yield several thousand 'hits', most of which will be of little or no interest to you. What is needed is a more reliable and efficient way of pinpointing those documents in which you are interested. Metadata can help a suitably designed search engine to determine which documents are genuinely relevant to your query. Paul Miller has some useful insights into the value of metadata on the Web [Ariadne 5, 'Metadata for the Masses']:

"In such an environment, there is an obvious requirement for metadata, but this metadata must be of a form suitable for interpretation both by the search engines and by human beings, and it must also be simple to create so that any web page author may easily describe the contents of their page and make it immediately both more accessible and more useful. As such, compromises must be made in order to provide as much useful information as possible to the searcher while leaving the technique simple enough to be used by the maximum number of people with the minimum degree of inconvenience."

He goes on briefly to mention a variety of metadata approaches, followed by a more detailed discussion of the Dublin Core. This is a widely used document metadata format which can specify information such as author, title, subject, publisher, data format, language and spatial and temporal coverage. Dublin Core metadata is a multi-disciplinary standard and is not designed to include information specific to archaeology, though there are standards for the representation of archaeological information within Dublin Core.

In addition to searching for a report based on bibliographic information, it would also be valuable to search for archaeological site reports based on the nature and contents of the site. For example, a researcher may be interested in finding examples of all sites which contain both Iron Age defensive structures and medieval churches. A keyword list is not suitable for answering this query exactly (it may yield all instances of defensive structures and churches and ignore dating, for example), nor is a full-text search (which may well yield sites adjacent to churches, or previously presumed to include a church of which no trace was found). What is needed to resolve these problems is a more precise way to 'highlight' and describe the salient features of a site. This is not, perhaps, metadata (i.e. it does not describe the data in the report) in the true sense of the word; we suggest that 'structured abstract' or 'structured site description' (SSD) would be a more appropriate term.

We should stress that SSDs are not a substitute for comprehensive database systems recording individual finds and contexts. Instead, they represent a compromise between depth of detail on the one hand, and succinctness on the other. We believe it is a compromise worth making. The decision on what level of detail to include is down to the report author, and our proposed data structure does not prevent the inclusion of a very full description of the contents of a site.


 PREVIOUS   NEXT   CONTENTS   HOME 

© Internet Archaeology URL: http://intarch.ac.uk/journal/issue7/gray/gray2.html
Last updated: Mon Sept 6 1999