Internet Archaeol 2. Wise and Miller. 1

It's a funny word, metadata. All it means is 'data about data', or the information needed to communicate sensibly about information (in the same way that metalanguage is the discourse linguists use to communicate about language). Metadata is everywhere, but you probably don't recognise most of it as metadata. For example, if you locate a book in a library card catalog and write down its title and classmark (call number for those of you who speak American English) you have just recorded two pieces of information about information. Voila metadata!

2. What does metadata allow you to do?

Metadata has three main purposes. First, it allows the nature of a body of information to be assessed without having to access the data themselves. For example, you might want to know what questions were asked in the 1991 UK census without actually looking at the millions of response forms. The second purpose of metadata is that it allows a user to locate a piece of information. Going back to the library card catalog example, the classmark allows a potential reader to wander off and actually find the books that he or she is interested in, whereas without it they could conceivably wander aimlessly through the library forever. The third use of metadata is that it allows similar bodies of information to be grouped or linked together. The Dewey Decimal system is an example of a type of metadata that allows library books to be grouped by subject.

Thus, the information about information communicated through metadata is generally:

An important point to remember is that metadata actually does something as well. A person can follow a particular interest by using computers (like those hooked up to the Internet) to search for information automatically. Thus the clever thing about metadata is that people can understand it, and it can be expressed numerically so machines can understand it too.

3. Metadata and digital information

Whilst reading the previous sections of this document, you perhaps have been thinking that library card catalogs went out of vogue with the dinosaurs. These days many libraries have computerised catalogs of their holdings, and even the holdings of other libraries. Have you ever wondered how those library catalogs interact? Well the answer, in a sense, is metadata and, specifically, the MAchine Readable Cataloging (MARC) metadata scheme (Network Development and MARC Standards Office 1997) which has evolved into one of the most comprehensive and widely adopted of metadata schemes worldwide.

By using similar cataloging terms from a scheme like MARC across library collections, it becomes relatively easy to make computer systems search more than one catalog in response to a user's query. The very cataloging terms used in the search are then a form of metadata, allowing for basic description of the data, its location, and the existence of similar information to be discovered. A catalog entry may vary in complexity from the equivalent of a library record identifying title, author, publisher, date of publication and shelving details, to a keyword-indexed abstract that enables a thorough search and assessment of the results of this search.

Metadata can also document everything you need to know to decide if you can actually use a resource. For example, charges levied by a library or copyright restrictions imposed by a publisher can be two pieces of metadata that describe the book you are interested in, and which potentially affect whether or not you really want to spend time physically retrieving the book. The language of a resource is also an issue suitable for metadata cataloging; that book you have found may well be just what you need, but if it's in Portuguese and you only read English and Ki-Swahili, it's useless to you.

Our examples do not have to be limited to books, because metadata documentation works for complex digital data as well. Information about satellite images might include the date of collection, the type of sensors used, spatial coverage, amount of cloud cover, resolution, costs, and copyright information - in short, everything that a data user might wish/need to know in order to use the information contained within the dataset.

Behind all this apparently seamless information, description and discovery lies a complex suite of technical problems which include speed, accuracy, precision, and completeness of the results. The answer to many of these technical problems is standardised metadata entries. If all pieces of information have an author or group of authors (in the sense that texts are written, photographs are taken, maps are digitised, and databases are constructed usually by an identifiable person or people), and this metadata is presented in standardised, machine-searchable ways, everyone can find the information they want better and faster.

4. Who needs metadata?

Well, we all do. Anyone who makes it their business to produce, use, or keep track of information is aided in their Herculean task by metadata.

5. What are archaeological data?

The short answer would be that archaeological data are all available evidence about the past as well as all interpretations of that evidence. The long answer is that archaeological data consist of all sorts of information types: artifacts, ecofacts, features, the contextual relationships between artifacts/ecofacts/features, aerial photographs, cultural norms and values, digital photographs, near infra-red photographs, any other kind of photograph, CAD layers, GIS databases, drawings/plans/sections, maps (digital and paper), contour surveys, typologies, geophysics printouts and displays, sites, landscapes, recording forms, indexes, gazetteers, anecdotes, historical texts, grimy smudged field notebooks, PCR gels, spectrometer printouts, books, digital texts, web pages, electronic journals, email, paper correspondence... etc! That's enough for you to get the picture. Despite that huge, exhausting, list there are plenty of types of archaeological data not listed.

6. What is the biggest problem with archaeological data?

The greatest problem with all forms of archaeological data is that they are, by their nature, dispersed around the world, whether physically in situ (e.g. Stonehenge) or languishing inside archives, libraries, and museums. It is a daunting undertaking to embark upon a new piece of research, because one may end up having to travel over large distances to visit sites, museums, libraries, and archives just to discover what sorts of information already exist. This time has to be subtracted from the total available to study those holdings in any detail. The inaccessibility of archaeological data is also one of the foremost barriers to effective, creative, and accurate syntheses of data across regions and countries and time periods.

The key to faster, better, discovery of archaeological information is metadata which can be quickly and thoroughly searched by computers and presented in an understandable form to users. Metadata can be used to summarize the content of archives, libraries, museums, and even publications, so you scan their holdings relatively easily and either download the information directly, or at least more carefully plan the itinerary you will need to gather the relevant pieces of information.

7. What problems do we have describing archaeological data?

To describe archaeological data adequately (with the goal of making it faster and easier for other people to discover it) we have to understand what people will want to know about it. What people will want to know determines what types of metadata will be important, and what people will want to know about the data will change as they become more familiar with what is available.

For example, say you were thinking about a nice place to do future fieldwork. For some reason, you decided that you weren't too particular about the type of archaeology in this new region, but that the average temperature during northern hemisphere summer needed to be around 25° C and the local landscape needed to support vineyards. You would want a metadata index of site locations that led you quickly to information about the archaeology of Burgundy, France, or the Sonoma Valley in California.

To take a more serious example, say that future fieldwork was intended to fit in with your developed expertise in the prehistoric ritual mounds of southern England. You would want a metadata index that quickly took you to information about sites in other parts of Northwest Europe, the Americas, or anywhere which included prehistoric ritual mounds.

Now, in either of the above examples, after locating a region with information pertaining to your research interests/requirements you would want more detailed information. Perhaps you would like to know what types of artifacts to expect from mounds in Mississippi, or what satellite imagery was available for Burgundy. The metadata to tell you about this increased level of detail should still be general enough that you're not having to search each piece of data individually to find out if it's appropriate for you. In other words, once you know there's a library section on 'Archaeology' you shouldn't have to walk to each individual book, take it off the shelf, and glance through the table of contents to see if it's right for you.

The site or region's location is certainly not the only important starting point for most archaeologists. Also critical is the temporal affiliation of the site. Is it occupied in the Bronze Age, the Middle Woodland Period, the Minoan Period, the Tang Dynasty or from 3000 BP? Metadata entries should cover all the basic information you need to decide if a resource is right for you.

At this most basic level (called, confusingly, 'high-level' metadata) you would find the following types of information about each piece of information:

This basic level of metadata is being defined by broad international groups working on what they term 'core description'. The metadata fields mentioned above seem to describe, at the most basic level, just about any type of information that a person might want to access.

Are you worried that this metadata stuff sounds very hierarchical? Well, in a way it is. The technology currently available to enable a variety of users to find different sorts of information using metadata requires these entries to be arranged hierarchically.

The good thing is that we don't have to stop at this basic level of description: we can have metadata indices which describe information in the ways that archaeologists need to have it described. This project, defining the archaeology-specific metadata, is underway right this very moment. These metadata discussions lurk under the most confusing variety of names: cataloging, data standards, metadata description, resource discovery and retrieval, and more!

8. Metadata of use to archaeologists

There are a large number of metadata initiatives of value to archaeologists, whether developed for a wider community or for archaeology specifically. These initiatives range from extremely detailed and specific metadata systems such as the Federal Geographic Data Committee's (FGDC) Content Guidelines for Digital Geospatial Metadata to the much simpler and more generalised Dublin Core. The range of initiatives, as well as the large number of ways in which each is used, makes it impractical to cover many in detail here. Instead, we shall outline a few of the most interesting or relevant.

8.1 The Dublin Core

Of these metadata systems, the Dublin Core (Miller 1996; Miller and Gill 1997; Weibel 1995) is by far the most user friendly. It is also the most widely applicable metadata system. This is because the essential (or 'core') fields of the Dublin Core are designed to describe any source of digital information including images, electronic texts, HTML pages, geophysical surveys, etc. Other metadata systems are designed especially for one subject or data type (e.g. library books, satellite photographs, etc.). There are currently 15 metadata fields in the Dublin Core, although the Core is still being developed and this number may be subject to slight changes during 1997 (see Weibel and Miller 1997 for the definitive list of elements at any time). Discipline-specific metadata fields can be added to the core fields through the Warwick Framework (Dempsey and Weibel 1996). The Dublin Core was originally created by those primarily interested in describing electronic texts and library collections, so the field headings are very text-based (e.g. 'author', 'publisher'). Therefore an important thing for archaeologists interested in the Dublin Core to do is to try and ignore the field headings. Instead of 'author' think 'creator' or 'investigator' or 'excavator'.

The Dublin Core itself consists of fifteen core elements, each of which may be further extended by the use of SCHEME and TYPE qualifiers;

Element Name	Element Description
Title	The name of the resource
Author or Creator	The person(s) primarily responsible for the intellectual content of the resource
Subject and Keywords	The topic addressed by the resource being described
Description	A text-based description of the resource (e.g. an abstract)
Publisher	The agent or agency responsible for making the resource available in its current form
Other Contributors	The person(s), such as editors and transcribers, who have made other significant intellectual contributions to the resource
Date	The date of publication
Resource Type	The genre of the object, such as a novel, poem, or dictionary
Format	The data format in which the resource is available (e.g. Postscript, HTML, etc.)
Resource Identifier	String or number used to identify the resource uniquely
Relation	Relationship between this resource and other resources
Source Resources	Resources, either print or electronic, from which this resource is derived
Language	Language of the intellectual content
Coverage	The spatial location and temporal duration characteristic of the object
Rights Management	Who holds copyright on the material, which organisation distributes the material, and any restrictions on use of the data

Table 1: The fields of the Dublin Core Metadata Element Set

Tools are available to help in the creation of Dublin Core metadata. One example is the Dublin Core Metadata creator for web pages available through the Archaeology Data Service project metadata pages.

If you would like to see metadata right this very moment you can. Metadata is embedded in this web page, but it's invisible. You can see the Dublin Core metadata if you look at the source code for this web page (just pull down 'View' then 'Document Source' from the toolbar if you are using Netscape).

8.2 Federal Geographic Data Committee (FGDC) Digital Geospatial Metadata

The FGDC metadata system is much more complex than the Dublin Core. The FGDC is primarily concerned with spatial information (Federal Geographic Data Committee 1994), thus its metadata system contains fields to identify the dataset, data quality, data format, location, vector or raster nature of data, the coordinate system used to georeference the data, attribute information for spatial coordinates, how to cite the dataset, temporal coverage of dataset, the agency/individual which created the data, and who to contact for more information. FGDC metadata can be compiled and validated with tools currently available over the World Wide Web (Schweitzer 1997) and output can be created in SGML.

8.3 Global Change Master Directory (GCMD) Directory Interchange Format (DIF)

DIF is designed for the exchange of information relating to global environment change (Global Change Master Directory 1996). It is thus particularly useful for climatologists, oceanographers, palynologists, palaeogeographers, environmental archaeologists, etc. There are a total of 33 fields in the DIF system, but only 7 of these are required. Fields record information such as the project title, investigator's name, discipline (e.g. 'archaeology'), context of data, kinds of measurements included (e.g. ice cores), keywords, temporal and spatial coverages, resolution, quality, how to access information, restrictions on use of information, storage medium, bibliographic references, and a short text summary. Also recorded in the DIF system is information about the metadata itself. For example, the name of the person who authored the metadata, revision dates, and upcoming review dates can be incorporated.

Why Metadata Matters in Archaeology

1. So what is metadata?

2. What does metadata allow you to do?