Internet Archaeol. 21. Stoyanovich. Introduction

1. Introduction

In this article we present the Faceted Query Engine, a system developed at Columbia University under the aegis of the inter-disciplinary project Computational Tools for Modeling, Visualizing and Analyzing Historic and Archaeological Sites. Our system is based on novel Database Systems research that has been published in Computer Science venues (Ross and Janevski 2004 and Ross et al. 2005). The goal of this article is to introduce our system to the target user audience – the archaeology community.

Commercial database systems today are robust and scalable. Large quantities of data can be stored, accessed and updated in a distributed and concurrent manner. Despite these achievements, database systems are not easy to use for the untrained person. A scientist may know a lot about his or her data without having the skills (or inclination) to pose queries in a database language such as SQL.

What are the alternatives? One can settle for very limited kinds of queries, such as queries based on keyword search in text documents. Alternatively, a handful of common and useful queries can be canned by the database administrator so that users can get access to these predefined queries. Yet such an approach leaves no room for a user to pose exploratory queries that could not be anticipated in advance.

In our implementation of the Faceted Query Engine we derive a compromise between the expressiveness of SQL and the limitations of simplistic query formalisms by identifying a class of queries that express the needs of the user while not overwhelming him/her with technical detail. Our query language is compositional: results of one query can be reused directly as input to another, allowing the user to build sophisticated queries from simple parts.

We demonstrate the use of the Faceted Query Engine on a previously unpublished dataset: the Thulamela (South Africa) collection. This dataset comprises Iron Age finds from the Thulamela site at the Kruger National Park. Our project is the first to compile and classify this dataset systematically. We also use a larger dataset, a collection of ancient Egyptian artifacts from the Memphis site (Giddy 1999), to demonstrate some of the features of our system.

Our system and the Thulamela dataset are freely available for download: http://www.cs.columbia.edu/~kar/facet/

2. The Datasets

We work with two archaeological datasets, Memphis and Thulamela, and we present each of them in turn.

Thulamela is a collection of 92 Iron Age finds from the Kruger National Park site in South Africa. The finds include tools, weapons, ceremonial items, personal adornments, pottery, faunal remains and metallurgical products. Our system stores detailed information about these finds, including their physical properties, location, interpretation, and visual resources: photographs for all and Quick-Time Virtual Tour (QTVR) for some of the finds.

Memphis is a rich collection of ancient Egyptian artifacts discovered during a six seasons-long excavation of the settlement remains at Kom Rabi'a (Giddy 1999), and includes personal adornments and tools, pottery and other household items, architectural elements, statuary, musical instruments, game pieces and other finds. The dataset includes detailed information about the physical properties, location, interpretation, and parallels between finds. We classified and imported about 2500 artifacts, a large portion of the Memphis dataset, into our system.