3. Why Facet?

Data classification and analysis are central to archaeological research. Relational systems, the current de facto standard in data management, were developed over the past 30 years, primarily in response to a data management need of the business community. The relational technology is mature: commercial systems are robust, reliable, and capable of storing and efficiently querying vast amounts of data. However, it is increasingly recognized that one size does not fit all: a model that is well suited for the data representation and analysis needs of one domain may not be as appropriate for another.

Analysis and classification is an integral part of archaeology research. Consistently determining categorizations and sub-categorizations of finds, cultures, time periods, and other elements of cultural and archaeological context is an important, yet non-trivial task. And as soon as sub-categories are introduced into classification -- one is faced with hierarchical data organization. In this sense, archaeological data is inherently hierarchical.

Relational systems provide support for sophisticated data analysis, but the underlying model does not naturally support hierarchies. Archaeo-Browser, FacetMap, and Flamenco are some recent systems that implement search over faceted hierarchies. However, these systems do not provide any data analysis capabilities beyond search. The goal of our work is to support both construction of hierarchies using facets, and sophisticated queries against such hierarchies. We elaborate on these points in the remainder of this section.

3.1 The Relational Model

The central concept of the relational model is that of a relation, or simply, a two-dimensional table. Relations can describe real-world entities (e.g. archaeological finds) or relationships between entities (e.g. finds and their contexts). Columns in relations represent attributes (e.g. find identifier, weight, state of preservation), while rows correspond to individual records (e.g. find 1, find 2 etc).

Relational algebra defines ways to manipulate relations: it is comprised of operators that act on relations and produce other relations as a result. The Structured Query Language (SQL) is based on the relational algebra, and contains operators such as selection, projection, join, as well as some set operators. An important property of the relational algebra is compositionality: output of an operator is always a relation, and it can be used as input to another operator.

3.2 Relational Representation of Hierarchies

As mentioned earlier, archaeological domains are inherently hierarchical, which calls for support for hierarchies in the data model and query language. Relational tables are two-dimensional ("flat"), and there is no direct way of representing hierarchies, only ways to simulate them. Consider, for example, how one would represent the following artifacts relationally (attributes appear inside square brackets):

Figure 1: Simple hierarchy

Two reasonable ways to proceed are:

3.3 Relational Databases for Non-Technical Users

When choosing an appropriate data management technology for an application, one must consider not only the representational power of the model, but also the conceptual complexity that the power brings. The relational data model, despite its lack of direct support for hierarchies, is powerful enough to represent complex relationships and pose sophisticated queries (e.g. joins with cyclic hypergraphs) against the data. This complexity presents a problem not only in terms of efficient processing, but also, and perhaps more importantly for the application domain at hand, in terms of usability by non-technical users.

In order to utilize the full power of relational algebra and analyze (often hierarchical) archaeological data, the users would need advanced knowledge of the database schema and of the SQL query language. We believe that catering to SQL experts among archaeologists, and expecting intimate knowledge of the schema of our users, is unrealistic and should, if possible, be avoided.

There are numerous application-specific systems that hide the complexity of SQL behind a carefully designed graphical user interface (e.g. an HTML form). These systems shield the user from direct interactions with the database engine, but also limit the user's data analysis capabilities. There are other general-purpose systems (e.g. Microsoft Access) that allow for interactions with the database engine through a user-friendly interface. However, even while interacting with the database through an interface, the user is still restricted to a fundamentally relational view of the data.

Our faceted data model and query language were designed with these issues in mind. We describe our approach in the following sections.


© Internet Archaeology URL:
Last updated: Mon April 30 2007