Internet Archaeol. 21. Stoyanovich. Why Facet

3. Why Facet?

Data classification and analysis are central to archaeological research. Relational systems, the current de facto standard in data management, were developed over the past 30 years, primarily in response to a data management need of the business community. The relational technology is mature: commercial systems are robust, reliable, and capable of storing and efficiently querying vast amounts of data. However, it is increasingly recognized that one size does not fit all: a model that is well suited for the data representation and analysis needs of one domain may not be as appropriate for another.

Analysis and classification is an integral part of archaeology research. Consistently determining categorizations and sub-categorizations of finds, cultures, time periods, and other elements of cultural and archaeological context is an important, yet non-trivial task. And as soon as sub-categories are introduced into classification -- one is faced with hierarchical data organization. In this sense, archaeological data is inherently hierarchical.

Relational systems provide support for sophisticated data analysis, but the underlying model does not naturally support hierarchies. Archaeo-Browser, FacetMap, and Flamenco are some recent systems that implement search over faceted hierarchies. However, these systems do not provide any data analysis capabilities beyond search. The goal of our work is to support both construction of hierarchies using facets, and sophisticated queries against such hierarchies. We elaborate on these points in the remainder of this section.

3.1 The Relational Model

The central concept of the relational model is that of a relation, or simply, a two-dimensional table. Relations can describe real-world entities (e.g. archaeological finds) or relationships between entities (e.g. finds and their contexts). Columns in relations represent attributes (e.g. find identifier, weight, state of preservation), while rows correspond to individual records (e.g. find 1, find 2 etc).

Relational algebra defines ways to manipulate relations: it is comprised of operators that act on relations and produce other relations as a result. The Structured Query Language (SQL) is based on the relational algebra, and contains operators such as selection, projection, join, as well as some set operators. An important property of the relational algebra is compositionality: output of an operator is always a relation, and it can be used as input to another operator.

3.2 Relational Representation of Hierarchies

As mentioned earlier, archaeological domains are inherently hierarchical, which calls for support for hierarchies in the data model and query language. Relational tables are two-dimensional ("flat"), and there is no direct way of representing hierarchies, only ways to simulate them. Consider, for example, how one would represent the following artifacts relationally (attributes appear inside square brackets):

pot_and_kiln
Figure 1: Simple hierarchy

Two reasonable ways to proceed are:

Create a single table, ARTIFACT, that stores attributes id, type, volume and temperature, assigning values to the attribute volume if type=pot, and to the attribute temperature if type=kiln. The advantage of this approach is that all artifacts in the collection reside in a single table and can be retrieved by referring to that table alone (i.e. no reconstruction of records is needed here). However, a serious drawback is that there is no simple way to enforce which attributes are valid for which type. Additionally, if artifacts of a new type were added to the collection, the table would have to be re-defined to include columns for the attributes of the new type.
Create three tables, one for each class, with the corresponding attributes. In this case, the model represents the data in a more intuitive manner, and avoids the problems of the first option. However, the drawback here is that a join of the ARTIFACT and the POT tables on the id attribute is required in order to reconstruct the pot object (for example for display purposes). This presents a serious problem if the hierarchy is more than only a few classes deep, which is often the case in real-life archaeological schemas.

3.3 Relational Databases for Non-Technical Users

When choosing an appropriate data management technology for an application, one must consider not only the representational power of the model, but also the conceptual complexity that the power brings. The relational data model, despite its lack of direct support for hierarchies, is powerful enough to represent complex relationships and pose sophisticated queries (e.g. joins with cyclic hypergraphs) against the data. This complexity presents a problem not only in terms of efficient processing, but also, and perhaps more importantly for the application domain at hand, in terms of usability by non-technical users.

In order to utilize the full power of relational algebra and analyze (often hierarchical) archaeological data, the users would need advanced knowledge of the database schema and of the SQL query language. We believe that catering to SQL experts among archaeologists, and expecting intimate knowledge of the schema of our users, is unrealistic and should, if possible, be avoided.

There are numerous application-specific systems that hide the complexity of SQL behind a carefully designed graphical user interface (e.g. an HTML form). These systems shield the user from direct interactions with the database engine, but also limit the user's data analysis capabilities. There are other general-purpose systems (e.g. Microsoft Access) that allow for interactions with the database engine through a user-friendly interface. However, even while interacting with the database through an interface, the user is still restricted to a fundamentally relational view of the data.

Our faceted data model and query language were designed with these issues in mind. We describe our approach in the following sections.