4. The Faceted Data Model

Faceted classification was introduced by an Indian librarian and classificationist, S.R. Ranganathan, and was first used in his Colon Classification in the early 1930s. Faceted classification treats entities or groups of entities as collections of clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics. Such aspects, properties, or characteristics are called facets (Wynar 1992).

For example, in an archaeology domain, we may classify artifacts along multiple orthogonal dimensions: by time period, by culture, by material, by geographic location, etc. Each of these dimensions is a facet. Recent work (FacetMap; Flamenco) describes search facilities for hierarchical data using faceted hierarchies. We extend this idea in Ross and Janevski (2004), where we develop a data model and a query language appropriate for such domains. Our primary goal was to create a data model and a query language that can be understood by non-technical users, while still allowing the composition of complex queries from simple pieces.

Our data model is particularly well suited for application domains that require modelling of complex entities within classification hierarchies. For many of these domains, the hierarchy and the entity structure are the main sources of complexity, while other features of the domain, such as relationships between entities, are relatively simple. In such domains it is natural to make the set of entities the basic conceptual structure.

Entity sets – possibly heterogeneous collections of entities – are organised into an inheritance hierarchy, with multiple inheritance (two or more parent entity sets) allowed. Entity sets inherit attributes, and may define new ones. We refer to entity sets that have objects stored in them as classes. Every entity assigned to a class must specify a value for each attribute of that class.

The complete hierarchy (schema) of the Thulamela dataset is presented below. This hierarchy was created by the archaeologists on our team, with the help of schema design guidelines for faceted hierarchies (see Section 6).

Figure 2: Faceted hierarchy for the Thulamela dataset

So far, our data model appears very similar to another well-known modelling paradigm: the object-oriented model. There, real-world entities are represented as objects; objects have attributes that describe their composition (what the object is), and methods that describe their behaviour (what the object does). This model supports inheritance hierarchies, but these hierarchies are monolithic rather than faceted. We demonstrate the difference between faceted and monolithic hierarchies with the following example.

Suppose that we are tasked with classifying a collection of archaeological finds that contains ceremonial.gifCeremonial Entities: entities used for ceremonial purposes. Suppose that the collection also contains metalimpl.gifMetal Implements, some of which were used for ceremonial purposes. To classify such entities appropriately in an object-oriented schema, one needs to create a class Ceremonial Metal Implements that inherits from both ceremonial.gifCeremonial Entities and metalimpl.gifMetal Implements, and place our entities into that class. In the object-oriented model, all valid combinations of classes must be enumerated explicitly during schema creation, and thus have to be anticipated. There is no such requirement with the faceted model. All combinations of classes are valid by default. So, to place a Ceremonial Metal Implement in the hierarchy appropriately, we simply add it to both relevant entity sets.

Because valid class combinations are not enumerated in advance, but are composed on the fly from orthogonal dimensions, the schema designer is not forced to make decisions about which classes to place higher or lower in the hierarchy. A class placed higher implicitly carries more importance, and if this decision is made arbitrarily (such as would be the case with orthogonal dimensions), unnecessary bias is introduced during schema creation. Section 6 outlines the guidelines for the design of faceted schemas, and elaborates on this point further.

Based on these considerations, faceted hierarchies are best suited for domains that can be faithfully described using orthogonal, uncorrelated dimensions. All invalid combinations of facets must be explicitly excluded from the schema. In this sense, faceted hierarchies are the dual of monolithic hierarchies: in faceted hierarchies invalid combinations of properties must be explicitly excluded, while in monolithic (e.g. object-oriented) hierarchies valid combinations must be explicitly enumerated.


© Internet Archaeology URL:
Last updated: Mon April 30 2007