Internet Archaeol. 21. Stoyanovich. Schema Design Guidelines

6. Schema Design Guidelines

This section outlines some general guidelines for designing a faceted hierarchy for your own dataset. These guidelines were used by the archaeologists on our team while designing the Thulamela (Section 4) and Memphis hierarchies. As an example, let us consider how we would classify a collection that contains ancient and modern artifacts. Some of these artifacts are made of metal, and others are made of wood.

The main principle of faceted design that emerges immediately is that of orthogonality. In our example, there are three orthogonal aspects: age, material and function. The resulting faceted hierarchy appears as follows:

Figure 30 Schema design: proper orthogonal schema

If the principle of orthogonality is not observed, we may end up with a hierarchy that places one aspect, e.g. age, higher in the hierarchy, thus biasing the model: aspects that appear higher in the hierarchy implicitly carry more importance. A hierarchy that makes this (bad) choice would look like this:

Figure 31 Schema design: violation of the principle of orthogonality

From this simple example we observe that properly designed faceted hierarchies have two desirable properties:

No arbitrary bias.
Modest size, as measured by the number of classes. Note that in the orthogonal hierarchy 10 classes are needed to represent the dataset, while in the second non-orthogonal hierarchy a total of 15 classes are required to represent the same dataset. The size of the orthogonal hierarchy is linear in the number of aspects, while the size of the non-orthogonal hierarchy is exponential. As the number of classification aspects increases, the difference in size between the two representations becomes even more striking.

The schema designer should use his judgement and domain knowledge when applying the principle of orthogonality, so as not to over-complicate the schema needlessly. Consider the classification of arrowheads in the Thulamela schema (Section 4): the class Arrowheads is a descendant of Metal Implements. Thulamela is a collection of Iron Age finds, where arrowheads made of materials other than metal are uncommon. The designer deliberately chose to position Arrowheads below Metal Implements, thus encoding a constraint that all arrowheads are also made of metal. If our schema was designed to accomodate for Stone Age and Iron Age finds, a better design decision would have been to create three classes: Metal Implements, Stone Implements and Arrowheads. Arrowheads made of metal would then be placed into the Metal Implements and Arrowheads classes, while arrowheads made of stone would be placed into the Stone Implements and Arrowheads classes.

The second important design principle is choosing the right level of granularity during facet creation. Our system automatically calculates which attributes are available for querying in different entity sets. In order to do this judiciously, we must ensure that an object that is assigned to a class specifies values for all attributes of that class.

For example, consider the task of classifying ceramic vessels some (but not all) of which are inscribed. Suppose also that we wish to store the inscription for the vessels that are inscribed. The way to classify our data in accordance with the principle of granularity is to create two classes, Ceramic Vessels and Objects with Inscription, and add the attribute inscription to the latter. We then place all ceramic vessels in the class Ceramic Vessels, and also place all vessels with inscription in the class Objects with Inscription. An alternative, and incorrect, schema design, would have been to add the attribute inscription directly to the class Ceramic Vessels, and leave the value unspecified for the vessels that do not have an inscription. There is an additional benefit to creating a class Objects with inscription: we can deal with inscriptions on many kinds of artifact without having to create a separate class for each.

Finally, the schema designer needs to be aware of the fact that, in the faceted data model, class names have global scope: once assigned, a name has semantic effect over the entire schema. Consider for example a dataset of finds that contains personal adornment plaques, as well as wall plaques used for interior decoration. A possible faceted schema, created in accordance with the principles of orthogonality and granularity, is presented below.

Figure 32 Schema design: entity set naming considerations

If the designer considers adornment plaques and decorative plaques to have more in common than just the name, he may choose to create a common class Plaque into which to place all plaques. He may then create two additional classes, and assign to them distinct names, for example: Adornment Plaque and Decorative Plaque. Even if no class Plaque existed in the schema, the classes Adornment Plaque and Decorative Plaque would still need to be named differently, to resolve ambiguity.