[Back] [Forward] [Contents] [Home]

Section 5: Towards a 'Real World' Application

5.2 The markup of grey literature reports

There are a number of practical considerations to be addressed if the discipline of archaeology is to implement a strategy for grey literature report markup. These include who is to carry out the encoding, when this encoding should be applied and what level of encoding would be appropriate. The following summary discusses these issues.

5.2.1 Who should encode?

The question of who, or perhaps what, should encode is debatable; different sectors of the profession may have different ideas on the subject. From the author's own experiences and those of other projects (see 3.7), it is evident that human input is required to evaluate the original text before applying markup, and that it would be difficult for a computer alone to perform such a task. It has not been within the scope of this study to investigate mechanical or automated means of inserting encoding, nor to evaluate the range of XML editors currently available. However, it is possible that a system and/or software could be used and developed to assist in the markup process. Walsh (2001), for example, has developed a Java programme to perform markup, but has found that results have to be manually edited to ensure accuracy.

A variety of organisations are presently involved in handling and curating grey literature and there are options as to who might apply encoding, separate from the question of who might then disseminate it. Bosak (2001), for example, sees that it is authors themselves who should apply markup. The author of a report is the person most familiar with the report content and structure and will be able to add in, or amend, data if omissions become apparent during markup. For the OASIS Project, this is the approach being taken; it is the organisations undertaking archaeological projects that are being targeted to complete the data collection forms. An alternative approach, however, would be for the time taken to complete an OASIS form to be redirected towards encoding the report itself. This could add value to the whole document, not just the content relating to the OASIS dataset (see 3.2). Potentially, this would enable not only an OASIS record to be populated from encoded data within the report, but also other datasets. As noted elsewhere in this section, the development of a standardised approach to markup, and a template for so doing, alongside use of an XML editor would greatly facilitate such a process, as would the availability of appropriate import and export scripts.

However, Bosak (2001) notes that the application of consistent markup may impose on authors a mode of working they may find uncongenial. The pressures of producing archaeological reports to tight deadlines with even tighter budgets may also prove restrictive. Wolle (2002) further notes that whilst an electronic publication is relatively easy to create for those with the skills and knowledge to do so, it still requires far greater effort and expertise to prepare than a conventional hard-copy report.

Meckseper (2001), found that archaeological contractors are reluctant to shift from traditional to electronic publication, even though most are aware of the potential benefits of doing so. Some saw it as not their responsibility, but as that of the national or local HER (see Meckseper 2001: 2.5.4). There are already backlogs of data awaiting entry into local HERs and in the author's experience, without the provision of significant, additional resources, the potential for encoding to be applied by HERs is not feasible at present. Nevertheless, there is strong potential for HER staff to assist in the process of facilitating access to, and dissemination of, digital reports, for example through the hyperlinking of digital documents to online HERs (see 2.1.2 and 2.3.2). Some may see encoding as relevant only at the end of the process of report creation, rather than at the beginning, and look to a digital archive repository to facilitate its application. However, in this instance, similar resourcing issues to those referred to above for HERs will apply.

5.2.2 When to encode?

To a large extent, the question of when to encode is allied to that of who? If new reports are encoded at source as they are written, this is the most cost-effective method and has advantages for all subsequent users. Electronic publication and archiving must be considered and planned from the outset of any project, and not regarded merely as an afterthought.

Encoding can be applied successfully by someone remote from the creation of the original document, years after a report has been written, indeed this has been the situation for the present case study. The majority of other examples of encoding reviewed by the author have also been retrospective (see 3.7). However, the benefits of so doing so late after the original archaeological event are reduced. Encoding at source can aid the population and currency of heritage databases such as OASIS, AIP Project, NMR and HER records, thereby reducing duplication of effort (see 4.4.3.2).

In terms of the substantial back catalogue of the thousands of grey literature reports already deposited within local HERs, many of which may no longer be available in electronic format, it would take significant expenditure of resources to create encoded, digital XML documents from them, and this would need to be justified in terms of reuse potential (Richards and Robinson 2000). For this backlog, it would seem that the most practical approach would be to identify which reports do survive in electronic format, ensure these are appropriately archived and curated, for example in the ADS Library of Unpublished Fieldwork Reports, and link them digitally to existing online resources. Worcestershire County Council, for example, is seeking to obtain Adobe PDF files from all their reports and link these digitally to the online version of the NMR Excavations Index (J.D. Richards, pers. comm. June 2004; Atkin 2002). This approach accords with that promoted by the new Environmental Information Regulations 2004 (see 2.3.2).

5.2.3 What to encode?

The more markup added to a document, the wider use that can potentially be made of it. However, whilst anything and everything within a report may be encoded, the level of markup needs to be realistic in terms of the resources required to apply it. The markup should also meet a nationally agreed set of aims and standards, and be justifiable in terms of the uses to which the encoded data will be put.

On the basis of the results of the present case-study, the XML version of the TEI Guidelines and the associated TEI DTD could be used successfully for archaeological purposes. There is also scope for the development of new tag sets specific to archaeology and archives within new versions of TEI Guidelines (see 3.3). To achieve similar report transformations to those demonstrated in Section 4, both basic structure and content would need to be encoded, along with the TEI Header (see 4.3.6). The TEI Header could be used, as in the Oxford Text Archive, for the creation of a reports catalogue, as well as a source of structured metadata and key project information (Morrison et al. 2002). Ultimately the question of what to encode will relate to the needs of users. For document content, a minimum standard relevant to OASIS Project and HER records could be encoded, in a similar way to the approach adopted by the author (see 4.4.3.2). In future, should additional needs be identified, additional markup could be applied – for example, there is potential for export of data for interchange with libraries, museums and archives. Particular specialisms may also want to highlight particular content through encoding (see 4.2.2 and 4.3.4). For the benefits of markup to be maximised, especially for the purposes of interoperability, an agreed and consistent approach should be taken using controlled vocabularies and agreed metadata (Butler 2001).

The approach taken by the University of Virginia Electronic Text Center in providing guidelines for their in-house users of the TEI Guidelines would provide an excellent model for the development of a user guide for archaeological markup (Seaman 1995). It may also be possible to develop an XML template for use when authoring reports that would greatly facilitate the process, as identified above. Whilst the present article focuses on archaeological grey literature, encoding could also be applied more widely to digital versions of archaeological reports and articles in general.

5.2.4 Costs of markup

No exact figure can be given for the actual cost of encoding an archaeological report. This will depend upon a variety of factors, principally the time required to apply the encoding, which will be influenced by the prior knowledge, experience and training of the encoder, the level of detail to which the text is to be encoded and the length of the report.

Bosak (2001) has found that the application of markup adds considerably to the costs of writing a text, in addition to the costs of training and tools. Livingood (1996) also found that the preparation of an electronic document took significant investment in equipment, expertise and time. However, the process of applying XML encoding is not dissimilar to that of applying (X)HTML markup, and can be done with a text editor designed for such a purpose to speed up the process (see 3.3.2). Within archaeological practice, the Archaeology Data Service's Frequently Asked Questions about digital archiving note that the creation of a long-term digital archive of an archaeological project should cost between two and three per cent of the total project budget. If electronic publication were to become a requirement of national and local planning policy, report encoding could be funded legitimately as part of the overall project costs borne by a developer.

The costs of marking up an original report could, potentially, be recouped in the long term by the repurposing of encoded data and the related saving of time, effort and resources further down the line, that is by reducing the need for the manual re-keying of data for entry into OASIS Project and HER records, bibliographic records and project summaries for example.


[Back] [Forward] [Contents] [Home]

© Internet Archaeology URL: http://intarch.ac.uk/journal/issue17/5/gf5-2to5-2-4.html
Last updated: Wed Apr 6 2005