4.2 The nature of dissemination

Despite the clear biases displayed, this is not to say that synthesis and interpretation of the data produced through commercial (and indeed other sources) of investigations are invalid; clearly, the results from past and present studies are essential (see Bradley 2007). However, the spectre of sampling bias should not be ignored but rather met head-on, and the role of the new generation of academic syntheses should be to engage with the gaps to identify what we don't know, as well as what we do. These gaps are arguably the final frontier for new research undertaken by the academic and community sectors. However, in order for this to happen it is imperative that the results of these syntheses and analyses, including lacunae, are disseminated as widely as possible: a rallying call to challenge the potential known unknowns within the landscape (cf. Garwood 2007). The potential paradox in the increased re-use of grey literature by academia is that freely available information (i.e. fieldwork reports) is collated and absorbed solely into traditional hard-copy formats that are themselves not widely accessible. For example, the landmark works on later prehistory and historical periods that leant heavily on commercial archaeology, and cited by Aitchison (2010) as examples of successful re-use, were both published in this form (Bradley 2007; Newman et al. 2001). The extent to which these (commercial) outputs are readily accessible to those without access to a well-stocked university library is debatable (specifically curators and units responsible for the original data on which these syntheses are based, or locally based groups wishing to engage in research in their geographic region).

This is not a direct criticism of these particular projects or the publication strategy itself; the incentive for Higher Education-based researchers is to publish, especially in monograph form in order to demonstrate research excellence and impact (OAPEN-UK 2014). Open access journals are of course one method, but are currently not widespread within the major period-based journals within the UK. In fact, the journal format may not be conducive to the publication of national period-based overviews that encompass large volumes of text and graphics (cf. Bradley 2007). Although there are steps in Higher Education publishing towards open access monographs, this is neither widespread nor yet a requirement within UK Higher Education funding bodies (Crossick 2015). Thus, traditional publication creates a potentially paradoxical situation: when academic synthesis is undertaken, it is in danger of becoming relatively 'grey' itself and not widely accessible outside of particular institutional collections and the culture of academia. It has also been noted that publications, unpublished reports and data from academics are often missing (i.e. not submitted) from HERs, thus perpetuating the cultural and methodological schism (Brookes and Pearce 2003; Evans 2013). Conversely, if academia does not collate, analyse, synthesise and publish then these tasks – large infrastructure schemes such as Terminal 5 or CTRL aside – are not carried out within the typical commercial framework (Morrison et al. 2014).

The answer to this potential impasse appears in the online strategies of two recent projects. The first is the Roman rural landscape project, whose publication strategy incorporates access to the research dataset, itself a research-driven appraisal of the value of unpublished fieldwork reports (Smith pers. comm.). The second solution appears to lie within the new generation of Research Frameworks as outlined by English Heritage (now Historic England) (Miles 2013) and exemplified in the recent East Midlands document being made available online as an updatable wiki (East Midlands Heritage 2015). With flexible and free to access documents there is the real potential for regions (and periods) to have up-to-date and comprehensive research priorities that could be used to inform and influence the nature of planning/developer-led fieldwork. This could potentially put the emphasis back on research in the curatorial and commercial sectors, enabling developer funding to be used to help fill the research gaps instead of solely having to rely on academics taking an interest and securing the budgets to do so. An extension to this strategy could be to link these flexible regional and thematic documents to online fieldwork reports as they are created, thus enabling contractors to identify key 'unpublished' works that relate to a particular research theme.

Thus an intrinsic part of our 'grey literature' future is the online dissemination of fieldwork reports, bypassing traditional problems with accessibility and potentially offering new avenues for linking (predominantly) commercially funded outputs with research. However, although the strengths of this are clear (Moore and Evans 2013), perhaps less so are the potential weaknesses. The first issue is the disparity in the current geographical distribution of online reports; the unevenness of the location of commercially-led investigations is, to a large, extent mirrored by the online corpus (Figure 10). In fact, it is even more skewed towards the densely investigated central/eastern region. Compare this with the scant information from the northern counties, as well as gaps in densely investigated parts of the south, and there seems to be a serious imbalance in online information. Of course this reflects the relative levels of fieldwork, but also relates to the cultural working practices and the use of OASIS by relevant curators and units since the piloting of the system in the east of England (Smith et al. 2012). It may also be suggested that this disparity and use is influenced to some extent by the logistical and financial capabilities of HERs to process unpublished reports and OASIS backlogs at a time of serious cuts to staff and resources (Rescue 2013). In addition to this, what is striking is the under-representation of research projects undertaken by community groups, academic institutions and individual researchers (Figure 10). As noted above, the sector outside commercially-funded works is not inconsiderable, but apparently does not submit unpublished reports through OASIS. This is perhaps unsurprising for projects that do not generate any such reports, or those with a more traditional publication strategy, but still suggests a cultural imbalance in the online availability of information.

Figure 10
Figure 10: Reports for excavations, evaluations and watching briefs recorded in OASIS and disseminated via the Archaeology Data Service: for planning-led events (n=13978), investigations by community groups (n=169) and other 'research' projects (n=61). All records also displayed as kernel density (20km). Data from [Last accessed: 22 Oct 2014].

It may also be argued that as the number of online reports grows, the nature of accessibility has changed. As Huggett (2014) has recently outlined, there are a raft of practical and theoretical issues concerning the limitations of large-scale data dissemination. To this we may add the discoverability and usability of these reports files themselves. Outside of archaeology, a recent survey of reports held by the World Bank shows that simple accumulation of PDFs with basic bibliographic metadata is not conducive to re-use, but does in fact create vast online 'graveyards' of unused information that is accessible in name only (Doemeland and Trevino 2014). It should be stated that this is clearly not the case for archaeology, as access statistics and objective assessment clearly show the use and impact of online reports (Beagrie and Houghton 2013). However, looking forward there is a need to revisit and explore new notions of access and discoverability, given that the online corpus is growing at an increasing rate, and not simply rely on the fallacy that more data equals more knowledge. As the recent projects using unpublished reports for research into the Roman period has shown, a great deal of time can be spent establishing not only the extent of the resource, but also its usefulness (Holbrook and Morton 2008; Fulford and Holbrook 2011). For example, at the time of writing the ADS library holds in excess of 30,000 records, and a search for 'Medieval England' returns over 6000 reports, 'Iron Age' returns over 1100 and 'Post Medieval' almost 10,000, with no further way to refine these queries except on the subjective recording of monument and artefact terms. It may be argued that this is enough but, while commendable, as yet it fails to meet the vision of online publishing and communication forecast at the beginning of the current century:

'Whereas previous generations would have taken years to gather all the references to research at a given location and would have congratulated themselves on the achievement of such a laborious task, in the future (and increasingly in the present) this will be the work of a single day or less. Comprehensive referencing will not be a virtue to which scholars aspire, it will be a sine qua non. It will no longer be the first stage of a doctoral thesis; it is likely to be the first stop in an undergraduate paper. The days of the descriptive index submitted without interpretive scrutiny are surely numbered' (Clarke et al. 2003).

It may be argued that for meaningful and innovative searches to be undertaken, underlying systems have to be advanced lest we return to the simple accumulation of data criticised in the past (Thomas 1991). One answer to more intelligent searching of reports could well be machine-based, as demonstrated by recent research into the generation of rich metadata via text recognition (Vlachidis et al. 2013). Although increased levels of metadata can be advantageous for locating specific monuments or artefacts, it may be of limited use to know that a report contains reference to 'animal remains' or 'sherd'; it may be of greater significance to know whether these are discussed in detail (and indeed by whom), and their relevance to a particular period or area of research. Archaeological reports are not databases or inventories, which, as noted elsewhere, may well be overlooked facets of the digital archive (Evans and Moore 2014), and their value is undoubtedly as literature no matter how 'grey' this is perceived. In tandem with technical innovation there is also a need to remain grounded in the archaeological context of these reports; not only 'who-what-where' but also 'why' and 'how'. At the time of writing detailed metadata about the context of the event such as prompt, land use development type, investigation size and methods employed are all recorded in the OASIS system but not replicated within the index of reports. The power of this potential metadata cannot be understated, and with the future redevelopment of the OASIS system and the ADS library it is imperative that these data are both retained and utilised.

It is also important that the accessibility and discoverability of reports is not confined to systems such as the ADS Library, no matter how successful this endeavour. The aforementioned Research Frameworks are of course on method, but there are also a growing number of online databases – for example the Bibliographies of Medieval Pottery and Roman Pottery that offer the potential specialist appraisal of the contents of a report (MPRG 2011; SGRP 2014). Large-scale syntheses such as the Rural Settlement of Roman Britain (Allen et al. 2015) also offer in-depth analyses of particular facets of monuments and sites. Last, but by no means least, there are also the HERs: the first port of call for monuments in England. Although somewhat optimistic, it may be envisaged that these often disparate systems can be aligned, incorporating the detail recorded within HERs of events and sources with additional metadata recorded by finds specialists and academics. Such a vision of 'linked data' is of course nothing new (see Tudhope et al. 2011), but it is now arguable that with the growing number of online resources, and even with the limitations discussed above, online research must advance beyond a simple type and hope approach, thus truly making 'grey literature' an integrated part of the information landscape.