4. The Collection of Reference Data for Archaeological Research Outputs

In order to map the discipline of archaeology, we need to collect the aggregated reference data of genuine archaeological research outputs. There are two major subscription-based, online search engines, Web of Science (hereafter WoS) and Scopus, and three open-access search engines (PubMed, CiteSeer and Google Scholar) that allow the collation of reference data on archaeological research publications. Of the subscription-based search engines, WoS, started in 1997 and now managed by Thomson Reuters, is the oldest. It is based on the original Science Citation Index, Social Science Citation Index and Arts and Humanities Citation Index started by Eugene Garfield at the Institute for Scientific Information in the 1960s and 1970s. The WoS contains a full set of data on academic publications and their citations back to 1945, with an increasing body of data on publications back to 1900. The other online search engine, Scopus, was created by Elsevier in 2004 and is still maintained by them. Scopus has built up a full record of data on academic publications back to 1966 covering the physical, life and social sciences and the arts and humanities. Both WoS and Scopus allow subscribers to download data on multiple documents. While there is a significant overlap between WoS and Scopus databases, there are recognised differences (see Falagas et al. 2007; Norris and Oppenheim 2007; Mongeon and Paul-Hus 2016 for reviews). Since the purpose of this analysis is to present a visualised overview of the nature of archaeological research, data from WoS was used in preference to Scopus owing to its ease of use for data download (described below), its broader compatibility with bibliometric analysis software and its greater coverage across the social sciences and sciences.

For this study a search was performed on WoS Core Collection (which includes the three WoS flagship citation indices: the Science Citation Index Expanded, the Social Science Citation Index and the Arts and Humanities Citation Index) for archaeological documents that were published between 2004 and 2013 inclusive, by running a search for the string 'archaeol*' OR 'archeol*' as a 'topic'. Topic searches within WoS search through the titles, abstracts, author-designated keywords and the assigned keywords for all document records. This search retrieved data on 24,954 separate documents that are broadly, but not exclusively, classified into separate document types; articles, reviews, book reviews, editorial material and so forth (see Table 1). This long list was then refined by excluding book reviews, reviews, letters, editorial material etc., which are not usually considered to be original research outputs or knowledge claims. The final list contains data on 20,339 documents published from the beginning of 2004 to the end of 2013.

Table 1: Document types located in Web of Science Core Collection when topic searching for 'archaeol*' or 'archeol*'. (Numbers in italics are those defined in this study as research outputs for archaeology.)
Document Type No. in Search    
Research Articles 16793 Refined List
of Articles
Long List
of Articles
Proceedings Papers 3038
Book Chapters 1420
Books 226
Reviews 1284  
Book Reviews 3301
Editorial Material 982
News Items 160
Meeting Abstracts 154
Biographical Items 86
Letters 57
Corrections 29
Art Exhibit Reviews 16
Poetry 10
Biography 6
Software Review 1
Film Review 1
Excerpt 1
Chronology 1
Total 24954

There is considerable discussion within the scientometrics literature about what data to include and exclude when mapping disciplines, particularly when trying to explore issues of inter-, multi-, and trans-disciplinary research activity (e.g. van der Besselar and Heimeriks 2001; Morillo et al. 2001; Leydesdorff and Schank 2008; Porter and Rafols 2009; Wagner et al. 2011). Some discipline-focused studies restrict their searches to documents that have been published in a pre-determined list of 'discipline-specific' journals (e.g. Goldstone and Leydesdorff 2006; Dolfsma and Leydesdorff 2010; Yuan et al. 2014). This is made easier by the fact that WoS, like Scopus, uses an algorithm to assign one or more subject categories of science (from a list of 250 categories in total) to individual journals based on the citing and cited behaviour of documents published within (Wang and Waltman 2016, 348-9). While the classification system for journals is remarkably successful (Wang and Waltman 2016), individual documents will not be classified as 'archaeology', for example, unless they were published in journals that are themselves classified in that way. Restricting any search for the research outputs of a discipline to a pre-classified set of journals, therefore, is recognised as problematic for disciplines that address complex systems (especially biological, ecological or sociological) where the evidence examined takes many different forms and can be investigated with many different approaches, and where the potential timescales for analysis and interpretation exist at different levels (Vugteveen et al. 2014), since interdisciplinary and multidisciplinary research fields often publish their knowledge claims in journals or books that are identified to another discipline.

This problem can be seen clearly for archaeology. The refined list of 20,339 documents, discussed above, is considerably larger than the number of documents within this list that are described as 'Archaeology' according to the WoS list of Subject Categories or Research Areas – just 8931 of these publications. Even though the classification methodology and range of subject categories used by WoS has developed over time (see Leydesdorff et al. 2013; Wang and Waltman 2016), there remains a clear discrepancy between the documents classified as 'Archaeology' by WoS and those that can be recognised as such by an archaeological specialist because of the diversity of materials and topics addressed in archaeological research. The WoS has a limited set of subject categories and research areas for the Social Sciences and especially the Humanities by comparison with their classifications of Science (see Leydesdorff et al. 2011). Furthermore, WoS (like Scopus) assigns subject categories and research area labels at the level of the journal, not the individual document (Wang and Waltman 2016). If the maps presented here used information from only those documents classified as 'Archaeology', data from many documents that a domain specialist would recognise as an output of archaeological research would be excluded. For example, if we look at articles within the refined list that have been classified as Zoology by category (a list of 143 documents), there are many documents that are clearly archaeological in terms of the materials or sites examined, the period covered or the behaviour discussed (Askeyev et al. 2013; Christensen and Weisler 2013; Lin and Chang 2013; Monchot et al. 2013; Piper et al. 2013; Salque et al. 2012; Upex and Dobney 2012). If we look slightly further 'off-discipline' and focus on documents classified as 'Public, Environmental and Occupational Health', this also includes documents that are clearly archaeological (for example Warmlander et al. 2011; Aldenderfer 2011; Timmann and Meyer 2010; Montgomery 2010; van der Geest 2004). The use of 'archaeol*' OR 'archeol*' as a topic for the initial search, therefore, seems to work more effectively for extracting a full range of potential archaeological research outputs for mapping than extracting a set of documents from the list of journals already categorised as 'archaeology'.

Although a very substantial number of documents have been identified for mapping, it is still necessary to consider possible and recognisable limitations to this dataset. Such limitations are likely to relate to the nature of communication practices in archaeology as a discipline and the fact that certain forms of document are poorly represented in the citation index databases upon which this study is based. There are also particular limitations of WoS in comparison to other possible databases such as Scopus. Unsurprisingly, there is a considerable and long-standing literature within the information science community exploring these issues arising out of the desire/planned intention to use data extracted from the major citation indices as a mechanism for the assessment of the research performance of institutions, departments and individuals rather than other forms of assessment including, for example, peer review (for example Archambault et al. 2006; 2009; Hicks 2005; Meho and Yang 2007; Nederhof 2006; Nederhof et al. 1989; Norris and Oppenheim 2007; among many others). Three significant sources for document omission of archaeological outputs need to be noted relating to the form, language and media of publication.


