Internet Archaeol. 42. Sinclair. Mapping disciplines using VOSviewer

Visualising the relationships between nodes in large bibliometric maps in two-dimensional space is not easy; the bibliometric data necessary to map research areas, disciplines and anything larger in scale will usually contain information derived from hundreds of documents and can easily increase to the thousands, tens of thousands, or even more. It is difficult to represent this information visually so that it is possible to see the relationships between so many nodes. Any interpretation of the relationships between nodes within bibliometric maps depends on the observable structure of the network – their place or mapping and their clustering, since this structure will help to identify the main topics or research fields within a discipline and how they relate to each other.

For this study, the program VOSviewer (version 1.60) was used to map the collated bibliometric data on archaeological research outputs. VOSviewer is a mapping and clustering program for network data developed by Ludo Waltman and Nees Jan van Eck based at the Centre for Science and Technology Studies at the University of Leiden (http://www.vosviewer.com). It has been designed primarily for the analysis and visualisation of bibliometric networks (van Eck and Waltman 2010) and, as such, it can create bibliometric maps based on co-citation or bibliographic coupling relationships (See Box 4) between authors, author institutions, or sources for documents, allowing a high degree of user interactivity. VOSviewer can also construct a network of discipline terms based on a corpus of terms extracted from document titles and/or abstracts using a natural language parsing function that identifies terms as a sequence of nouns and adjectives (ending with a noun). This corpus of terms represents the vocabulary employed in a discipline that must be understood in order to use its literature. VOSviewer integrates well with WoS data since it reads the original ISI format files to extract information on authors (first author only), sources and document titles and the relationships between them. A Windows version and a Java version of the program are available for download and use without cost, and the program can also be run directly online through a web browser (http://www.vosviewer.com) (best with Internet Explorer). The effectiveness of VOSviewer as a mapping and clustering program for bibliometric analysis has resulted in the program being used by a growing number of specialists to create maps of science, domain maps and term maps (see the growing list of publications using VOSviewer at http://www.vosviewer.com/Publications).

Before any form of mapping can be undertaken, some pre-processing of the collected bibliometric data is essential owing to the variety of ways in which author names or source titles have been recorded in the citation indices, as well as for cases where publisher names, or even page numbers, have been mistakenly recorded in place of source titles or authors, etc. (Börner et al. 2003). Pre-processing allows such variations to be identified and either eliminated or reconciled to a single form as might be appropriate. Examples of some of the variability in bibliometric data that need to be pre-processed before generating the maps of archaeological research presented here can be seen in Table 2. Without pre-processing, bibliometric software will necessarily treat all such variants as though they were different authors or source titles and calculate any citation-based links between them as separate entities, creating a map that fragments individual nodes and their relationships. Likewise, when creating a map of terms, pre-processing of bibliometric data is required to eliminate generic terms that occur frequently in the narrative of academic titles and especially abstracts (author – 'the author shows that...'; firstly, secondly, concern, idea, need, letter, new way, recommendation, etc.) as well as such things as publishers' names, technical terms (copyright, copyright material) and so forth that might be mapped as discipline terms if not removed. For the mapping of archaeological domain terms, in addition to these generic terms, the corpus also included both acronyms and their full expressions (ois and oxygen isotope stage; msa and middle stone age; naa and neutron activation analysis), as well as spelling variations (i.e. artifact and artefact), which need to be reconciled to a consistent form before mapping. For the maps presented here, VOSviewer facilitates the pre-processing of bibliometric data by reading a 'thesaurus file', which sets out a list of labels (author names, source titles, etc.) that should be replaced by another, specified, label or excluded altogether, prior to the creation and subsequent mapping of network data files. For this study, an iterative procedure was used to generate a series of thesaurus files specifically for archaeological authors, sources and terms. This procedure worked by creating bibliometric maps that progressively included more nodes, inspecting the maps for nodes that should be eliminated or replaced by a chosen variant, and then amending the thesaurus file before the creation of the next generation map. Given the number of potential variants of author name, source titles or archaeological terms (268,942 authors; 273,824 source titles; 288,487 terms) within the final document list, it is likely that a small number of these pre-processing issues still remain in the maps presented here.

Table 2: Examples of variations in author and source names in Web of Science entries for reconciliation in the pre-processing of publication data
Authors
Lewis Binford	Binford, l
	Binford, lr
	Binford, lw
	Binford, Lewis r
Elizabeth Brumfiel	Brumfiel, e
	Brumfiel, em
	Brumfiel, Elisabeth m
Meg Conkey	Conkey, m
Meg Conkey	Conkey, mw
Ian Hodder	Hodder, i
Ian Hodder	Hodder, Ian
Matthew Johnson	Johnson, m
Matthew Johnson	Johnson, mh
Colin Renfrew	Renfrew, ac
	Renfrew, c
	Renfrew, Colin
Bruce Trigger	Trigger, b
	Trigger, bg
	Trigger, Bruce
	Trigger, Bruce g

Source titles
Journals	African Archaeological Review	Afr archaeol rev
	African Archaeological Review	African archaeol rev
	American Journal of Physical Anthropology	Am j phys anthrop – ne
		Am j phys anthropol
		Am j phys anthr s
	Bulletin de la Societe Prehistorique Francaise	Bspf
		B soc prehist fr
		Bull soc prehist fr
		B soc prehistorique
Books	Animals in Archaeology	Animals archaeol
Books	Animals in Archaeology	Animals archaeology
Publishers
		Viking
		Viking fund publicat
		Yale U publications
Page numbers		P1
		P125
		P135

To create a map, for example (the network map of co-cited sources presented here – Figure 2), VOSviewer begins by reading the cited references entries within the ISI format file of bibliographic data collected for each archaeological research output, taking into account the pre-processing substitutions or deletions set out in the relevant thesaurus files, and generates a co-occurrence matrix of co-cited sources. It then normalises this co-occurrence matrix to generate a similarity matrix. In VOSviewer, the similarity, or the strength of association, between two items (cited sources in this example) is determined by dividing the number of times they are cited together by the product of the number of times that each is cited. The program will then generate a map using the similarity matrix. VOSviewer, working as a distance-based mapping program, will then generate a map of cited sources using the VOS (visualisation of similarities) technique to locate cited sources as nodes within a two-dimensional space in such a way that the distance between these two nodes on the map most closely approximates their similarity. Van Eck and Waltman (2010, 531) note that 'the idea of the VOS mapping technique is to minimise a weighted sum of the squared Euclidean distances between all pairs of items. The higher the similarity between two items, the higher the weight of their squared distance in the summation. To avoid trivial maps in which all items have the same location, the constraint is imposed that the average distance between two items must be equal to 1'. As a final part of the mapping process, VOSviewer then rotates, reflects or translates potential mapping solutions until it generates a consistent result. VOSviewer will also cluster the nodes into colour-coded groups using a form of single-link clustering analysis that works in a similar way to the measure of association strength (Waltman et al. 2010). A detailed description of VOSviewer and its processes can be found in van Eck and Waltman (2010), van Eck et al. (2010) and Waltman et al. (2010), along with specific mathematical formulae for determination of similarities and mapping.

A significant advantage in using VOSviewer to map bibliometric networks is the degree to which users can control the visualisation process. VOSviewer allows the user to specify a threshold minimum number of citations made to any node (author, source or document) before it will be included in any map, and nodes are also scalable in size on the maps to reflect the number of citations received. The clustering resolution can be adjusted to increase or reduce the number of clusters identified, and a minimum number of nodes within a cluster can also be specified as a threshold prior to the identification of any cluster; Van Eck and Waltman (2010) urge users to adjust the clustering resolution so that the final network map contains a stable set of clusters that make sense to a knowledge domain specialist. The viewer controls within VOSviewer also allows users to specify the number of edges between nodes that will be mapped and to determine whether these edges represent the normalised links between nodes, where normalisation takes into account the potentially enormous difference in citations that might be made between sources containing the greatest number of documents (i.e. major journals and the articles published within them) and those containing fewer citations. The viewer allows maps to be enlarged or reduced in size to facilitate the inspection of a particular part of the map, while it is also possible to interrogate the map to find the position and relationships of any particular nodes of interest.

Internet Archaeology is an open access journal based in the Department of Archaeology, University of York. Except where otherwise noted, content from this work may be used under the terms of the Creative Commons Attribution 3.0 (CC BY) Unported licence, which permits unrestricted use, distribution, and reproduction in any medium, provided that attribution to the author(s), the title of the work, the Internet Archaeology journal and the relevant URL/DOI are given.

Terms and Conditions | Legal Statements | Privacy Policy | Cookies Policy | Citing Internet Archaeology

Internet Archaeology content is preserved for the long term with the Archaeology Data Service. Help sustain and support open access publication by donating to our Open Access Archaeology Fund.

5.1 Mapping disciplines using VOSviewer