Internet Archaeology Issue One - Editorial

Alan Vince

Cite this as: Vince, A. 1996 Editorial, Internet Archaeology 1.

1 Drowning in Data

The publication of the results of archaeological research has occupied a great deal of my working life as an author, an editor and as a user of archaeological publications. It has always seemed to me that this is the weakest part of our discipline. We spend our time revealing how our ancestors lived, a topic of clear interest to every man, woman and child, and yet our professional publications are widely acknowledged to be daunting and dull, even to other professionals. The reason, in my opinion, is that we are drowning in our own data.

2 Why Publish?

Archaeological research is painstaking. One of society's stereotypes of the archaeologist is of someone at work with a paintbrush gently brushing away the sands of time. Perhaps in the background of this image might be other people with clipboards, theodolites, tapes and so forth. There is more than a little truth in this image, certainly more than in the other standard image - that of Indiana Jones risking life and limb to fill western museums with the artistic treasures of past societies. The real drama of archaeology unfolds slowly, both in the field and afterwards in the laboratory and study. It is generated by the application of archaeological theory to a set of observations and is rarely conclusive. How we arrive at our conclusions is as important as what those conclusions might be. It is what separates us from those for whom Stonehenge was a landing stage for UFOs or the civilisations of Mesoamerica were founded by colonists from Ancient Egypt. We need, therefore, to describe how our data were collected and how we approached their study. We also need to make our data available to others. The remains of past societies are a dwindling resource and we cannot assume that it will always be possible to advance our knowledge through the systematic destruction of the evidence. There are numerous sites which have been "dug out", where the only chance of furthering our knowledge of them comes from reappraisal of previous workers' records and publications. Inevitably, in future a greater and greater contribution to our discipline will come from this secondary research. There are therefore two main reasons for publishing our research. Firstly, to present new knowledge and, secondly, to facilitate future research.

3 Presenting New Knowledge

3.1 Previous Research

To advance knowledge we must have assimilated previous research. This need not involve an exhaustive survey of past work but it is an insult to our predecessors to ignore their work or, worse still, claim credit for ourselves which rightly belongs to others. Publication of research must therefore involve a limited amount of repetition of already-published information to set the scene. In print this is wasteful but if the author refers readers to another work instead then this work must be generally available. Such has been the rate of change in archaeological knowledge that there are many areas of the discipline where there are no up-to-date textbooks which can be used to set the work in context.

3.2 Background

It is also useful to know why the research was undertaken. In some cases this is implicit. For example, a site might be excavated because it was to be destroyed. More often, however, the reasons are more complex. The researcher might be testing a theory or be constrained by external factors (usually lack of funds). As the research progresses, however, it tends to generate its own momentum. Discoveries lead to further questions or even complete changes in direction. This background information is important not only to allow other professionals to judge the results but because it helps to make the activities of archaeologists more accessible to the general public. Nevertheless, much of this background information will inevitably be repetitive and dull.

3.3 Methodology

The methodology employed in a piece of research must also be described or reference made to a published description of the methodology. If the methodology is not standard then readers need to know why it was adopted. Other professionals may well want to know the authors' conclusions about the methodology. Most researchers end up wishing they could start the whole project again - and this time do it properly! Several archaeological laboratories have published their own methodologies, or follow other published standards. The less "scientific" archaeological organisations in the main have not made their work practices available in print and the commercialisation of archaeology seen on both sides of the Atlantic in the past decade in fact inhibits the sharing of information which might be seen to have a commercial value. Nevertheless, I know of at least one case where an author was unfairly criticised by a reviewer for not following best practice where that author had actually adopted current methods but had not made this explicit in the publication. Dull it may be, but information about methodology, like other background detail, must be accessible to the reader of a publication.

3.4 Results

In a piece of academic writing the author will always have to satisfy two very different potential readers. At one extreme will be the casual reader who just wants to know conclusions and at the other will be the fellow professional who will want to know exactly what the justification is for a particular statement. Trying to keep both these readers - and all those in between - reading a single piece of text is impossible. One or other will be disappointed. Successful archaeological authors steer a course between these two extremes but the more "popular" a piece of prose becomes the more professionals will criticise it. One solution, the most common, is to split the work into two: a "popular" publication and the academic report. To my mind, whilst there are undoubtedly cases where this can be justified in the main this tendency acts to the detriment of academic publication, since the authors can effectively abandon the non-professional and address their work only to their colleagues.

It seems therefore, that the author of even the simplest archaeological publication is faced with decisions which could lead to a cautious, even turgid result. Furthermore, the same professional critic who would damn a piece of work for omitting to set the scene, describe the background or whatever will him or herself complain about the length or lack of excitement in the same work. Clearly, the demands placed upon the author of an archaeological publication have themselves been partly responsible for the present state of publication within the discipline.

4 Facilitating Future Research

These days, most archaeological research is computer-based. Whereas in the past a researcher would have written out index cards when faced with a catalogue or table, a researcher's first inclination today will be to rekey this data into a database, spreadsheet or GIS (Geographical Information System). However, this data almost certainly originated in a computer in the first place. Assuming that the researcher wants others to benefit from his or her research (whilst getting as much credit as he or she can from the work first), it makes sense to publish this data in a computer-readable form. Finding a satisfactory means of disseminating archaeological data has occupied a number of archaeologists for several years and leads me to a consideration of the split between archive and publication.

4.1 Archive and Publication

This is not the place to recap the history of the development of the concept of archaeological archives. Suffice to say that the recognition of the potential of the site records from archaeological fieldwork has grown considerably in the past few decades although it would be fair to say that because of the inadequacy of most existing archaeological archives there are very few which actually fulfil their purpose adequately. It has been standard practice, in Europe at least, to attach these archives to museums, since artefacts are one of the main records left from an excavation or piece of fieldwork, and it has been the general experience of European museums within recent years to be both starved of funds in general and to have shifted funds away from collection management towards display and education. Unfortunately, archaeologists have at the same time been making greater and greater demands of museums. This is a result of advances in methodology which generate more records and lead to the retention of more finds and samples.

The notion that archaeological publications were, or should be, hierarchical led to the development of ideas of the appropriate level for data of certain types. Professor Frere, for example, chaired a committee to consider the publication of archaeological data in the UK which devised a four-tier system:

Level 1
The Raw Data.
Level II
Site Records. To be ordered, indexed and then deposited in an archaeological archive.
Level III
Full Reports. To be deposited in an archaeological archive but perhaps disseminated through multiple copies.
Level IV
Synthetic Reports. To be published.

Several modifications of this scheme were made over the years of which the definition of a research archive is the only one that need concern us here. In a report published by English Heritage, The Management of Archaeological Projects, 2nd Edn, Gill Andrews made a distinction between the original site archive (site being used here to include any piece of archaeological fieldwork, not just excavation) and the additions to that archive made as a result of post-excavation analysis. It was recognised that it was not possible for researchers unfamilar with a project to use the site archive whereas the research archive should be accessible. It was also recognised that site archives would not transform themselves into research archives and that funds would be needed to make them accessible (plus further funds to curate the archive afterwards).

Attempts to disseminate archaeological data therefore have had to be fitted into this conceptual framework. The earliest solution was to disseminate data in printed form, either as photocopied "grey literature" or through the use of microfiche. This was followed by attempts to make electronic data a part of site archives. This attempt has not had any noticable success, mainly because of the difficulties in curating electronic data. The third method proposed was to disseminate electronic archives on disk or CD-ROM. This is at the time of writing the preferred option of most archaeologists in the UK. However, as one might expect from the editorial of an electronic journal on the World Wide Web, I believe there is a better way.

5 Archaeological Publication on the World Wide Web

Such is the pace of change with the development of the World Wide Web that its history must be recounted in months rather than years. Rather than attempt to chart the impact of the WWW on archaeology (which I hope will be the subject of forthcoming papers in this journal) I will give here an account of my own conversion to the cause, the thinking behind the choice and realisation of papers in Internet Archaeology and - a hostage to fortune - my current ideas of how the journal will develop and how archaeological publication will be affected by these technological changes.

5.1 The Early Months - Lynx and NCSA HTTPD

From 1988 until 1995 I was employed by the City of Lincoln Archaeology Unit, initially to produce an ordered archive from the records of twenty years excavations in the city and then to oversee the production of a series of related studies based on the analysis of the resulting research archive. Computers were a fundamental part of these projects and by 1995 the unit had a network of PCs running Windows for Workgroups linked to a Unix fileserver running Interactive Unix. Connectivity was achieved using the Samba package.

From 1991 onwards, a modem was attached to the system, initially to connect this network to another Unix box based elsewhere in the city. Using a series of programs written in C and Bourne shell scripts it was possible for any user on the network to call up any data relating to the excavations but these scripts had no front end. They were all called up from a C-shell command line and required a considerable amount of knowledge on the part of the user since, like most Unix programs, they suffered from "galloping featurism" and impenetrable documentation. What we needed was a front end which would run on a Unix fileserver's console, in a Unix window on a Windows PC and directly from Windows.

At that time, my main advisor on the development of the system was Paul Tyers and I had given him a general brief to trawl through the world's computer networks and find public domain software to implement this front end. Several systems were tried during the early 1990s including an early version of lynx, which together with the NCSA HTTPD and a hypertext tutorial written in HTML should have been enough to show the way. However, at the same time we were given our first introduction to GIS, in the form of Dominic Powlesland's G-Sys software which ran on MS-DOS. The idea of using a map-based system as the front end of our electronic archive was very tempting and for some time therefore attention was taken up with the development of a GIS interface whilst the development of the Unix filesystem's front end was shelved. In the main the reason for my not seeing the potential of the Web was that lynx was not graphical and gave no obvious advantage over the home-grown scripts when used on a Unix console or a Unix window on a Windows PC. The breakthrough came with the arrival of the first graphical browsers at the unit (for which thanks are due to Armin Schmidt and Paul Cheetham at the University of Bradford).

5.1 Experiments at CLAU

Like most Web converts, it was the glossy froth which first caught my imagination when using the early Mosaic and Netscape browsers. The ability to control fonts and to interleave images with text gave the system immediate impact and a much greater impetus to make the necessary changes to the scripts to make them work with the Web. The next breakthrough was the introduction of perl and the finding of an archive of perl scripts which included one to translate from troff codes, used in coding up the CLAU draft publications, to HTML. The latter script was used as the basis of one which took a standard Lincoln archaeological site report and a series of text files which described the detail and the availability of archive records of each context group and produced from this input a coherent series of Web pages (albeit in hundreds of separate files and impossible to maintain). A text-only version of one site was mounted on the University of York Web server (thanks to Julian Richards) in March 1995 and although it attracted no comments was sufficient to convince me that the Web was undoubtedly the best answer to date for how to make complex and detailed archaeological data in a way which was both user-friendly and infinitely adaptable to the requirements of the reader. All the detail was present, right down to the archive numbers of individual site records, but it was unobtrusive. The casual reader would not be put off by the fact that certain key words were highlighted and formed the jumping off point for anyone wanted to be immersed in the archive. Four months later I was given the opportunity to put some of the many ideas flying through my head as a result of these experiences into practice when I was offered the post of Managing Editor of Internet Archaeology.

5.2 Guaranteeing Quality

One of the features which attracted me to the Internet Archaeology project was that it had been set up so as to ensure that the content of the journal was of a consistent, high quality. As any user of the Web in the mid-1990s will know, the Web is not at present known for the quality of its content. Search engines can very quickly search through millions of records for any term submitted but much of what is thrown up is clearly of little value. With archaeological information there is another, related, problem. What, in the last resort, tells users that they can trust the information they are being given? In the world of print, we have a distinction between academic publication, which operates its own series of rules, and the rest. This does not guarantee that the information is correct, after all there are many cases in archaeology where new data or new paradigms effectively demolish cherished beliefs, but does give an official seal of approval. Internet Archaeology was set up to buck the Web trend of instantaneous, uncontrolled publication. It set up a series of advisory panels, including an Editorial Board, and the members of these panels use the Internet, both through email and the Web, to deliberate over whether a paper should be published in the journal and to appoint referees, who themselves must have, or obtain, Web access in order to do view each paper. There is a libertarian viewpoint, which is that publication on the Web should be completely uncensored, and that peer review is in effect censorship. However, I myself have no qualms over acting as such a censor. After all, we are not barring other work from the Web, merely refusing to give it an air of authority by including it within our journal.

6 Setting up Internet Archaeology

For six years I was on the Editorial Board of an academic journal, Medieval Archaeology, published by the Society for Medieval Archaeology. From this experience I knew that it is one thing to have editorial policies and grand ideas but quite another to actually get the authors you want to both offer a paper and then to produce it. The first few months in post were therefore spent both setting up the software and hardware needed to publish on the Web and in actively hunting for suitable authors. My experiences at Lincoln had convinced me that this medium was the solution to some of the main problems of archaeological publication but without any examples of what I wanted it was difficult to persuade others. Furthermore, the advantages of Web publication would only really become evidence when the amount of information available online had reached a critical mass so that one could seamlessly jump from a site report to the study of the environmental data from that site then on to related papers then back to the site report and so on. Large projects which would themselves contain the sort of complexity I was after could not be expected to immediately drop their publication plans in favour of a new and untested medium whilst small ones were likely to elicit the response that we were using a sledgehammer to crack a nut. Two major projects were quick to take up the offer of having Internet Archaeology as a element of their publication strategy - the Souks excavations at Beirut in the Lebanon and the Heslerton Parish project in the Vale of Pickering, Yorkshire. However, neither of these projects had even finished the excavation stage at that time. Reluctantly, therefore, I set aside the idea of having the first part of a major excavation project into the first issue of the journal even though I was, and remain, convinced that this in the long run is the area of archaeological research which will benefit most from the Web. Once again, I turned to Paul Tyers, who had produced for his own amusement a Web version of his forthcoming book on Roman Pottery in Britain which was very impressive, and asked if he would agree to place a part of this document in the journal. We agreed that the best plan would be to take a single chapter and rework it so that it would be self-contained. The first version of this work was placed on the Internet Archaeology web server in October 1995 and was well received, being publicised through various email lists, by word of mouth and, gradually, by the inclusion of links to the paper, or the journal's Home Pages, in Web hotlists. In many ways this paper set the tone for the rest of the issue. Prospective authors looked at this paper and if there were sufficient similarities to their own work, or if their imaginations could make the leap to translate Roman amphoras into clay pipes or metallurgical analyses, then they offered papers. The visual style of Issue One too was heavily influenced by the Amphoras paper. Once there was a paper to react to potential authors could either accept its style or try to improve upon it. Guidelines to authors were initially very loose, so as not to inhibit innovative ideas, but a series of guidelines emerged. We have tried to steer a middle course so as to produce work which could be downloaded and read off-line but where the added value comes from the possibilities presented by the Web.

6.1 Hypertext

To a newcomer to the Web one of the most exciting aspects is the ability to jump from one paper to another. One of the things I remember from my own first experiences was the wonder I felt at being able to travel virtually from the UK to the USA then off to Australia then to Germany and so on. Undoubtedly, this revelling in the possibility of using hypertext is a stage which all new Web authors go through and it is only gradually that this wears off and is replaced with a more mature attitude. I have tried to think of the remote reader waiting for several seconds or even minutes for a new page to download and to ask myself whether this reader would think that the link was worth waiting for. On the other hand, since most readers will be reading the papers from a computer screen and since most of these screens will have 20 or so lines of text visible at any time I have not hesitated to break long blocks of text into smaller sections so that the pages will download quickly. My aim, rarely achieved in practice, has been to make the papers hierarchical to allow readers to scan the whole content in a few minutes then to delve deeper where their interest is aroused and finally to jump right into the detail when they are certain that this is what they want. Just as importantly, however, I have wanted readers to be able to find their way back to the top of a paper. To do this we have used navigation bars.

6.1.1 Navigation Bars and Icons

Go forward to next page Go forward to next page
Go back to previous page Go back to previous page
Go to more detailed page Go to more detailed page
Go up to less detailed page Go up to less detailed page
Go to this paper's Table of Contents Go to this paper's Table of Contents
Go to the Internet Archaeology Home Page Go to the Internet Archaeology Home Page
Submit comments Submit comments

We have put navigation bars at the bottom, or sometimes top and bottom, of each file. These contain a variety of icons (or words, if using a non-graphical browser). Clicking on the "forward" or "back" buttons takes the reader through the paper in a logical manner. With long and complex papers there may be parallel series of routes, for example to look at all the tables or figures one after the other. Where there is a possibility of branching off to look at a topic in more detail this is indicated by the "down" button and, once in a page of detail, the "up" button takes the reader back to the main thread. Each paper also has a link in the navigation bar to the paper's Table of Contents and to the Internet Archaeology Home Page. I think readers soon tire of treating the Web as an adventure game or maze and simply want to get to the information they hope or believe is there as quickly as possible. If the first issue of Internet Archaeology does not fulfill this promise then we have a mechanism - email - whereby they can make their views known.

6.2 CGI Scripts

Following my experimental work at CLAU I was keen to see papers in Internet Archaeology access archaeological data directly, rather than dead copies of this data, as happens in print publication. Pressures of space, and a lack of knowledge as to what readers expect from a publication, lead in most cases to the final publication of an archaeological project containing only a small fraction of the analytical results. Granted, this does lead to a sharpening of focus on the part of the author who has to consider exactly what points he or she wants to make but in the main the decision is taken on grounds of cost - both cost of production and cost to the final users. There are several cases in the UK of valuable work being published but at such a high price that even academic libraries think twice before purchasing copies. This type of information - which can conveniently be summarised as being catalogues, diagrams, maps and tables is ideally suited to Web publication. One of the features of the first issue of Internet Archaeology of which I am very proud is the series of programs which serve up data according to criteria supplied by the reader. They are by no means perfect - even with today's technology there is much more that we could have done - but they allow readers to follow their own lines of enquiry, for example to find what source of information was used to create the distribution maps of Roman amphoras or tobacco pipe kilns, while not delaying the downloading of this data so much that less inquisitive readers will be put off.

One consequence of using CGI scripts rather than pre-prepared HTML and images is that it is not possible to use these aspects of the papers off-line. The text itself can be downloaded, saved on disk or printed out. It can even, for personal use, be imported into databases or other computer systems but the data accessed by the CGI scripts is not accessible to the reader. Whether this is a useful feature - encouraging on-line access to the data - or a impediment to the use and re-use of the data remains to be seen.

6.3 Intranets and the Internet

Although at the time I did not know it, what was created at CLAU was an example of the Intranet - the use of Web software on a local area network. It is clear that in the next year or so all major software producers will have made their products Intranet-friendly, either by writing viewers which allow remote users to use text or data files in their native format (as with Microsoft's Office range) or by producing add-ins which allow users to export their data in formats currently supported by the Web (again, as with Microsoft). Access to these Intranets will depend mainly on the policy of their owners and on the cost of linking the Intranet to the Internet. The organisations which will hold archaeological data in this form are in the main going to be government-funded institutions, museums and commercial archaeological units. Individual researchers too may be able to publish their own work, depending on the pricing policies of the ISPs. At the time of writing, for example, it seems very likely that several megabytes of webspace will be available free to most users of the Web. What effect will these two developments have on the use of the Web for archaeological publishing? Firstly, it seems clear that a lot of the background information I have just described will be available on-line. It is bound to be ephemeral, however. As an individual, I could publish my own PhD thesis on the Web using the webspace provided by my ISP. As soon as I fail to keep up my subscription to that provider, however, that resource would disappear. Even commercial organisations have a finite life, especially so in archaeology. Whereas grey literature produced by a commercial organisation might survive in copyright libraries or other archives, Web-based documents would not. One answer to this would be for these individuals and organisations to lodge their data with a databank, such as the Archaeology Data Service. Internet Archaeology itself uses this service so that even if the journal itself ceased to function the existing issues would be available.

Until the long-term future of other web-based information is secured by such means Internet Archaeology has a policy of not making hypertext links to these external sources but instead of asking for permission for copies of the external data to be mounted on the Internet Archaeology Web server. There will come a time, perhaps within a few years, when such sources are stable and at that point I would expect papers in Internet Archaeology to refer to those databanks rather than make dead copies of the data. Indeed, if papers are submitted to the journal now where the authors have lodged their data with this data service then the journal will make links to that data directly. One would also expect permanent archaeological archives - such as those maintained by museums - to develop on-line services quick quickly and similarly, once they ahve demonstrated their stability, the journal will link to them. This seems to me to be the ideal we should be striving to achieve. Fieldwork and research is carried out using computers for data storage and manipulation. That data is then deposited as an on-line resource with either an independant databank or with a data service run by an archaeological repository and journals such as Internet Archaeology then present the results of analysing this data, linking these results back to the live data. In time I am sure that a number of such services will exist but whether they will be state-funded, subsumed into libraries, run as commercial enterprises or survive by some other means altogether remains to be seen.

6.4 Java applets

A development which arrived too late for the first issue of Internet Archaeology was java support, now built in to the leading Web browsers. Java applets run on the users machine not on the remote server and there are several cases where data which at present we serve through CGI scripts or HTML tables would be better served through Java applets. In the case of small databases the entire database could be downloaded as part of the applet and queried on the user's own computer. If we desired it, it would be possible to download the class files to the users hard disk for permanent local use. The larger databases, such as the ABCD and the Roman Amphoras in Britain distribution data, can occupy several megabytes of disk space. To give an indication, that for the ABCD occupies approximately 3.5Mb whilst that for the Amphoras distribution data is only 157Kb). Unless bandwidth in radically increased or much higher compression rates achieved there seems to be no alternative to housing such databases on the server. However, more complex queries could be constructed and validated locally using Java applets and after that the request sent back to the Web server.

6.5 Portable Document Format Files

Another possible development would be to serve the Internet Archaeology papers as PDF files, using Adobe's Acrobat viewer, which most Web users will already have on their computers. Since the cost of PDF writing software is rumoured to be tumbling this too is a development which may well be implemented, at least on a trial basis, in a future issue of the journal. Here too we are trying to steer a middle course between two extreme (and equally defensible) positions. On the one hand, we want to treat our papers as structured data which we sometimes decide to store and serve as text files and sometimes through scripts. We leave the user to deal with the way in which that data is assimilated (on-line, off-line, using a browser, printed out, reformated or whatever). On the other hand, it is clear that as readers/consumers we are all heavily influenced by the context in which we come across information. Particular page layouts and typefaces themselves convey messages, and as publishers and authors we may want more control over the appearance of the paper.

The markup language itself, HTML, is itself evolving through the addition of new tags and the discarding of unsuccessful ones. Once features such as frames become a standard feature of all Web browsers we will begin to accept papers which use them. In the same way, the ability to control the colour of text and the use of multi-column formats might have a place (although personally I would take some convincing that they do really help readers to get the most out of the Web).

6.6 CD-ROM

It would be possible to put copies of Internet Archaeology onto CD-ROM. They would then be available for users with no access to the Internet (or no wish to spend large sums of money on telephone bills). However, as stated above, it would not be possible to get the CGI-scripts (which are written in Perl 5), to run from a CD-ROM without giving the users advice on how to set up a local web server, how to obtain and install Perl and so on. Rewriting the entire first issue in Java is a possibility - though probably one requiring extra funding, and therefore a demonstrable need and market.

Here, I believe, the journal will be market-led. If there is a high demand for the text, images and data on CD-ROM then the technicalities can be sorted out. If readers are happy in the main to use the journal on-line then this will continue to be our policy.

7 Conclusion

In this editorial I have given a personal view of why a journal like Internet Archaeology is needed and how it might develop in the future. As with all aspects of the futurology of computing, the truth is that we really have no real idea of what lies in store for us or what the pace of change will be. The next few years are going to be exciting but nerve-wracking times.


© Internet Archaeology URL:
Last updated: Wed Sep 11 1996