2. Web services, XML and Archaeology

2.1 What is a web service?

The term 'Web services' can be defined as a standard for a particular set of XML-based technologies (Glossary) used to build Service Orientated Architectures (SOA) (Glossary). It is not the purpose of this article to explain in depth the technical nature of these components, rather to illustrate the virtues of an SOA and how we as archaeologists can make use of this technology.

As mentioned above, distributed computing isn't new. Traditionally, developers have used Remote Procedure Calls (RPCs) (Glossary) in systems to perform various actions on distributed objects, using protocols such as COM/CORBA/RMI. Although a powerful concept, this involves several rather complicated steps: a request to create an object, a request to invoke a method on/of the instance of that object, a request to receive the response based on a number of these exchanges and finally the need to release the instance of the object (Vogels 2003) .

More often used within academic information systems, Z39.50 is another protocol that can be used to communicate data between remote systems (Miller 1997). However, this too is a cumbersome and relatively static technology. Z39.50 requires a dedicated Z39.50 server acting as a host to a number of Z-targets. Each of these targets need their database and related connections aligned to the Z39.50 protocol by using what are known as 'profiles' held on the server. A Z-client or gateway can then hook into the server, request a query and await results. Aside from being notoriously complicated to implement correctly (as experienced first hand from working on the ARENA implementation), the use of specific server-side profiles to link the server to its related data sources thus present a somewhat tight coupling between the various remote systems.

A web service is not directly a web-based implementation of the Remote Procedure Call protocols mentioned above, or indeed a technology designed as a replacement to the dated Z39.50 protocol. A web service is, however, a much simpler protocol that offers greater flexibility for achieving the same goal – the retrieval of information from a remote system.

Fundamentally, a web service is a standards-based document-centric messaging system that allows discovery and dissemination of information via, among other things, the Hyper Text Transfer Protocol (HTTP). To achieve this the Simple Object Access Protocol (SOAP) (Glossary) – a messaging protocol defined in XML – is used. SOAP is essentially a technology that creates a wrapper for information, enveloping the message content within a well-defined XML structure. Various SOAP formatted headers are also included within the structure to allow further control of how the message should be handled. By communication through this text-based document model, platform-specific independence and a loose coupling between all parties involved in the various transactions is largely achieved.

Unlike Z39.50 where a profile needs to be explicitly created on the server, one of the key concepts behind web services is that of discovery. To achieve this, a description of the service is written using the Web Service Description Language (WSDL) (Glossary) – another XML-based language. The WSDL descriptor contains information pertaining to what operations the service provides, and what input and output messages are associated with those operations. Further, this document can be made available to the consumer either directly or by publishing to a Universal Description, Discovery and Integration (UDDI) service (Glossary).

2.2 What type of information can be found?

The content contained within a SOAP message is normally a well-formed XML structure itself. RPC-style directives – such as calls to various methods on a host system, large datasets returned as a response to a request, or even complete Java classes realised in XML, are all valid content for a SOAP message. However, it is the ability to communicate XML-formatted datasets which is of primary interest to archaeologists. Indeed it is these datasets, either as a subset or in their entirety, that are often sought by the majority of archaeologists conducting online research.

For an SOA system to be truly effective, the use of various XML schema are considered necessary to ensure that all parties can design their systems to be interoperable. XML schema are a way of dictating the structure of an XML document. Documents written to comply with specific schemas can thus be validated to ensure data integrity and coherence without the need for human interaction. The use of the XML language to present structured data is becoming an increasingly popular option within the field of archaeology. Indeed, recent studies such as Falkingham's (2005) investigation of how archaeological grey literature could benefit from being encoded as XML, serve to illustrate the potential flexibility of this format. Written to comply with specific archaeological-specific schema, such as MidasXML as proposed by the Forum on Information Standards in Heritage (FISH), archaeological data can be prepared in such a way that lends itself directly to the concepts that drive the SOA model.


© Internet Archaeology/Author(s)
University of York legal statements | Terms and Conditions | File last updated: Tue Sep 27 2005