The extensibility of XML allows those encoding material to define their own elements and define document structures according to their specific needs (Ross et al. 2004, 63). XML is an international open standard for data exchange that is independent of vendor, application and platform. As XML uses Unicode, it is also language independent. Users are not tied to the use of specific hardware or proprietary software (Geser 2003, 30).
This extensibility and structured nature of XML allows it to be used for communication and interoperability between different systems, which would otherwise be unable to communicate (Daly 2004). XML defines the content of a document separately from its formatting, so that documents can be stored in one format in one place, reformatted, restructured and distributed through a variety of channels with minimal effort according to different needs and uses. Presentation of text can be automatically adapted to the capabilities of different publishing media. The same content can be transformed from XML into plain text, HTML, and other fomats, such as PDF, and also tailored for the Web, e-mail, text, handheld, wireless devices, and print (Daly 2004). XML can also be used to create new languages, and a number of industries and disciplines have developed their own, for example, the Wireless Markup Language (WML) used to encode Internet applications for handheld devices, the Geography Markup Language (GML) and Electronic Thesis and Dissertation Markup Language (ETD-ML).
The neutrality, expression and plain text format of XML has advantages for digital archiving and preservation as XML content representation and management 'will enable heritage institutions to make effective long-term and varied use of their information assets' (Ross et al. 2004, 43). However, the XML format alone is not a complete preservation strategy as there is still need for migration of media and transformation of application systems in the future, as well as associated stylesheets or scripts to extract, output and transfer data, as well as a processor and parser to transform and present the data (Ross et al. 2004, 61). The issue of XML and preservation has been the subject of a number of detailed analyses and reports, such as those by the Dutch Digital Preservation Testbed Project (2002), ERPANET (2002) and NINCH (2002).
XML is often referred to as 'self-describing'; although a computer does not know the meaning of a specific element in a document without the aid of a processing programme, the element tags are human-friendly as they are meaningful, readable text as opposed to code. Whilst these tags describe the data they enclose, they do not constrain how the data will be interpreted; however, how material is encoded does constrain how it may be used (Ross 2003, 8). As they comprise plain text, XML documents can be written in any simple text editor, such as Microsoft Windows Notepad (see 3.3.2).
Whereas two to three years ago XML technology was an emerging and little-used standard, it is gaining in acceptance and the cultural heritage community is developing a range of applications which are dependent upon it (Geser 2003, 30). Ross et al. (2004, 42) see that XML has become a major force in the world of information management, but note that it 'is not a solution in itself, but a new way of approaching content structuring and reuse'; 'the uses to which XML can be put in the reuse and repurposing of content are substantial'.
XML improves the functionality of Web technologies through the use of a flexible and adaptable means to identify information. In discussing the benefits of the Semantic Web, Ross (2003, 8) sees that 'by adding descriptive information to content and resources, and representing both the descriptive information and the content in well-defined, consistent, and structured ways, "mechanised agents" could be enabled to use Web information "intelligently"'. XML files can be searched across, combined, and manipulated in far more powerful ways than HTML permits. Because of XML's very strict syntax, the Web browser can be smaller and faster as there is no need for tolerance code; this is good for smaller handheld devices.
However, the use of XML requires skilled staff. Inconsistent markup will return misleading results. The more information added to a document, the wider the uses it can be put to. However, added markup requires additional resources and may need to be justified in terms of added value. Encoding a document has its related costs, as does the underlying analysis required to make this happen. Furthermore, whilst this encoding may result in significant advantages for producers and users, these benefits may be long-term rather than immediate (Ross 2003, 9). At present, encoding cannot come from a computer alone; human input is essential (Bosak 2001; Walsh 2002). As Castro (2001, 11) notes 'whilst XML demands a bit more attention at the start, it returns a much larger dividend in the end. In short, HTML lets everyone do the same things, but XML lets some people do practically anything'.
Some have expressed caution in the use of XML; they see it 'surrounded with hype and ambitious publicity' (Warwick and Pritchard 2000). A particular disadvantage at present is the limited support for XML technologies in Web browsers. Although all the latest versions of Internet browsers support XML, not all do so to the same extent; only some support XSLT, and many computers still use older versions of browsers (Ross et al. 2004, 64, see also 3.4).
© Internet Archaeology
URL: http://intarch.ac.uk/journal/issue17/5/gf3-2.html
Last updated: Wed Apr 6 2005