Internet Archaeol 15. Nick Ryan. WWW database access

3.1 WWW database access

Client-side methods using scripting languages such as JavaScript (ECMAScript) that are embedded in a web page and executed by the client browser do not normally provide any direct support for database access. However, they may be used in database applications, for example, to provide input validation before requests are submitted to the server.

Direct access to a database from the browser is possible using Java Applets. These are self-contained programs written in the Java programming language that are normally stored on the web server and delivered to the client browser on demand. The browser executes the program locally and may send requests to remote database servers over the network. Database access from a Java program normally uses JDBC, an OO programming interface for interaction with any relational or OR DBMS for which a driver module is available. Applets are typically much larger than simple HTML pages so take longer to load. To improve performance, they are often cached by the browser. Web servers that deliver applets or other executable programs are often referred to as Application Servers.

This approach has several benefits, not least that it shifts some of the processing load from the server to the client machine where spare capacity is often available and so reduces the load on the server. However, differences in browser capabilities and the rapid evolution of the Java language mean that many users may not have a sufficiently up-to-date browser to handle all applets. For some this is not a problem and is an incentive to upgrade their browser. Other users, particularly those in institutional environments, may not have sufficient control of their machines to make such upgrades.

Most web-based database access techniques use server-side approaches where the application elements that interact with the database are located on the same machine as the web server. The database server may, of course, be on yet another machine; such systems are often said to have a 3-tier architecture (Fig. 6). The term 'middleware' is often used to describe the components that lie between the web client and database servers ³.

Figure 6: Outline architecture of a 3-tier Web-based data access system.

Figure 6: Outline architecture of a 3-tier Web-based data access system

The first approach to gain widespread acceptance was the Common Gateway Interface, or CGI (Gundavaram 1996). This is a specification that defines how a web server can invoke a program and pass parameters to it. HTML forms typically include a number of entry fields that the user can fill in, and a 'submit' button used to pass the content of these fields to the server. Pressing the button causes the browser to send a request to a specified URL (which indicates the required program) with the field values encoded either by appending them to the URL or as part of the message body. When the web server receives the request, it extracts the program name and parameters then starts the program with the supplied parameters.

A CGI program generates an HTML page by writing text to its output. The web server collects this output and returns it to the client browser. In a database application, the program would typically perform various queries against the database and format the results as an HTML page. CGI programs can be written in any programming language, although Perl ⁴ has proved particularly popular and there are many add-on libraries to perform database access to different DBMS. Where the higher performance of a compiled language has been required, C and C++ have also been widely used. In these cases, database connectivity has typically required ODBC or DBMS-specific libraries.

CGI provides an effective, if somewhat primitive and inefficient, method of allowing a web server to execute programs and interact with databases. Each time the server receives a request for a particular CGI program, it must start a new instance of the program. If it is a database application, each instance of the program needs to make a new connection to the database server. These operations, starting a program and connecting to a DBMS, are both very time consuming and, for simple queries, may take much longer than executing the query and returning its results.

Java servlets represent a more recent approach that overcomes these limitations. Servlets are executable Java programs that can be invoked by the web server in much the same way as a CGI program. Database applications use JDBC to connect to the database server and servlets typically output results as an HTML page to be returned to the client. At the first request for a servlet, the web server starts several instances which then continue to run and service requests as they arrive. Because they are always running, servlets can maintain open connections to the DBMS, so they avoid the start-up delays of CGI programs. Unlike CGI programs, however, servlets can only be written in the Java language.

Both CGI and servlet approaches can deliver wholly dynamic web pages. In other words, no content need be stored on the web server, but each page can be generated on-the-fly by the invoked program. In practice, common elements of web pages may be stored as HTML fragments in local files loaded by the programs, but the URL requested by the client points to the executable program rather than a static web page.

Embedded, or server-side, scripting languages offer a different approach. Here, something like a conventional HTML page is stored on the web server. These pages contain additional program code that is interpreted by the server or, more often, by an optional plug-in component. This code, when executed, either writes HTML to its output or replaces itself by inserting additional HTML into the page before it is delivered to the client. Microsoft's Active Server Pages (ASP) provides an environment within which such scripts can be written in Visual Basic or other scripting languages and ODBC can be used for accessing databases. A popular and more portable solution is offered by PHP, a scripting language that can be used with most web servers. It is widely considered to be easy to learn and supports access to a large number of DBMS.

Although well suited to delivering relatively simple dynamic web pages, the server-side scripting approach can become unwieldy with increasing application complexity. The main reason for this is the way in which HTML content and executable code are mixed in a single source page. The problem is particularly acute when content and code may be written and maintained by different people, probably with quite different skills. Both may have to deal with unfamiliar material mixed in with their parts of the page, and there is a real danger of confusion and that one person may damage the other's work.

Several more recent frameworks attempt to overcome this problem by a clear separation of these different components into separate source files that may be maintained independently, possibly by people with quite different skills. Important examples are JavaServer Pages (JSP)⁵ and Extensible Server Pages (XSP).⁶ Both are intended to be used in a Java servlet environment and, effectively, by embedding code fragments in the pages, allow the programmer to concentrate on core behaviour rather than needing to write complete, self-contained, servlets.

A key feature of these recent approaches is their extensive use of the Extensible Markup Language (XML) . Unlike HTML in which the meaning of each tag or element is predefined, XML is a meta-language in that it provides a way of writing special-purpose markup languages. Indeed the proposed future version of HTML, called XHTML, is one such language. The many applications of XML range from system configuration files, through data wrapping to text markup. Whilst all of these applications are likely to become increasing important, data wrapping is of particular significance to data-intensive systems. As an example, an XML-based language might be defined for the exchange of excavation data. Using the earlier example database schema, information might be exchanged in something like the following form:

<?xml version="1.0"?>
   <site id="XYZ2001">
    <layer id="123">
     <description>Pit fill</description>
      <find id="1022">
       <type>Pot</type>
        …
      </find>
     …
    </layer>
    …
   </site>

This is, of course, only a simple example to illustrate the typical appearance and format of XML data. Many other arrangements of the same data are possible. In this example finds are nested within layers whereas another approach would be to separate them and provide the <find> element with a cross-reference to its layer in much the same way that a relational model uses foreign keys.

Modern publishing frameworks such as JSP and XSP can also use the Extensible Stylesheet Language (XSL) and XSL Transformations (XSLT) , a language for transforming XML documents. With these, it is possible, to store a single skeleton page complete with data access components as an XML file and to apply different transformations to generate output suited to a range of clients or applications. For example, a JSP page might retrieve data from our notional site database and present it, in a form similar to the above XML example, to an XSL processor. The appropriate transformation might be determined by the capabilities of the client browser (desktop or handheld computer, WAP phone, etc.) or by application needs (HTML web page, text, PDF, or other format).⁷