Internet Archaeology 7. Terras. A Virtual Tomb for Kelvingrove: Testing and Evaluation

11. Testing and Evaluation

Problems in developing a testing strategy

There are many problems involved in evaluating a virtual reality electronic display, mostly because of the fact that 'there is a distinct lack of methodologies that can be readily applied to the development of virtual environments for museums' (Mitchell 1997). Very little research into the evaluation of such systems for educational worth has been carried out, affording little framework upon which to base a comprehensive testing strategy.

The testing of such a device requires much observation of the user, ideally in the environment in which the final system is to be placed. Care has to be taken that test volunteers have a variety of computing experience to take account of the fact that a large proportion of museum visitors will have little or no computing knowledge. The person collecting the data has to be as unobtrusive as possible to allow for the reality that museum visitors do not often operate or understand an exhibit in the way those responsible for developing the display wanted them to.

Because of these basic facts, it was decided to carry out a series of formative evaluations on the virtual tomb of Sen-nedjem. By using systematic observation of a small sample of potential museum visitors, clarifying any issues raised and collating data, formative testing brings empirical evidence to the design process, testing the developer's ideas and informing the developer of the visitors' opinions (Economou 1997, 124). It was felt, with the lack of any proven methodologies for testing virtual environments, that formative evaluation would provide the means necessary for evaluating the system.

Aims of testing

The Virtual Tomb of Sen-nedjem aimed to present a virtual archaeological reconstruction, allowing the form and function of such a structure to be understood and interpreted by a wide audience. The model, however, did not carry any information regarding the tomb and the period of Egyptian history to which it belongs. Also, the means of viewing and exploring the tomb was already understood to be quite complex due to the nature of viewing virtual worlds constructed in VRML (see section 9). It was thought that a period of formative testing was necessary to;

Test the construction of the model
Test the navigational tools used to explore the model (provided by CosmoPlayer)
Test the navigational functions, i.e. animation, available to explore the model
Observe the behaviour and needs of users
Provide feedback about the user's understanding of major aspects of the model
Provide feedback about the user's understanding of the procedures required to view the model, i.e. hyperlinks, load time, instructions given on different screens etc.
Assess the overall impact of the model.

Method of testing

The model itself was thoroughly technically tested as each part was implemented, to ensure that the VRML was free from bugs and that each part functioned according to plan. This meant that when the model was fully rendered, each section would be sure to be completed and working, and any glitches were a result of the file structure or the way each part was called. The model, then, was continually tested from its creation to ensure that it functioned fully. The remaining tests would concentrate on how the user would be able to view the model.

It was not possible to carry out the evaluation of the user interface in the actual setting where the model would eventually be installed. However, in an attempt to rectify this, the user was supplied with some information regarding the possible use of the system to try and give some context to the display.

The first phase of testing involved 'think aloud protocols' (where users discuss exactly what they are doing and why) with three different volunteers from varying backgrounds. The subjects were observed whilst using the interface, and an informal discussion took place during and after their use of the interface. Care was taken to ensure that the users knew it was the interface that was being tested, not themselves, and the subjects were urged to be honest about their views on the model and user interface. They were then asked to explore the model using the three different navigational methods, and were given no help from the observer.

Further testing, building on these findings, involved the use of a semi-structured interview after viewing the model. Another five test subjects were shown the model, and encouraged to discuss their views.

More informal testing was carried out by showing the model to friends, computer specialists and archaeologists. Any comments were recorded and used to re-evaluate the design of the model and user interface further.

Problems with the testing

Unfortunately, it was not felt that those testing the interface would give a fair representation of the average museum goer. Although not all had experience with virtual environments, most had extensive experience with computers, were used to using the mouse, and were familiar with terms like 'load' and 'home'. To test the final version of the interface, it was acknowledged that a broader spectrum of people would be required, including those from different age groups and social strata. However, for initial testing of the user interface, it was thought that the users chosen would spot major flaws in the electronic display. They came from a wide variety of backgrounds, having experience in primary education, Egyptology, archaeology, computer graphics, three-dimensional computer gaming and the development of multimedia interfaces for museums.

Results: first phase

The purpose of the first three 'think aloud' protocols were to indicate the ease of use of the user interface, and compare how effectively the different options enabled the user to navigate around the model. This was done by asking each of the three users to describe exactly what they were doing (and what they understood they were doing) at each stage. Each was asked to start with a different model to evaluate what the user learnt from each. Results of these first tests showed that whilst the model itself was interesting and functioning to a high standard, the user interface explaining how to use the tools provided and allowing the user to switch between models was far from ideal.

One problem was the frame mechanism used. Although this allowed the user to alternate between instructions and the model, and provided some introductory information, every time the model was accessed it took five minutes to render. If any of the navigational functions were used before the model had finished loading the resulting views were stilted and jumpy. There was no way of telling exactly when the model had finished loading, unless Netscape's toolbar was retained on the screen, but this gave the added problem of providing the user with another row of buttons, some of which would crash the model if pressed. The use of Netscape also meant its basic menu bar at the top was displayed, and the mail preferences in the lower right hand corner were active. These, too, could be accessed by the user and crash the model.

Another problem with the frame mechanism was that the instructions required to start the animated fly-through or use the viewpoints were given on the page before the model. With the five minute delay, the user had forgotten the complex instructions before the model had even fully rendered.

A final problem with the frame mechanism was that, using Netscape v3.01, every time the user pressed the 'home' button to return to the start a new browser was initiated. With two browsers running, the model did not function smoothly and with three or more the model crashed because of the extra pressure placed on the processor.

Once the model was fully loaded, there were other problems. The toolbar provided by CosmoPlayer was found to be wholly unsatisfactory for users both unfamiliar and familiar with the use of virtual reality. All three found it non-intuitive, with one commenting 'The worse thing about it is the toolbar'. Only rudimentary help is given, and it is not clear that the buttons require the user to click and drag to instigate movement. That said, after all three users grasped the basic concept, they were quick to use the tools effectively, or as effectively as they can be used. One user noted that whilst the model was in three dimensions, the tools only allowed travel in two, meaning that the user has to switch between all three tools to move, for example, to the top left of a space. Movement was deemed to be very clumsy and difficult. It was noted that the experience of the model was greatly affected by the poor tools provided. The inability to disable the link to the CosmoPlayer site and help site was also noted; these brought up separate sites which filled the screen, giving no indication of how to return to the model.

Due to the problems with the toolbar, it was found that the version giving no navigational help through viewpoints or animation proved impossible to navigate by even the most experienced user. Without any help the user did not know what they were trying to do, where they were trying to go, and why they were trying to get there. Although the animated slabs gave some indication of where to go next, the tools provided made it impossible to get there, and after a few seconds all three users gave up. One found it 'irritating', another 'unplayable', and the third commented that it was 'no fun'.

The version using viewpoints also proved tricky; first the user had to understand the concept of viewpoints, then understand how to implement them, then be prepared to follow the list of viewpoints in order. Because the viewpoint mechanism is constructed linking the easiest path between two specified points, if the user chose a viewpoint not consecutive to the current position there was often the sensation of passing through walls and glimpsing the partially rendered model outside the field of allowed vision. There was no way to disable this unwanted, disorientating effect. The viewpoints often did not give the feeling of a fluent transition between each, merely jumping to the next, meaning that areas of interest were not seen by the user, and the experience of understanding the structure of the model was lost. Again, two left the model without seeing the most important part, the painted tomb. This was deemed 'too difficult to use', and 'disorientating', by those tested.

The animated fly-through was thought to be the most successful of the three options by all three users. This integrated the use of the animation and solo exploration by taking users to the inner tomb door, and allowing them to explore the tomb chamber. 'The animated fly-through was the most enjoyable. The fly-through wins', commented one user, another said it was 'cinematic', and the most advanced they had seen, the third merely said 'wow'. It was generally agreed that the combination of guidance and then opportunity to explore the model yourself gave the user some context before having to experience the CosmoPlayer toolbar, and the use of the images on the moving door made them try harder to use the tools to see the rest of the model.

However it was noted that the model gave no information about what it contained, why it was of interest and what they were expected to draw from the experience. One of the three believed that as there was no interactivity (in the provision of information sense) the model would be better presented as a continual fly-through, including the inner tomb chamber in the animation which could be on continuous rotation in the museum, even shown on a video and television screen: 'it's not worth being interactive as there's nothing to discover'. The other two believed that having the opportunity to explore the model yourself was paramount to the learning experience.

The actual model was found to be of high quality, and very advanced for its type. The inner tomb chamber was especially commented on for its realistic appearance and the real feeling of being there the model imparted. There were a few areas noticed which could be tidied a little, but no drastic flaws were spotted.

Conclusions drawn from this initial testing were:

The model itself is fully functioning
The animated fly-through is fully functioning
The tools provided by CosmoPlayer are very hard to manipulate, and whilst this is not the fault of the model developer it affects the impression people have of the display
Of the three versions of the model, the animated fly-through is the best and, indeed, the only workable option
The frame mechanism hinders the display of the model because of the time taken to render it every time it is accessed. The model should be rendered once and reused
The frame mechanism does not help in explaining to the user how to use the toolbar provided
The whole model needs to be displayed in such a way that the Netscape functions are not shown
Information should be given during use of the model, on the same screen
Some contextual information needs to be provided regarding the tomb and the wall paintings.

Results: phase two

Testing with the five others confirmed the above findings. The model was very well received in general, aside from the technical problems of the frame system and instructions. There were various positive comments: 'I like it', 'It's like Doom!', 'makes you want to see the real thing', 'fascinating to look at', and 'cool as hell!' One volunteer, a former primary school teacher, commented that 'kids would love finding things and getting around on their own, especially if there was some puzzles or items to find'. Another, who had actually visited the tombs in Deir el-Medina and the surrounding district, observed that it was 'as realistic as you are going to get, especially if you never get to Egypt'. He also commented on the fact that the real tombs are now in such bad repair because of the number of tourists visiting them that in such sites the wall paintings are protected by sheets of opaque plastic and very dimly lit: 'it is actually possible to see more detail in the model than in the real thing', highlighting the conservation aspect of virtual reality.

However, there was a general feeling of wanting to know more, and most asked 'is that it?' One volunteer commented that it 'looks quite cool but doesn't tell you anything about it', another said 'yes, it was very pretty...' All were questioned on what they had learned; most had no concept of the time scale of the model, how it looks today, what the purpose of the structure was or if they had 'completed the task'. A general conclusion was that some information needs to be integrated into the model to tell the viewer exactly what they have been seeing.

Various suggestions to improve the model were made, including the use of a continuous animated loop, the use of Java pop-up windows, the addition of a voice-over to the animated tour, the use of mouse-overs, the use of a compass so users could orientate themselves within the model, the presence of a small map so users could tell their present position, and the use of interactivity to impart knowledge about the tomb paintings.

Conclusion

The model itself was thought to be of very high quality, showing that it is possible to build a realistic archaeological reconstruction using VRML and commonly available resources and software packages. However, it is acknowledged that the first design of the user interface is unsuitable for installation in the museum, for reasons specified above. Testing has shown that a complete redesigning of the user interface is necessary to incorporate the necessary information and tools into the model in such a way that the user will understand how to use the model and the purpose of the installation. Unfortunately, the second phase of the design was not implemented in time to be tested. It is hoped that this design will incorporate the necessary characteristics needed for users to view the model easily.