Archiving of Archaeological Digital Datasets in Slovenia: historic context and current practice

Summary This article presents the archiving of archaeological digital datasets in Slovenia in its historic context. The datasets discussed have been separated into three categories: non-reproducible datasets, reproducible datasets, and registries. Several reproducible datasets created by ZRC SAZU have been freely available online since the early 2000s, but the number of users is small and those benefiting often do not adhere to clearly stated copyright limitations. There is a large discrepancy between the stated interest and the actual usage of reproducible, let alone non-reproducible, online datasets disseminated as open access. In addition, adherence to fair use cannot be expected unless enforced. The key outcome of this study is that it has exposed a total absence of systemic archiving practice for non-reproducible digital datasets. The article concludes with recommendations and next steps that could be taken to address these issues in future. First and foremost, a systemic approach to digital archiving is urgently needed if the irreversible damage to the decades worth of born-digital non-reproducible digital data is to be averted.


Introduction
The aim of this article is to present the current practices of archaeological digital data archiving in Slovenia.This is the author's subjective view, written partially from an outsider's perspective (non-reproducible datasets) and in part from a single-institution perspective (reproducible datasets).The information pertaining to all but ZRC SAZU's own datasets has been acquired through personal interviews, the few publications available, and personal experience.To provide a historical context, archiving practices for analogue datasets are also presented, starting from 1948.
Archaeological datasets, as understood in Slovenian archaeology, can be broadly categorised as non-reproducible and reproducible datasets.The former are data recorded during any archaeological research that cannot be reproduced, e.g. an archaeological excavation.The latter can be reproduced either because they stem from some form of processing or cleaning of non-reproducible datasets, e.g.transcription of a field diary, or because the research can be repeated, e.g.pottery analysis.Registries are described separately as a third category.

Non-reproducible Datasets
As mentioned previously, the term non-reproducible datasets describes all data derived from a research activity that cannot be repeated.The foremost example of this, and the source of the vast majority of archaeological data, is derived from archaeological excavation.

Historic context: 1948 to 2002
A brief overview of archaeological practices regarding the documentation systems for the period between 1948 and 2002 is needed to provide historical context.This is necessary as the end goal of any archiving in the 21st century must also be to digitise born-analogue data.Methodological articles describing the archaeological practices of Slovenian archaeologists were few and far between in the 20th century (Berce 1951;Šribar 1969;Šribar 1974;Grosman 1991;Novaković and Turk 1991).Therefore, this brief overview is based on the analysis of selected archetypal excavations carried out by the National Museum of Slovenia (NMS) and the Museum of Gorenjska (MG).
Archaeological excavation of an early medieval cemetery at Bled (Blejska Pristava) from 1948 to 1951 was executed by a young and eager team inspired by the post-World War II zeitgeist.In the tradition of the time, the centrepiece of the documentation system was the archaeological diary.Although it was written in essay form, it included explicit descriptions of individual burials.However, owing to the presence of a trained geodesist, Rudolf Berce, x-y-z locations of individual graves in a relative coordinate system were measured and noted in diaries.In addition, this was one of the first excavations in Slovenia where an attempt to photograph individual burials systematically was made using black-and-white film, although the effort was hampered by a lack of film.The entire archive of excavation -i.e.artefacts, diaries, photographs, and plan drawings produced during post-excavation -is kept at the NMS (Berce 1951;cf. Knific 2008;Pleterski 2008, 27-28).
The same system was adhered to at the 1953 excavation by the same team from NMS in Kranj (Župna cerkev in Kranj).However, when the team from MG re-started the excavation in 1964, archaeological plan drawings created in situ were added to the documentation.Albeit somewhat rudimentary at first and at 1:20 scale, these are among the earliest such drawings in Slovenia.During a further resumption of the excavation, this time implemented by a new generation of young archaeologists, the in situ archaeological drawings were much more precise at 1:10 scale (Štular and Štuhec 2015, 34-42).The documentation produced by this excavation, i.e. the archaeological diary, in situ drawings at 1:10 or 1:20 scale and photographs (contact copies and developed negatives on black-and-white film), remained the standard in Slovenian archaeology until the mid-1990s.Although from the 1970s some individuals introduced printed single-context sheets and photogrammetric documentation of plans, these were never widespread or adopted as standard.
In the mid-1990s, colour photographs became a standard addition to black-and-white.More importantly, with the introduction of stratigraphic excavation at this time, printed single-context sheets made their way into widespread use.This remained the standard until about 2002.An early exception to this analogue documentation was digital photography, but for several years this was only used to supplement the analogue photography.
Throughout this period a system was in place whereby the excavating legal body held portable finds, i.e., artefacts, until 'the analysis was completed'.In practice, the analysis could -and often did -last indefinitely.The documentation, i.e., analogue archaeological non-reproducible datasets, remained permanently in the custody of the excavating legal body.However, the excavating legal body was always a public body, including the Institute for the Protection of Cultural Heritage of Slovenia (IPCHS), the Research Centre of the Slovenian Academy of Sciences and Arts (ZRC SAZU) -Institute for Archaeology, the Archaeology Department at the University of Ljubljana or various museums.The museums were more active until the1970s and the three institutions thereafter.
In practice, therefore, these institutions still hold much of the archaeological nonreproducible data from the pre-2002 period.Lately, there is a trend to move these archives to the museums holding the respective artefacts, and at the same time to digitise the archives (e.g.Štular and Belak 2012a;2012b;Belak 2013;Štular and Belak 2013;Belak 2014;Sagadin 2014).This is a slow process, however, and while it is in the spirit of the current legislation, it is neither supported nor enforced by it.

Born-digital: 2002 to 2008
Well-funded archaeological excavations of large areas, habitually exceeding 1000 ha, occurred within national highway building projects in the late 1990s and early 2000s, spearheading methodological development, including documentation.From c.2002, the vast majority of non-reproducible datasets in Slovenian archaeology became born-digital data stemming from archaeological excavations.Since both the form and content of these datasets are a direct result of archaeological practice, the practice of digital recording will be briefly described.
In preparation for several big excavations taking place in 2002 in Krško Polje, the backbone of the born-digital documentation system of archaeological excavations in Slovenia was created, with major development taking place up to around 2008.Whereas the initial impetus and development for this was done by the Archaeology Department at the University of Ljubljana, the development and implementation of a functioning workflow occurred within two commercial units, Arhej d.o.o. and Tica sistemi d.o.o.(for the latter see Butina et al. 2007).Each company developed its own workflow, based on proprietary software solutions built on top of Autodesk CAD for geomantic and custom database solutions within the MS Access environment.Both workflows were developed simultaneously and in close (intellectual and physical) proximity to each other and hence provide very similar solutions.The commercial environment at the time financed the excavation only, but very little data analysis and even less data archiving.This resulted in data entry rather than data-analysis focused systems.In addition, the complete absence of any archiving standards was noteworthy (cf.Novaković et al. 2007, 57).
In this period, most excavations were subcontracted to commercial companies and the documentation, as well as the artefacts, was held by them for long periods.Now, most documents and artefacts have been handed over to the local museums.

Current practice
Until recently, all digital documentation used in archaeological excavations in Slovenia used either the system initially developed within Tica sistemi d.o.o.(courtesy of the Autodesk CAD add-on being made freeware) or a derivative of one of the two initial systems.As a result, the practice of digital recording gradually became more homogeneous than that reported in 2007.
To introduce the recent practice of archiving of archaeological digital datasets in Slovenia, it is important to briefly introduce the cultural heritage legislation passed in 2008 (Cultural Heritage Protection Act -Official Gazette of Republic of Slovenia, nos 16/08, with amendments; further 'Act') and implemented in practice five years later (Rules on Archaeological Research -Official Gazette of Republic of Slovenia, no.3/13; further 'Rules').This legislation, among others, is intended to address the shortcomings noted by Novaković et al. (2007).
It is by no means the intention of this brief overview to analyse the 2008 legislation and its implementation, as much more in-depth knowledge and analysis would be required.Rather, it is to present a subjective view of the practice created by this legislation from an outsider's perspective, i.e., from the perspective of the data user rather than data creator.
The major changes in the 2008 legislation (60 pages), regarding archaeological practice, were: • a specialised public institution centre for development-led archaeology at IPCHS (CPA) maintains a database of all on-going archaeological field research; • the Ministry of Culture issues consent for each invasive and semi-invasive field research intervention; • the conductor of the preliminary/preventive archaeological research intervention can be a non-public legal body; • archaeological documentation should be submitted to IPCHS within 6 months; • the entire 'archaeological archive' (finds/artefacts and samples, field documentation, and digital datasets) must be submitted on a compact disc to a museum (the one appointed in the consent) within five years; • the agent executing the archaeological research must deliver the final report within two years (exceptionally up to five years); • the sanctions for noncompliance can be loosely translated as a temporary loss of the ability to obtain licences for further archaeological research.
Whereas this legislation is in many ways a notable step forward, it has two major shortcomings from the perspective of archiving: 1. insufficient standardisation of the digital data archive and 2. museums are currently not equipped to curate digital data 'Archive…' must be delivered in printed form (the report only) and on compact disc to the appointed museum, the responsible supervisor (archaeologist-conservator at IPCHS), CPA and INDOK centre (central repository of data on cultural heritage at the Ministry of Culture).The instructions, however, do not prescribe archival methods for digital assets beyond a directive that all data must be appropriately archived.
The latter, not dwelling on technical details of digital archiving, is in line with any legislative text that aims to be more durable than the technology of the time.What is lacking, though, is providing a means (e.g. an expert body) to prepare and maintain technical details for digital archiving.
The above top-down overview does not, however, provide much of an informative insight into archaeological practice.To this end, using a bottom-up approach, a selected internal standard is presented.Specifically, the digital recording system developed and used by a commercial excavation unit STIK -an intellectual successor of the previously described digital recording system developed within the now defunct Tica sistemi d.o.o.(cf.Butina et al. 2007) -will be briefly discussed.
Digital archives produced by this recording system are based on a permanent folder structure.This use of simplicity is to ensure that the system can be quickly learnt, is robust and OS independent.The recording system is described in broad strokes in an internal white paper.The folder structure consists of seven top-level folders loosely translated as: Within the Documentation folder, one finds the Access database holding the description of single stratigraphic contexts, samples, and artefacts as well as other small 'databases' in the form of MS Excel sheets, e.g. the description of each individual photograph.
The Forms folder is where templates of paper forms that are to be printed and used in the field are kept.
Photoarchive is obviously where the photographs are kept.It is divided into eight subfolders, Documentary, Orthophotos, Finds, etc.Each photo in the Documentary folder is individually described in the above-mentioned MS Excel sheet 'database'.This is an example of an input-centric rather than archive-centric system.The reason MS Excel worksheets are used rather than being made a part of an MS Access database or even part of EXIF data is the simplicity of the multiple similar or equal inputs that MS Excel offers.
All of the geomatic data is held within the Plans folder.The folder is subdivided into ten categories that correspond either to different stages of the AutoCAD-based workflow or different types of data.Spatial databases are based on Autodesk AutoCAD (and its derivatives) and a proprietary add-on MiniExplorer (Butina et al. 2007).
Lately, 3D data derived through photogrammetry workflows have been incorporated.
The ease with which 3D workflows have been implemented testifies to the robustness of the recording system on the one hand and demonstrates the advantage of a true borndigital system on the other hand.This contrasts with most other current practices in Europe that can be described as using digital solutions for an underlying analogue recording system, a system exemplified using manual in situ plan drawings.
The remaining folders, Reports, Miscellaneous and Temporal, are self-explanatory.
The internal standard described above may appear to be very rudimentary and basic.
Since rudimentary and basic also translates into low maintenance, low cost, and robust, it serves the purpose of its creators.Insufficient supporting documentation -especially pertaining to the file formats and file format versions -is an area in need of improvement.
Museums are currently not equipped to curate digital data.The major weakness for archiving archaeological digital datasets at a national level, however, lies at the receiving end, with the museums receiving 'the archive of the archaeological site' often (1) illequipped to archive any digital data, (2) often lacking the appropriate software, hardware and knowhow to even access the data, let alone (3) transcribe the data into archival formats.Clearly, this is an oversimplification aimed at exposing the underlying problem: the complete absence of systemic archiving of non-reproducible digital datasets in Slovenian archaeology.
Recently, this systemic failure has been recognised by the policy-making institution, namely, the Ministry of Culture.Although this is at the development stage, there is a desire to build a digital repository on top of the modernised 'Sites and Monuments' registry (Register nepremične dediščine).The repository will, among other things, archive the final reports including appendices.Since the latter incorporates singlecontext databases and some CAD/GIS data, this is a promising step in the right direction.However, fully-fledged digital archives with born-digital non-reproducible datasets do not yet seem to be on the agenda.

Reproducible Datasets
As mentioned, the term reproducible datasets is used here to describe the archaeological data derived from archaeological research from repeatable field practices (e.g., geophysical surveys), analysis of non-reproducible (e.g.analysis of pottery and other artefacts or animal bones and other ecofacts) or multipurpose data (e.g.airborne LiDAR and satellite data).
Some archaeological field practice is only partially repeatable.For example, a field survey with a total collection of finds: the same artefacts cannot be collected again, but the distribution maps can in most cases be replicated, at least to some extent.There are also some research practices that are conditionally repeatable, such as radiometric dating or any other invasive laboratory sample analysis.The process can be repeated if backup samples have been archived.For the purposes of this article, such data types are considered reproducible datasets.

Filtered and/or recombined datasets
The first type of dataset is derived by analysing, filtering and/or recombining nonreproducible datasets with a specific research-orientated goal.In some instances, these are similar to registries but are considered reproducible datasets, as they were created for specific use in archaeological research.All such datasets in Slovenia are curated by ZRC SAZU, the Institute of Archaeology.
Arguably, the most important dataset in the country is Archaeological Cadastre of Slovenia (ARheološki KAtaster Slovenije).ARKAS is an up-to-date Slovenian sites and monuments database, comprising four subject areas: the first defines the archaeological sites according to place, content and length of time protected, the second describes the level of research activity and protection, the third includes the sources of information, and the fourth comprises the selected documentation kept by the Institute of Archaeology.The back-end was designed as a relational database in 1993, and its structure remains unchanged.From 2004, it includes an online GIS and database front end.
The second most extensive dataset is Zbiva (zbiva.zrc-sazu.si),a research database for the archaeology of the eastern Alps and its surrounding regions in the Early Middle Ages.Its inception in the early 1980s was deeply rooted in the scientific research context of the time.The trilingual (Slovenian, English, German) database is assembled from three parts: site database, grave database and artefact database.It is closely connected with LIBERA, a bibliographic database for Early Medieval Archaeology.The back end of ZBIVA and LIBERA are relational databases designed in the mid-1980s, and since 2001 both also have an online front-end.In 2016, the front-end was migrated to the 'Zbiva web application' based on an open source Arches 3.0 platform.This GISenabled web application is focused on catering for the needs of highly invested researchers (Štular 2019).
There is also a dataset stemming from a project from the 1990s, which holds data on 6th-and 7th-century grave goods, Merowingerzeitliche Grabfunde Mitteleuropas.
The databases described above have a venerable online open access presence and therefore ZRC SAZU possesses a noteworthy experience with sharing reproducible datasets.The subject matter and trilingual design of Zbiva and LIBERA addresses the international public, i.e. archaeologists interested in the medieval archaeology of Slovenia, Croatia, (northern) Italy, Austria, Czech Republic, Slovakia and (southern) Germany.The on-line access of the LIBERA database was tracked from 2000 to 2007 (after that, the tracking option was no longer available).Initially, for several years, there was very limited access from outside ZRC SAZU, apart from the reactions to individual endorsements and mentions, i.e., mailing lists, lectures and hosting interested researchers.Later on, a small but diligent user community of about twenty regular users developed, based on the persistent personal endorsement of LIBERA's author, Prof. A. Pleterski.
In comparison, the Merowingerzeitliche Grabfunde Mitteleuropas had no such endorsement.The result is virtually no access to the database.It seems that archaeologists prefer to spend days and weeks in the library rather than half-an-hour on a computer.
For this reason, a different approach was taken recently for Župna cerkev in Kranj, the largest Early Medieval cemetery in the eastern Alps and its surrounding regions.Nonreproducible datasets, comprising an archive of archaeological excavation, are published in the form of six digital-only monograph books (Štular and Belak 2012a;2012b;2013;Belak 2013;Sagadin 2014;Belak 2014).The publications were designed as commented facsimiles, including transcriptions, as modern PDF client software enables an advanced user to access each of these publications as a simple database, in particular as a database of graves.Graves can be searched according to grave number; individual artefacts can be located throughout the cemetery, etc.
There are no concrete usage statistics available, but anecdotally the reception has been better than that of the Merowingerzeitliche Grabfunde Mitteleuropas and it seems this audience still prefers information to be delivered in book format.
A somewhat different experience is offered by the ARKAS database that is focused on a Slovenian-only audience.In the initial years of its existence, ARKAS shared the destiny of LIBERA, i.e., it was struggling to generate any usage outside ZRC SAZU.In spite of its dated online presence, in recent years its use has surged.This can be attributed to a very specific reason: with changes in legislation, the content of ARKAS gained commercial value.Namely, before any archaeological excavation occurs, a site's biography must be created.Thus, just as archaeological excavation became a subject of commercial archaeology, so did creating the site's biography.However, the ARKAS site clearly states in its copyright licence that it is to be used for non-commercial purposes only.This non-approved use continues but at least mostly by personal endorsement; certain public institutions have begun to cite the use of ARKAS.
ZRC SAZU's long-term experience drawn from these examples is that great caution must be taken to distinguish between the stated interest and the actual usage of reproducible, let alone the non-reproducible, datasets disseminated as open access online datasets.The former is at least an order of magnitude higher that the latter.This might be brushed off as just a domain and/or location specific (Early Medieval archaeology in Central Europe) experience.In addition, it might be argued that the services described were simply ahead of their time: the notion of the internet as a serious research tool having only just emerged in the last 5-10 years.However, it is a cautionary tale nonetheless, especially in view of the fact that it is grey literature rather than datasets that is currently named as the most accessed type of data at big repositories such as Archaeology Data Service (UK) and Data Archiving and Networked Services -KNAW (Holland).
An additional cautionary tale is fair use cannot be expected unless enforced.

'True' reproducible datasets
In this section, datasets that are derivatives of non-reproducible datasets are described.One such dataset was made available by ZRC SAZU in 1998, giving online open access.This is a database of the graves and artefacts of Altenerding, a Bavarian Early Middle Age cemetery, one of the biggest and most important sites of its type, which has been investigated and reinvestigated for a century and a half.Although the database is in the Slovenian language only, the content is based on a controlled vocabulary and can now be easily translated using online translation methods.However, since 1998, there were just two downloads of the dataset and no documented use.
ZRC SAZU, the Institute of Archaeology, holds analogue archives with parts of the Digital images of inscriptions from Slovenia archive incorporated into the EAGLE Portal.All archives are published online and are publicly available.
ZRC SAZU also attempted to set up an archive of reproducible digital datasets, earchive of the Institute of Archaeology at the ZRC SAZU.The system comprises a database (MS Access) that records the files uploaded to the archive and the archive itself.The latter is based on a rigid system of folders and subfolders.However, the system never took off beyond the initial upload of three projects.
More recently, CPA took advantage of being established anew and has set up a similar archive that is fully operational.Since CPA is equally involved in producing nonreproducible (mainly field-based archaeological assessments, field surveys and excavations) and reproducible data (desk-based archaeological assessments), the system is a hybrid between the two.However, the archive is not publicly available nor is it planned to become so in the near future.
Based on the above, the reasons why reproducible archaeological datasets in Slovenia are not shared can most likely be attributed to the three main challenges identified by the recent ARIADNE survey (Selhofer and Geser 2015): • the perceived lack of professional recognition and reward for sharing the data; • the work effort required to prepare data for deposit in a repository; • a lack of suitable available repositories.
Pleterski -the pioneer behind most of the early attempts in dataset sharing in Slovenia, such as Zbiva, LIBERA, and Altenerding -has recently conducted several interviews with archaeologists in Central Europe that have been sharing their reproducible datasets for years, or even decades (Pleterski, pers. comm.).Based on this survey, another reason for the low use of datasets shared online is that the lack of know-how on the part of researchers to use such datasets remains an important obstacle.The experience is that every single user must be trained individually.

Registries
There is one online registry of archaeological digital datasets, a Registry of unmovable cultural heritage in Slovenia (Register nepremične kulturne dediščine Republike Slovenije; RNKD).The development of the registry began in 1991 and in 1996 the first beta version was tested.In 1997, the system got a web-GIS front end, one of the first in Europe.Two major upgrades in 2002 and 2009 were mostly content-based in response to the changing legislation (Kastelic 2015, 2; Ministrstvo za kulturo 2020).The major limitation of the registry at that point was the fact that the web-GIS, database viewer and the data pertaining to cultural heritage management were spread between three different web addresses.The registry is in the process of rebuilding the entire back and front-end, and a new web-GIS tool has been available since 2019 (https://gisportal.gov.si/portal/apps/webappviewer/index.html).
There are two local registries that also need to be mentioned.The first is the registry of non-reproducible archaeological research (i.e.excavation, fieldwork, etc.), including final reports.It is created and maintained by the above-mentioned CPA.It is an Access database connected to an ESRI ArcGIS spatial database, presumably supported by an archive of final reports in PDF format.The desire to make this database publicly available has been expressed, but lack of funds was given as the major obstacle.As it is very likely that this registry will be deployed within the planned repository, one is hopeful that it will indeed become publicly available.
The second registry is actually a set of local registries used by Slovenian museums to register artefacts and other museum objects.The history of this began in 1990 when the Ministry of Culture established (the predecessor of) the Service for Movable Legacy and Museums.However, the goal of having an interconnected database of all Slovenian museums was never achieved.
In its stead is a database that was designed to be used as a local database by individual museums; it was never intended to have either a public interface, or to enable crosssearching between the museums.The database has been developed and is maintained by a commercial company.The museums have been encouraged by the Ministry of Culture to adopt this system, for which they pay a yearly fee.A serious drawback arose when some of the smaller museums wanted to opt out but, reportedly, could not obtain (export) their own data.In essence this means that, using public funding, public property is being transferred to a private company and is no longer available to the public.

Conclusion
This article has described the current practices of archaeological digital data archiving and usage in Slovenia, including its historic context.The latter presents us with two conclusions.Firstly, the lack of literature describing the actual practices continues to hinder methodological development.Secondly, archaeological practice, considered as a longue durée process, reveals an extreme dependency on factors external to archaeology, predominantly technical development and legislation.
An overview of current practice has led to several cautionary conclusions that are worth repeating.First, there is a large difference between the declared interest and the actual use of archaeological datasets disseminated as open access online datasets; the former is at least an order of magnitude higher than the latter.Second, fair use (e.g.citing and non-commercial use) cannot be expected if it is not enforced.Third, there is a complete absence of systemic archiving of non-reproducible digital datasets in Slovenian archaeology.
The first two conclusions have already had a negative impact on archaeological practice.Namely, there is no incentive to invest in dissemination platforms within the research community.Investment is only made in internal research tools, some of which happen to be suitable for dissemination, for example, Zbiva 3.0 (Štular 2019).
It is the third conclusion that is downright disastrous.We are already a decade beyond the predicted shelf-life of non-curated born-digital data from the early 2000s.The damage has already occurred, but we are not even able to monitor and quantify it.If a systemic solution is not found and the transition of the data to a fully fledged digital archive does not begin within a few years, the damage will be catastrophic and irreversible.As already mentioned, the late 1990s and 2000s saw the biggest excavation projects in the country's history, and the digital data recorded there are in grave danger.
As has been repeatedly stressed by the representatives of the Ministry of Culture, they are, according to the current law, the sole institution that can and must organise a systemic solution.The only viable solution that can divert us from the current course towards a digital dark age (Wright 2020) is to build a fully fledged digital archive for (at the very least) born-digital non-reproducible datasets.In practice, this can only happen on top of the modernised RNKD registry.
Insufficient standardisation of the digital data archive.The top level standardisation of the digital data archive is provided in the Appendices of the 'Rules' within the description of what is termed 'archive of the archaeological site'.The key element of the 'archive…' is the so-called 'Final report' including attachments that consist of all born-digital documentation other than photographs and full CAD/GIS data.The types of data in the attachments are described on the top level (e.g.work-plan, field diary); in parts the instructions are specific to the level of file types (e.g..mdb,.xlsor .xlsxfor databases).