Keywords

1 Introduction

A cuneiform tablet is an object made of clay on which textual information is present, and they are in general ancient, very roughly dating from the late fourth millennium BCE to the second century CE. As such, they represent a special type of pottery, where the most precious information is rendered on their surface in a rather subtle manner easily undergoing damage, not only due to the age of the artifacts, but also due to improper handling. The layer bringing most information on the past is the most vulnerable one.

The collection consists of ca. 400 cuneiform tablets from the Old Assyrian period (level II) excavated in Kültepe (ancient Kanesh, Turkey) by Bedřich Hrozný and is almost entirely homogeneous in the sense that it comes from one location and a rather narrow chronological layer (ca. 20th–19th century BCE). From the point of view of the content, the majority of the tablets represent correspondence among the members of the ancient Assyrian society, connected with the Old Assyrian trading network [1, 2]. Personal memoranda (contracts) among members of the society represent another important genre, in this case the tablets may exhibit some additional features, namely an envelope and seals.Footnote 1 The textual edition of the collection is accessible in [4],Footnote 2 2D photographs are available at https://cdli.ucla.edu/collections/prague/prague_en.html. The collection is part of the Inscriptions cunéiformes du Kultépé (ICK) series; the number of Old Assyrian tablets altogether reaches ca. 23,000 tablets.

The priority aim of the virtualization of the collection is to preserve the physical collection in digital form in order to diminish to an absolute minimum direct contact, which might cause their wear-out. The second aim is to offer a powerful tool for both administration and further analysis of the collection as a whole, and its internal and external relations, as well as to enhance the research by making the collection available to the public and introducing elements of the digital humanities approach. This tool should help both in preserving the collection and in positioning it within a broader context. Thus, the task can be understood as an effort to bring together (and interconnect) as much relevant information on the collection and its position in a wider frame as possible.

When compared with other initiatives in the digitization of cuneiform tablets,Footnote 3 our focus is not just on the creation of digital models or a creation of textual corpus, but also the possibility to offer data useful for the maintenance of the collection, as well as for the study of its position in the Old Assyrian society, and the creation of possible links to other similar collections. The binding of the database with the material artifacts is also a pivotal difference from other types of databases used in humanities, such as D-Place [8] or Seshat databank [9]; another difference lies in the focus on the concentration of various types of information in the tablets and making this data as explicit as possible. In this sense, the project aims both at museological and research utilization of the resulting database, and the basic aim of the project is to offer a complex rendering of the tablets with new technologies to help the preservation of the collection and open it to (further) research. Another important aim is to provide metainformation on the tablets as well as their position in the Old Assyrian society and its position in today’s study of the Ancient Near East. The resulting system is supposed to enable the administration of the collection, enhance its preservation and open access to the further exploitation of the collection in a variety of ways by non-invasive methods. A basic overview of the architecture of the database is available in [10].

2 Domains of Description

The basic unit of analysis is the tablet itself as the constituting element of the collection and as a material object with many possible information layers connected to it. The data consists of several layers of description, where information from various types of research are brought together. As the information on the past is necessarily fragmentary (the objects are ca. 4000 years old) and the tablets are very often damaged,Footnote 4 the combination of various types of data is supposed to overcome at least some of the problems of incompleteness. This is also connected with the need of fields of comments for each data item in the database, and strict distinguishing between acquired data and interpretations.

The database represents an interplay of several domains of the analysis of the collection. Although the division into domains is somewhat artificial from the point of view of the final shape of the database (in which there should be no significant obstacles for combinations of various types of information from various domains), in the analytical stage, it reflects roles assigned to the teams within the project. The basic domain includes data that identify the object together with the description of its physical properties. The second domain offers the digital models of the object and can be seen as the principal level enabling the digitization of the collection, providing graphical information on the object. The next domain comprises the description of the layer provided by textual/visual information; in case of the cuneiform tablets, this means the level of the text, sometimes extended by seal impression(s) contained on the tablet’s surface. This layer can be compared to e.g. collections of coins, inscriptions or manuscripts, but it can also cover various types of decorations on the objects, etc. The last domain brings the information on the position of the object within the society that created the artifact, its purpose, function, etc.

2.1 Domain of Basic Information and Physical Properties

The first domain in our list is the one dealing with typical information necessary for a museological praxis. Its role is twofold: the unique identification of the object within the sample, but possibly also within a greater set of similar collections. The next task is the provision of the basic types of the object’s properties, such as weight, measures, colour, volume, etc. Another part of this domain involves methods of non-destructive analysis, such as optical microscopy or X-ray fluorescence, and both methods have their further usage in the analysis of the collection.

As opposed to traditional types of such analyses of other collections of cuneiform tablets, we follow rather strict procedures that are supposed to ensure that the resulting data is reliable and transferable to other databases. The selected measurement procedures are supposed to meet high standards of precision within reasonable effort. Such methods are briefly described below.

Colour characterization. The typical way of subjective decision based on some colour scheme was replaced by a more precise way of colour definition. For measuring the colour, a spectrophotometric method has been chosen. We use a mobile spectrophotometer (X-Rite RM200QC) that measures the object under eight different light sources and one UV LED source with nine wave-bands, which ensures a higher precision as opposed to models working only with the RGB spectrum. In order to obtain a relevant number of data, 6–8 or 8–12 measurements are taken (in accordance with the tablet size). As a base, the RAL K7 Classic colour guide has been used, and the results of the measurement are expressed by the values of ∆L, ∆a, ∆b (deviation from colour standard); the average values are visualized in the CIE 1976 standard. This data is then stored in the database, both in the form of a diagram of chromaticity and the base values (L, a, b). Although the collection is supposed to be rather homogeneous (correspondence and contracts within limited time and space), the variation of colour within the collection is rather high. From the measurements concluded until now, there are seven groups of colouring identified so far, consisting of various types of grey, beige and yellow.

Composition of the tablets. The application of XRF methods in order to reveal the element structure of the objects yields interesting results even on a rather small sample (ca one quarter) of the collection. We use a micro X-ray fluorescence spectrometer Artax 400 (by Bruker), which is a mobile µXRF spectrometer for in situ measurements without the limitations of the size of the object. It is possible to focus the analyzed spot by means of CCD camera and a laser beam. The results are evaluated by means of several methods (mathematical/statistical analysis: multivariate analysis, cluster analysis, discriminant analysis [11, 12]; corresponding analysis: analysis of two variables arranged in a contingency table, resulting in a correspondence map representing the relations of the variables [13]. Several proprietary programs have been produced by team members for processing the results [14]. The comparison of data for individual tablets shows that the collection consists of several groups characterized by common properties of clay composition (Fig. 1).

Fig. 1
figure 1

Scheme of the colour correspondence of tablets (labelled with T) and their envelopes (E). In the sample of 16 tablets, in 9 cases the colour of the tablet and its envelope is not shared, although a strong correlation is in this case expected

These methods are supposed to reveal the element structure of the clay and by themselves, they constitute an independent branch of investigation. The usage of the results is manifold—from purely museological purposes, such as storage requirements, to various types of analysis purposes (e.g., links among clay types, locality, scribe, actors).

2.2 Domain of Digital Models

The production of digital models is nowadays very widespread. The models produced within this project include several types of the rendering of graphical information. It can be said that without this type of information, the current way of digitization of collections would not exist, and it has additional importance in case of the cuneiform tablets, where proper visualization is crucial.

The photographs of the tablets (available also within the CDLI project, https://cdli.ucla.edu/collections/prague/prague_en.html) represent the traditional part. These offer the basic and traditional type of graphical information. Another type of graphical information are the autographs, i.e. drawings of the tablet surface produced by specialists (in our case, the autographs come from [4: Tafel I—CLX]). As the digital photos of the tablets can be completely superseded by the higher types of models (2.5D, 3D), the autographs retain their importance as they carry also the interpretational level—an opinion of an expert who had the tablets physically available.

Within the project, new models are produced, and all of them bring spatial information. These models can be divided into two types. The first one is represented by models based on laser triangulation, photometric stereo (PS) and Structure from Motion (SfM), i.e. models sometimes described as 2.5D. True 3D models (i.e., including the image of the inside of the object) are obtained by means of fine-grained Computer Tomography (CT) methods with extreme resolutionFootnote 5; all of these models (2.5D and 3D) can be combined. The PS and SfM models bring a smoother and more detailed picture of the surface together with colour information, while the CT models offer information on the inner structure of the object (tablet). This scanner can be used in digitization of other similar collections (Fig. 2).

Fig. 2
figure 2

Digital model of surface topography of cuneiform tablet separated from its colour is rendered in a simulated illumination enhancing shapes of signs (left part of figure). The 3D model is acquired by photometric stereo technique. The same area of the same tablet captured by focus stacking technique of digital photographs is shown for comparison on the right. Such a visualization is capable of substituting a direct contact with the tablet

It is worth mentioning that the CT-based models do not function as mere digital objects used in the representation of the collection but are very useful for both the collection maintenance and research. The true 3D modelling reveals the inner structure of the tablets and enables a better choice of the best measures for the correct conservation of the artifacts within the collection. The research purposes include especially palaeographic utilization, where the interaction of the shape of the tablet’s surface (in any rate of magnification) and variable simulated lightning of the object offers new possibilities in the analysis of the cuneiform writing; e.g., the sequence of the individual wedges within a sign can be traced.

The possibility of the re-arrangement of the graphical information in a variety of ways adds significantly to the use of this information in the research, as the combination of various types of representation can significantly help with various types of tasks.

2.3 Domain of Linguistic and Philological Description

The textual information available on the tablets is dealt with basically within the tradition of cuneiform studies and corpus linguistics. The importance of philology in case of ancient languages is crucial, as the data is available only directly from the texts and no feedback from native speakers is possible. This requires space for various types of notes and alternative analyses, especially in damaged contexts. The aim of the analysis is to meet the demands of the discipline, but also to provide the texts in various formats used by today’s linguists and philologists.

The intention is to provide an accessible representation of the text in all its details. The representation must be readable both for humans and machines and must allow for a detailed (linguistic and philological) analysis of the text, the graphemic system used, the reconstruction of damaged parts (both on the level of a single grapheme and the whole text), and to offer an analysis of the content of the texts (Fig. 3).

Fig. 3
figure 3

Example of the sign-oriented architecture of the linguistic description. The sign is the basic element at which the description starts and the graphical information is most usable, as well as the degree of damage or link to palaeographic database. Joining of signs into words creates a new level of description. E.g., in case of the word for ‘silver’, the values of individual signs are KÙ and BABBAR, brought together they form a word with a different “phoneticvalue (kaspu). Standard linguistic description starts only at this level

The focal point of this domain is a text written on a tablet. The text is coded in the Unicode standard (UTF range 1200-123FF for cuneiform, 12,400-1247F for cuneiform numbers and punctuation) and in transliteration into Latin script according to the ORACC project [http://oracc.museum.upenn.edu/]. Although the two renderings of the text (cuneiform and transliteration) should be equivalent, for the purposes of the analysis, the sign-oriented approach is preferred. In this sense, parallels with Chinese or Japanese corpus linguistics can be drawn [17,18,19]. As the texts are mostly of a unique character (correspondence and contracts that were issued in one or two copies), any damage to the text is difficult to recover, in addition to the fact that the texts are written in an ancient and fragmentarily attested language. This means that substantial portions of the texts can be damaged (partially or completely missing), and some parts of existing texts can be understood differently by today’s scholars. As a result, the transfer from the cuneiform script to current systems must offer the possibility of alternative interpretations.

The linguistic analysis includes both morphological information (parts of speech) and syntactical ones. Syntax is treated according to the Universal Dependency standard [20], which enables a comparison with other languages. Translation of the original text is of great importance with ancient languages. These are organized by means of parallel corpora techniques.

The analysis of the palaeography will result in a palaeographic database. The variability of signs within the collection is rather high, in spite of the fact that the sample is otherwise homogeneous (archives of several tradesmen). Every sign, including its individual variants, in the collection is described, with a concordance of its appearances. The numbering of signs follows a standard sign list provided by [21] and contains also a definition of the position of the signs on individual tablets, as well as information on their diagnostic features.

The philological part of this domain will cover various types of other information, mostly derived from the collection (corpus) itself. One of the most important tasks is the identification of the persons that took part in the relations reflected by the tablets (correspondence or contracts), as well as the definition of their roles. The flow of goods between geographical locations as well as their prices can be mapped. At the same time, it is clear that such tasks might demand external information as well.

2.4 Domain of Cultural Context

This domain gathers information from external sources. Currently, we have identified several areas where the connection of the database with external data sources is desirable.

From the point of view of the textual information, the connection with other sets of Old Assyrian documents presently studied is crucial. Such a connection could offer important comparisons regarding the content, but also the style of the letters and contracts (personal memoranda), which could further aid to obtain a more structured view of life in the Assyrian trading centres. There are many textual editions of such sources; unfortunately, the corpus of Old Assyrian texts (The Old Assyrian Text Project [http://oatp.ku.dk/].) that used to join them, is currently not accessible, and one of our aims is to help revive this activity.

A similar type of connection is the palaeographic database.Footnote 6 A detailed analysis could shed more light on the use of the script in the colonies, but alternatively also bring important information on the possible link between the Old Assyrian texts and the later cuneiform tradition of the Hittites. Preliminary investigation seems to point to an extensive variability of individual signs reflecting possible individual preferences and/or traditions of the respective tradesmen writing their messages on their own.

There is also historical information available. The concept of the Assyrian trade network has been mentioned above [1, 2], which means that the integration of this database with other sources of information on this network is important. The network derived from the collection can help in comparing the flow of goods, price level and types of contacts.

Geographical information contained in the letters (and contracts) needs to be completed with a map of locations in Anatolia. Such a map can also contain additional data, such as demographic information, communications between the cities, etc.

This part of the database is the most open one. It can be expected that many new possibilities of connections will appear in the future, as the process of digitization of other collections will progress.

3 Integrated Multi-disciplinary Database

The resulting database intends to bring together the types of data mentioned above. It is certainly necessary to distinguish data, metadata and analyses, but on the other hand, there should be no technical obstacles in joining the data from multiple domains. It should allow to pose multi-disciplinary queries combining knowledge gathered in separate domains. The database will also provide a user interface for the research community in order to access the accumulated information. As much data as possible will be made available via an internet browser.

It is to be hoped that such an approach will lead to new discoveries on the collection and its position within society, as well as to provide a tool for a more reliable solution of complex tasks that were so far solved only within a single discipline. The added value can be seen in several examples offered below.

The identification of fakes: several tablets from the collection have been labeled as fakes by some scholars [4: XXIV], based on the analysis of the handwriting. However, other attributes of a given tablet may be in line with the attributes of the other tablets (e.g., clay composition, tablet shape, or stylistic characteristics). This might point in a different direction, showing that the (somewhat peripheral) community developed a complex specific style, formed from diverse (family?) traditions.

The identification of a scribe is traditionally based on palaeography (cf. the example above), but there are several other attributes that can be expected as characterizing. It can be surmised that a (professional) scribe will be defined not only by his handwriting style. Other attributes that can be taken into consideration in connection with this type of craft may include: the technique of tablet preparation (processing of clay, shape, measures in relation to the amount of text, etc.); the clay composition (well-tried source of clay); the linguistic style characteristics, but also a certain circle of persons that used the service of the scribe, and possibly other features.

Many other similar examples of possible analyses based on cross correlation of the data could be added. In our opinion, such types of analyses are in line with the current turn to the exploitation of digitized data. The connection with other types of databases created in the field (in its broad understanding) ensures the usability of the database in the future.

4 Conclusion and Future Work

The project dealing with the collection of cuneiform tablets shows that the integration of data coming from various fields can be useful in several ways, starting from the maintenance of the collection itself to various ways of publishing the content of the collection and opening it to scholarly research.

Already in a rather early stage, it can be seen that the data from various fields bring interesting correlations usable in the research. New hypotheses can be supported by a number of data that were not combined together before, and old hypotheses can be re-examined on the basis of cross-related data.

For future work, the primary aim is to fill in the whole database: currently, about one fourth of the collection is processed. However, the most important and possibly the most difficult task will be to ensure that the combination of a number of data will work smoothly together. This is inevitably connected with existing standards, which in some cases may mean that we will strictly follow a certain standard, or, in other cases, that the system is able to export the data in the desired format. The choice is sometimes driven by the given task (e.g., generation of data for EUROPEANA [23], the specification of Universal Dependencies or parallel corpora), sometimes the project brings its own way of data acquisition (e.g., colour characterization) and the creation of interfaces is important. Another task in this direction is the merger of data acquired within the project with data from other, external sources, especially within cultural context domain.

The digitization of the collection means also the possibility of creating links to other such collections, especially the ones from Old Assyria. Such links among related collections could lead to a virtual space concentrating knowledge on some particular place or period in history and could bring a better understanding of the past. Apart from purely technical sustainability (which is in our case guaranteed by Charles University and the National Museum in Prague and by adherence to standards), this is another opportunity to keep the information derived from the collection being used within the scholarly discourse on the Old Assyrian society.