1 Introduction

Archaeological science, i.e., the use of scientific techniques in archaeological investigations, comprises a wide range of methods of analysing remains, used to obtain indirect information and then deduce archaeological conclusions [1]. Apart from those methods requiring direct human observation, like microscopy, where the equipment enhances the researchers ability to inspect the finds, many methods involve automatic analyses and complex instruments, which produce numeric results that need to be interpreted by the researcher to obtain significant archaeological meaning. Often also these outcomes are automatically post-processed, and the result is provided; but then the archaeological interpretation is obviously the researchers’ responsibility.

There are many popular examples, sometimes known also outside research circles. For example, the well-known 14C dating method [2] measures the residual quantity of this isotope left in an organic sample, which is then used to date the sample. In this case, the measuring step consists of establishing the percentage of 14C in the carbon composition of the sample, and then further processing it to evaluate the sample age through calibration and software computations.

While direct observation methods produce a description by the researcher, analyses involving measuring and processing generate datasets. The processing may avail of tools to consolidate and summarise numbers facilitating the researcher’s interpretation, for example generating a diagram from spectra in XRF (X-Ray Fluorescence), a technique used to detect the chemical elements composing an artefact; or computing a Fourier Transform of data from different samples, as used in FTIR (Fourier Transform InfraRed) spectroscopy, which is applied in archaeology, e.g. to characterise ceramic artefacts.

In this paper, we will propose an extension of the CIDOC CRM ontology [8] to be used for the documentation of such archaeological science activities and results. The documentation discussed here concerns only the part of the investigation process producing the results to be interpreted, archaeological interpretation being a substantially different step that avails of the scientific results together with other evidence, all distilled by the researcher’s expertise to produce archaeologically significant conclusions.

2 Scientific experiments and sampling

As already noted, scientific experiments are of two kinds: one, which we will call observations, where the researcher observes (possibly with the help of ‘passive’ equipment) an archaeological find and notes some features; and a second one, which we will call analyses, where the find undergoes some processing by an ‘active’ machine, which directly produces some numeric results as a consequence of the processing. Such results are then used for archaeological interpretation. To indicate either of them, regardless of its specific nature, the term experiment will be used.

Usually observations operate on analogic information, while for analyses the raw data produced by the equipment as well as the data resulting from post-processing are completely digital. Generally, the equipment creates large datasets that could be stored for future use, or perhaps used to assess the results of the investigation, although in practice they are discarded once the archaeological interpretation is completed. In some cases, the same activity may be performed with an observation or with an analysis: to measure an object, one can use a tape (and it is an observation) or a laser metre (and it becomes an analysis).

In many cases, an analysis does not concern the whole of the object under examination. Only a small part of it is analysed: this is called a sample. According to the Oxford Dictionary, a sample is “a small part or quantity intended to show what the whole is like”. Thus a sample has three characteristics: it is a part of the whole; it is intentionally chosen; and it is representative of the property or attribute of the whole under examination. Samples are used in many fields, for example in statistics, where there is a specific methodology to define them in such a way that representativeness (and randomness) is guaranteed. In our case, as the examination concerns physical and chemical properties, the sample must be chosen in such a way that any result obtained from its analysis may be extended to a larger part or even to the whole. The sample choice is frequently the result of an observation of the whole. A bad choice of the sample may negatively affect the conclusions.

The technique used in the experiment may either not damage the sample (non-destructive technique) or imply the destruction of the sample (destructive technique); in the latter case, the technique may allow using only very small samples (non-invasive technique on micro-samples). Also for non-destructive techniques, according to the technique used, the size of the whole object and of the equipment, and the way the latter operates, it may be necessary to remove from the object a smaller sample to be analysed, and sometimes also to prepare it for the analysis, for example diluting in a liquid or heating it to create a gas.

Sometimes the choice of the sample is suggested by the condition in which the object is: for example, a marble artefact may have broken producing tiny pieces adapt as samples; if they are recognised as belonging to the whole, it is natural to choose them as samples for a destructive analysis to avoid further damage to the object. When the sample is not made already available by the condition of the object, it depends on the analytical technique and on the equipment if it must be detached from the whole or not. For destructive techniques, it matters little if the sample is detached before the experiment or after, because in any case it will be destroyed. For non-destructive techniques, it is always preferred to leave the sample where it is and avoid damaging the object, even very little; unfortunately this is not always possible, for example when the object and the equipment are not movable, or the size of the object does not fit in the experiment area of the equipment.

In conclusion, analyses that allow not modifying the object by removing the sample must be described differently from analyses that require taking the sample away from the object, because they destruct it or for other reasons, as discussed above. If removing the sample from the object (as required in the latter) is considered as part of the sampling concept, it differs from the former case, where the sample is actually a ‘virtual’ sample, because it is defined only for the time of the experiment and it always remains with the whole it is part of.

Figure 1 illustrates such concepts.

Fig. 1
figure 1

Sampling vs. virtual sampling

3 Documenting experiments

Documenting the outcome of an observation is not much different if this is performed with the naked eye or with the help of equipment, e.g. a microscope: in the latter case, the equipment features should be recorded, but this may be considered as part of the environmental conditions of the experiment. If some pre-processing of the sample is executed before observation, this must be also recorded as part of the experiment preparation. The experiment steps to be documented include, beside the environmental conditions, the equipment description (if used), the equipment calibration and the experiment description, as well as any further data post-processing.

A good report of the complete experiment, from its preparation to its execution, is sufficient to guarantee the scientific outcome and to provide other researchers with the information necessary to enable future re-use of the results. This is outlined in the diagram in Fig. 2.

Fig. 2
figure 2

The pipeline of a scientific observation experiment

Instead, in an analysis, the use of complex equipment, which performs part of the researchers work, introduces a ‘black box’ in the process that needs appropriate documentation. In this case, the equipment settings may be considered as parts of the environmental conditions. The presence of equipment plays in fact a significant role, which affects also data post-processing, as shown in Fig. 3.

Fig. 3
figure 3

The pipeline of a scientific analysis

A special case concerns those experiments where the outcome of an analysis is observed by the operator before making assertions, e.g. in tomography: the first step consists of an analysis followed by computer post-processing. Then the visual result obtained is observed as a replica of the object showing also its interior. In such cases, one may consider the entire experiment as the combination of two separate experiments, an analysis and an observation, the results of the former being fed into the latter.

4 Documenting observations

The CIDOC CRM extension named CRMsci [3] is a framework for documenting all kinds of scientific activities. This extension provides a satisfactory way of documenting what we have called observation as well as the experiment preparation of ‘analyses’. CRMsci defines S4 Observation as the overarching class including all activities of gaining scientific knowledge from empirical evidence. This is paralleled by S5 Inference Making, which includes instead all activities based on formal logic deduction and splits into S6 Data Evaluation and S7 Simulation or Prediction. Thus, within CRMsci our ‘analyses’ would belong to S4 Observation.

Although this may be suitable for generic scientific documentation, it seems to be too general for documenting the value of data especially for their assessment in view of a potential re-use. Since machines play in this case part of the human researcher’s role, it seems necessary to provide information about the way such machines work, how their parameters are set, the procedure they follow, the methodology implied by their use and, above all, the way they were used in the specific analysis.

This refers to the overall experiment setup, including measurements as documented by S21 Measurement or any other kind of scientific observation, not directly producing numbers, such as those generating images, diagrams and so on.

5 Documenting environmental conditions and parameters

Environmental conditions having an impact on the analysis significantly depend on the technology used and the physical or chemical laws the experiment is based on. In general, a specific protocol must be followed, and compliance guarantees good results. Not following the protocol, on the contrary, may result in poor quality and limited reliability. An example of the workflow to be documented for 3D scanning is given in [4]. Similar examples may be detailed for the many different experiments used in archaeological sciences as, for example in [5] for 14C dating.

Also equipment settings have a strong impact on the experiment outcome. As the settings vary according to technology and instrument type, and then from instrument model to model, it is suggested to avoid a proliferation of properties related to individual parameter settings, and record the latter as a literal using P3 has note. Use will suggest possible extensions, specific to widely used categories of equipment.

6 Definition of the CRMas extension

As already mentioned, CRMas builds on other CRM extensions extending the definition of their classes and properties. Such CRM extensions are the already introduced CRMsci, and CRMdig, briefly described below.

CRMdig [6] is “an ontology and RDF schema to encode metadata about the steps and methods of production of digitisation products and synthetic digital representations such as 2D, 3D or even animated models created by various technologies”. It was created as an extension of the CIDOC CRM mainly to document the creation and the reliability of digital replicas (2D or 3D) of cultural objects. We will show how most of CRMdig may be profitably used to document scientific analyses. Also concepts from CRMsci are required, as previously noted, with some newly defined classes and properties used to define the portion of the archaeological object actually examined and analysed as representative of the whole. Altogether, they form CRMas, a proposed CRM extension for the documentation of archaeological sciences activities and results. Henceforth ‘D’ and ‘L’ will denote, respectively, classes and properties of CRMdig, ‘S’ and ‘O’ classes and properties of CRMsci, and ‘AS’ and ‘HS’ newly defined classes and properties of the extension CRMas.

A scientific analysis is formed by one (or more) D7 Digital Machine Event, or more precisely by one (or more) D11 Digital Measurement Event. To parallel the class D2 Digitisation Process used in CRMdig, a similar class AS1 Digital Analysis Process is introduced. The scope note of the new class is similar to the one of D2, except specifying that it concerns the digital representation of physical or chemical characteristics or properties of an object rather than its appearance or form.

An analysis uses one (or more) D8 Digital Device, i.e. a machine, the equipment involved, which produces one (or more) D1 Digital Object, or perhaps directly a D9 Data Object. Post-processing of such raw data is a D10 Software Execution. Most CRMdig classes concerning data post-processing may in fact be used also in CRMas.

As previously noted, documenting the equipment used, the method and the procedure are particularly relevant for scientific analyses. This can be documented directly using CRM classes and properties as follows. The use of a specific type of equipment in an AS1 Digital Analysis Process, a subclass of E7 Activity, may be documented with P16 used specific object (was used for), the specific object being the (each) instrument used for it. The method and procedure may be documented with P32 used general technique (was technique of) if the procedure is broadly defined, e.g. “C14 dating”, or with P33 used specific technique (was used by) if the procedure is formally described in a specific document, which is an E29 Design or Procedure, e.g. “C14 dating with calibration using INTCAL13”. One particular aspect deserving attention is documenting the part of the object chosen for the analysis, i.e. the sample. The main CRMsci class used to model samples is S13 Sample. However the scope note of this class defines the sample as taken from some instance of S10 Material Substantial, i.e. the matter to be analysed. As already noted, sampling may on the contrary be just virtual, because a portion of the object is chosen to be analysed but is not removed from the whole, which is not affected in any way by the experiment. Since the class S13 designates sampling with removal, a new class called Virtual Sample must be introduced.

In the next section, the definitions of the proposed new classes and properties are outlined. Due to space limitations, the scope notes are described summarily. As already mentioned, most of the new classes and properties are modelled after those of CRMsci and CRMdig, as indicated for each one.

7 CRMas class and property definition

7.1 Class definition

AS1 Digital Analysis Process This class comprises events that result in the creation of instances of D9 Data Object that digitally represent physical or chemical features or properties of an S10 Material Substantial, which is modelled as E18 Physical Thing or, in some special cases, fluids modelled as S14 Fluid Body. The subsequent processing steps on digital objects are regarded as instances of D3 Formal Derivation. AS1 is modelled like D2 Digitisation Process.

AS1 is a subclass of D11 Digital Measurement Event; S4 Observation; AS4 Measurement by virtual sampling; and S3 Measurement by Sampling.

Related properties: HS1 digitally analysed (was digitally analysed by): AS3 Virtual Sample; HS2 documented by digital analysis (was digitally documented by): E1 CRM Entity.

AS2 Matter Selection This class comprises the activities that result in part of an instance of S10 Material Substantial being selected without removal. AS2 is modelled like S1 Matter Removal.

AS2 is a subclass of E7 Activity and a superclass of AS3 Virtual Sample Selection.

AS3 Virtual Sample Selection This class comprises the activities that result in selecting an amount of matter as virtual sample for further analysis from a material substantial such as an archaeological object. AS3 is modelled like S2 Sample Taking.

AS3 is a subclass of AS2 Matter Selection and a superclass of AS4 Measurement by Virtual Sampling.

Related properties: HS3 virtually sampled from (was virtual sample by): S10 Material Substantial; HS4 virtually sampled at (was virtual sampling location of): E53 Place; HS5 defined (was defined by): AS5 Virtual Sample; HS6 virtually sampled from type of part (type of part was virtually sampled by): E55 Type.

AS4 Measurement by Virtual Sampling This class comprises activities of selecting a virtual sample and measuring or analysing it as one managerial unit of activity, in which the virtual sample may not be defined and preserved beyond the context of this activity. AS4 is modelled like S3 Measurement by Sampling.

AS4 is a subclass of AS3 Virtual Sample Selection and S21 Measurement, and a superclass of AS1 Digital Analysis Process.

AS5 Virtual Sample This class comprises instances of S11 Amount of Matter selected on some instance of S10 Material Substantial with the intention to be representative for some material qualities of the instance of S10 Material Substantial or the part of it that the virtual sample was selected from for further analysis. There may be various ways to define the Virtual Sample, for example stating its position on the surface of the Material Substantial using relative coordinates. AS5 is modelled like S13 Sample.

AS5 is a subclass of S11 Amount of Matter.

7.2 Property definition

HS1 digitally analysed (was digitally analysed by) This property associates an instance of AS1 Digital Analysis Process with the instance of AS5 Virtual Sample it uses for the analysis. HS1 is modelled in a way similar to O18 observed value. HS1 has domain AS1 Digital Analysis Process and range AS5 Virtual Sample. It is a subproperty of P39 measured (was measured by).

HS2 documented by digital analysis (was digitally documented by) This property describes the CRM Entities documented by instances of AS1 Digital Analysis Process. HS2 is modelled like O8 observed. HS2 has domain AS1 Digital Analysis Process and range E1 CRM Entity. It is a subproperty of P140 assigned attribute to (was attributed by).

HS3 virtually sampled from (was virtual sample by) This property associates an instance of AS3 Virtual Sample Selection with the instance S10 Material Substantial on which a virtual sample was defined. HS3 is modelled like O3 sampled from. HS3 has domain AS3 Virtual Sample Selection and range S10 Material Substantial.

HS4 virtually sampled at (was virtual sampling location of) This property associates an instance of AS3 Virtual Sample Selection with the instance of E53 Place where AS3 selected a virtual sample. HS4 is modelled like O4 sampled at. It has domain AS3 Virtual Sample Selection and range E53 Place.

HS5 defined (was defined by) This property associates an instance of AS3 Virtual Sample Selection with the instance of AS5 Virtual Sample defined during this activity. HS5 is modelled like O2 removed. It has domain AS3 Virtual Sample Selection and range AS5 Virtual Sample.

HS6 virtually sampled from type of part (type of part was virtually sampled by) This property associates the activity of a Virtual Sample Selection with the type of the part where a virtual sample was taken, e.g. the finger of a statue. HS6 is modelled like O20 sampled from type of part. HS6 has domain AS3 Virtual Sample Selection and range E55 Type.

HS7 has format (is format of) This property actually completes CRMdig, as it may be useful in documenting a large number of digital applications, including those (e.g. 3D scanning) for which CRMdig was created. It associates a data object, typically a file, with a description of its structure, either man- or machine-readable. In standardised cases, the property may be shortcut using P2 has type, for example if the type is a well-known and defined format. HS7 has domain D9 Data Object and range E73 Information Object.

Figures 4 and 5, respectively, illustrate the process to create a virtual sample, and its analysis. Figure 6 gives a general overview of CRMas illustrating its hierarchy and the connections with the CRM, CRMsci and CRMdig.

Fig. 4
figure 4

Virtual sampling

Fig. 5
figure 5

Analysing a virtual sample

Fig. 6
figure 6

The CRMas model. Labels in black indicate classes from CRM (E), in grey classes from CRMsci (S) or CRMdig (D). Hierarchy, i.e. isA relationship, is indicated with a double-line arrow

8 Documenting the experiment purpose

Some have argued that the result of a scientific analysis is biased by the research question leading the research that the analysis belongs to. If the experiment documentation is complete, the negative impact of this factor may be reasonably appreciated and kept into account. Anyway, the purpose of the analysis may be separately documented with a note E62 String, using P3 has note; or using P21 had general purpose (was purpose of), which enables to give the type of the purpose—a simplified way of noting it, which assumes a taxonomy of experiment purposes; or, more extensively, using P20 had specific purpose (was purpose of) which identifies the activity the analysis is aimed, for example, at dating the Turin Shroud, traditionally believed to belong to the Roman period. An alternate way of describing the research question, still more articulate, is described in [4]. The purpose definition is an (intellectual) E7 Activity, which P17 motivated (the overall research, or some of its parts, and ultimately) the AS1 Digital Analysis Process. The purpose definition P70 is documented in the purpose description, an E31 Document. Such document P94 was created by an E65 Creation activity. Researchers (i.e. E39 Actor) P14 performed the document creation or the purpose definition (or both) as specified by P14.1 in the role of. This way is more verbose but also provides richer information on such a delicate subject. This extended description also enables third parties, for example when re-using legacy data, to describe the purpose definition deducing it from other information, acting in the role of reviewer or commenter.

9 Examples

The following example concerns a real XRF analysis carried out on the painting of a painted sarcophagus casually discovered in 2008 near Larnaka, Cyprus, during construction works. The sarcophagus was well preserved and still maintained traces of the colour with which it had been painted. A team from C2RMF (http://www.c2rmf.fr) and the Cyprus Institute (http://www.cyi.ac.cy), in agreement with the local Department of Antiquities, carried out an XRF analysis of the paint to get insights into its provenance and consequently information on the provenance of the sarcophagus, using a portable XRF device producing in output a file automatically visualised as a spectrum. The sample or, better, the virtual sample to be analysed consisted of any still extant blue pigment, which was considered to be of the same nature throughout the sarcophagus. The experiment was motivated by the fact that, at a preliminary observation, the paint looked very similar to a paint used in Egypt: the XRF analysis was expected to confirm such provenance, by comparing the XRF spectrum with those of pigments known to come from Egypt. No artefact preparation was required. The study on the sarcophagus and the archaeological science activities, comprising also other analyses, are fully described in [7].

Some details of the example presented here are fictitious or have been simplified for this presentation. XRF data, actually dispersed after the study completion, are supposed to be stored in a file called xrf-sarcophagus-data, and the log of the equipment operations is supposed to have been automatically recorded in the file xrf-sarcophagus-data.log. In the example, explanatory comments are parenthesized.

figure a

The following example concerns a PIXE analysis carried out on the same artefact. PIXE is actually a non-destructive technique; so it could use a virtual sample, but in this case the equipment to be used was in Paris and transportation of the sarcophagus was evidently non-feasible. However, it was possible to use as samples some small pieces accidentally broken from the sarcophagus when it was discovered, and easily recognisable as parts of it. The real experiment is described in [7]; in the following example, many details are fictitious and have an illustrative purpose only. Note that the experiment may be described using CRMsci and CRMdig only.

figure b

We note that in the examples, the equipment fictitiously supposed to have been used to carry out each experiment, as the AGLAE C2RMF accelerator or the ELIO XRF device, is a very complex machine, not just a digital computer. For the purpose of this documentation, however, the instrument is a black box where the sample is fed in, and digital data come out. Anybody interested in how a D8 Digital Device works should follow a path from its identifier here to its description somewhere else, possibly at the makers, documented with L33 has maker. For this purpose, the device identifier should use a standardised format.

10 Conclusions and further work

The suitability and completeness of the proposed extension, so far an intellectual construct only, need to be assessed through the application in many real examples, including also legacy material. At the same time, CRMas must be tested to document current archaeological science activities.

The solution proposed here for knowledge organization in archaeological sciences seems to be a solution also to the needs of heritage science, i.e. the wider domain of scientific techniques applications to cultural heritage studies, e.g. for the conservation and preservation of artistic objects. In fact most of the scientific investigation techniques are the same, but they are used in a different perspective and to answer different research questions. It must be analysed if such differences imply different documentation needs. This activity has already started in collaboration with sector experts.

An approach similar to the present might be useful in many other scientific domains, all subject to data re-use issues that involve data reliability. While it is up to domain experts to evaluate the suitability of CRMas to their research questions, this is another development to be further explored in the future.