Introduction

Studies of the faunal remains at archaeological sites have been helpful in explaining the origin of bone accumulations and the role of hominins in their appearance. The “hunting-scavenging” debate has focused the attention of taphonomists on efforts to determine whether hominins or carnivores were the main agents responsible, and how secondary agents might modify the characteristics of these remains (Bunn 1982; Bunn and Ezzo 1993; Domínguez-Rodrigo 2002; Egeland et al. 2007; Pante et al. 2012; James and Thompson 2015; Domínguez-Rodrigo 2015; Harris et al. 2017; Parkinson 2018). The examination of bone surface modifications (BSM) has been the traditional manner in which these questions have been addressed. Percussion marks made during attempts to reach the bone marrow (Blumenschine and Selvaggio 1988; Blumenschine 1995; Domínguez-Rodrigo and Barba 2006; Galán et al. 2009; Blasco et al. 2014; Yravedra et al. 2018), cut marks on the surface of bones (Abe et al. 2002; Andrews and Cook 1985; Behrensmeyer et al. 1986; Bello et al. 2009; Bello and Soligo 2008; Binford 1981; Braun et al. 2016; Bromage and Boyde 1984; Bunn 1982, 1981; Bunn et al. 1986; Courtenay et al. 2017; de Juana et al. 2010; Dewbury and Russell 2007; Domínguez-Rodrigo 1997; Domínguez-Rodrigo et al. 2009, 2005; Maté-González et al. 2019, 2018, 2015; Olsen and Shipman 1988; Palomeque-González et al. 2017; Shipman and Rose 1983; Wallduck and Bello 2018; Yravedra et al. 2017), and tooth marks left by carnivores (Selvaggio 1994; Blumenschine 1995; Blumenschine et al. 1996; Domıínguez-Rodrigo and Piqueras 2003; Domínguez-Rodrigo and Barba 2006; Njau and Blumenschine 2006; Baquedano et al. 2012a; Andrés et al. 2013; Saladié et al. 2013; Arilla et al. 2014; Aramendi et al. 2017; Arriaza et al. 2017b; Yravedra et al. 2018) have all provided clues.

BSM, however, cannot always provide the answer. Sometimes it is difficult to identify de visu what actually made an alteration due to the equifinality involved in mark interpretation and its highly subjective identification (Domínguez-Rodrigo et al. 2017). For example, when a site has been subject to trampling, the marks left on bone remains may look very similar to cut marks (Behrensmeyer et al. 1986; Olsen and Shipman 1988; Blasco et al. 2008; Domínguez-Rodrigo et al. 2009; Pineda et al. 2014; Courtenay et al. 2018). In addition, percussion marks can be confused with tooth marks if the former have no associated microstriations (Galán et al. 2009; Yravedra et al. 2018). On other occasions, the state of preservation renders the task of identification impossible, and no conclusions can be drawn regarding the agent responsible for making bone accumulations (Egeland and Domínguez-Rodrigo 2008; e.g. Domínguez-Rodrigo and Martínez-Navarro 2012; Pineda et al. 2014, 2017, 2019; Yravedra et al. 2016; Pineda and Saladié 2018). However, the debate surrounding anthropic activity at archaeological sites has tended to overlook the evidence that bone breakage (which has been documented at sites aged at least 2.6 m.a. (Domínguez-Rodrigo et al. 2005)) can provide. Bone marrow is an important source of lipids and essential fat-soluble vitamins (A, D, E, K) (Saint-Germain 2005; Malet 2007; Costamagno and Rigaud 2014) for both human groups and carnivores (Binford 1978; Metcalfe and Jones 1988; Jones and Metcalfe 1988), and both break bones to some degree in order to obtain it.

BSM may not provide information by themselves on how a bone was broken, thus requiring the use of other methods to determine how this occurred, although the distribution of percussion or tooth marks can be related with the bone breakage. One way to do these studies has been to examine the notches that appear on fracture planes. Via their classification, it has been possible to distinguish between breakages made by anthropic percussion or the action of carnivores (Capaldo and Blumenschine 1994; de Juana and Domínguez-Rodrigo 2011; Galán et al. 2009; Moclán and Domínguez-Rodrigo 2018).

Another method is to analyse the fracture planes themselves, which can have different morphologies depending on the state of the bone (i.e. green or dry) (Villa and Mahieu 1991) and angles (Alcántara García et al. 2006). The main variations in green fracture planes are owed to the fact that anthropic breakage is percussive, i.e. it involves dynamic loading which distributes the energy of a strike differently around a bone’s structure. In contrast, the static loading associated with carnivore bites means the associated energy is distributed much more evenly across the bone surface (Johnson 1985).

With this in mind, Alcántara García et al. (2006) suggested an analytical method (for use with the remains of large and small prey animals) for identifying the agent responsible for the breaking of a bone via the angles of the fracture planes. However, the results obtained with this method were never very clear when used with archaeological remains (Pickering et al. 2005)—some authors highlighted methodological problems associated with the technique (Coil et al. 2017), while others raised concerns about the size of the samples available for analysis (Moclán et al. 2019).

Based on the method of Alcántara García et al. (2006), Moclán et al. (2019) recently reported the use of machine learning (ML)-based statistical techniques for analysing the fracture planes of broken bones of medium-sized animals (50–200 kg) and thus identifying the bone-breaking agent. In that work, it was shown experimentally that it is possible to distinguish between bone accumulations made by hyaenas, wolves and hominins with more than 95% confidence. The technique involved examining 12 variables related to bone breakage, including the morphology of the fracture plane, the angle of the fracture plane and the presence and type of notches on bone fragments. The results also suggested this method might be useful for examining material with poorly preserved cortical surfaces. Other authors too have shown that ML algorithms (Arriaza and Domínguez-Rodrigo 2016; Egeland et al. 2018; Domínguez-Rodrigo and Baquedano 2018; Domínguez-Rodrigo 2018; Courtenay et al. 2019), with their enormous statistical potential, can be of use in solving taphonomic problems.

However, in the above work (Moclán et al. 2019), no material from the archaeological record was ever examined. The Navalmaíllo Rock Shelter, a Mousterian site, is a good laboratory for testing this method on such remains. The BSM on the bone remains at the site are well enough preserved to have established Homo neanderthalensis as the main agent behind their accumulation. The clear majority of the remains worked by these hominins are of very large-, large- and medium-sized animals (Huguet et al. 2010; Moclán et al. 2017, 2018a). The marks left by carnivores are scant, and generally found on the remains of small and very small animals (Moclán et al. 2017; Arriaza et al. 2017a; Moclán et al. 2018a). The aim of the present work was to test the validity of the above ML method on the remains of medium-sized animals from the Navalmaíllo Rock Shelter, to re-determine whether the bone breakages seen there are the result of intense anthropic activity (Huguet et al. 2010).

The Navalmaíllo Rock Shelter: a Middle Palaeolithic site on the Iberian Plateau

The archaeological sites at Calvero de la Higuera in Pinilla del Valle (Madrid, Spain) are located in the upper valley of the Lozoya River in the Guadarrama Mountains (Sierra de Guadarrama), 55 km north of the city of Madrid. Five different sites have been found to date: the Cueva del Camino, the Buena Pinta Cave, the Des-Cubierta Cave, the Ocelado Rock Shelter and the Navalmaíllo Rock Shelter (Fig. 1) (Alférez et al. 1982; Arsuaga et al. 2009, 2010; Baquedano et al. 2010; Pérez-González et al. 2010; Arsuaga et al. 2011; Baquedano et al. 2012b; Arsuaga et al. 2012; Álvarez-Lao et al. 2013; Laplana et al. 2013; Márquez et al. 2013; Baquedano et al. 2014; Blain et al. 2014; Karampaglidis 2015; Laplana et al. 2015; Baquedano et al. 2016; Márquez et al. 2016a; Laplana et al. 2016; Márquez et al. 2017; Moclán et al. 2018a).

Fig. 1
figure 1

Upper left: orthophotographic view of the Pinilla del Valle archaeological sites (1, the Camino Cave; 2, the Navalmaíllo Rock Shelter; 3, the Buena Pinta Cave; 4, the Des-Cubierta Cave). Upper right: general view of the excavation area of Level F from the north part of the site (note the presence of the archaeological pit for context in further figures). Lower image: orthophotograph of the Navalmaíllo Rock Shelter where Level F and other parts of the site can be seen. Orthophotograph by Alfonso Dávila Lucio/MAR

The Navalmaíllo Rock Shelter (hereinafter NV) was discovered in 2002, and has been excavated without interruption ever since. The site is a rock shelter, carved out by the Valmaíllo stream (Pérez-González et al. 2010), with a minimum surface area of ~ 250 m2 (Análisis y Gestión del Subsuelo S.L. [AGS] 2006). The stratigraphic sequence (Fig. 2) from top to bottom consists of an Ap horizon (10YR 5/2) some 0.20–0.40 m thick and at least two colluvium stages of dolomitic clasts within a silt-sand matrix (7.5YR 6/3) up to 1 m thick. Below this lies large dolomite blocks (some more than 1 m in height) that have fallen from the rock shelter ceiling. The falling of the blocks produced the hydroplastic injection of clays (Level D) containing some lithic and faunal remains from Level F. Level F lies below Level D, forming a bed up to 0.85 m thick composed of clay/sand (10YR 4/3) and carbonate clasts with a long axis of up to 0.35 m. Burnt sediments from this level have been dated by thermoluminiscence to be between 71.685 ± 5.082 and 77.230 ± 6.016 ka, i.e. the final part of MIS5a or the first part of the MIS4. According to pollen analyses, Level F reflects an open ecosystem (Ruiz Zapata et al. 2015). Under Level F are at least 2 m of allochthonous fluvial facies of siliceous gravel and sands deposited by the Valmaíllo stream, which drains Variscan gneisses before flowing into the Lozoya River.

Fig. 2
figure 2

a Stratigraphic sequence of the site (for a complete view of the sequence and its legend, see Fig. 2 in Arriaza et al. 2017a). Note that the archaeological materials of Level D are found in a secondary position due to the effects of hydroplasticity; b orthophotograph of the stratigraphic sequence of the Navalmaíllo Rock Shelter in the archaeological pit (see Fig. 1). Orthophotograph by Alfonso Dávila Lucio/MAR

The remains of lithic industry make up some 60% of the archaeological record of Level F (Márquez et al. 2016b). Most of these remains (~ 78%) are made of locally collected quartz (Abrunhosa et al. 2014, 2019); the most common technological category recorded is that of simple flakes (Márquez et al. 2013). There is a clear trend towards the production of microlithics, an apparently intentional choice, at least for those tools made of quartz. The presence of anvils near cores, along with bipolar products, indicates bipolar knapping to have been the best way of handling this type of raw material, especially for small format products (Márquez et al. 2013, 2016b, 2017).

The faunal remains of Level F belong to animals of different size (Table 1) (Huguet et al. 2010; Arsuaga et al. 2011; Moclán et al. 2017, 2018a, 2018b; Arriaza et al. 2017a). Previous studies have highlighted the major presence of large animals such as Bos primigenius/Bison priscus in the assemblage, as well as Equus ferus and Stephanorhinus hemitoechus (Huguet et al. 2010). Medium-sized animals (such as cervids) are also well represented, but small-sized animals are clearly underrepresented (Moclán et al. 2017, 2018a). The remains of the very large-, large- and medium-sized mammals show clear evidence of anthropic activity, such as cut marks, percussion marks and evidence of burning to varying degrees of intensity. Taphonomic alterations produced by carnivores are not common in the faunal assemblage, and it would appear that Neanderthals were the main drivers of bone accumulation at the site (Huguet et al. 2010; Moclán et al. 2017, 2018a). The remains of small-sized animals (such as rabbits and tortoises) and carnivores are not the result of anthropic activity. The rabbits, for example, were almost certainly brought into the shelter by lynxes when the shelter was not occupied by human groups (Arriaza et al. 2017a).

Table 1 Animal species recovered from Level F of the Navalmaíllo Rock Shelter (Arriaza et al. 2017a; Arsuaga et al. 2011; Huguet et al. 2010; Moclán et al. 2018b, 2017)

Materials

The materials examined in this work were bones from the archaeological record of NV Levels D and F, plus the non-archaeological materials presented by Moclán et al. (2019). The experimental (non-archaeological) materials were divided into three sets: set 1—40 anthropically broken long bones (10 humeri, 10 radii-ulnae, 10 femurs, 10 tibiae) from Cervus elaphus (a medium-sized species), set 2—medium-size bone remains from a hyena (Crocuta crocuta) den near Lake Eyasi in Tanzania, and set 3—the bones of medium-sized animals (Cervus elaphus and Sus scrofa) broken by wolves (Canis lupus) living semi-free in the Hosquillo Natural Park (Serranía de Cuenca, Spain). All three sets have been previously described in detail (Prendergast and Domínguez-Rodrigo 2008; Moclán and Domínguez-Rodrigo 2018; i.e. Moclán et al. 2019). All the bone fragments represented in the latter two sets were identifiable as belonging to long bones; no metapodials were included since previous work has shown since, given the very thick cortical cross-section, they provide little evidence about the bone-breaking agent (Capaldo and Blumenschine 1994). The anthropically generated sample (n = 332) provided 881 fracture planes for analysis (oblique = 549; longitudinal = 297; transverse = 35), the hyaena-generated samples (n = 66) provided 202 fracture planes (oblique = 87; longitudinal = 91; transverse = 24), and the wolf-generated sample (n = 61) provided 237 (oblique = 273; longitudinal = 287; transverse = 50).

The NV archaeological remains examined were 12,966 bone fragments (unearthed between 2002 and 2018), with 11,919 from Level F and 1047 from Level D. The materials from Level D were originally from Level F, but were moved by the hydroplastic processes that affected the ceiling of Level F, where they eventually settled. Metapodials were excluded from the analysis. One thousand one hundred four specimens of medium-sized animals were detected (long bones = 63.77%) with a total of 377 fracture planes.

Methods

Zooarchaeological and taphonomic methods

For taxonomic identifications and the assignment of a size (via body weight), the reference collections of the Institut Català de Paleoecologia Humana i Evolució Social (IPHES) and the Museo Arqueológico Regional (MAR), and manuals of comparative anatomy were used (Pales and Lambert 1971; Schmid 1972; Barone 1976; France 2009).

The sizes of the animals represented by the bone remains were defined via their weight as very large (> 800 kg), large (200–800 kg), medium (50–200 kg), small (10–50 kg) and very small (< 10 kg). The assignment of a size was independent of any taxonomic identification.

BSM were identified following different methods: cut marks were distinguished using the criteria of Domínguez-Rodrigo et al. (2009), tooth marks were identified following the method of Blumenschine (1988, 1995) and percussion marks were distinguished following the methods of Blumenschine and Selvaggio (1988) and Blumenschine (1995).

Breakage patterns were identified following the method of Villa and Mahieu (1991), differentiating between those in dry and fresh bones. The fracture planes were divided into three types—transversal, longitudinal and oblique—according to the trajectory followed with respect to the long axis of the bone (Alcántara García et al. 2006). Gifford-Gonzalez (1989) and Haynes (1983) defined transverse planes as fractures occurring at right angles to the long axis of the bone, while longitudinal planes are fractures parallel to the long axis of the bone. Pickering et al. (2005) described oblique fracture planes on either straight or curved in a helical pattern with a subparallel angle in relation to the long axis of the bone.

Only remains over 4 cm in length were examined, following the indications of Alcántara García et al. (2006)—a requirement for determining some of the variables measured (see below). The angles of the fracture planes were measured using a goniometer at the point of greatest inflexion.

Notches were defined as “semi-circular to arcuate indentations on the fracture edge of a long bone that are produced by dynamic or static loading on cortical surfaces […] leaving a negative flake scar onto the medullary surface” as reported by Capaldo and Blumenschine (1994). Here we identified notches according to the typological classification proposed by Pickering and Egeland 2006; modified from Capaldo and Blumenschine 1994):

• Complete or type A notches: those with two inflection points on the cortical surfaces and a non-overlapping negative flake scar.

• Incomplete or type B notches: those missing one of the inflection points.

• Double opposing or type C notches: those with negative flake scars that overlap an adjacent notch.

• Double opposing complete or type D notches: the two notches that appear on opposite sides of a fragment and that result from two opposing loading points.

• Micronotches or type E notches: always < 1 cm in length.

Following the method of Moclán et al. (2019), the fracture planes were examined using ML algorithms. This required collecting data for 12 variables that allowed the planes to be individualised with no loss of a fragment’s general information, such as follows:

  1. 1.

    The presence or absence of an epiphyseal section in the bone fragment.

  2. 2.

    Length (mm) of the bone fragment.

  3. 3.

    Length category (the fragments were included in categories related to their maximum length; see Table 2).

  4. 4.

    The number of total fracture planes measurable in the fragment, including green transversal planes.

  5. 5.

    Type of fracture plane, i.e. longitudinal or oblique (a variable used only when comparing all the samples at the same time).

  6. 6.

    Angle of the fracture plane.

  7. 7.

    Type of angle: if the angle is right or close (i.e. 85–95°) or not (obtuse or acute).

  8. 8.

    Fracture plane longer than 4 cm or not.

  9. 9.

    Presence or absence of notches in the fragment.

  10. 10.

    Presence or absence of type A notches.

  11. 11.

    Presence or absence of type C notches.

  12. 12.

    Presence or absence of type D notches.

Table 2 Intervals proposed by Moclán et al. (2019) for the bone fragment variable “length category”, for use in ML analysis

Note that some variables refer to the structure of the fragment studied (1, 2, 3, 4, 9, 10, 11, 12) and others to the fracture plane (5, 6, 7, 8), allowing for multivariate analysis (Domínguez-Rodrigo and Yravedra 2009; Domínguez-Rodrigo and Pickering 2010) rather than the univariate analysis covered by the original method of Alcántara García et al. (2006).

Statistical methods

The present work follows the methodology of Moclán et al. (2019), and thus involves classic univariate and bivariate analyses followed by the use of ML algorithms. All analyses were performed using R v.3.2.3 software (R Core Team 2015).

Firstly, the ratio of longitudinal to oblique fracture planes was established. According to Moclán et al. (2019), this ratio provides an initial means of testing whether a sample of fragment was subject to anthropic breakage (ratio < 1) or breakage via the action of carnivores (ratio > 1). This was followed by comparing of the angles of the fracture planes using the non-parametric Wilcoxon signed-rank test due to the Shapiro-Wilk test confirmed the normal distribution of the data.

Following the indications of Alcántara García et al. (2006), and making use of the “plotrix” library in R (Lemon 2006), the mean and 95% confidence intervals of the fracture plane angles were compared to examine the variation between the different experimental non-archaeological sets of bones (sets 1–3) and the NV sample.

Following the method of Moclán and Domínguez-Rodrigo (2018), and using the “cabootcrs” library in R (Ringrose 2013), the notches on alternative sample materials made by known agents—lions (Arriaza et al. 2016), hyaenas (Domínguez-Rodrigo et al. 2007), anthropic percussion action on small-, medium- and large-sized animal bones (Domínguez-Rodrigo et al. 2007; Blasco et al. 2014; Moclán and Domínguez-Rodrigo 2018), and anthropic percussion via battering (Blasco et al. 2014)—were compared via correspondence analysis of notch types A, B and C, thus providing a further comparator for the present archaeological materials. To facilitate understanding of the correspondence diagram, the examples of anthropic action on all large-sized animals in the alternative sample (Domínguez-Rodrigo et al. 2007; Blasco et al. 2014) were taken together.

Finally, ML statistical analysis was performed using the “caret” library in R (Kuhn 2017). The powerful statistical methods involved, which were first used in a taphonomic context by Arriaza and Domínguez-Rodrigo (2016), allow the classification of data, and predict into which categories new data will fall (Kuhn and Johnson 2013). A standard procedure in these ML analyses is the use of bootstrapping (Efron 1979) to render the results more robust. Following the method of Moclán et al. (2019), raw data for the experimental materials were bootstrapped 1000 times to generate a predictive model for comparison with the NV archaeological sample.

The functioning of the different ML algorithms differs mathematically, but the results are interpreted in the same way. In previous taphonomic studies (Arriaza and Domínguez-Rodrigo 2016; Domínguez-Rodrigo and Baquedano 2018; Domínguez-Rodrigo 2018; Moclán et al. 2019; Courtenay et al. 2019), kappa agreement indices were calculated. The kappa index, which accounts for the possibility of a correct prediction occurring by chance alone, takes a value of − 1 to 1, with values of 0.80 to 1 reflecting results in “very good agreement” (Lantz 2013). However, other variables also need to be taken into account, such as accuracy, sensitivity, specificity and balanced accuracy. Accuracy refers to the percentage success of the classification generated by an algorithm (represented on an ascending 0–1 scale). Sensitivity and specificity provide a view of how good the methods of delivering the kappa and accuracy values are, sensitivity describes the proportion of correctly classified positive results, and specificity describes the proportion of correctly classified negative results. The balanced accuracy corrects the final result taking into account both true positives and true negatives.

To generate the predictive model, the material in sets 1–3 was divided into “training” and “testing” groups (70 and 30%, respectively) (Moclán et al. 2019). This allows one to check whether the algorithms used make correct classifications (via the generation of kappa, accuracy, sensitivity, specificity and balanced accuracy values). For further details on the learning process, see Tables 5–9 of Moclán et al. (2019).

Since the present work involves archaeological material, a second step was required to examine that material with the predictive model generated by the algorithms. In this step, the algorithms process the data for each of the archaeological materials, and determine the probability that they were broken by one agent or another. This provides a final view of whether, as a whole, these materials were broken by wolves, hyaenas or anthropic percussion (the agents used in the training stage).

As in Moclán et al. (2019), the algorithms used in the present work were the neural network (NN), support vector machines (SVM), k-nearest neighbour (KNN), random forest (RF), mixture discriminant analysis (MDA) and naive Bayes (NB) algorithms. The partial least squares (PLS) algorithm and “decision trees using the C5.0 algorithm” (DTC5.0) were also used since these have been reported useful in taphonomic contexts (Arriaza and Domínguez-Rodrigo 2016; Domínguez-Rodrigo and Baquedano 2018; Domínguez-Rodrigo 2018; Moclán et al. 2019). Accuracy, kappa, sensitivity, specificity and balanced accuracy are reported for these algorithms since they were not included in Moclán et al. (2019).

Results

Univariate and bivariate analyses

A total of 1104 NV remains were eligible for analysis out of the original 12,966 available. The sample represented two types of medium-sized herbivore—Cervus elaphus and Dama dama. Some samples were identified at the family level of Cervidae when the species could not be confirmed. These remains belonged to all anatomical areas, although remains of the long bones were the most common (n = 704), followed by those of cranial remains (n = 198) and flat bones (n = 122). BSM of different types were detected, including cut marks (n = 72; 7.17% of identified specimens [%NISP]), percussion marks (n = 32; 3.19% NISP) (Fig. 3), tooth marks (n = 13; 1.29% NISP) and thermal alterations (n = 136; 13.55% NISP). Notches were also seen at the fracture planes of the long bones (n = 30; 2.99% NISP).

Fig. 3
figure 3

Left: NV specimens of medium-sized animals with green fracture planes. Right: an example of a percussion mark on a shaft fragment of a medium-sized animal (Mag: ×35). Photos: Abel Moclán

Three hundred and thirteen remains of these medium-sized animals showed evidence of being broken when fresh; 168 fragments were over 4 cm long with a total 377 green fracture planes, of which 327 were in long bones of the upper and intermediate anatomical sections (it should be remembered that no metapodials were included in the sample). Twenty-two transverse fracture planes were seen, along with 136 longitudinal planes and 169 oblique planes (see Table 3); the longitudinal/oblique ratio was 0.80.

Table 3 Fracture planes in the remains of the medium-sized animals of the Navalmaíllo Rock Shelter. Note the small number of data available for the transverse fracture planes

Comparison of the mean values for the angles for all sample sets (Wilcoxon signed-rank test) indicated the NV set to differ significantly (p < 0.05) from sets 1–3 (whether taking all specimens of each set together or analysing by type of fracture).

Following the method of Alcántara García et al. (2006), large differences were revealed among the NV materials and those of sets 1–3 in terms of the types of fracture planes present (Table 3; Fig. 4).

Fig. 4
figure 4

Comparison of mean and 95% confidence intervals for fracture angles in the NV and sets 1–3 material

As indicated by other authors (Alcántara García et al. 2006; Coil et al. 2017; Moclán et al. 2019), the transverse fracture planes showed too much variation in their angles to allow any conclusions to be drawn regarding the bone-breaking agent. The longitudinal and oblique planes, however, showed much less variation; < 90° longitudinal fracture planes were seen in both anthropically and carnivore-generated samples, while those of > 90° were seen only in the latter. In contrast, > 90° oblique fracture planes were seen in all samples generated by wolves, while those of < 90° were seen only in the specimens generated by percussion.

Thirty of the NV remains had notches; 15 were type A, 8 were type B, 1 was type C, 1 was type D and 2 were type E. Correspondence analysis showed the medium-sized NV remains to be clearly associated with an anthropic origin; certainly, the NV sample contained many simple notches (Fig. 5).

Fig. 5
figure 5

Bootstrapped correspondence analysis distinguishing the alternative experimental samples produced by carnivores, battering and percussion, and the NV sample of medium-sized animals. Ellipses cover the 95% confidence intervals. Left, distribution of the different types of notches. Right, distribution of the different samples

Machine learning analysis

The two ML new algorithms proposed (PLS and DTC5.0) returned results very similar to those reported by Moclán et al. (2019). PLS returned 81–89% correct classifications for the experimental pieces, while DTC5.0 returned 91–99% correct classifications (Table 4). This result shows DTC5.0, along with NN, RF and SVM, to be among the best algorithms available for determining the identity of bone-breakers. Like KNN, MDA and NB (Moclán et al. 2019), PLS is not as reliable.

Table 4 Results returned by the PLS and DTC5.0 algorithms when used with the different experimental samples (sets 1–3)

Analysing the longitudinal and oblique fracture planes together revealed anthropic action to be the most likely agent that broke the NV bones. Depending on the algorithm used, the probabilities returned varied between 80.33% (DTC5.0) and 100% (PLS). The probability that hyaenas were the bone-breakers was just 18.36% according to the DTC5.0 algorithm, and 0% according to PLS. Wolves were the least probable bone-breakers; the NB algorithm returned a value of 4.59%, while PLS returned a value of 0% (Table 5).

Table 5 Probability, according to different algorithms, that the NV bones were broken by hyaenas, wolves or hominins, analysing the longitudinal and oblique fracture planes together

When the same analysis was performed using only the longitudinal fracture planes, the same conclusion was reached. The probability that the NV material was broken anthropically was determined as 55.43% by the KNN algorithm and 98.91% by the NB algorithm when contemplating fracture planes of > 90°, and as 86.27% by NNET and 100% by PLS when contemplating fracture planes of < 90°.

The values returned for hyaenas ranged from 26.09 to 36.96% according to the NNET, SVM, KNN, RF and DTC5.0 algorithms, while the highest probability that wolves were responsible was returned by the KNN algorithm (7.61%) (Table 6).

Table 6 Probability that the NV material was broken by hyaenas, wolves or hominins according to the different algorithms when longitudinal fracture planes were examined alone (left, < 90°; right, > 90°)

A similar pattern was seen when the oblique fracture planes were examined alone (Table 7); once again, the most likely bone-breaker was determined to be hominins (60.80–100% depending on the algorithm). When considering fracture planes of < 90°, the maximum probability that hyaenas were responsible was 33.60%. When considering fracture planes of > 90°, the highest values suggesting hominins were responsible were recorded, ranging between 86.67% for the MDA algorithm, and 100% for NB and PLS.

Table 7 Probability that the NV material was broken by hyaenas, wolves or hominins according to the different algorithms when oblique fracture planes were examined alone (left, < 90°; right, > 90°)

Discussion

The statistical analysis of the green fracture planes of the NV material returned different results depending on the complexity of the test employed. The ratio of the types of fracture plane provided a clue that the bones were broken by hominins, although this kind of result needs to be understood with caution since large sets of material can have different proportions of different bones (i.e. humerus, radius-ulna, femur and tibia), which could affect over/underrepresented the results obtained (Moclán et al. 2019). Analysing the mean values of the fracture plane angles provided no clear result either; the Wilcoxon signed-rank test showed differences between the different sets of samples, while the 95% confidence test (Alcántara García et al. 2006) indicated one type of fracture plane had been generated by carnivores, and the other by anthropic action.

The results obtained in the analysis of the notches are also a little problematic. The correspondence analysis revealed a clear relationship between the NV notches and all the anthropic-origin samples. However, very wide variation was detected, and indeed, more overlap was seen with other small- and large-sized remains than other medium-sized remains.

The NV site has been described as a camp used by Homo neanderthalensis with evidence of intense anthropic action, including the fracturing of bones, on carcasses weighing over 50 kg (Huguet et al. 2010; Moclán et al. 2017, 2018a, 2018b). Here the anthropic activity is clearly confirmed by the use of ML algorithms. The examination of the longitudinal and fracture planes, and their combination, by these algorithms indicated anthropic action to be that main cause of bone breakage at the site.

That said, the different algorithms returned different probability values regarding this anthropic breakage. PLS, for example, returned a probability of 100% when oblique or longitudinal fracture planes of > 90° were contemplated, and when the entire sample was studied as a whole. These results should be understood with some caution, however, since the presence and the distribution of tooth marks on the NV materials and the absence of epiphyseal fragments indicate some degree of ravaging—although rather small (Huguet et al. 2010; Moclán et al. 2018a, 2018b, 2017). It should also be remembered that the accuracy of PLS was well below 100% when the experimental materials in sets 1–3 were examined; errors might also occur, therefore, when archaeological material is examined. In fact, the NNET, RF, SVM (Moclán et al. 2019) and DTC5.0 (current study) algorithms were shown to be the safest to use, and indeed these identified anthropic activity to be behind the breakages too (the RF algorithm returned a probability of 97.78% when contemplating oblique fractures of > 90°) (Fig. 6).

Fig. 6
figure 6

Differences in the classification of the bone-breaker of the NV material according to the NNET, SVM, RF and DTC5.0 algorithms (contemplating all fracture planes together, and longitudinal and oblique fracture planes separately)

It should not be overlooked that all these algorithms also returned a high probability that hyaenas—rather than wolves—were responsible for some modifications to the remains. Both taxa have been identified among the remains at the rock shelter, and previous work has shown that hyaenas probably ravaged some bones (Moclán et al. 2017, 2018a).

The present work shows the usefulness of ML algorithms for analysing bone breakages in archaeological materials. The study of BSM has been a priority of neotaphonomic studies, as the literature reveals, for example, with respect to cut marks (Shipman and Rose 1983; Bromage and Boyde 1984; Andrews and Cook 1985; Behrensmeyer et al. 1986; Olsen and Shipman 1988; Domínguez-Rodrigo 1997; Greenfield 1999, 2006; e.g. Abe et al. 2002; Domínguez-Rodrigo et al. 2005, 2009; Dewbury and Russell 2007; Bello and Soligo 2008; Bello et al. 2009; de Juana et al. 2010; Maté-González et al. 2015, 2018, 2019; Braun et al. 2016; Yravedra et al. 2017; Palomeque-González et al. 2017; Courtenay et al. 2017; Wallduck and Bello 2018). However, the present and earlier studies show that bone breakage is an important variable to bear in mind. It can used to help reveal the origin of faunal assemblages. Future work should focus on expanding the reference base for comparisons, and include studies on large- and small-sized bones, and different taxa.

Conclusions

This work, based on the methodology of Moclán et al. (2019), shows that ML algorithms can be used to identify bone-breakers, even those of archaeological material, without the need to examine BSM. The present results show that hominins were the bone-breakers at the Navalmaíllo Rock Shelter, but also indicate the remains suffered slight intervention by carnivores, probably hyaenas.