Introduction

In part I of this contribution, the reproducibility and accuracy of Number of Identified SPecimens (NISP), Minimum Number of Elements (MNE), and, to a lesser extent, Minimum Number of Individuals (MNI) tallies were assessed in a blind test focused on problems in the identification and quantification of ungulate remains. Analysis of experimental data showed that MNE counts give more robust estimates of skeletal abundances than NISP. Given these results, one might conclude that MNE should be widely adopted. However, several points make this conclusion premature. One of these points concerns the behavior of MNE with respect to sample size, an issue that could not be investigated in the blind test. In this paper, we examine how NISP and MNE are numerically related in a large sample of assemblages and review the implications of this relationship for the analysis of skeletal abundances.

A small number of studies have previously investigated the relationship between NISP and MNE. Grayson and Frey (2004) observed strong correlations (r ≥ 0.90, p > 0.001) between paired NISP-MNE data in three Paleolithic collections, which was interpreted as indicating that the two measures may frequently yield consistent results. Lyman (2008) reached a similar conclusion after studying 29 assemblages, although his analysis documented a wider spread of correlation coefficients (r = 0.66–0.96). These authors noted that these correlations are not surprising given that MNE counts are ultimately derived from NISP counts. In fact, the question asked here is one of sampling. Are MNE values reflecting the underlying structure of the NISP sample? In other words, can MNE values be treated as a random, and therefore representative, sample of NISP values? Because MNE is closely linked in its construction to the more intensively studied MNI, it seems legitimate to ask whether they are affected by the same sampling problems. A short discussion of MNI helps to clarify this point.

Ducos (1968) should be credited for having first demonstrated the tendency for MNI to increase according to a power function as NISP gets larger. He showed that the net effect of this trend is that MNI inflates the representation of taxa with low NISP counts in archaeological collections, particularly at small sample sizes. As noted by Grayson (1984) and Lyman (2008), the reason for this inflation is easily understood. The first fragment attributed to a new taxon automatically entails the presence of one individual of that taxon in the sample. However, the probability of assigning a second fragment from the same taxon to a second individual is smaller than one because the specimen may derive from the previously identified individual. The probability that a third fragment will represent a new individual is even lower as that specimen may belong to the first, a second, or even a third individual. As a result of the decreasing probability of identifying a new individual, MNI has to be curvilinearly related to NISP in fragmented assemblages. There is ample evidence that this is indeed the case, at least in small samples (Ducos 1968; Casteel 1977; Grayson 1978a, b, 1984; Lyman 2008; Cannon 2013).

Because the identification of distinct elements proceeds in a similar fashion and provides the foundations for the derivation of MNI values, MNE should likewise artificially inflate the representation of rare parts as measured by NISP. This problem deserves serious consideration because it implies that MNE values may tally specimens in a fundamentally different way in samples that differ appreciably in NISP size and/or patterns of skeletal representation. In this paper, we evaluate this hypothesis using paired NISP-MNE data from archaeological assemblages. The analysis of these data is followed by an examination of an alternative metric of abundance that circumvents most of the problems encountered with MNE and MNI.

Materials and Methods

To assess whether MNE tends to inflate the representation of elements with low NISP counts, we compiled paired NISP-MNE data for 58 Western European assemblages excavated and analyzed according to modern standards. These assemblages, which are characterized by a wide spectrum of NISP sample sizes (25–18,523; Table 1), are all of Pleistocene age and derive from cave, rockshelter, or cliff deposits. The faunal remains were, in each case, primarily accumulated by humans, although carnivore intervention is sometimes also documented (e.g., Teixoneres cave level III, Rosell et al. 2010). To control for taxonomic differences in skeletal morphology, only two closely related species are considered in the dataset: red deer (Cervus elaphus) and reindeer (Rangifer tarandus).

Table 1 Archaeological red deer (Cervus elaphus) and reindeer (Rangifer tarandus) samples used in the analysis of the relationship between NISP and MNE

Although comparing the overall relationship between NISP-MNE relationships across assemblages is an approach that has previously proven productive (Grayson and Frey 2004; Lyman 2008), the present study examines relationships within classes of skeletal parts to assess the impact of calculation methods at the anatomical level. This means that each NISP-MNE relationship focuses on a single class of elements (e.g., the mandible) with each data point in the scatter plots representing that element in a different assemblage. Correlations were calculated for 24 classes of skeletal elements, including the cranium; mandible; hyoid; all main types of vertebrae (atlas, axis, other cervical vertebrae, thoracic vertebrae, lumbar vertebrae, sacrum); scapula; ribs; innominates; all six types of long bones; malleolus; carpals; tarsals; and phalanges. NISP-MNE relationships were analyzed by comparing coefficients of determination (R 2) obtained using a linear versus power function. In these comparisons, a better fit with a power function means that the two measures increase at different rates with increasing sample size. This last pattern is problematic because it implies that the two metrics will not be fully comparable at different sample sizes and depending on methods of aggregation (Grayson 1984). We also note that some variation, due to differences in patterns of site occupation, context of preservation, and degree of fragmentation, is expected in the dataset. Nonetheless, the fact that the same assemblages—or a sub-sample of these assemblages in cases of missing data—are included in the regressions should make the results roughly comparable between classes of elements.

Analysis of the NISP-MNE Data

Table 2 gives the regression equations obtained for each of the 24 classes of skeletal elements. When all the assemblages with relevant data are considered, the comparisons indicate that a linear function provides the best-fit model in 13 classes of elements, whereas a power function gives the most parsimonious model for 8 classes of elements. These differences in frequencies are not statistically different from random (χ 2 = 1.2, p = 0.28). Best-fit models are shown for the long bones in Fig. 1. We note that for some elements, the power function substantially improves the strength of the NISP-MNE relationship, as is the case for ribs (linear: R 2 = 0.58, p < 0.0001; power: R 2 = 0.74, p < 0.0001), cervical vertebrae other than the atlas or axis (linear: R 2 = 0.59, p < 0.0001; power: R 2 = 0.88, p < 0.0001) and thoracic vertebrae (linear: R 2 = 0.61, p < 0.0001; power: R 2 = 0.79, p < 0.0001).

Table 2 Best-fit relationships between NISP and MNE data for a sample of Paleolithic assemblages
Fig. 1
figure 1

NISP-MNE relationships for classes of long bones in a sample of Paleolithic assemblages. Only the best-fit line is shown for each class of elements (see Table 2)

Excluding large collections (NISP >50) from the correlations provides information on the numerical response of MNE to changes in NISP in small assemblages. In the small samples (Table 3, Figs. 2, 3, 4, and 5), the relationships between the two metrics are generally better described by a power (n = 17) rather than a linear function (n = 5), the difference in frequencies between the two types of function being significantly different from chance (χ 2 = 6.5, p < 0.02). The high number of elements for which a power function provides the best-fit model appears to confirm that MNE is, as predicted, curvilinearly related to NISP in small samples. The metatarsal is the only large bone that shows a stronger fit with a linear function. However, this probably reflects sampling error, as the power function gives the best fit model for this bone at a smaller threshold value for sample size, although the relationship is not significant (linear: R 2 = 0.03, p = 0.50; power: R 2 = 0.11, p < 0.15, NISP ≤30). The other elements that yield a better fit with a linear function—the atlas, axis, hyoid, and small tarsals—all show very minor differences between NISP and MNE estimates.

Table 3 Best-fit relationships between NISP and MNE data for a sample of Paleolithic assemblages with NISP size ≤50
Fig. 2
figure 2

NISP-MNE relationships for classes of long bones in a sample of Paleolithic assemblages with NISP sample size ≤50. Only the best-fit line is shown for each class of elements (see Table 3)

Fig. 3
figure 3

NISP-MNE relationships for six classes of skeletal elements in a sample of Paleolithic assemblages with NISP sample size ≤50. Only the best-fit line is shown for each class of elements (see Table 3)

Fig. 4
figure 4

NISP-MNE relationships for elements of the spine in a sample of Paleolithic assemblages with NISP sample size ≤50. Only the best-fit line is shown for each class of elements (see Table 3)

Fig. 5
figure 5

NISP-MNE relationships for six classes of short/small elements in a sample of Paleolithic assemblages with NISP sample size ≤50. Only the best-fit line is shown for each class of elements (see Table 3)

The above-mentioned findings mean that MNE tends to inflate the representation of poorly represented elements, similar to the way in which MNI exaggerates the representation of individuals in small assemblages (Ducos 1968; Casteel 1977; Grayson 1978a, b, 1984; Lyman 2008; Cannon 2013). This finding implies that the calculation of MNE values is influenced by the size of the NISP sample. The trend for MNE values to increase at a decelerating rate relative to NISP likely follows from the decreasing probability of identifying new elements as NISP increases (Lyman 2008).

What our results also indicate is that MNE tallies may not increase proportionally between classes of skeletal parts, a problem illustrated in Fig. 6. Indeed, some classes of elements, such as the cranium, mandible, hyoid, scapula, vertebrae other than the atlas and axis, ribs, innominates, and the long bones, have a low scaling exponent, which is indicative of a strongly curvilinear relationship. Conversely, the atlas, axis, carpals, tarsals, malleolus, and phalanges show a high scaling exponent, which is indicative of a weakly curvilinear relationship. Because archaeological samples often differ widely in NISP size and skeletal composition, the fact that different elements do not always increase proportionately severely constrains the usefulness of MNE as a proxy measure of skeletal abundance.

Fig. 6
figure 6

Plot showing the shape of the NISP-MNE curve for four skeletal elements. The data are the same as those displayed in Figs. 2, 3 and 5. Note that the scales are harmonized in the present figure

While scaling exponents shed light on the numerical response of MNE to an increase in NISP size, comparisons of the coefficients of determination may provide information about variation in reproducibility of tallies between classes of parts. Indeed, holding the assemblages equal, elements that are easily and consistently counted should produce high NISP-MNE correlations, whereas a wide scatter of MNE values at comparable NISP sizes likely indicates that the abundance of the part of interest is more difficult to calculate. In the Paleolithic dataset, the small samples indicate a trend for the thoracic vertebrae, mandible, ribs, and more particularly, the cranium and metatarsal to show only moderate relationships (R 2 = 0.33–0.68, power function) between NISP and MNE data. The increased data dispersion observed in these elements may reflect a problem of reproducibility of the tallies (Fig. 7). Conversely, the relationships are very strong (R 2 = 0.84–1.00, power function) for the cervical and lumbar vertebrae, the sacrum, innominates and most of the small/short bones (i.e., carpals, malleolus, tarsals, phalanges). These very strong relationships indicate that methods of calculation have only a small impact on the MNE values for these parts, presumably because they are mostly represented—as suggested by our own experience with some of these assemblages—by relatively complete specimens or by fragments with the same landmarks. If the above-mentioned assumptions are correct, these observations suggest that classes of parts are not always comparable in terms of MNE reproducibility.

Fig. 7
figure 7

Plot showing the coefficients of determination (R 2) for six skeletal elements. The R 2 values are those given in Table 3. Note that, in the sample of Paleolithic assemblages, the metatarsal is mostly represented by shaft fragments, which probably explains the low coefficients of determination obtained for this bone

In this dataset, reproducibility seems substantially lower than among the blind test participants (see Morin et al., part I, this issue). This difference is easily explained. While Paleolithic specialists use a wide variety of methods for deriving MNE values, the subjects who participated in the blind test received very specific instructions about how to tally elements, which likely considerably increased reproducibility. This strategy was deliberate as the goal of the blind test was to produce a conservative test in which counting methods were as comparable as possible between the subjects.

Yet, in actual practice, archaeozoologists often diverge greatly in the way they generate MNE counts (Lyman 1994, 2008; Marean et al. 2001; Reitz and Wing 2008). For instance, some analysts identify new elements by looking for overlap between fragments manually (e.g., Bunn and Kroll 1986) or using GIS software (e.g., Marean et al. 2001), whereas others prefer to sum fractions in specific zones of bones (Klein and Cruz-Uribe 1984). Moreover, there is much variation between authors in the consideration of criteria relating to sex, age, size, and morphological idiosyncrasies when identifying new elements. Studies have also highlighted major disparities in how faunal specialists treat long bone shaft fragments in their tallies: some consider them to be critical (Bunn 1991; Marean and Kim 1998; Pickering et al. 2003; Cleghorn and Marean 2004), while others believe they can be safely ignored (Klein et al. 1999; Stiner 2002). Thus, results obtained with MNE in real archaeological applications are much more variable than suggested by the blind test. However, our experiments indicate that the problem does not necessarily lie in the measure itself, but in a lack of consensus about how MNE values should be calculated in archaeological contexts.

Other problems have been identified with MNE. Estimates based on the minimum number concept suffer from the notorious problem of aggregation (Paaver 1958 [cited in Casteel 1977]; Grayson 1973, 1979, 1984; Lyman 1994, 2008). Indeed, the fact that MNE values need to be recalculated after a new field season or when stratigraphic and/or spatial units are modified at a site is a major drawback as it slows the process of analysis and publication. Moreover, Grayson (1973, 1979, 1984) showed that different aggregation approaches applied to the same assemblage can alter the rank order of taxa at a site when tallies are based on the minimum number concept. As emphasized by Grayson (1984) and Lyman (2008), the problem of aggregation significantly undermines the value of MNE as a tallying method. This and the other issues raised earlier confirm that MNE is an imperfect measure of skeletal abundance. A different approach discussed below circumvents many of the pitfalls associated with this method.

An Alternative Metric: the Number of Distinct Elements

The many, sometimes daunting, problems associated with NISP, MNE, and MNI have led several specialists to explore alternative metrics of skeletal and taxonomic abundance (e.g., Davis 1992; Albarella and Davis 1996; Broughton 2004). Here, we discuss a different type of estimate that focuses on the abundance of specific landmarks. This metric, referred to as the Number of Distinct Elements (NDE), simply tallies the number of times a diagnostic landmark is represented in a sample of specimens attributed to the same element and taxon. In this approach, a landmark gives a NDE tally of “1” if and only if the fragment shows at least 50 % of the cortical surface of that landmark. This is a safeguard to avoid counting the same element more than once. Although developed independently, this approach is similar in design to one devised by Watson (1979) for ungulates, and to counting systems implemented by archaeomalacologists (Mason et al. 1998; Harris et al. 2015).

With respect to vertebrates, the main differences between the NDE and Watson’s “diagnostic zone” approach lie in how landmarks are defined. The landmarks used in the NDE tend to cover smaller regions of bones, include a wider range of elements and bone regions (such as shaft fragments, sesamoids, ribs, and vertebrae), and are assessed using a control cutout. Moreover, in contrast to Watson’s method, left and right sides of paired elements are not treated as separate landmarks in the NDE, although these can be distinguished in a database. Despite these differences, both approaches share the same goal of tallying distinct elements with the aid of a pre-determined list of landmarks.

An example will illustrate how NDE estimates are generated. In a sample of five red deer mandible specimens, three left and one right fragments give an NDE of 4 for the mandibular condyle because this landmark is mostly or fully represented on all four remains (Fig. 8). This tally excludes the fifth fragment that only comprises a small fraction of the landmark (the rightmost specimen in Fig. 8). This specimen is ignored in the NDE approach because future samples from the same archaeological context may comprise the associated fragment that contains most of the landmark. Indeed, as pointed out by Watson (1979, pp. 129–130), because the aim is to avoid counting the same element twice, one must: “reject any piece that does not have more than half the zone present.” Importantly, to reduce subjectivity and increase standardization, a small square cutout (shown as black squares in Fig. 8) provides a control for assessing whether at least 50 % of the cortical surface of the landmark is preserved. Concerning foramina, the NDE count only includes specimens that comprise at least half of the external circumference of the foramen.

Fig. 8
figure 8

Calculation of NDE values in a sample comprising five red deer mandible specimens. Because the rightmost specimen shows less than 50 % of the cortical surface of the landmark, it was not included in the total NDE. The control cutout shown here has an internal side of 15 mm

We note that the size of the cutout may need to be adapted to the size of the species. For small- (<50 kg), moderate- (50–250 kg) and large-sized (>250 kg) species, we suggest using control cutouts with an internal side of 10, 15, and 25 mm, respectively. We also note that for purposes of standardization, the cutout is invariably of the same shape (a square) and dimensions for a given body size class and is preferably made of flexible material (e.g., fabric, plastic wrapping) so that it can be placed on the surface of bones with different morphology. In all cases, the landmark must lay at the center of the cutout.

For cervids and bovids, the NDE is based on 87 landmarks covering all classes of skeletal elements (Table 4). This list includes many “point-specific” landmarks that are likely to occur even on small fragments, such as nutrient foramina, fossae, eminences, the base of processes, and small articulatory surfaces. To ensure consistent coverage of all portions of the long bones, four landmarks were selected for each class of long bones: one for each epiphysis and one for each of the shaft halves (Fig. 9). For cervids, we include a fifth landmark—length of the anterior groove in millimeters (Castel 1999)—for the metatarsal and metacarpal because there are no point-specific landmarks on the shaft of these bones, but only long homogeneous diagnostic zones (Fig. 10). To derive an NDE value for the metatarsal and metacarpal, the summed measurements obtained for this landmark are divided by the known length of a metatarsal or metacarpal similar in dimensions to those reconstructed for the assemblage. For skeletal elements other than long bones (Fig. 11), between one and three landmarks were selected, generally as a function of the size of the bone.

Table 4 Landmarks that form the basis of the NDEfor cervids and bovids
Fig. 9
figure 9

Location of landmarks for cervid and bovid long bones. The landmarks are described in Table 4. All images of elements are from the left side. The bones were drawn by François Lacrampe-Cuyaubère, Archéosphère

Fig. 10
figure 10

Method used to calculate measurement length in mm for the metatarsal and metacarpal. See text for explanation

Fig. 11
figure 11

Location of landmarks for cervid and bovid elements other than long bones. The landmarks are described in Table 4. Images of all paired elements are from the left side. The bones were drawn by François Lacrampe-Cuyaubère, Archéosphère

Landmarks were generally chosen on the basis of their susceptibility of being identified in highly fragmented assemblages, as well as experience with archaeological material. It is important to note that, when calculating the total for a class of elements with multiple landmarks, only the NDE value from the most common landmark is retained. This is crucial as it ensures that the tallies are truly independent at the level of the element. To illustrate this point, two examples are given in Fig. 12. In this figure, the NDE for the tibia is “6,” and that for the mandible is “5.” Having multiple landmarks is also useful because they allow for intra-element comparisons of patterns of representation.

Fig. 12
figure 12

Example of calculation of NDE estimates for the tibia and mandible

In addition to the study of skeletal abundances, the NDE can be used as a proxy measure of taxonomic representation. However, NDE tallies will normally need to be adjusted because species tend to differ in frequencies of bones. In a manner reminiscent of the MAU routine (Binford 1984), normed NDE (NNDE for short) values are derived by dividing the NDE count for an element by the abundance of the same element in a living animal. The NNDE values can then be summed (ΣNNDE) to obtain a total standardized element count for each taxon. The NDE is thus a flexible measure as it can be used to estimate both skeletal and taxonomic abundances.

Comparisons of the NDE and MNE Approaches

Because it was developed late in the course of this research, the NDE approach could not be integrated in the blind test discussed in part I. Nonetheless, NDE values were derived by the first author (see Appendix Table 7) for the experimental samples presented in part I and compared with the actual numbers of elements (ANE). Unfortunately, the author’s prior knowledge about the composition of the samples and other factors (long bone specimens in one experiment were extensively refitted for another experiment) restrain the power of this “test,” although efforts were made to ignore these favorable conditions during the counting procedure. The results presented below therefore provide only a preliminary assessment of the accuracy of the NDE.

Before discussing the results of this test, some information must be presented about the experimental samples that were used as a control. The two samples comprise a large number of red deer (Cervus elaphus, ΣANE = 501, samples combined) and a few cattle (Bos taurus, ΣANE = 5, samples combined) elements, mostly long bones from the former species. In the first experiment (called the marrow-cracking experiment (MCE)), elements were only marrow-cracked, whereas in the second experiment (called the bone grease rendering experiment (BGRE)), elements were marrow-cracked and subsequently processed for bone grease. Detailed records were kept of initial abundances and specimen counts. Additional information about the samples can be found in part I.

Rank order correlations between the NDE and ANE values are very strong in the MCE and BGRE for all elements, and for the sample that excludes long bones (Table 5). As was the case with MNE (see part I), the relationship for long bones is weaker in the MCE. The data also indicate that the performance of the two metrics is similar for long bone regions (Table 6). Comparisons at the taxonomic level show no statistical difference between the ΣNDE and ΣANE (MCE: χ 2 = 0.2, p = 0.68; BGRE: χ 2 = 0.1, p = 0.72, whole bone counts, vestigial metacarpals excluded [see Table 5, note 1], ΣNDE counts from Appendix Table 7, raw NDE values were used because the two species comprise the same skeletal frequencies for the elements considered). These observations suggest that the NDE is as robust as MNE for estimating skeletal, and possibly, taxonomic abundances.

Table 5 Spearman’s rank correlation tests of accuracy for MNE versus NDE
Table 6 Spearman’s rank correlation tests of accuracy in estimates of skeletal abundances for bone regions comparing MNE with NDE

Advantages and Limitations of the NDE Approach

In addition to yielding promising experimental results, NDE has five advantages over MNE. These include the following:

  1. 1.

    NDE counts are more easily calculated than MNE counts.

  2. 2.

    The measure is inherently more standardized than the MNE method.

  3. 3.

    NDE values are expected to increase linearly with NISP sample size.

  4. 4.

    The NDE approach does not suffer from the problem of aggregation.

  5. 5.

    NDE counts facilitate comparisons with recent approaches to calculating mollusk abundance.

The first point is relatively straightforward. The NDE approach eliminates the time-consuming task of spreading/drawing the material and looking for overlaps between specimens (see Marean et al. [2001] for a review of overlap methods with MNE). With the NDE, the tallying process simply involves identifying a diagnostic landmark and tabulating it. Thus, NDE totals can be generated at any point during analysis, for instance, by counting the number of times a given landmark is recorded as present in a spreadsheet column. In the example provided in Fig. 13, tallies are easily calculated: the NDE for landmark #18 (femur, Fovea capitis) is “3,” that for landmark #22 (femur, Sulcus extensorius) is “2,” and that for landmark #23 (femur, foramen) is “1.” If needed, the side of a paired element can be identified by adding an “L” (for left) or an “R” (for right) next to the landmark number in the database. Moreover, because specimens counted by a single NDE landmark are, by definition, from distinct elements (each represents >50 % of the landmark), the problem of interdependence seen in NISP is also avoided.

Fig. 13
figure 13

An example showing how the NDE can be calculated from a spreadsheet. Numbers in the “NDE” column correspond to the landmark identification numbers in Table 4. For each identified specimen, a landmark ID number is entered only when at least half of the landmark is present. In this figure, the total NDE for landmarks #18, #22 and #23 are: 3 (2 lefts and 1 right), 2 (2 lefts) and 1 (1 left), respectively. Note that one of the specimens shows two landmarks. For this reason, both are tabulated. Alternately, the information can be recorded on two separate lines with an additional column indicating the specimen ID, which may be more suitable for many databases

The second advantage of NDE follows from the first. As discussed earlier, methods of MNE calculation are notoriously variable. Common sources of disagreement include whether specimens should be tallied as fractions or integers and whether long bone shaft fragments and criteria of sex, age, and size should be considered during the counting process (Klein and Cruz-Uribe 1984; Bunn 1991; Lyman 1994, 2008; Marean et al. 2001; Reitz and Wing 2008). In comparison, the NDE approach allows for greater standardization because the method focuses on a set of precise and invariable landmarks that are counted as integers. Moreover, criteria of sex, age, and size are not pertinent to the calculation of NDE values, and therefore, are ignored. The use of a control cutout also contributes to reducing subjectivity. For these reasons, the approach should produce tallies that are more comparable between specialists and samples than those based on MNE.

The third advantage with NDE is that it does not inflate the representation of rare elements. NDE values are expected to increase linearly with NISP because the probability of identifying a new element is independent of previous identifications. This is unlike MNE, as identifications of new elements are linked in this approach, which results in tallies increasing at a decelerating rate relative to NISP (Lyman 2008). However, analyses of large samples of paired NISP-NDE data will be required to verify these inferences.

A fourth advantage revolves around the issue of aggregation. Grayson (1973, 1979, 1984) pointed out that MNE (and MNI) values are not additive because skeletal elements are unlikely to be identically distributed across all excavation units. Indeed, due to the vagaries of sampling, the most common element in an excavation unit is likely to differ from the most common element in another or larger excavation unit (Ducos 1968; Grayson 1984). Figure 14 illustrates this difficulty by showing the same set of metatarsal specimens in two hypothetical situations: a floor plan consisting of two houses and a stratigraphy composed of two layers. In the house example, the total of the MNE values (4 + 2 = 6) is identical to the actual MNE value (“6”) for the aggregated houses because all the tallies were derived from the same landmark (identified by a polygon). In contrast, in the stratigraphic example, the MNE for layer A was derived from the landmark marked by a circle, whereas that for layer B was derived from a different landmark identified by a square. To complicate things even further, the MNE for the sample that combines the two layers was derived from a third landmark (marked by a polygon). Because the most abundant landmark differs between these three units, it is not surprising to find that the actual value for the aggregated layers (MNE = 6) conflicts with the total (4 + 4 = 8) of the MNE values for layers A and B.

Fig. 14
figure 14

Example showing how MNE values for the same set of bones differ in two hypothetical situations: a floor plan with two houses and a stratigraphy comprising two layers. In the uppermost example, the total for the two houses (MNE = 6) is consistent with the actual value (MNE = 6) because the same landmark (marked by a polygon) was used. In the lowermost example, the total (MNE = 8) and the actual value (MNE = 6) conflict because different landmarks were used in the derivation of the tallies

Unlike MNE or MNI, the landmarks that provide the estimates in the NDE approach are constant and do not vary with the composition of the sample. The use of constant landmarks means that aggregation has no effect on the calculation of NDE counts. Figure 15 demonstrates this point by using the same set of metatarsal specimens shown in Fig. 14. In both the floor plan and stratigraphic examples, the NDE approach gives identical values (“5”) for the aggregated samples. This aspect of the NDE is critical, as it indicates that counts for a given landmark can, like NISP tallies, be added indefinitely as long as the criteria of identification remain unchanged and that data are reported for all landmarks. Counts may not be fully additive if only published for whole bones, as the most common landmark may differ between excavation units for elements with multiple landmarks. Systematic reporting of all landmark data simply and effectively eliminates this potential problem. In avoiding the problem of aggregation, the NDE approach represents a marked improvement relative to the MNE.

Fig. 15
figure 15

Two examples involving the same set of bones and the same methods of aggregation shown in Fig. 14, but this time calculated with the NDE. Note that the values derived in the two examples are consistent

The fifth advantage of NDE relates to the study of contexts where counts for vertebrates need to be compared with those for molluscs, a group of animals with a simplified exoskeleton (Claassen 1998). Recently, archaeomalacologists have turned to new methods for estimating the abundance of molluscan taxa using pre-determined “non-repetitive elements” (NRE) that provide the basis for the calculation of MNI values (Giovas 2009; Harris et al. 2015; Thomas and Mannino 2016). Of particular importance in this respect is Harris et al.’s (2015) extensive study aimed at standardizing NRE methods of calculation for a wide range of diagnostic landmarks, including hinges and umbos in bivalves and apices in univalves. In their approach, NRE-based MNI differ conceptually from vertebrate MNI because the former counts circumvent the problem of aggregation by tallying mutually exclusive specimens using pre-determined landmarks. This fundamental characteristic entails that the NRE-based MNI in Harris et al. (2015) is conceptually similar to the NDE. For this reason, the two approaches should enable sounder comparisons of vertebrate and invertebrate tallies than previous counting methods.

The above-mentioned features and the simplicity of the method make NDE a valuable substitute to traditional metrics such as MNE and MNI. Familiarizing oneself with the NDE does not take more than two or three hours of practice for an experienced analyst. Although the number of landmarks that one must pay attention to is relatively large, most of them are probably well known to the archaeozoologist. Importantly, a test of the approach with Paleolithic material has shown that the NDE only marginally affects analysis time relative to just tallying NISP and is considerably more efficient than methods commonly used for tabulating MNE. One current disadvantage with the NDE approach is that the landmarks in Table 4 are only relevant for cervids and most bovids. We hope in the near future to publish lists of landmarks for a large number of taxonomic groups (e.g., equids, suids, carnivores). Another limitation of the method is that it may not be suitable when estimates need to be generated for bone portions that do not include a NDE landmark.

Discussion

As discussed in part I, MNE seems to avoid many biases inherent in NISP. However, comparisons of NISP-MNE data within classes of skeletal parts show a clear trend for the relationships to be curvilinear. The degree of curvilinearity seems to be mostly explained by fragmentation. In general, the more fragmented an element, the greater the tendency for the relationship to be curvilinear. This observation probably explains why rodent elements—more frequently recovered complete than ungulate elements—tend toward linearity in NISP-MNI comparisons (Grayson 1984; Lyman 2003, 2008).

The curvilinear relationships observed in the Paleolithic sample mean that MNE, like MNI, tends to inflate the representation of rare elements. Therefore, derived metrics such as NISP/MNE ratios—or their converse, MNE/NISP ratios—are unlikely to be reliable proxies of fragmentation because they measure, among other factors, the size of the NISP sample. In fact, any ratio that incorporates a proxy estimate based on the minimum number concept will be affected by this problem, given that built into this approach is a pattern of declining probability of identification of new units with increasing NISP (Lyman 2008). Viable alternatives for the study of fragmentation include the use of mean fragment length and mean area measurements of specimens (Lyman and O’Brien 1987, pp. 494–497; Morin 2012, pp. 95–98; Cannon 2013, pp. 413–416).

Problems identified in the preceding analyses led to the suggestion of a substitute counting method, the NDE. Like Watson’s (1979) diagnostic zone approach, the NDE is a simpler and more productive alternative to metrics commonly used in archaeozoological analysis. Indeed, unlike MNE, NDE should increase linearly with NISP size and is not affected by the problem of aggregation. Moreover, NDE is additive; counts can be derived and re-derived almost instantaneously as new identifications are entered into a database. Furthermore, comparisons can more readily be made with mollusk NRE-based MNI counts. These advantages mean that NDE values are more appropriate for statistical treatment than values generated with other approaches such as MNE or MNI. Importantly, in order to facilitate intra- and inter-site comparisons, we encourage archaeozoologists to publish raw NDE data for all observed landmarks. Systematic reporting is essential in the present case as it permits analysts to calculate and recalculate NDE values for whole elements using any landmark.

One may object that the NDE approach gives values that tend to be slightly smaller than MNE, and that an element known to be present in the sample may become analytically absent if its specimens show only NDE landmarks <50 % or none at all. These observations are correct and echo points made in the archaeomalacological literature about some of the NRE-based methods of calculation (Mason et al. 1998, 2000; Giovas 2009; Gutiérrez Zugasti 2011; Harris et al. 2015; Thomas and Mannino 2016). However, generating large (or larger) values is not a guarantee of accuracy, NISP being a clear example. In fact, as pointed out by Ducos (1968) nearly 50 years ago, what is critical is not the magnitude of the values but their proportionality to the actual number of elements, individuals and species in a sample. The experimental results seem to confirm that NDE provide estimates that are proportional to known abundances.

In a related vein, it should be noted that attempts to increase MNE (or MNI) tallies through matching do not increase accuracy because the success of this exercise is highly dependent on the characteristics and, perhaps more importantly, the size of the sample (Watson 1979; Klein and Cruz-Uribe 1984; Lyman 2006). Indeed, identifying individuals or elements using sex, age, size, and similar criteria is easy with three specimens, a bit more challenging with 10 fragments and becomes complicated with 20 or more remains. Furthermore, matching teeth is much easier than matching long bone shaft specimens or scapula fragments, which introduces another source of bias. Consequently, trying to derive the largest MNE or MNI from a sample in order to get closer to the “real” abundances is a trap because this procedure exaggerates the curvilinear relationship between MNE and NISP. For this reason, we believe this practice should be discontinued.

For similar reasons, and despite relatively encouraging experimental results (Hudson 1990; Domínguez-Rodrigo 2012), MNI is probably a poor proxy measure of taxonomic abundance. Indeed, its curvilinear relationship to NISP is known to influence the representation of rare taxa in small samples, an issue of particular relevance in geographical regions such as the tropics, where collections frequently comprise a wide range of species represented by few specimens (see Emery [2008] and Boileau [2014] for examples). Another problem with this measure is the considerable variation in the way MNI counts are generated between analysts (Payne 1972; Watson 1979; Grayson 1984; Lyman 2008; Reitz and Wing 2008). Although rarely used with the aim of quantifying taxonomic composition, the experimental samples show that ΣMNE (see part I) and ΣNDE counts produce more accurate estimates of species representation than NISP. Indeed, summing element counts by taxon seems a more effective way of interpreting variation in taxonomic composition than counting numbers of specimens or minimum numbers of individuals.

Because they have fewer limitations, the NDE approach and its derivatives (ΣNDE or ΣNNDE depending on whether the values need to be normed) seem to represent improved alternatives relative to MNE and MNI. However, the NDE was not designed to replace NISP, a more fundamental measure. Given that these last two metrics are partly complementary, more robust interpretations of skeletal and taxonomic abundances can potentially be achieved by using them in concert.

Conclusion

The MNE approach has often been reviewed with mixed enthusiasm, in large part due to the many problems that have been identified with it. Foremost among these are a lack of standardization in calculation methods and the non-additivity of the values. Moreover, our results show that the MNE approach inflates the abundance of rare elements, particularly at small sample sizes. The non-linear relationship with NISP sample size means that MNE is a less than ideal measure of abundance.

The NDE emerges as an improvement over the MNE approach because it largely avoids these problems. In addition to yielding positive experimental results, this metric produces mutually independent values that are presumably unbiased by sample size and methods of aggregation. Moreover, by eliminating the complicated step of comparing and/or drawing specimens to identify zones of overlap, the NDE approach provides gains in time that are significant, particularly in large assemblages. Thus, the NDE approach represents a step forward toward the production of more robust interpretations of past skeletal and taxonomic abundances.