Introduction

In the last few decades, phytolith analysis has become an essential tool for the study of past and present vegetation and for the identification of past plant resource management and consumption. Phytolith research in archaeology, palaeoecology and geology has intensified substantially since the 1970s (see Cabanes 2008: 40–41; Madella et al. 2013a: 1). International conferences, as well as specialised sessions and workshops on phytolith studies, are currently common, showing that phytolith analysis is a mature discipline (see Albert and Madella 2009; Madella and Zurro 2007; Madella et al. 2013b; Meunier and Colin 2001; Pinilla et al. 1997; among others).

The state-of-the-art of phytolith studies show that there is a lack of balance between different steps of the research process, and despite the steep growth, phytolith studies have not yet reached a point where we can think about a real standardisation of all the aspects involved.

Certain areas, such as laboratory procedures (e.g. Calegari et al. 2013a; Lentfer and Boyd 1998, 1999; Parr 2002; Zhao and Pearsall 1998; Zucol and Osterrieth 2002), the production of reference collections (e.g. Albert et al. 2007; Carnelli et al. 2001; Whang et al. 1998), taxonomical and botanical identification (e.g. Ball et al. 1996, 1999; Bozarth 1990; Ollendorf et al. 1988; Piperno 1988, 2006; Twiss 1992) or the understanding of the silicification physiology (e.g. Bennet and Parry 1980; Morikawa and Saigusa 2004; Sangster et al. 1983), have been well explored. However, other issues have been less well researched: sampling, counting procedures and interpretation criteria are among them (Zurro et al. 2016). One of the reasons that may explain this disparity might be that while extraction procedures or reference collection development are common to several disciplines that apply phytolith analysis, other issues are discipline oriented. Therefore, they require specific development in each field.

Human activities produce specific inputs in the archaeological record. These inputs comprise of different levels of anthropisation that range from the changes in soils due to agricultural practices to the occupation layers we find in several archaeological sites. Hence, the challenge of archaeobotanical research is to discriminate the specific human actions within an anthropically modified environment. While the study of arable fields or landscapes can address palaeovegetation reconstruction or climate change, the analyses of occupation layers or intrasite analyses are essentially the focus of interest for archaeological studies that are less often shared with other disciplines.

Experimental archaeology and ethnoarchaeology have been systematically used to construct methodological tools to aid the evaluation of these inputs in order to identify production and consumption processes associated with plant materials. In this respect, the main concerns are do these anthropic phytolith assemblages differ from natural ones? And if so, can we associate particular assemblages to the specific activities that produced them? Recent research in this area is encouraging. These issues have been approached in a few studies focusing on work investment (Zurro 2006), recognition of specific agricultural by-products (Harvey and Fuller 2005) or the creation of interpretative indices and markers developed from ethnoarchaeology (Albert et al. 2008; Novello and Barboni 2015; Rondelli et al. 2014; Tsartsidou et al. 2008; Zurro 2011). Within the field of palaeoecological studies, several indices and ratios have been developed as well, aiming at the interpretation of the phytolith assemblages for vegetation reconstruction (see Strömberg 2009: 124 and Coe et al. 2013: 69 for a review).

Considering, however, that both phytolith richness and variability characterise an assemblage, do different archaeological contexts require a specific count size too?

This paper examines count size in phytolith studies and presents the results from counting experiments carried out on material from different archaeological contexts. The study is based on the application of well-established tests to archaeological samples in order to evaluate both variability and component distribution of morphotypes within archaeological samples (Chao et al. 2005; Gray et al. 2004).

The rationale behind this research is that count size must not only be accurate and precise but also oriented towards efficiency. Indeed, there is no need for a large work investment when bigger count sizes only produce redundant data. While studies on phytolith count size have been performed in palaeoecology (i.e. semi-anthropic and natural contexts), they need to be explored archaeologically to assess whether variability and richness differ substantially in “anthropic” contexts.

Counting methods in archaeobotany

Counting methods, as subsampling methods, are concerned with the minimum number of individuals needed to retrieve a representative sample of the population under study. We maintain that the concept of “representativeness” has to be formulated in relation to specific research questions.

To produce a representative assemblage, the observations must recover diversity, which is the combination of evenness (abundance of each taxon within a sample) and richness (number of present taxa). Another factor to take into consideration is the procedure used to obtain this diversity (see, for instance, the use of Lycopodium spore tablets when counting pollen). The control of this procedure is essential to avoid biases during sample recovery or processing. Indeed, bigger count size does not necessarily signify smaller biases.

In archaeobotany, the accepted minimum number is very similar for different remains (see for general review Wright 2010), and several variability and taxonomical redundancy tests have been developed (see Lepofsky 2002; Shackleton and Prins 1992; Wright 2005) as well as other methods to check error margins (see Albert et al. 2003; Pegg and Weisstein 2013). In anthracological studies, counts can range between 200 and 500 fragments (Badal et al. 2003) or 300 and 400 (Scheel-Ybert 2002) with most researchers counting ca. 300 (Chabal 1997; Chabal et al. 1999). In pedoanthracology, 200 fragments are considered a meaningful number (Nelle et al. 2013; Dotte-Sarout et al. 2015). In carpology, ubiquity has been sometimes considered enough (Marinval 1988) but other factors such as density per unit of soil volume has been used too (Buxó 1993; Jones 1991; Van der Veen 1985). In palynology, a count of 300 individuals is considered an accepted standard (Moore et al. 1991) even though some studies count a total of 150 to 300 palynomorphs (Burjachs 1992). However, further counting is recommended for studies that need to capture rare but ecologically significant forms (see Piperno 1988), such as those in high biodiversity environments (Rull 1987). Additionally, analyses of specific species or of specific morphologies in phytolith analyses require higher count sizes so that these species or morphotypes get an accepted count size too (Iriarte and Paz 2009).

Therefore, it does appear that for most case studies, 300 individuals are accepted as a standard count size for macrobotanical (anthracological and carpological analyses) as well as for microbotanical remains (pollen and phytoliths) in both archaeology and paleoecology.

Minimum phytolith sum in archaeological studies

The phytolith assemblage constitutes the minimum interpretative unit and it is defined as “the total tabulation and quantification in percentages, absolute numbers or ratios of all morphological variants observed in a sample” (Piperno 1988: 132). Count size must represent the diversity of a given sample, but this resulting number is the product of different steps; “(…) the last in the long series of steps of sub-sampling that occur when a soil assemblage or a plant sample is processed and analyzed for phytoliths” (Strömberg 2009: 125).

The phytolith association, on the other hand, refers to a given set of morphotypes whose nature can be explained on the basis of a specific input or event (i.e. it corresponds to a species or plant community, to a soil or is the result of anthropic activity) (Zucol 1996, 1998; Zurro 2011). Since in archaeological contexts multiple anthropic and non-anthropic inputs are mixed, it is paramount to produce reference frames, both ethnographical and plant collections, to allow the construction of correct intepretative models. Analogous to the pollen sum, the phytolith sum (PS) is the result of a counting method set in place to produce a phytolith assemblage (Horrocks et al. 2003: 15; Zurro 2011: 121).

In phytolith studies, the minimum count size has not been discussed in depth. The phytolith community has reached a kind of consensus for a count size of ca. 250 phytoliths per sample (see Piperno 2006: 115 for a general discussion on this topic). However, most publications do not explicitly state why a specific phytolith count size or PS was chosen apart from a few exceptions (e.g. Morris et al. 2009: 93; or Pearsall 2000) and works specifically devoted to counting procedures are few (see Strömberg 2009 as an exception).

Albert and Weiner (2001: 258) positively tested the significance of a count of 200 phytoliths. The test was performed with reference collection material (Quercus calliprinos Webb), which constitutes a controlled environment when compared to an archaeological sample. Other researchers carried out tests to check the statistical significance of countings (Ball et al. 1996, 1999). Strömberg, on the other hand, reviewed phytolith counting across different disciplines, providing a proposal for counting phytoliths in palaeoecological studies (Strömberg 2009).

Even though a 250 PS seems to be considered a standard minimum number of phytoliths, the approaches developed in phytolith analysis for “collecting” a significant PS are quite varied (see Table 1) and they can be summarised as

  1. 1.

    Counting a minimum number of phytoliths, such as 200 (Alexandre et al. 1997: 216), 300 (Thorn 2004: 43), etc.

  2. 2.

    Counting a minimum number of significant morphotypes, such as grass short cells (Barboni et al. 1999: 91–93).

  3. 3.

    Counting a minimum number of phytoliths in relation to a standard. This can be:

    1. (a)

      Counting according to an internal standard, e.g. the number of phytoliths encountered when a count of 200 grass short cells has been reached.

    2. (b)

      Counting according to an external standard:

      1. i.

        Phytoliths per number of pollen aliquots or diatoms (Abrantes 2003:168; Bracco et al. 2005: 41; Lentfer et al. 1997: 845).

      2. ii.

        Phytoliths per number of mineral grains (Golyeva 1997: 17; Benvenuto et al. 2013: 23; Bonomo et al. 2013: 36).

      3. iii.

        Phytoliths per number of other biosilicates (del Puerto et al. 2013: 92).

    3. (c)

      Phytoliths per slide’s surface, e.g. all slide surface or a number of transects or fieldviews (Albert et al. 2014: 4; Peto 2013: 154; Zurro et al. 2009: 187).

Table 1 Different count sizes proposed to produce appropriate phytolith assemblages following different types of criteria: (a) General phytolith counting, (b) Identifiable phytoliths, (c) Diagnostic phytoliths and (d) Minimum sum for specific morphotypes

Other researchers add 100 silica skeletons to the count (Miller-Rosen 2001), give a specific amount of time to invest in poor samples (Carnelli et al. 2004: 41) or divide the sample in sedimentological fractions (Mercader et al. 2000: 105; Runge 1999: 41). When counting phytoliths from reference collections, where variability is narrower, other approaches are followed. These approaches include scanning five transects per slide (Watling and Iriarte 2013: 168), producing scales of abundance (Wallis 2003: 206) or focusing on specific morphologies or species (Gu et al. 2013: 142; Iriarte 2003: 1092; Piperno et al. 2002: 10,923; Whang et al. 1998: 462).

In addition, many other researchers recommend a count size around 200–300 phytoliths (see column a in Table 1).

To summarise, two issues must be considered. First, counting the methods must produce reliable data that need to be weighed against time invested in counting and the quality of the data produced (Piperno 2006: 115). Second, the presence of multiple standards makes it difficult to carry out comparisons between studies (Strömberg 2004: 244).

Materials and methods

To carry out the current work, samples from two sites of the Iberian Peninsula were chosen: El Mirón and La Bauma del Serrat del Pont (see Fig. 1 and Table 2). El Mirón (M) is a cave of the Cantabric Cornise (Straus and González Morales 2003; Straus et al. 2001, 2002) with an anthropic occupation dating from the Palaeolithic to the Bronze Age (Peña-Chocarro et al. 2005). La Bauma del Serrat del Pont (B) is a rock shelter in northeast Catalonia with an occupation dating from the Mesolithic to pre-Roman times (Alcalde and Saña 2008; Alcalde et al. 2002). The samples, mostly from hunter-gatherer occupations, were chosen to cover the possible spectrum of depositional realities.

Fig. 1
figure 1

Location of El Mirón and La Bauma del Serrat del Pont sites in the Iberian Peninsula. Image modified from the NASA image gallery (http://visibleearth.nasa.gov/view.php?id=64573 (Jacques Descloitres, MODIS Rapid Response Team, NASA/GSFC)

Table 2 Samples from El Mirón (M) and La Bauma del Serrat del Pont (B) with corresponding archaeological information

Sample processing and counting

Sediment samples were processed according to Madella et al. (1998). This procedure comprises the elimination of the carbonates, deffloculation of the sample, removal of the organic matter and densimetrical separation of the minerals with sodium polytungstate (SPT). The opaline silica residues obtained were mounted on microscopy slides with Entellan® or Eukitt® permanent mountings. Samples were scanned at ×400 magnification with an Olympus BX-51 optical microscope. Phytolith concentration per gramme of original sediment was calculated adapting accepted methodologies (Albert et al. 1999; Albert and Weiner 2001).

Furthermore, X-ray diffraction analyses on the sediments were carried out to identify any trace of diagenetic processes, so that further analyses (Fourier transform infrared spectroscopy (FTIR) would be performed (Burton 2000; Karkanas et al. 2000). Results showed no single trace of diagenetic minerals (see the whole set of mineralogical spectra by Zurro (2011)).

Counting was carried out in batches of 50 phytoliths, and a total of 31 phytolith categories were identified (see Table 3). Categories used to record phytolith variability correspond to morphotypes that could be identified without using morphometrics (based on plant tissue and phytolith morphology). No separation between poorly and well-preserved phytoliths was considered, and only isolated phytoliths were counted for the analyses (avoiding silica skeletons that were present only in part of the samples). When samples were very rich, approximately 50% of the counting was carried out along transects in the central area of the slides to avoid any potential bias due to a possible differential spatial distribution of particles. Considering that in the field of phytolith analysis a standard phytolith count is assumed to be ca. 250 particles, in this study, a higher PS was fixed (400 individuals) to test this assumption. Furthermore, in nine samples, the PS was raised to 600 phytoliths to explore behaviour of morphotype variability when increasing count size. In both cases, non-identified phytoliths were included. Once this category was eliminated after the counting, a minimum 350 PS or 550 PS was achieved.

Table 3 Categories used for phytolith identification during microscopy scanning

The tables of raw counts and the R script used for performing part of the statistical analysis can be accessed via a GitHub repository (https://github.com/Dzurro/Zurro_2016JAAS).

Phytolith variability in the phytolith sums

Two techniques were used to assess sample variability: the morphotype accumulation curve (MAC) and the phytolith sum variability analysis (PSVA) (see as examples Fig. 2 for samples with a 400 PS and Fig. 3 for samples with a 600 PS).

Fig. 2
figure 2

MAC and PSVA for samples M1 and B6, both with 400 PS. MACs show that at 250 PS, almost all variability has been recorded. The dendrograms show that low count sizes tend to separate from the rest and that big clusters are created between 300 PS and 400 PS

Fig. 3
figure 3

MAC and PSVA for samples M6 and B9 both with a 600 PS. MACs show that at 250 PS, almost all variability has been recorded. The dendrograms show that high-count sizes (400 PS or more) tend to separate from the rest. Without considering the lowest PSs, sample M5 shows two count areas with no new data: 200–400 and 450–600

In addition, a linear regression analysis was performed to test for dependence between variability and count size. For each count batch, the median variability was extracted and regressed against increasing sample size of the corresponding batch.

Morphotype accumulation curve

Techniques for the identification of species variability originated within ecological studies on the basis of the “habitat heterogeneity” hypothesis. According to this hypothesis, different ecological niches have different characteristics and thus different flora and fauna will occupy them. Therefore, the more areas that are surveyed, the more taxonomical variability is recorded (Cramer and Willig 2005: 209). Species-area relationships (SAR) techniques (see Drakare et al. 2005) were designed to establish a relationship between both variables: surveyed or sampled areas and number of species found in them (usually represented as species accumulation curves; Rosindell and Cornell 2009; Zillio and He 2010).

In archaeobotany, it is also essential to recover the variability (richness) of a sample, identifying rare species that can be diagnostic of anthropic activities (on site and off site) or of ecological communities (Jones 1991; Pearsall 2000). Rare species (or morphotypes) may not have statistical weight, but they can strongly influence the interpretation of the assemblages. The species accumulation curve (SAC, see Gray et al. 2004) is an approach similar to SAR that obviates spatial information (Lyman and Ames 2007) and that is widely used in archaeobotany (Lepofsky and Lertzman 2005; Giesecke et al. 2014). In the case of SAC, new species are recorded throughout the counting procedure, producing a simple graphic that shows the variability (the number of species) with respect to the total number of recorded elements.

MACs were produced to assess assemblage variability with respect to phytoliths counted. The curves allow the identification, when present, of a stabilisation plateau. A stabilisation plateau is present when more than one 50 phytolith batch does not add new morphotypes (i.e. at least a total count of 100 phytoliths) (see in Fig. 3 a central stabilisation plateau for samples M6 and B9).

Phytolith sum variability analysis

PSVA was designed to highlight the relationship between the retrieved data and the effort invested in counting. Consecutive PSs were generated, each of them including the previous batches (i.e. 1 = 50 PS, 2 = 100 PS, 3 = 150 PS, etc.), in order to show differences in the composition (dissimilarity) of the assemblage after each increase. For each sample, these subsets were then compared to the final PS to assess the increased information at each step and to establish a relationship between the counting effort and the information acquired during the counting.

This diversity analysis measures the degree of similarity between consecutive PS as they were different samples. In phytolith studies, diversity indices have been used to analyse the relationship between plant communities and soil phytolith assemblages (Fernández Honaine et al. 2009: 92) and to measure similarity among phytolith assemblages (Gallego and Distel 2004: 866).

In the current work, the Chao index was used (Chao et al. 2005). This is a non-parametric statistical tool designed to measure similarity between samples on the basis of their composition, and it is not affected by sample size. Analyses have been carried out with the R statistical software using absolute and percentual data.

Results are retrieved as table and dendrogram, in which the horizontal axis represents dissimilarity (0–1), so the lower the branches are generated, the more similar the samples are (see Figs. 2 and 3). Dendrograms show one or more internal ruptures that allow the identification of quantitative increases in the data that imply a qualitative change in relation to variability and distribution, thus showing dissimilarity among the subsamples (see in Fig. 2 dendrogram of sample M1 in which there are PSs with a high dissimilarity that separate from the rest, such as 50 PS or 100 PS, while others with lower dissimilarity tend to group, such as 150–200 PS and 250–400 PS).

Results

Phytoliths were identified in all the samples, with the number of phytoliths per gramme of soil sample varying from around 500 to more than half a million (see Table 4).

Table 4 Data from sample processing and counting, including the number of morphotypes identified per sample (in italics samples coming from Hearths).

Preservation of phytoliths was generally good. The mineralogical spectra of the samples do not show the presence of minerals that can be associated with diagenetic processes, such as dahllite or montgomeryite (Karkanas et al. 2000). Therefore, any possible soil modification regarding volume or composition was considered minimal and phytolith concentrations were calculated per gramme of original soil sample.

In a limited number of samples, final count sizes differ substantially from what was fixed according to the methodology (400 PS or 600 PS) once the non-identifiable phytoliths were removed and corrections were done after counting. In order to avoid adding noise to the results, the category “undetermined phytoliths” was eliminated from the final figure for each batch of 50 phytoliths. In a few cases, samples were not as rich, so slides were completed before reaching the expected PS. These cases were kept within the set to check the behaviour of phytolith assemblage with a low PS.

Morphotype accumulation curve

A comparative observation of MACs shows major differences between samples, with variability ranging from 11 to 24 observed morphotypes. A higher number of morphotypes does not necesarily relate to greater count sizes: those samples showing the highest (sample M3 with 24) and lowest (sample M7 with 11 morphotypes) variability both have a 600 body count size (see Table 4).

Although some samples show simple stabilisations (as sample M1 in Fig. 2), most MAC curves have more than one plateau, showing an increase in morphotype variability when count sizes increase too (see sample B6 in Fig. 2 and samples M6 and B9 in Fig. 3).

MACs from both sets of samples 400 PS and 600 PS show different tendencies (see Figs. 4 and 5). As expected, the 600 PS set, with a general higher variability in both hearth (H) and non-hearth samples, tends to stabilise later than the 400 PS set. For the 400 PS set, 17 from 27 (62.96%) of the samples reach at least 90% of their variability at 250 PS (see Table 5 and Fig. 4). In samples with a 600 PS, 90% of their variability is achieved in 44% of the samples at 300 and 350 PS and in 33% samples at 400 PS (see Table 6 and Fig. 5).

Fig. 4
figure 4

MACs for both standard and hearth (H) samples with a 400 PS

Fig. 5
figure 5

MACs for both standard and hearth (H) samples with a 600 PS

Table 5 Percentual variability in subcountings, in relation to morphotype variability, in 400 PS count size
Table 6 Percentual variability in subcountings, in relation to morphotype variability, in 600 PS count size

For the 400 PS set, curves start showing plateaus at 250 PS and after 300 PS, curves tend to be almost completely stabilised with 80–100% variability achieved, excluding the anomalous sample B5, and only a very few new morphotypes are subsequently added. For the 600 PS set, curves start to stabilise later, after 300 PS (see Table 7 with percentage data for each dataset and Figs. 4 and 5).

Table 7 Average percentage of morphotype variability reached at each subcounting for samples coming from hearths (H) and from non-hearth samples

An important result concerns the archaeological provenience of the samples. The samples from well-identified contexts with an expected anthropic plant input, such as hearths or ashes, reach a higher variety of morphotypes (19 morphotypes for 400 PS and 20 for 400 PS H and 21 and 24 for 600 PS and 600 PS H, respectively). In addition, for the 400 PS set, MACs from H samples tend to stabilise later or they do not even produce two consecutive batches without increasing number of morphotypes (see Figs. 4 and 5 and Tables 5, 6 and 7).

Phytolith sum variability analysis

The analysis was carried out with both absolute and relative (percent) data, and because the results are equal, only the absolute dataset is illustrated. In three cases, no dendrogram was built as no differences were identified among the subcountings (samples M14, B11 and B12). This could be explained on the low count size for B11 (215 PS) and to the low variability for B12. Sample M14, on the contrary, does not present a specific feature that could explain this.

Even though results from the PSVA seem to be quite heterogeneous (see Online Resources 1 to 3), there are some patterns that can be identified. In most dendrograms, the 50, 100 and 150 count batches are usually grouped and separated from the rest because of the relative scarce variability recorded with such low count sizes. For this reason, first ruptures are not taken into consideration.

After count sizes 200 PS to 300 PS, we find a high number of ruptures, indicating the range where qualitative sample distributions show the biggest changes. In most 400 PS samples, this implies that from 300 PS to 400 PS count sizes, batches are more similar. In 600 PS samples, lower dissimilarity is placed between 400 PS and 600 PS (see Online Resources 1 to 3).

The distribution of morphotype variability for each of the count batches was then qualitatively examined by means of boxplots (Figs. 6 and 7), and the median variability of H samples for each bin was also plotted for comparison.

Fig. 6
figure 6

Boxplot showing the behaviour of morphotype variability in relation to count size for the 400 PS set. Black squares correspond to samples coming from hearths (H)

Fig. 7
figure 7

Boxplot showing the behaviour of morphotype variability in relation to count size for the 600 PS set. Black squares correspond to samples coming from hearths (H)

For the 400 PS set, the distribution for low count sizes is more heterogeneous, showing some outliers in the batch corresponding to 100 PS. When count size increases to 200 PS, the median variability is already close to 90% of the total variability, with a further increase to 95% at 250 PS. In this subset, H samples exhibit consistently lower median values than non-H samples, in particular, for 100 and 200 PS.

As far as the 600 PS set is concerned, median variability for non-H samples remarkably reaches almost 90% of the total variability already after 350 PS. For this subset, H samples exhibit a different behaviour from that described for 400 PS samples. In this case, they start showing higher median variability than non-H samples, while they get closer to the median at 250–300 PS and tend to exhibit lower median values between 350 and 500 PS.

Finally, a linear regression was performed to explore potential significant correlation between sample size and morphotype variability. As expected, results highlight a strong and positive corrrelation for both H and non-H samples in both 400 PS (r 2 = 0.89, p < 0.0001 and r 2 = 0.98, p < 0.0001, respectively) and 600 PS (r 2 = 0.9, p < 0.0001 and r 2 = 0.92, p < 0.0001 respectively) sets.

Discussion

This study shows the relationship among the factors involved in the analyses: PS, species curves and morphotypological variability. The results show that once 250 phytoliths are reached, assemblages show a substantial different distribution of morphotypes in respect to lower count sizes.

An inherent problem to counting phytolith is that analysts differ widely in how finely they subdivide the morphological spectrum of phytolith assemblages and also in the morphotypes they count (Strömberg 2009: 126). This is also an aspect that varies in relation to research questions (see Table 1 for further examples). In this study, results allow the comparison of the whole set as a unique counting procedure has been used.

However, the tests carried out have been useful to highlight the point at which adding new data does not significantly affect the composition of the assemblages, which corresponds to 300–400 PS for the 400 PS set and 450–600 PS for the 600 PS set (that is, the highest count sizes).

The MACs have shown that for the 400 PS set, stabilisation mostly occurs between 200 and 350 counted particles. Indeed, at 300 PS, 8 from 27 (29.62%) of the 400 PS set have reached 90% of the morphotype variability and several display 100% of their variability (14 from 27; 51.85%). It is important to note that an increase in the final PS (samples in which the counting was raised to 600 PS) produces a change in this pattern in which less than half of the samples (4 from 9; 44%) have reached 90% or more of the recorded variability at 300 PS.

The PSVAs analyses show similar results. Although it is difficult to find specific patterns due to the heterogeneity of the dendrograms, it is possible to highlight some tendencies. One of the patterns that clearly emerge is that the smallest (up to 150–200 PS), as well as the largest, count sizes in those samples, where the PS was raised to 600, tend to appear as separated branches in the dendrograms. Therefore, the results show that within the range 200–350, most samples show lower dissimilarity (see samples M17 or M21 as examples in Online Materials). This range has already been pointed out as the most common phytolith count size range in paleoecological literature (Strömberg 2009 and references therein). This means that within these limits, there are no essential differences in relation to the composition (variability and distribution) of the assemblages, while it is probably the appearance of new but uncommon morphotypes that produces new clusters in higher count sizes (between 450 PS and 600 PS).

The results of both tests performed support the accepted standard count size (250–300 PS) most commonly used in phytolith studies (e.g. Persaits et al. 2008; Cabanes et al. 2011) and within an effort results evaluation: counting between 250 and 300 particles seems to be the best compromise. Whenever there is higher morphotype richness (as in H samples), this strategy would lead to redundant data (as H samples very quickly reach higher variability). At the same time, a 250–300 count can be coupled with a quick scan (as proposed by Pearsall 2000) to assess the presence of rare but archaeologically important morphotypes. As is highlighted by this data, counts below 200 particles may imply the loss of less abundant forms that may be significant (see also Albert and Weiner 2001; Piperno 1988). Therefore, this strategy helps to recover a significant assemblage while minimising counting time and it helps in assessing variability for samples with a higher number of morphotypes, such as those where anthropic plant input is usually richer (latest morphotypes that appear for highest count sizes in hearths and ashy areas). This approach is also useful for observing silica skeletons and other particles (such as cellulose rings) that would add further information to the counting.

In phytolith analysis, few studies have addressed count size in non-natural contexts. In addition, even though phytolith researchers share overlapping areas of interest, most reviews tend to mix palaeoecological and archaeological approaches although they target different research questions (ecology vs. anthropic activity). These approaches can be distinguished based on what questions they address: related to vegetation the former and focused on the identification of the anthropic selection of plants the latter (see Strömberg 2009 for a review).

In archaeology, Albert and Weiner (2001) assessed error margin when identifying morphotypes and demonstrated that counting 265 single phytoliths with a consistent range of morphology gives a 12% error margin. Lentfer et al. (1997) analysed samples from a historical windmill opened to natural phytolith input, fixing a variable count size according to the stabilisation of phytolith cumulative curves.

In spite of these examples, in archaeology, phytolith counting methods have been mostly designed to assess the reliability of extraction procedures. Lentfer and Boyd (1998), for example, compared the number and diversity of morphotypes and also the clarity and count time of samples which had been processed following different extraction protocols. More recently, Katz et al. (2010) assessed the reliability of quick extraction protocols compared to long laboratory standard ones.

To the contrary, count size has been deeply discussed in taxonomical identification (Ball et al. 1996) and in palaeocological studies. In palaeoecological research, the use of different types of vegetation cover or aridity indices is already a common standard (Barboni et al. 2007; Bremond et al. 2005; Delhon et al. 2003 among others). In these cases, there is a specific need to identify a minimum number of particular morphotypes to calculate ratios and identify changes in vegetation. While some morphotypes are essential, others may be less significant. Strömberg (2009) showed that for palaeovegetation studies cumulative curves are useless, because palaeovegetation studies use indices. She also pointed to the fact that count size should vary in relation to the specific requirements of the selected index.

Conversely, cumulative curves are the proper tool for simply understanding the composition of the assemblages. Currently, even when there are hypotheses about plant consumption and a general idea about what the assemblages should “look like”, most archaeological studies explore phytolith assemblages rather than testing hypotheses with them. Indeed, the development of archaeological interpretative tools such as indices is a new area within phytolith studies that still needs work.

Therefore, the study of phytolith assemblages in archaeology presents two big challenges: distinguishing natural from anthropic inputs and understanding the meaning of the anthropic input.

The present study centres on caves, which constitute common archaeological contexts. The study of phytolith assemblage variability in different archaeological contexts, periods and settings is essential to fix context-appropriate count sizes and to identify patterns corresponding to plant production and consumption processes in each of these.

This study shows that, apart from presenting a slightly wider variability, there are no substantial differences between samples coming from hearths (or specific features, such as sample B1) and other contexts. In archaeological contexts, high presence of phytoliths is considered to be an anthropogenic signal (Madella 2007). This is even more remarkable within caves and shelters where the phytolith recovery rate is much greater from hearths and ashy layers than other contexts (Albert et al. 1999, 2000; Madella et al. 2002). Higher heterogeneity of these assemblages can be explained in terms of their functionality, as maintenance activities almost certainly implied continuous dumping of various discarded material in the fireplaces.

Conclusions

The present study supports the accepted standard count size (250–300 PS) most commonly used in phytolith studies.

It has already been acknowledged the need to produce specific methodologies for specific studies. In palaeoecological studies, Strömberg (2009) pointed to the need for defining an appropriate count size for each study. In archaeology, the fact that archaeological contexts and sediments differ so widely has been recognised as a characteristic that may compel ad hoc extraction procedures (Lentfer and Boyd 1998; Katz et al. 2010). Accordingly, in archaeological research, a unique and appropriate count size can be complemented with the quick scan for samples with different kind of inputs such as hearths, storage structures or trampling areas to assess the whole variability of the spectra.

Understanding the variability of phytolith assemblages in archaeological contexts requires us to take into consideration the interplay of different factors such as natural inputs present in the sediments together with the anthropic signal and the taphonomical issues affecting the preservation of specific particles. For this reason, further experiments ought to be done to establish whether it would be possible to propose specific counting methodologies for particular contexts. For example, knowing that fireplaces normally display a higher variability, a proposal to increase the number of counted phytolith in samples coming from these contexts could be appropriate. Further experimentation could allow us to develop interpretative indices and support the identification of anthropic markers of specific activities (Rondelli et al. 2014) so that different inputs could be recognised qualitatively and/or quantitatively through the analyses of the phytolith assemblages.