Keywords

Prehistoric spears, javelins , spearthrower darts , and arrows can be readily recognizable items when recovered in their entirety. The length of a weapon shaft, its overall size and weight, its balance, and the presence or absence of a notched versus a dimpled nock, for example, are often indicators of such an implement’s function. Unfortunately, the hafts of these weapons were constructed from organic materials , and apart from rare instances of unusual preservation, are not often preserved in archaeological contexts; the archaeologist generally recovers only the lithic artifact, commonly referred to as a “point”, that once served as the armature component of the weapon.

In common parlance we tend to use the terms “point” and “projectile point” rather ambiguously. The former commonly implies the latter, while the latter is commonly, though incorrectly, used in reference to the armature of a spear, or other similar weapon, which is a thrusting weapon per se, rather than a projectile weapon (the term javelin is used herein to refer to a spear-like weapon that is thrown). Of course, archaeologists seldom recover direct evidence of the weapon delivery technology employed by ancient hunters . As a result, they face some important challenges: (1) how to determine whether an individual lithic artifact actually functioned as a weapon armature, and (2) how to recognize the delivery technology associated with that armature. We need to know both before we can be sure whether a specific pointed artifact is indeed a projectile point.

The first challenge is commonly tacked via generalization and analogy; repeated classes of artifacts are observed in repeated associations. Considered as an aggregate, when similar forms of lithic artifacts are recovered repeatedly in similar contexts; pointed lithic bifaces embedded in animal bone in kill sites for example; they might reliably be classified as components of weaponry, though the specific form of the weapon may still be unknown. The second challenge is more formidable. Of particular significance to the discussion that follows, is the simple fact that lacking direct evidence of specific weapon technologies, archaeologists have from necessity attempted to identify secondary criteria that demonstrate associations with specific weaponry.

This paper presents a critical review of contributions that have led to the popular acceptance of certain mainstream criteria as diagnostic indicators of weapon identification and use, and identifies deficiencies in those criteria. The purpose is to argue for more rigorous experimental methods, and to draw attention to the need to recognize the limits of inferences that reasonably can be drawn from our work. The author emphasizes that an important aspect of science is its self-correcting nature; errors and approximations are an inherent and necessary part of the scientific process. The critical review that follows is likewise offered as a necessary part of this process, and is not intended to be a critique of the individuals whose contributions are discussed; indeed, without such contributions there can be no tradition of archaeological science.

The author also acknowledges a bias in the discussion that follows; in particular, a strong bias towards North American perspectives and associated point forms, due to the underlying fact that this discussion grew out of an initial concern for the objective identification of North American Paleoindian weaponry. Thus the focus tends to be on generalized North American artifact forms that might reasonably have functioned as various hafted weapon armatures, and on bifacial “points” rather than microliths .

Identifying Weapons and Delivery Technologies

Studies Based on Morphology, and Morphological Types

Numerous researchers (e.g., Evans 1957; Forbis 1962; Wyckoff 1964; Corliss 1972; Thomas 1978; Shott 1997; see also Shea 2006) have sought to identify prehistoric weapons and delivery technologies through examination of the one surviving component of these systems: their lithic armatures. Such research has commonly involved investigation of neck, shoulder, or stem widths of points; or various measures reflecting overall point size or shape. The underlying assumptions pertaining to these analyses are that:

  1. 1.

    a relatively thin, triangular, leaf-shaped, or lanceolate, pointed artifact was probably a “point” (which is understood to be a weapon armature);

  2. 2.

    neck, shoulder, and stem widths of points reflect the diameter of weapon shafts or foreshafts (“hafts”) of weaponry;

  3. 3.

    spear and javelin hafts are large, dart hafts smaller, and arrow hafts smaller still; and

  4. 4.

    spear and javelin points are big and heavy; arrow-points are small and light; and spearthrower dart point sizes and weights lie somewhere in between.

These assumptions have long-standing historical precedent in the North American archaeological literature. For example,

These [points] will be discussed in two categories: (1) small, thin, light, finely chipped specimens believed to have served on arrows; and (2) larger, thicker, heavier and more crudely chipped specimens we believe were used on darts thrown with atlatl s. That such a distinction actually existed over vast areas of America is no longer denied by many archaeologists (Baker and Kidder 1937: 51).

Despite the simplicity of such assumptions, an influential study by Fenenga (1953) of 884 points from the American Midwest, Southwest, and California, suggested that there may be some basis for these distinctions. Fenenga (1953) demonstrated that a frequency plot of either point neck widths, or overall weights, produced a bi-modal distribution suggesting mutually exclusive point groupings. Even though no data were presented to establish the actual sizes and weights of prehistoric weapon shafts themselves, the bi-modal distribution was interpreted as reflecting the morphological differences between spearthrower and bow projectiles.

The issue was later addressed by Thomas (1978) who employed a sample of 132 hafted arrow points and 10 hafted spearthrower dart points drawn from ethnographic collections, as well as archaeological specimens recovered from Pueblo Bonito (New Mexico), to determine the relationship between point size and the diameter of the actual foreshaft it was attached to. Thomas (1978) noted a correlation between arrow foreshaft diameter and arrow point neck width, but was unable to document a similar relationship between spearthrower dart foreshafts and their respective points. Despite this, the data suggested that arrow foreshafts were significantly smaller than spearthrower dart foreshafts, and arrowheads themselves were significantly smaller than dart tips. Furthermore, a discriminant analysis based on considerations of length, width, thickness, and neck width of the points correctly classified approximately 86% of the study sample (Thomas 1978: 471). Thomas’s approach provided no mechanism for dealing with unnotched points.

Shott’s (1997) reassessment of Thomas’s data utilizes a significantly enlarged sample of hafted dart points, and considers shoulder width as an alternative to neck width, since he found the latter variable to be inadequate:

A neck width threshold of 9 mm correctly classifies 38 of 39 dart points, but misclassifies as darts 82 of 132 arrow points (62.1 percent). A threshold value of 8.5 mm produces identical results for darts but misclassifies 89 arrow points (67.4 percent). Even a threshold of 10.4 mm, one standard deviation lower than Chatters et al.’s (1995:757) mean for inferred dart points, misclassifies 57 arrows (43.2 percent) (Shott 1997: 98).

Employing shoulder width criteria and a larger sample, Shott was better able to classify dart points, however, Shott’s (1997: 99) overall ability to distinguish dart and arrow points, at 85% success, is essentially equivalent to that of Thomas’s at 86% (1978: 471).

The relevant measure in these approaches was selected because of its relation to shaft or foreshaft diameter. The latter is the more important variable, however, since within reasonable limits of variation it is more closely related to weapon performance, and is, therefore, indicative of the weapon system. Even a casual perusal of archery equipment, for example, whether ancient or modern, will convince the reader that much more variation exists in point dimensions than in shaft or foreshaft dimensions for a given weapon kit. So while points are much more common in the archaeological record, the data recovered directly from the study of shafts and foreshafts, rather than inferred from point metrics, are expected to be a better reflection of weapon system design considerations.

Published metric data and scale photographs are readily available for hundreds of dart foreshafts recovered from dry cave sites throughout the American Great Basin and Southwest (e.g., Kidder and Guernsey 1919; Guernsey and Kidder 1921; Loud and Harrington 1929; Guernsey 1931; Harrington 1933; Woodward 1937; Heizer 1938; Fenenga and Heizer 1941; Cosgrove 1947; Jennings 1957; Smith 1963; Smith et al. 1963; Taylor 1966; Dalley and Petersen 1970; Berry 1976; Dalley 1976; Janetski 1980; Hattori 1982; Tuohy 1982; Pendleton 1985; Salls 1986). This literature indicates that while most recovered dart foreshafts are approximately 0.8–1.1 cm in diameter, many are less than 0.6 cm in diameter (Hutchings 1997). In comparison, the mean diameter of arrow foreshafts from Thomas’s ethnographic sample (n = 118) is 0.7 cm, while the mean diameter of arrow foreshafts from his archaeological sample (n = 14) is 0.9 cm (Thomas 1978: Tables 1 and 2). In 1981, a 0.6 cm diameter dart foreshaft with a hafted stone point, along with five other dart foreshafts ranging from 0.4 to 0.6 cm, was recovered from NC Cave, Lincoln County, Nevada (Tuohy 1982). In reference to Thomas’s (1978) study, these finds, as well as 56 other dart foreshafts from cave sites in the vicinity of Lake Winnemucca, Nevada, prompted Tuohy (1982: 97) to comment:

I am not convinced that enough data have been marshalled [sic] to segregate arrow foreshafts from dart foreshafts on the basis of size or variability in dimensions such as length, width, weight, or shaft diameters, and the new data from “NC” Cave and the Winnemucca Lake foreshafts from a cache support this contention.

Studies such as those of Fenenga (1953), Thomas (1978), and the archaeological specimens from the American Great Basin and Southwest referred to above rely on a number of normative assumptions. First, they assume that the point samples are representative of one specific technology, the one they are found associated with (e.g., arrow or dart, etc.), and are not transferable between coexisting technologies. Second, they assume that there is no meaningful variation within a single technology, that point and shaft dimensions did not vary to adapt to application (e.g., larger projectiles for larger game). Third, they assume that the study samples are representative of that technology through time and space. Furthermore, in differentiating points based on metric attributes, particularly attributes of size, these studies can be impacted by both subtle and dramatic instances of repair and resharpening. While some researchers (e.g., Shott 1997) have attempted to compensate for this, it is a much more complex concern in this specific regard than has generally been recognized. In particular, it may be difficult to determine how often an artifact has been recycled, and whether it was recycled within or between technologies. As an example that explores the implications of each, a hypothetical point that was created for use as a spear armature may conceivably be recycled into an arrow point. If the recycling is noticeable, the resulting point may be treated separately by the analyst who may be inclined to decide that the morphology had been adversely affected by the recycling event, therefore excluding its metric data from the aggregate. It is possible, however, that the recycling may have resulted in an arrow point of ideal morphology (i.e., just because it was recycled, does not necessarily mean that the end product was not exactly what was desired for the new end use; and, in addition, as an “arrow point” the piece had never been resharpened). Certainly, the fact that this hypothetical point was eventually hafted as an arrow armature tells us that it was considered an acceptable point for that technology, so treating it as a resharpened point may skew our research. Had a second hypothetical arrow point been manufactured with identical proportions it is likely that the metric data derived from it would not be considered comparable with the first. Of course, data derived from shaft and foreshaft diameters avoid such issues altogether, and benefit from being more closely related to the phenomena we are interested in (i.e., the propulsion technology rather than just the points).

In choosing to study examples of hafted arrow points, Thomas’s (1978) sample was unavoidably recent by way of preservation bias. This was an inevitable consequence of the research parameters, and it may have biased the sample by assuming a priori that small, late period, and ethnographic arrow points and shafts are representative of bow technology throughout time. In contrast, we must accept that the absence of point types known to be associated with arrows does not constitute evidence for the absence of the bow; if we choose to rely on characteristics of size and suitability we must keep in mind that many early lithic points are of a size and weight suitable for use with the bow, even if not ideal, and we have no empirical proof that hafting methods are discrete indicators of delivery technologies. In fact, Browne (1940: 211) noted that even North American Folsom Paleoindian points would make highly efficient arrow points.

More recently, tip cross-sectional area (TCSA) and tip cross-sectional perimeter (TCSP) have been proposed as criteria to be used in combination with other data (e.g., use-wear, context) to distinguish weapon armatures (Hughes 1998; Shea et al. 2001; Shea 2006; Sisk and Shea 2009). Reminiscent of older morphological studies, morphometric criteria are derived from both ethnographic and experimental studies of relatively recent weaponry, and consider characteristics deemed to optimize mechanical and aerodynamic efficiency. TCSA and TCSP identify as a possible projectile or spear point , any suitable pointed object that falls within the range of variables known from ethnography and experimentation to be acceptable for use as a weapon tip, substituting area and perimeter measures to identify delivery technologies. As such TCSA/TCSP may be considered suitable for identifying additional objects with similar mechanical and aerodynamic characteristics as those known from ethnography and experimentation, even though they do not offer any empirical evidence in and of themselves that objects so classified were actually employed as weapon armatures. Perhaps more to the point, when applied to assemblages distant in time or space from those on which the measures were developed, these criteria imply that the same values of mechanical and aerodynamic efficiency were of primary concern to the people who produced those distant assemblages. Stated another way, they constitute a tautologous argument that assumes a priori that we already know the range of objects that constitute an acceptable weapon technology. While there can be no doubt that underlying mechanical and aerodynamic principles do not change, there is no reason to expect that mechanical and aerodynamic considerations or priorities were the same, and of equal value, to all people in all places and times. At the very least, we might reasonably expect periods of ancient weaponry development and experimentation to produce variability beyond the TCSA/TCSP ranges expected for more recent, well-developed technologies. For these reasons, TCSA and TCSP cannot be considered valid indicators of projectile function within assemblages where the ranges of pertinent variables have not already been established.

Further complicating the issue of acceptable morphology, Ahler (1971) found evidence suggesting that artifacts that might otherwise be readily labeled as bifacial “projectile” points were not always used primarily as projectile armatures, but were often used as knives and multi-purpose tools. When faced with the tautology of existing morphometric criteria, as well as evidence that readily recognizable “projectile points” were at times not used to arm projectiles, one is forced to question the usefulness of morphological and morphometric methods of identification.

Studies Based on Microwear, Residues, and Impact Fractures

Microwear analyses have been proven effective in differentiating modes of contact between stone tools, location and orientation of use contact, hafting, and even materials against which tools were used (e.g., Semenov 1964; Tringham et al. 1974; Keeley 1980; Tomenchuk 1985; Kay 1996; Dockall 1997; Rots 2003, 2004). Unfortunately, since the direction of contact for spears, javelins , darts , and arrows can be identical, and the use of each weapon type might be expected on identical contact materials, microwear analyses have not demonstrated an ability to identify specific weapon technologies per se independent of relational analogues (i.e., independent of ethnographic or direct historic analogies). Hafting traces are also not necessarily diagnostic of a specific weapon technology since spears, javelins, darts, and arrows can exhibit common patterns of hafting wear (although the area of contact may be noticeably larger and more intense for a spear point than an arrow point).

Organic residues may tell us what materials a lithic artifact has been in contact with (Hardy and Raff 1997; Hardy and Kay 1999; Hardy et al. 2001), perhaps also identifying the area of hafting. Three flakes bearing a tar mastic, and recovered from a Mid-Pleistocene bone-bearing deposit, are considered by Mazza et al. (2006: 1317) to constitute evidence for hurled weapons, despite a lack of any wear traces or other corroborating evidence apart from their association with bone (see also Hardy et al. 2001; Boëda et al. 1998). Of course there may be many conceivable reasons to haft pointed lithic artifacts, but ultimately, evidence of hafting, even if associated with longitudinal wear traces, is not indicative of any specific weapon technology.

So-called “diagnostic impact-fractures” (DIFs) have been touted by many analysts (e.g., Witthoff 1968; Frison 1974; Ahler and McMillan 1976; Frison et al. 1976; Odell 1977; Frison 1978; Roper 1979; Barton and Bergman 1982; Bergman and Newcomer 1983; Fischer et al. 1984; Odell 1988; Shea 1988; Woods 1988; Holdaway 1989; Dockall 1997) to be indicative of weapon impact. For example, Bergman and Newcomer (1983: 241–243) describe three types of DIFs identified during their projectile experiments; the burin-like fracture, the flute-like fracture, and the bending fracture. A forth type of DIF, the bending fracture-initiated spin-off, was identified by Fischer et al. (1984). Bergman and Newcomer (1983) employed DIFs to suggest that certain Upper Paleolithic artifacts may constitute projectile armatures. Likewise, Fischer et al. (1984) employed DIFs to suggest that certain Mesolithic and Neolithic artifacts may constitute projectile armatures.

While most archaeologists have restricted such analyses to formal points, Odell (1988), has labeled large numbers of modified and unmodified flakes from a site in the Lower Illinois Valley (USA) as projectile armatures. Relying primarily on the identification of DIFs, he suggests: (1) that diagnostic projectile impact fractures may often be observed on simple retouched flakes, as well as unretouched waste flakes and detritus; (2) that the practice of employing suitable waste flakes as functional projectile points may be widespread; and that (3) this phenomenon will have repercussions on studies of technology and foraging efficiency. Odell’s (1988) waste flake analysis is based on previous comparative studies of impact-related breakage patterns, most notably Odell and Cowan (1986), which used an extensive series of shooting experiments employing replicated bifacial points and unmodified flakes as armatures on both javelins (the authors use the term “spears”) and arrows. Various other types of use-wear such as edge-rounding, surface polish, and linear striations were also used to identify projectile impact-related damage. Odell’s (1988: 344–345) tabulated data are unclear with respect to the percentage of the study sample represented by waste flakes and detritus, versus morphological projectile points. He does, however, state that only 3% of the functional projectile points from the Smiling Dan site sample are found among “… modified type collection objects” (presumably, morphological projectile points) (Odell 1988: 346), suggesting that 97% of the site’s projectile points were not, for lack of precise terminology, traditional points.

This author finds little reason to doubt the possibility that prehistoric peoples made greater use of materials usually classified as debitage and detritus than has been popularly recognized, yet there are several problems inherent in the use of impact breakage patterns and wear traces as evidence of projectile use.

These problems arise due to both the general morphology and functional nature of lithic projectiles:

  1. 1.

    lithic projectiles generally exhibit little use-wear or haft related polish (Kay 1996) prior to catastrophic failure; and

  2. 2.

    impact fractures are generally location- and orientation-specific forms of damage that can be caused as much through thrusting , or even dropping (Hutchings 1991, 2011), as from projectile impact.

In fact, it is possible to produce flakes and blades during simple core reduction which unintentionally exhibit sympathetic or repercussive fractures that often appear similar to projectile point impact fractures; an issue also noted by Fischer et al. (1984: 24). For example, the thin distal and lateral margins of flakes and blades can be damaged when they strike the ground after removal from a core, or from being dropped into a pile for subsequent use by the flintknapper. Such damage would constitute an impact fracture per se, but not an impact fracture caused by use as any type of weapon armature. Given a site with a relatively large population of waste flakes, blades , and other debitage, a significant number of pieces might be expected to exhibit so-called DIFs, but even though these fractures were caused by an impact, they are certainly not diagnostic of any weapon use.

A Study of Impact Fractures Among Debitage

An investigation of modern (replicative) flintknapping debris intended to explore the incidence of “impact fractures” on discarded flint debitage (Hutchings 1991: Appendix F), demonstrated that 72.4% of a sample of 246 pieces of flint chipping debris were suitable, with respect to overall morphology and weight, for hafting as practical arrowheads . Of these, 15 pieces (6.1% of the original sample) were found to exhibit damage suggestive by location, distribution, and morphology, of projectile use according to the macroscopic criteria of Odell and Cowan (1986), as well as those of Odell (1988) and others (e.g., Ahler 1971; Roper 1979; Barton and Bergman 1982; Bergman and Newcomer 1983; Fischer et al. 1984). In fact, three of the haftable pieces which exhibited DIFs also exhibited simple side-notches that could facilitate hafting; one of these three exhibited simple, uniform, bilateral side-notches.

The results obtained by this simple study demonstrate a high probability of observing projectile impact-like breakage patterns among discarded waste flakes and other debitage and detritus. Over 6% of the sample produced erroneous “use-wear”. The overall morphology of these pieces, and the current definition of what constitutes a projectile point, would suggest not only that they came in contact with some target material, but that they were shot or thrown at the target material as projectile points (Hutchings 1991: Appendix F, emphasis in original).

These results have been duplicated by Pargeter (2011) who found diagnostic impact fractures on 1.8% of an assemblage of experimental knapped debris, and as much as 2.4% of a trampled experimental debitage assemblage. As a result, Pargeter suggests that erroneous diagnostic impact fractures can be expected on approximately 3% of a lithic assemblage. Pargeter (2011: 2885) also noted the occasional formation of smooth, semi-circular notches on the trampled debris (see also Lombard and Pargeter 2008). Villa et al. (2009b: 449) also note that fractures like those associated with weapon impacts can result from processes of manufacture and trampling . Likewise, Sano (2009) found that relatively low frequencies of erroneous DIF types can be expected on knapped, retouched , and trampled assemblages.

Discussion

At the heart of shortcomings in the methodologies discussed are two critical issues; the confounding factor of equifinality, and the extent to which we can make reasonable inferences based on the parameters of our replicative experiments. The methodologies discussed have been derived from analogues and tested by replicative experimentation, but experimental hypothesis testing in and of itself does not guarantee robust results. Hypotheses can take several different forms, each of which may be scientifically valid, but each of which may be best suited to differing circumstances. In the study of complex phenomena, where numerous and confounding variables may produce instances of equifinality, direct testing of hypotheses may result in a lack of robusticity since testing often ceases once a finite number of positive results are generated. For example, given a phenomenon (P1), we may hypothesize that P1 is the result of a suspected behavior (b1), so we conduct an experiment to determine whether b1 results in the production of P1. If it does (i.e., we observe a positive result), we have verified our hypothesis, but we have not demonstrated that other behaviors (b…x) could not also produce P1. If we do not continue to test alternative hypotheses, we have succeeded only in accommodating the data by demonstrating a correlation.

[The] accommodation process suffers from a lack of empirical sufficiency…. Data are, in fact, used in model construction (the models are fitted to the data observed), and only those dimensions of the data supporting model construction are considered… however, the resulting model cannot be tested because relevant data have already been used in construction.

The accommodation approach to explanation has the appeal of common sense but is different from the usual scientific process of interpretation. A scientific approach fits data to models through falsification procedures, thereby assessing the utility of interpretations for explaining observations. The fitting of data to models requires that all causes for observed patterning be considered and compared to model implications. In other words, relations among phenomena are predicted from theory and compared to the actual, empirically measured relationships defined for the data. Data are not simply interpreted in terms of the model [Rigaud and Simek 1987: 48, emphasis in original].

In the study of complex phenomena, it may be more suitable to test the consequent of the hypothesis. This form of hypothesis testing generally takes the form of a predictive if:then statement, and relies on the concept of coherence. For example, if the hypothesis is correct, then we predict that we will also observe other specific phenomena (P…x). In this form of testing, confidence in the hypothesis increases as more and more instances of coherence are observed (i.e., more P…x are successfully predicted).

Arguably, in the study of complex phenomena hypothesis testing by falsification may produce the most robust results. In this form of testing, the hypothesis is considered valid provided it cannot be falsified; naturally, a successfully falsified hypothesis must be abandoned and an alternative sought.

The falsification process has demonstrated that DIFs are not, individually, or in small assemblages, diagnostic of weapon impact as they have been shown to be produced in low frequencies by knapping, retouch , and trampling activities. Granted, it has been adequately documented, both experimentally and at kill sites, that at an assemblage level, significant frequencies of DIFs (present on approximately 40% or more of a pointed tool assemblage) are indicative of weapon impacts (e.g., Frison 1974; Fischer et al. 1984; Bratlund 1996; Villa et al. 2009a, b). At present, specific weapon delivery technologies cannot, however, be reliably differentiated via DIFs.

The validity of morphological analyses (including morphometrics) rests on the rigor of an appropriate analogue, but can be seriously confounded by significant morphological similarities and overlaps between technologies. The robusticity of morphological analyses must be considered increasingly suspect with increased spatial and temporal separation from our analogue. Considered either individually or together, the presence of wear traces and residues on lithic artifacts can potentially indicate the area of contact, direction of motion, and the contact material. As such, they offer the greatest potential relative to the approaches discussed herein for the recognition of weaponry. In the absence of relational analogues, however, they are likewise, in and of themselves, incapable of differentiating weapon technologies.

Does all of this mean that we cannot trust our ability to identify any stone weapon armatures? Of course not. While the individual weapon identification methodologies critiqued above have failed, in the author’s opinion, to demonstrate adequate scientific rigor, when combined as multiple lines of evidence they benefit from the principle of coherence, and so are best employed in concert to provide identifications with varying levels of robusticity. The confounding factor of equifinality, however, still renders the multiple lines of evidence approach incapable of differentiating specific weapon delivery technologies. Of course the repeated associations of certain pointed artifact types and morphological characteristics, not only with kill sites and animal remains, but with good ethnographic analogues, are sufficiently reliable in most instances that we can be comfortable identifying classes of pointed artifacts as weapon armatures.

Where we start to encounter problems is when our assemblages are very small, or we wish to know whether a specific artifact actually served as a weapon armature; more tenuous still is our ability to associate the actual use of a specific artifact with a specific weapon technology (cf. Callow 1986; Villa and Lenoir 2006; Mussi and Villa 2008). These issues represent serious challenges and impact our basic ability to accurately reconstruct the lives of ancient peoples since weapon technologies affect not only subsistence focus and success, but also basic settlement patterning , resource scheduling, and myriad other issues that make up a culture. In fact, archaeologists intuitively recognize the inability of these methods to generate convincing results when important issues, such as those related to higher order concepts, are at stake. The question of Neanderthal use of ranged weaponry, for example, is one such issue. The same weapon identification methodologies discussed above have been employed to suggest that Middle Paleolithic Levallois and Mousterian points represent hafted weapon armatures rather than tools used for unspecialized tasks such as cutting, scraping, or other (and multiple) purposes (Shea 1988, 1990, 1991, 1993, 1995a, b, 1997, 2003a, 2006, 2009; Solecki 1992; Shea et al. 2001; Sisk and Shea 2009). This suggestion carries with it the connotation that Neanderthals possessed more sophisticated cognitive capacities than often credited, since they were capable of complex behaviour (Shea 2003a, b; O’Connell 2006; see also McBrearty and Brooks 2000; cf. Shea 2011). My concern here is not to argue whether these Middle Paleolithic artifacts are, or are not, actual projectile points or even weapons, but rather, whether from a scientific point-of-view, the proffered evidence is logical and supportable, or whether it is instead attempting to reach beyond the limits of reasonable and supportable inference.

Due to the significance of the cognitive implications, the validity of the Neanderthal weaponry data has been met with apprehension and even skepticism (see Bordes 1961; Holdaway 1989; Anderson-Gerfaud 1990; Holdaway 1990; Debénath and Dibble 1994; Plisson and Béyries 1998; Kuhn and Stiner 2001). When one considers the suggested antiquity of projectile weaponry (e.g., Thieme 1997; cf. Shea 2006) it seems tempting to compare our Paleolithic cousins to recent hunter-gatherers , using ranged weapons to safely and efficiently harvest game; but in this instance ranged weapon use has yet to be demonstrated empirically. Even considering the evidence from Umm el-Tlel , Syria, of a Levallois point embedded in the cervical vertebra of a wild ass (Boëda et al. 1999), in the absence of a haft element, or some other supportive evidence, one can only speculate whether the point was thrust or thrown. In fact, we cannot even be sure that the piece was thrust as a spear armature, since it is entirely possible that it was thrust as a simple form of hafted dagger, perhaps to deliver a coup de grâce; a possibility also recognized by Sisk and Shea (2009: 2044), and indeed by Boëda et al. (1999) when they conclude that the piece was minimally “hafted onto the distal extremity of a shaft”, and that “the use of Levallois points as projectile weapons is only one of several functional possibilities” (Boëda et al. 1999: 401). Other examples of lithic artifacts embedded within animal bone are discussed by Villa et al. (2009a: 856–857), but are notably more recent. Even taking preservation issues into account, the fact that such associations are not more common suggests that at least some caution is indicated with respect to the issue of Neanderthal ranged weaponry. Considering also the increasing interest in the role of weapon technologies in human dispersal (e.g., McBrearty and Brooks 2000; Brooks et al. 2006; Shea 2006; Villa and Lenoir 2006; Churchill and Rhodes 2009; Shea and Sisk 2010), it seems a most propitious time to exercise caution and evaluate methodological robusticity.

Of course, it should not be the case that we require significant issues and implications before properly assessing the validity of a given methodology. The simple experimental results presented above, and replicated by others (Sano 2009; Pargeter 2011), illustrate obvious problems inherent in employing impact breakage patterns as evidence of projectile function at the level of the individual artifact. Due to their fragility, we should actually expect narrow, fragile tips of pointed artifacts to exhibit fracture damage, the majority of which may be directed parallel to the long axis of the artifact. These fractures can exhibit near-identical morphologies, and may have been caused by some form of “impact”, but there are many causes of impact, apart from, and in addition to, weapon use. From this perspective, so-called “diagnostic impact-fractures” are little more than a series of tip fractures found on both projectile points and other pointed lithic implements and debitage, that are not diagnostic of anything other than breakage, and the term “impact fracture” , as it pertains to a visual means of identifying weapon armatures, is rendered meaningless.

Does this mean that we must discard these studies? Not necessarily, as it is entirely possible that there is something useful here, but we do have to be aware of how these results were derived in order to assess their applicability and scientific validity. For example, the breakage patterns associated with thrusting- and projectile-weapons use, when observed on point types already known to generally have been used as weapon armatures, may reasonably be employed to construct functional hypotheses – such hypotheses are reasonably derived by invoking the Direct Historical Approach. One cannot, however, with respect to artifacts of unknown function, employ the correlation between projectiles and breakage patterns as evidence of use as a projectile; we are all aware that correlation is not causation.

Conclusion

Since hafted spear point s may conceivably be used on similar contact materials, and with similar directions of use, as javelins , darts, and arrows, we can expect some similarities in the microwear, hafting traces, residues, TCSA/TCSP, and impact fractures exhibited by each. The end result is that archaeological identifications of weapon technologies that rely on visual recognition of morphology, or any morphology-based metrics; including DIFs, either alone, or in connection with microwear traces; residues; or TCSA/TCSP; are incapable of independently and reliably identifying specific weapon technologies. The simple fact that pointed artifacts known to never have been used as weapons would be identified as “projectile points” serves to falsify the hypothesis that commonly employed existing criteria reliably indentify weaponry. The author concedes, however, that by employing multiple lines of evidence, we can increase the robusticity of weapon identification.

In research where reasonable relational analogs exist to support the identification of specific projectile technologies, and specific artifacts as projectiles (e.g., by employing the Direct Historic Approach), this is much less of an issue. Unfortunately, where studies have been undertaken that rely on our ability to identify weapon technologies in the archaeological record in the absence of good, relational analogs; those that ultimately rely on the common methodologies discussed herein, rather than on direct evidence related to propulsion technology (i.e., associated with notched, or dimpled shafts; or other direct evidence of bow , spearthrower, or javelin technologies) have been built on a tenuous foundation.

As anthropologists, we are driven by our curiosity and our desire to find answers to questions regarding the human past, but as scientists we must be careful to adhere to the precepts of good science so that we can be confident in our results; only then can we proceed to build upon existing research. Replicative experimental archaeology can, and has, contributed significantly to our understanding of weapon technologies, but we must take care that our methodologies are rigorous, and that our inferences are supported logically and empirically. To accomplish this, replication studies must successfully eliminate confounding factors and alternative explanations if they are to avoid instances equifinality.