…first labour, after it and then with it speechthese are the two most essential stimuli under the influence of which the brain of the ape gradually changed into that of man.

Friedrich Engels, The Part Played by Labour in the Transition from Ape to Man, 1876.

Introduction

Many species make and use tools, but humans are distinguished by the extent and complexity of their technological behavior. It is likely that the neural prerequisites for this uniquely human facility evolved during the Paleolithic (2,600,000–10,000 years ago), a period which witnessed the origin (Semaw et al. 2003) and evolution (Stout 2011) of stone toolmaking. During the same time period, the hominin brain underwent a roughly threefold increase from ape to human proportions (Holloway et al. 2004a, b). Over 100 years of theorizing has linked Paleolithic toolmaking to human brain evolution (Oakley 1949; Holloway 1967; Engels 2003 [1876]), but empirical evidence bearing on the exact nature of this interaction has remained scant. In order to identify likely neuroanatomical targets of selection acting on toolmaking ability, we replicated the skill learning challenges faced by our early toolmaking ancestors and measured structural effects on the brain. This experiment builds on previous comparative work to test the hypothesis that stone toolmaking can elicit plastic structural responses in evolutionarily relevant brain structures. Such phenotypic plasticity enables learning to facilitate evolutionary adaptation via what is commonly termed the “Baldwin effect” (Weber and Depew 2003; Bateson 2004), potentially providing a mechanistic link between Paleolithic toolmaking and human brain evolution.

Our experiment builds on two lines of research into human brain evolution: comparative studies with nonhuman primates and experimental “neuroarchaeological” studies with modern humans replicating Paleolithic technology. These methods have complementary strengths and weaknesses, and are best used in conjunction. Comparative studies provide evidence of differences in brain structure and function between extant species that can be used to identify evolutionary changes specific to the human lineage. Comparative studies cannot, however, reveal the more specific timing and behavioral context of adaptations that occurred since the last common ancestor of species under consideration [e.g., chimpanzees and human LCA, ca. 7–8 million years ago (Langergraber et al. 2012)]. The archeological record provides an underutilized resource to fill this gap. Archeological evidence allows the detailed reconstruction of ancient human behaviors, which may then be studied using experimental neuroscience methods. A limitation of this approach is that it must use evidence from modern human subjects to support inferences about brain responses in pre-modern hominins. It is thus important that neuroarchaeological hypotheses and interpretations be constrained by comparative evidence. Here we focus on shared brain systems known to support tool use in humans and nonhuman primates, and which also show additional human specializations for tool use. The substantial homology of these systems across nonhuman primates and modern humans makes it unlikely that ancestral hominins displayed a uniquely divergent overall functional organization. Moreover, experimental results linking particular, archeologically attested toolmaking behaviors to modern human specializations within this network support inferences regarding the likely timing and behavioral context for the emergence of these specializations.

Comparative studies with nonhuman primates point toward human specializations in frontoparietal tool-use networks. For example, humans activate a region of anterior supramarginal gyrus during the observation of tool use actions, while macaque monkeys fail to activate the homologous region (Peeters et al. 2009). This region responds to the rigid kinematics of tools as opposed to the biological motion of hands (Peeters et al. 2013) and, together with enhanced parietal contributions to 3-dimensional form-from-motion perception (Vanduffel et al. 2002), has been proposed to support uniquely human tool use capacities (Peeters et al. 2009; Orban and Rizzolatti 2012). While macaques do not naturally use tools, experimentally tool-trained macaques show increases in gray matter in superior temporal sulcus, secondary somatosensory cortex, and intraparietal sulcus (Quallo et al. 2009). Compared to humans, macaques also have relatively more prefrontal activation during object perception (Denys et al. 2004), and relatively more frontal and less parietal activation during the perception of grasping actions (Nelissen et al. 2005). A comparison between human and chimpanzee brain activation during the observation of object-directed grasping found relatively greater human activation in ventral premotor cortex, inferior parietal cortex, and inferotemporal cortex, with human activation being generally more posterior than chimpanzee activation (Hecht et al. 2013b). Anatomically, lateral frontal, parietal and temporal association cortices are among the most volumetrically expanded portions of the human brain (Hill et al. 2010; Avants et al. 2006). This parallels comparative studies of frontoparietal white matter connectivity, which have reported relatively greater human connectivity with parietal and temporal regions (Rilling et al. 2008; Hecht et al. 2013a) in tracts that may be involved in human tool use (Ramayya et al. 2010). We have argued (Hecht et al. 2013b) that this indicates a greater contribution for “bottom-up” representation of observed action in the human brain, with more neural resources devoted to processing the details of objects and movements. Because action understanding is fundamental to tool use, social learning, and cumulative culture, we suggest that the evolution of these functions may have involved modifications of frontoparietal interactions.

Neuroarchaeological methods have previously been applied in functional studies of Paleolithic toolmaking using FDG-PET (Stout and Chaminade 2007; Stout et al. 2008), fMRI (Stout et al. 2011), and transcranial Doppler ultrasound (Uomini and Meyer 2013). Echoing the comparative evidence of frontoparietal elaboration in human evolution, experimental results have identified a distributed, bilateral frontoparietal network supporting stone toolmaking in modern humans. Some of these activation loci are plotted in Fig. 8. Early toolmaking (“Oldowan”, ca. 2.6 million years ago) activates ventral premotor and parietal regions identified by comparative studies as likely foci of human perceptual-motor specializations for tool use (see above). More advanced toolmaking (“Acheulean handaxe”, ca. 500,000 years ago) generates increased activation of inferior prefrontal cortex (Stout et al. 2008, 2011), another major locus of structural and functional change in human brain evolution (Falk 1983; Holloway et al. 2004a, b; Schenker et al. 2008; Teffer and Semendeferi 2012). This supports an evolutionary scenario in which perceptual-motor adaptations enabled the initial stages of human technological evolution, whereas later developments were dependent on enhanced cognitive control (Faisal et al. 2010; Stout 2010). Furthermore, observed overlap in functional neuroanatomy (Stout and Chaminade 2012) and the time-course of hemodynamic responses (Uomini and Meyer 2013) between stone toolmaking and language has renewed support to longstanding hypotheses of an evolutionary connection between these two distinctive human capacities (Holloway 1967). In keeping with broader “gestural origin” hypotheses of language evolution (e.g., Arbib 2005, Corballis 2002), it has been proposed that adaptations for toolmaking were evolutionarily co-opted [“exapted” (Gould and Vrba 1982)] at various points in human evolution to support proto-linguistic communicative behaviors (Pulvermüller and Fadiga 2010; Greenfield 1991; Stout and Chaminade 2012).

Together, the comparative (Peeters et al. 2009, 2013; Vanduffel et al. 2002; Orban and Rizzolatti 2012; Denys et al. 2004; Nelissen et al. 2005; Hecht et al. 2013a, b; Rilling et al. 2008; Ramayya et al. 2010) and neuroarchaeological (Stout and Chaminade 2007; Stout et al. 2008, 2011; Uomini and Meyer 2013; Faisal et al. 2010) studies reviewed here suggest that selection pressure for toolmaking ability may have influenced the evolution of human inferior frontoparietal regions. However, task-related functional activations cannot show that stone toolmaking actually generates pressure for structural changes of the kind (e.g., frontoparietal expansion and increased connectivity) indicated by the comparative evidence. To test this, we replicated the learning challenges faced by our early toolmaking ancestors and measured structural effects on the brain. Six subjects underwent an intensive, 2-year training program in a variety of archeologically attested Paleolithic toolmaking methods (Fig. 1). Structural MRI and DTI scans were collected at the start (Time 1), during (Time 2), and at the end (Time 3) of training. Individual training time and sensorimotor performance data for these time points were also collected, in order to directly test the correlation of observed structural changes with stone toolmaking training (cf. Thomas and Baker 2013). Stone toolmaking is a complex, real-world craft and a simple, objective method for quantifying variation in overall toolmaking “skill” does not exist. We chose to measure one key aspect of toolmaking skill, the accuracy of percussion strikes, using a controlled experimental task. Previous research has shown that control over percussive accuracy is critical to the planning and control of toolmaking outcomes and varies with toolmaking experience (Nonaka et al. 2010).

Fig. 1
figure 1

Intensive 2-year training program in Paleolithic stone toolmaking. a Training included direct instruction and individual practice, both on site and during intensive field trips (04/2011, 09/2011, 04/2012). b Stone tools produced by a representative subject at T1, T2, and T3. The last three tools were created using porcelain which was coated with red paint in order to facilitate recognition of the original surface (Khreisheh et al. 2013). c An actual Paleolithic stone tool (Acheulean handaxe, ~500,000 years old)

We expected that, if stone toolmaking does indeed place acute demands on the anatomical structure of frontoparietal cortex, this should be reflected in measurable changes in gray and white matter in this region and that the magnitude of these changes should correlate with training time and behavioral measures. These are conservative predictions because it is likely that the evolved frontoparietal cortices of our modern human subjects are already structurally well-adapted to the skilled use of tools, and thus less likely to show additional plastic response to stone toolmaking practice. If toolmaking nevertheless generates a measurable response in modern human subjects, this provides a strong indication that comparable toolmaking by ancestral hominins would have stressed homologous substrates to an equal or greater extent.

Materials and methods

Subjects

Subjects were recruited from the undergraduate and postgraduate programs in Archaeology at Exeter University. Subjects were ages 18–25 at the time the first scan was collected, five males and one female. All were right-handed by self-report and had no neurological or psychiatric illness. All subjects consented to participate and the study was approved by the Ethics Committee at Exeter University.

Toolmaking training

Paleolithic stone toolmaking involves striking a stone “core” with a “percussor” of bone, antler, or stone to detach carefully controlled chips and incrementally achieve various design goals. This demanding task requires precise sensorimotor coordination and strategic action sequence planning. Training occurred on site at Exeter University and during intensive field trips to the USA, France, and Denmark. Subjects were trained by Bruce Bradley, Ph.D., Director of the Experimental Archaeology Masters Programme at Exeter University, and an expert stone toolmaker with decades of experience teaching Paleolithic technologies. Training included instruction, coaching, and demonstration as well as independent practice, which was recorded by subjects in a log book. The Paleolithic toolmaking methods learned by subjects included: (1) basic flake production, comparable to the earliest known (Oldowan) tools of Homo habilis 2.6–1.5 million years ago (mya); (2) “Handaxe” making, comparable to the Acheulean tools of Homo erectus and Homo heidelbergensis 1.7–0.25 mya; and (3) “prepared core” flake production, comparable to the Levallois tools of Neanderthals and early Homo sapiens <0.25 mya. Training was naturalistic and self-paced, resulting in substantial variation across subjects in the duration, intensity, and content of practice.

Paleolithic toolmaking occurred over a vast time period and across many millions of square miles, and naturally encompasses a great deal of variation that could not be included in the methods learned by our subjects. The methods we did select are considered broadly representative of Early and Middle Paleolithic technology, whereas details of the production techniques employed closely match those documented in specific archeological collections (e.g., Stout et al. 2014, Bradley and Sampson 1986). This gives us confidence that our training protocol was both generally representative and specifically accurate in re-creating learning challenges actually faced by Paleolithic toolmakers.

Motor performance

Motor testing took place at the Brain and Behaviour Lab, Department of Bioengineering and Department of Computing, Imperial College London, on the same days as MR image acquisition. Due to scheduling constraints performance data were collected for only five (four male) of the six subjects. All subjects provided written informed consent, and experiments were carried out in accordance with institutional guidelines. A local ethics committee approved the experimental protocols.

Subjects held in their right hand an object (250 g, hand-sized cylinder) with an embedded Liberty motion tracking node sampled at 240 Hz, calibrated to sub-millimeter precision and tracked by a Polhemus Liberty electromagnetic tracking system [POLHEMUS, Colchester (VT)]. The measurement markers of this system recorded horizontal, vertical and depth position of the marker in a calibrated reference coordinate system (see Faisal et al. 2010). The coordinate system is aligned veridical with an augmented reality projection system linked to a computer controlling the experiment and processing the position information in real time (akin to Faisal and Wolpert 2009).

Subjects were instructed to strike from three different strike positions towards a fixed target point. The start points were located on a line 7.5, 24, and 40 cm away from the target point. The spatial position of the start point and target were located on an imaginary line, and mimicked the motion of percussion strikes during natural knapping. Start positions were randomly selected before each trial. Subjects had to move the marker to the start position before being able to initiate the strike. Subjects could initiate a strike at any time of their choosing, but once movement was initiated subjects had a time constraint of 200 ms to hit the target, or had to repeat the trial (classified as miss). Invalid trails (e.g., movement not executed in time window) were discarded. Subjects first performed 500 trials with a circular target of 2 cm, then 500 trials with a circular target of 1 cm, for a total average of 920 ± 31.7 (range 845–991) valid trials per session. We calculated accuracy (proportion of valid trials passing through target) per session and tested for correlation between changes in accuracy and changes in fractional anisotropy (FA, see Sect. “Tract-based spatial statistics”) using Pearson’s r correlation coefficient.

Image acquisition and preprocessing

Imaging took place at the Wellcome Department of Imaging Neuroscience in London. All subjects consented and the research was approved by the National Hospital for Neurology and Neurosurgery and Institute of Neurology Joint Research Ethics Committee. 61-direction DTI images were acquired on a Siemens Trio 3.0 T scanner with a voxel size of 1.7 mm3. Seven volumes were acquired with no diffusion weighting (B0s); these images provide a “baseline” from which diffusion magnitude and directionality can be measured. The FSL software package (Smith et al. 2004; Woolrich et al. 2009; Jenkinson et al. 2012) was used for image processing and analysis. The first B0 image from each scan was discarded to allow for equipment warm-up and the remaining 6 were averaged. FSL’s brain extraction tool (BET) (Smith 2002) was applied to the averaged B0s, which were then aligned to template space using a nonlinear registration algorithm (FNIRT) (Andersson et al. 2007) with an initial affine registration (FLIRT) (Jenkinson and Smith 2001; Jenkinson et al. 2002). FSL’s eddy correction algorithm (EDDY) was used to correct distortion caused by eddy currents. DTIFIT, part of FSL’s FDT package (Behrens et al. 2003) was used to fit the raw diffusion data to a tensor model at each voxel to obtain diffusion scalars, including fractional anisotropy. Dates for each subject’s scans at Time 1, 2, and 3 are listed in Table 1.

Table 1 Time points for each subject’s MRI and DTI scans, given in days from the beginning of the study

Registration to template space

Since the goal of the present study was to compare scans within subjects across time, we used an approach similar to a previous longitudinal DTI study (Engvig et al. 2012) in order to ensure accurate within-subject alignment across time points despite potential differences in the subject’s position in the scanner or in equipment calibration. First, for each time point, each subject’s fractional anisotropy (FA) image, which is a scalar measure of diffusion magnitude at each voxel, was linearly registered to the FMRIB 1 mm3 FA template, which is a standard T1-weighted MRI template formed by averaging the scans of 152 subjects. Next, FSL’s midtrans algorithm was used to find the transformation matrix from a geometrically intermediate space of each subject’s 3 FA images to the FA template (their “midspace”). This matrix was inverted and applied to the FA template to transform the template into each subject’s FA midspace. Each subject’s 3 native-space FA images were then linearly registered to their own midspace template. These images were fed into the FSL software package’s standard TBSS pipeline (see below), which includes nonlinear registration to a common template space. For probabilistic tractography, each subject’s midspace FA template was nonlinearly aligned to the MNI-space FMRIB 1 mm3 FA template. This warp was inverted and combined with the linear midspace-to-native-space transformation matrix in order to produce the MNI space-to-native-space transformation matrix for each subject, which was used to apply standard-space regions of interest to native diffusion space (Engvig et al. 2012). The inverse transformation (a linear native space-to-midspace transform combined with a nonlinear midspace-to-MNI-space warp) was used to register all tractography results into a common template space for viewing.

Tract-based spatial statistics

DTI is a structural neuroimaging method that is sensitive to the diffusion of water. Because water diffusion is more directionally constrained and homogenous inside than outside an axon bundle, DTI measurements can be used to trace white matter pathways and to identify changes in those pathways over time (for a primer, see Mori and Zhang, 2006). FSL’s Tract-Based Spatial Statistics (TBSS) package (Smith et al. 2006) was used to identify regions of white matter with changes in white matter integrity over time. This algorithm has been successfully used by multiple previous studies to identify changes in white matter and links between white matter variation and behavioral variation (e.g., Scholz et al. 2009; Schlegel et al. 2012; Lee et al. 2010; Taubert et al. 2010). TBSS performs voxelwise statistical analysis of fractional anisotropy (FA) images. FA reflects constraints on the directionality of water diffusion and is thought to be related to the degree of axon myelination, axon diameter, and/or tract density (Zatorre et al. 2012). Each subject’s FA images for each time point were first nonlinearly registered to the FMRIB 1 mm3 FA template, which is an MNI-space average of FA images from 58 subjects. All subjects’ template-aligned FA images were then averaged to create a study-specific mean FA image at all time points. This mean FA image was thinned to create a mean FA skeleton representing the centers of all white matter tracts common to the group. Aligned FA data from each subject at each time point was then projected onto this skeleton for voxelwise cross-time point statistical comparisons.

Previous functional studies of stone toolmaking found task-related activations in inferior frontoparietal regions (Stout and Chaminade 2007; Stout et al. 2008, 2011), so statistical comparison was constrained to the portion of the white matter skeleton that fell within the white matter beneath lateral frontoparietal cortex. This ROI was based on the superior longitudinal fasciculus (SLF) ROI from the JHU White Matter Atlas included in FSL, which was generated using probabilistic tractography in 28 subjects (Wakana et al. 2007; Hua et al. 2008). While the atlas’s probabilistic SLF ROI terminates around the vicinity of premotor cortex, previous functional imaging studies have indicated that more anterior regions of inferior frontal gyrus within the atlas may be involved in toolmaking (Stout et al. 2008, 2011; Stout and Chaminade 2012). Monkey tract-tracing studies have shown that these anterior ventrolateral prefrontal regions do indeed receive white matter connections from the SLF (Schmahmann et al. 2007; Petrides and Pandya 2009), and diffusion imaging studies have successfully tracked these connections in both monkeys and humans (Schmahmann et al. 2007; Hecht et al. 2013a; Thiebaut de Schotten et al. 2012). Therefore, we thresholded the atlas’s SLF ROI at 25 % and then manually extended it toward the anterior aspect of the inferior frontal gyrus. The threshold of 25 % was chosen as a conservative compromise between including voxels belonging to any subject in the probabilistic atlas (which would reflect a wide range of inter-subject variability) and including only voxels common to all subjects in the probabilistic atlas (which would create a very small ROI). This ROI was then used to mask the mean FA skeleton. As a control tract, we included the forceps major ROI from the JHU White Matter Atlas (Wakana et al. 2007; Hua et al. 2008), also thresholded at 25 %. Like the SLF, the forceps major is a cortico-cortical tract. It carries callosal connections between the left and right primary visual cortices and other early visual areas. To our knowledge there is no reason to suspect that this tract would be altered by toolmaking training. A single analysis was carried out on one mask that included both the SLF and forceps major in order to use a single test to establish whether toolmaking training had an effect on the SLF but not the forceps major (Nieuwenhuis et al. 2011).

Statistical comparisons were carried out using randomize, FSL’s tool for permutation-based testing with a general linear model (GLM) (Nichols and Holmes 2002; see Friston et al. 1994 for a discussion of the integration of general linear models with statistical parametric maps in neuroimaging). GLMs were constructed to compare FA changes along the 3 time points, from Time 1 to Time 2, Time 2 to Time 3, and Time 1 to Time 3 (i.e., not considering Time 2), both across the whole brain and in the SLF ROI. The GLMs incorporated the time point of each scan for each subject in days, with Time 1 being day 0. Threshold-free cluster enhancement (TFCE) (Smith and Nichols 2009) was used to identify statistically significant clusters of voxels. Results were corrected for multiple comparisons and thresholded at p < .05.

In clusters that showed significant FA change over time, we also examined axial and radial diffusivity. Fractional anisotropy is a measure of diffusivity in the primary diffusion direction (axial diffusivity) relative to diffusion in the other two perpendicular directions (radial diffusivity: the mean of these two latter measurements); changes in FA can therefore be due to underlying changes in axial and/or radial diffusivity. Preliminary research suggests that these may correspond to different biological mechanisms of change, although additional research in this area is warranted (Wheeler-Kingshott and Cercignani 2009). Increased axial diffusivity may reflect increases in axon density, axon caliber, microtubule packing, and/or microtubule organization (Kumar et al. 2010, 2012; Song et al. 2002, 2003; Choe et al. 2012), whereas decreased radial diffusivity may reflect increased axon myelination (Keller and Just 2009; Zhang et al. 2009; Bennett et al. 2010).

Radial diffusion images were created by averaging the images representing the magnitude of diffusion in the second and third directions; axial diffusion images were generated automatically by FSL in the standard DTI preprocessing pipeline. Both radial and axial images were then subjected to the same registration and processing steps as the FA images in order to enable measurement in equivalent voxels within and across subjects.

Voxel-based morphometry

Voxel-based morphometry is an approach to identifying structural differences in gray matter between subjects or within subjects across time which involves registering all scans to a common space and then measuring the spatial difference between each scan and the group average (for a primer, see Ashburner and Friston 2000). FSL’s VBM tool (Douaud et al. 2007) was used to compare gray matter volume across time points. This algorithm performs voxelwise statistical analysis of T1-weighted MRI images. Scans were segmented into CSF, white matter, and gray matter, with the latter voxels being retained for further analyses. A study-specific, bilaterally symmetric gray matter template was produced by nonlinearly registering each scan to the MNI T1 template, flipping each scan along the midline, and then averaging all scans. Each subject’s native gray matter data was then nonlinearly registered to this template and modulated to correct for local expansion or contraction resulting from nonlinear registration, and then smoothed using a Gaussian kernel with a sigma of 3 mm (which is equivalent to FWHM approximately 7.08 mm). VBM compares individual subjects’ template-space images to the template-space average, and intensity differences between the two indicate differences in gray matter volume. Analysis was constrained to gray matter regions connected by SLF (BA 45, 44, 6, 40, 39, and 37—the regions of experimental interest) and regions of early visual cortex (BA 17 and 18), since these regions are linked by the forceps major, the tract used as a control ROI in the TBSS analysis. As in the TBSS analysis, both the experimental and control regions were combined into one mask and subjected to a single statistical test in order to establish whether toolmaking training effected the gray matter areas connected by the SLF but not the gray matter areas connected by the forceps major (Nieuwenhuis et al. 2011). Voxelwise statistical comparisons of gray matter expansion/contraction were carried out using the same GLMs as the TBSS analyses, and threshold-free cluster enhancement (TFCE) (Smith and Nichols 2009) was used to identify statistically significant clusters of voxels. Clusters were thresholded at p < .01 but were not corrected for multiple comparisons.

Tractography from regions of white matter with significant change in fractional anisotropy

In order to obtain information about the anatomical connectivity of clusters showing significant changes in fractional anisotropy in the TBSS analysis, we used these clusters as seeds for further tractography analyses. Tractography was performed on each seed individually. Tracts were thresholded, normalized, binarized, and summed to produce composite images following the same approach as in the control tractography analyses. These tracts were produced to give anatomical elaboration to the TBSS results, and were not statistically compared across time points since they were the result of a previous longitudinal statistical analysis.

Tractography was performed in native diffusion space using FSL’s BEDPOSTX and probtrackx tools (Behrens et al. 2003). BEDPOSTX uses Markov chain Monte Carlo sampling of the raw diffusion information to build a Bayesian distribution diffusion information at each voxel. Probtrackx samples from these distributions in order to generate probabilistic streamlines; in the resultant tractography output images, the intensity at each voxel corresponds to the number of streamlines that reached that voxel (so greater intensity corresponds to a greater probability of connectivity within the model). Both BEDPOSTX and probtrackx model diffusion information not only along the principal direction of diffusion, but also along the other two orthogonal axes. Importantly, sampling of the Bayesian distributions for these additional diffusion directions allows tractography through regions with low fractional anisotropy, i.e., in regions of crossing fibers within white matter, and through the gray matter/white matter boundary (Behrens et al. 2007).

Results

Changes in white matter fractional anisotropy

FA changes occurred in white matter leading into left supramarginal gyrus (SMG), bilateral ventral precentral gyrus (vPrCG), and right inferior frontal gyrus pars triangularis (IFGpt). FA increased in these locations from Time 1 to Time 2 and decreased in almost identical locations from Time 2 to Time 3, although note that the vPrCG increase occurred in the right hemisphere and the vPrCG decrease occurred in almost exactly the same location in the left hemisphere (Fig. 2). No changes were observed in the forceps major. Coordinates, size, baseline FA values, and t values for FA changes for each cluster are reported in Table 2. The connectivity of each of these clusters is shown in Fig. 8 (described below).

Fig. 2
figure 2

Structural changes associated with the acquisition of Paleolithic stone toolmaking skills. Changes in SLF white matter (p < .05, corrected). Yellow–red increases from T1 to T2; cyan–blue decreases from T2 to T3. Insets show mean FA at each time point. Note the close overlap in the spatial location of increases/decreases in IFGpt and SMG; increases/decreases were observed in very similar locations of vPrCG in each hemisphere. While FA within the R vPrCG cluster continued to increase from T2 to T3, note that this was not significant (i.e., the voxelwise analysis that identified clusters showing significant change from T2 to T3 did not find such an increase at this location). R/L vPrCG right/left ventral precentral gyrus, R aIFG right anterior inferior frontal gyrus, L pSMG left posterior supramarginal gyrus

Table 2 Baseline FA, t scores for FA change, volume, and coordinates for clusters showing significant change in FA over time

In the clusters that showed significant FA change, we also measured axial and radial diffusivity in order to determine whether change in one or both of these measures was driving change in FA. From Time 1 to Time 2, axial diffusivity showed a significant increase (paired samples t test; t(17) = 5.299, p < .001, 2-tailed), while radial diffusivity showed no significant change [paired samples t test; t(17) = −.954, p = .353, 2-tailed]. From Time 2 to Time 3, axial diffusivity showed a significant decrease [paired samples t test; t(17) = −2.565, p = .020, 2-tailed], while radial diffusivity again showed no significant change [paired samples t test; t(17) = −.596, p = .559, 2-tailed], suggesting that FA changes may have been due to changes in axon density, axon caliber, microtubule packing, and/or microtubule organization, rather than changes in myelination (Keller and Just 2009; Zhang et al. 2009; Bennett et al. 2010; Kumar et al. 2010, 2012; Song et al. 2002, 2003; Choe et al. 2012).

Correlation with training time

White and gray matter changes mirrored the intensive training (mean 120 h) from Time 1 to Time 2 and the relative decrease (mean 47 h) from Time 2 to Time 3 (see Fig. 1). Total FA change (∆FA) was correlated with training hours since the previous scan (r = .580, p = .024, 1-tailed) (Fig. 3). ∆FA values within each cluster individually were not significantly correlated with training time, likely reflecting individual variability in the allocation of structural responses (Sect. "Variability in FA change across subjects").

Fig. 3
figure 3

Correlation between change in FA and practice hours. Gray lines denote 95 % confidence interval. Pearson’s r = .580, p = .024, 1-tailed

Correlation with motor performance

Across all time points, accuracy ranged from .74 to .96 (mean = .86, SD = .05). Accuracy at Time 3 did not differ significantly from Time 1 (paired t test, t = 1.717, p = .161), reflecting substantial variance in individual differences (range +.10 to −.04). Total FA change (∆FA) was correlated with changes in accuracy (∆accuracy r = .616, p = .029, 1-tailed) (Fig. 4). Performance differed (t = 3.226, df = 8, p = .012) between the two subject groups identified from PCA factor scores, with Group 1 showing a mean ∆accuracy of +4 % across time points and Group 2 a mean of −1 %. Thus, Group 1 subjects improvement their striking accuracy over the training period (mean = +8 %, paired t = 5.869, df = 2, p = .028), whereas Group 2 subjects did not (mean = −1 %, paired t = −.467, df = 1, p = .722). From Time 1 to Time 2, ∆accuracy correlated with PCA Component 2 factor scores (r = .878, p = .025, 1-tailed), whereas from Time 2 to Time 3 it correlated with Component 1 (r = .973, p = .003, 1-tailed). Analysis at the level of individual clusters is consistent with this: from Time 1 to Time 2 ∆accuracy is positively correlated with right vPrCG (r = .822, p = .044, 1-tailed) and negatively correlated with right IFGpt (r = −.989, p = .001, 1-tailed), whereas from Time 2 to Time 3, ∆accuracy is positively correlated with right IFGpt (r = .837, p = .039, 1-tailed).

Fig. 4
figure 4

Correlation between change in FA and change in accuracy on striking task. Gray lines denote 95 % confidence interval. Pearson’s r = .616, p = .029, 1-tailed

Variability in FA change across subjects

Closer examination of each subject’s pattern of FA changes across clusters reveals substantial individual variability (Fig. 5). A principal components analysis identified 2 components accounting for 83 % of this variance (Fig. 6). Component 1 (61 % of variance) captures the covariance of all clusters apart from the right vPrCG, while component 2 (22 % of variance) primarily captures variation in right vPrCG, including a negative correlation with right IFGpt. Inspection of individual factor scores for component two suggests the presence of two alternative responses to training. Group 1 (subjects 1,3, 5 and 6) is characterized by positive loadings from Time 1 to Time 2 and Group 2 (subjects 2 and 4) by negative loadings (Fig. 7). For all subjects, this pattern reverses from Time 2 to Time 3, reflecting the symmetric decreases in FA noted above (Fig. 2). Training time was not correlated with PCA factor scores.

Fig. 5
figure 5

Individual variation in pattern of FA change across clusters

Fig. 6
figure 6

Factor loadings of each cluster in each of two components identified by principal components analysis. Note that the left hemisphere clusters have very similar factor loadings (dashed circle)

Fig. 7
figure 7

Individual subjects’ factor scores for Component 2. Note that subjects are arranged on the x-axis to emphasize the distinction between Subjects 2 and 4 versus the others

Changes in gray matter volume

The VBM analysis in regions surrounding the SLF found gray matter expansion from Time 1 to Time 2 in the supramarginal gyrus, and a reversal of this effect from Time 2 to Time 3 (Fig. 8c), mirroring the pattern of change in the TBSS analysis. This effect was significant at the p < .01 level without correction for multiple comparisons, and should therefore be considered a trend. At this threshold there were no voxels showing gray matter change over time in visual cortex.

Fig. 8
figure 8

Regions of white matter change project to regions of gray matter change, and to regions of gray matter activated in previous Paleolithic toolmaking studies. Clusters of white matter change (Fig. 2) were used to seed tractography analyses. In ac, hot scale indicates increases from T1 to T2; cool scale indicates decreases from T2 to T3. d Shows the same tracts, all made partially transparent and overlaid. a Tracts produced after seeding with the L and R vPrCG TBSS clusters. These projections connect the body of the SLF to the cortex of the ventral precentral gyrus. b Tracts produced after seeding with the R aIFG clusters. These projections connect the pars triangularis of the right IFG with the right dorsomedial frontal pole. c Tracts produced after seeding with the L pSMG clusters. These projections connect the posterior inferior parietal cortex with lateral temporal cortex and project into a region of gray matter which showed a trend toward VBM increases from T1 to T2 and decreases from T2 to T3 (p < .01, uncorrected). d All tractography results rendered on the MNI template brain. Dots indicate activations related to Paleolithic toolmaking from previous studies that fall within or immediately adjacent to tractography results from the present study. Stout and Chaminade (2007); Stout et al. (2008); §Stout et al. (2011)

Probabilistic tractography from regions of white matter change

Probabilistic tractography confirmed that the voxels showing white matter change projected to the voxels showing gray matter change. Cortical regions identified by tractography and morphometry are known to be functionally responsive to stone toolmaking (Stout and Chaminade 2007; Stout et al. 2008, 2011) (Fig. 8).

Discussion

Skill acquisition and structural remodeling

These results establish that the acquisition of Stone Age toolmaking skills causes structural remodeling to frontoparietal circuits. This is in agreement with previous studies that have found structural changes with skill acquisition [e.g., (Maguire et al. 2000; Draganski et al. 2004; Floyer-Lea and Matthews 2004; Lappe et al. 2008; Scholz et al. 2009; Lee et al. 2010; Taubert et al. 2010, 2011; Sisti et al. 2012; Steele et al. 2012)]. The robustness of the anatomical changes we observed is underscored by the fact that they reached statistical significance in a relatively small sample size. This is likely related to the intensity and duration of our training program. Previous studies have reported significant results from larger samples using considerably shorter, more limited, and less evolutionarily relevant training regimes. For example, in a group of studies on balancing (Taubert et al. 2010, 2011, 2012), 14 subjects received six 45-min training sessions over the course of 6 weeks. In another study (Steele et al. 2012), 13 subjects were trained on a motor sequence task over a period of only 5 days. Significant effects have also been obtained using relatively small samples (<12) (Floel et al. 2009; Takeuchi et al. 2010; Antonenko et al. 2012; Gebauer et al. 2012; Schlegel et al. 2012; Wang et al. 2012; Tseng et al. 2013). In our study training was both intensive and long-term, extending over a period of 2 years. This enabled us to study the complex, real-world skill of stone toolmaking, which can take years to master (Stout 2002). The current study thus applied a well-established experimental approach to address for the first time questions about human brain evolution. We did so by studying an evolutionarily relevant skill that can be archeologically tied to particular times and places in prehistory.

The cellular mechanisms responsible for experience-dependent gray and white matter change are under active investigation, but are not yet well understood (Zatorre et al. 2012). Recent reviews (e.g., Thomas and Baker 2013) highlight the need to ensure that structural change over time is due to training and not some other temporally varying process such as exam schedules or equipment change. In the current study, structural changes correlated with both practice hours and motor performance, confirming their causal association with stone toolmaking training. Training time between scans varied across subjects and predicted total ∆FA. This included increases in total FA associated with intense training from Time 1 to Time 2 as well as symmetrical decreases in FA associated with lower levels of training from Time 2 to Time 3. A trend toward gray matter expansion and contraction observed in the supramarginal gyrus followed the same pattern. These sensitive anatomical responses likely reflect the acute nature of behavioral demands on brain structure, a finding which is supported by a growing body of evidence: white matter changes can occur in as little as 2 h of behavioral training in humans (Sagi et al. 2012), and changes in fractional anisotropy of white matter can be detected in as little as 1 h in mice (Ding et al. 2013). The fact that white and gray matter changes rapidly reversed with reduction of toolmaking activity may also reflect the metabolic costliness of maintaining such changes. This acute response confirms the structurally demanding nature of Paleolithic toolmaking behavior, even in our modern human subjects, and is consistent with toolmaking’s hypothesized role in generating selective pressure on frontoparietal cortex (see below).

Total ∆FA also correlated with changes in striking accuracy across subjects. Striking accuracy is essential in stone toolmaking to control fracture and achieve desired effects. Experimental studies of modern stone craftsmen have shown that control over the elementary striking gesture is a more reliable indicator of expertise than years of experience or amount of technical knowledge (Bril et al. 2005). Stone toolmakers must adapt motor performance to variable conditions (e.g., composition, size, and shape of both percussor and core), and experimental manipulations have documented the flexible adaptation of skilled toolmakers (Bril et al. 2005, 2010). We examined the transfer of motor skill from toolmaking training to a controlled virtual reality striking task and found that individual changes in accuracy (rate of hitting the target) correlated with FA change. In a naturalistic study, Nonaka et al. (2010) had subjects strike stone cores to remove flakes and found that the mean distance between predicted and actual points of impact for novice toolmakers was 7.4 mm, with a SD of approximately 4.6 mm (see Fig. 6 in Nonaka et al. 2010). Assuming a normal distribution, this implies that approximately 84 % of strikes would have been within 12 mm of the target (mean +1 SD), which is comparable to the mean hit rate in our experiment (86 %) using targets with an average diameter of 15 mm. Our most successful subject achieved a hit rate of 96 % by Time 3, which is roughly that expected for the “intermediate” subjects of Nonaka et al. (mean error = 4.3 mm, SD = 5.7, mean +2 SD = 15.7), but falls below the expected rate for “experts” (mean error = .6 mm, SD = 1.1, mean + 13SD = 14.9) which should approximate 100 %. To the extent that increases in FA are correlated with increases in accuracy, this suggests that the training-related anatomical adaptations we observed may be relatively modest compared to those that would have been experienced by expert Pleistocene toolmakers (compare Fig. 1b vs. c, see also Stout et al. 2014).

In contrast to total ∆FA, training time did not correlate with ∆FA in individual clusters. This reflects individual variation in the specific response to training. Despite the limitations of our small sample size for investigating individual variation, we did observe apparent patterning that might be further investigated in future research. TBSS analysis identified six clusters of significant FA change at the group level, but also substantial individual variability in the relative allocation of structural changes across these clusters. A Principal Components Analysis revealed two main dimensions of this variation, including a pattern of coordinated change across all clusters apart from right vPrCG (Component 1), and an inverse relationship between right vPrCG and right IFGpt (Component 2). Changes in striking accuracy were predicted by Component 2 from Time 1 to Time 2 and by Component 1 from Time 2 to Time 3. In other words, initial increases in striking accuracy were associated with increased right vPrCG FA at the expense of right IFGpt whereas subsequent increases in accuracy were associated with retention of FA increases (i.e., less negative ∆FA) outside right vPrCG, especially in right IFGpt. Changes in accuracy did not correlate with training time, in keeping with the disjunction between stone working skill and experience reported by Bril et al. (2005) and again reflecting a variable response to training across our subjects.

Interestingly, subjects fell into two groups with respect to Component 2 scores. Group 1 showed positive scores from Time 1 to Time 2 followed by negative scores from Time 2 to Time 3 whereas Group 2 showed the opposite pattern. Group 1 subjects improvement their striking accuracy over the training period whereas Group 2 subjects did not, suggesting that the former pursued a more effective learning strategy associated with a different pattern of structural brain changes. However, our data do not allow us to test the possibility that Group 2 subjects prioritized other aspects of skill learning (e.g., conceptual knowledge) that we were unable to measure, rather than simply displaying less effective learning overall.

This variable pattern is consistent with the known involvement of right inferior frontal cortex in action programming, and with the context-sensitive nature of interactions between IFG, the ventral premotor cortex or vPrCG, and primary motor cortex. Whereas ventral premotor cortex directly modulates primary motor activity, right IFGpt is thought to play a role in the more abstract cognitive control of action [e.g., inhibition under conditions of response uncertainty (Levy and Wagner 2011)] through its extensive projections to ventral premotor cortex (Dum and Strick 2005). Using a paired-pulse transcranial magnetic stimulation paradigm, Buch et al. (2010) found that ventral premotor cortex variably facilitated or inhibited primary motor corticospinal activity, depending on task context (reach vs. switch targets), and that individual variation in the magnitude of these effects correlated with FA in right precentral and inferior frontal gyrus clusters closely approximating (<9.2 mm Euclidean distance) those identified here. Our results suggest that anatomical changes in right vPrCG and IFGpt may play somewhat different roles in the acquisition of complex skills, perhaps corresponding to early vs. late learning processes and/or different learning objectives (e.g., motor skill vs. conceptual understanding).

Toolmaking, neural plasticity, and brain evolution

Previous functional imaging research (Stout and Chaminade 2007; Stout et al. 2008, 2011) has established that Paleolithic toolmaking by modern human subjects recruits portions of frontoparietal cortex which have been identified by comparative studies (Peeters et al. 2009, 2013; Orban and Rizzolatti, 2012; Hecht et al. 2013a, b; Rilling et al. 2008; Ramayya et al. 2010) as likely sites of human-specific adaptations for tool use. Current results demonstrate that active practice of Paleolithic toolmaking skills elicits structural remodeling in these same regions (Figs. 2, 8), providing further confirmation for hypothesized links between stone toolmaking and frontoparietal brain structure and function. The observation of anatomical effects in our subjects indicates that, even in large-brained, modern humans, stone toolmaking is sufficiently demanding to induce the metabolically costly re-allocation of structural resources in ways that parallel inferred evolutionary changes (increased frontoparietal gray matter volume and connectivity). Habitual toolmaking by smaller-brained Paleolithic ancestors would have stressed these anatomical substrates to a similar or even greater degree, generating positive selection pressure on any genetic variants that enhanced the reliability and efficiency of the plastic phenotypic response.

This potential role of phenotypic plasticity in facilitating evolutionary adaptation has been appreciated since the nineteenth century (Baldwin 1896; Osborn 1896) and is now widely known as the “Baldwin effect” (Weber and Depew 2003; Bateson 2004). Briefly, phenotypic plasticity allows organisms to adapt to new environmental conditions and/or generate novel adaptive behaviors. In the context of these new conditions or behaviors, any variation in the ease, efficiency, or reliability of the expression of the plastic trait will be acted on by selection. This would potentially lead to alterations in the genes regulating the trait and to the inheritance of a predisposition to develop the modified condition more or less independently of the original environmental/behavioral stimulus. We suggest that such a process may have occurred over the 2.5 million years during which stone toolmaking was a key adaptive skill for our ancestors.

The earliest, Oldowan, stone tools consist of sharp stone flakes struck from river cobbles (Semaw et al. 2003). This early technology predates fossil evidence of substantial hominin brain expansion, although it has been suggested that size-independent adaptations (i.e., “re-organization”) of posterior parietal and inferior frontal cortex may already have been present (Holloway et al. 2004a, b). Functional investigations of Oldowan toolmaking (Stout and Chaminade 2007; Stout et al. 2008) observed activations in vPrCG and in SMG which are now closely paralleled by evidence of training-related structural white and gray matter changes reported here. Fossil, functional and structural evidence are thus consistent with the hypothesis (Stout and Chaminade 2007) that early hominin toolmaking was supported by evolutionary elaborations of a primitive ventral frontoparietal circuit for object manipulation that is shared with other primates (Maravita and Iriki 2004; Rizzolatti et al. 1998). Indeed, a VBM study of tool-use (rake) learning in macaques found evidence of parietal gray matter expansion encompassing potions of anterior intraparietal sulcus (Quallo et al. 2009) that may be comparable to the human gray matter expansion reported here. It has previously been suggested (Iriki and Taoka 2012) that such plasticity, already present in monkeys, may have contributed to the evolution of human inferior parietal areas through the Baldwin effect. Our results extend this argument to include frontoparietal white matter connections, and link it to specific, archeologically observable, behaviors known to have occurred during human evolution.

In addition to changes in white matter underlying ventral PrCG and SMG, we observed a significant effect under right IFGpt. Previous functional studies have shown increased activation of this region by Acheulean, but not Oldowan, toolmaking (Stout et al. 2008, 2011). Early Acheulean toolmaking first appeared around 1.75 million years ago (Lepre et al. 2011; Beyene et al. 2013), roughly .25 million years after the first widely accepted fossil evidence of brain expansion and inferior frontal re-organization (Holloway et al. 2004a, b). The Acheulean is widely considered to represent a major advance in toolmaking sophistication over the preceding Oldowan, providing the first clear evidence of artifacts (“handaxes”, “picks”) made to a predetermined design. Acheulean tools also show evidence of increasing refinement through time (Beyene et al. 2013) and by 500,000 years ago (Fig. 1c) their production involved an elaborate sequence of hierarchically embedded goals and sub-goals (Stout 2011; Stout et al. 2014). This later Acheulean time period (780–400,000 years ago) coincides with the fastest rate of hominin encephalization in the past 2 million years (Ruff et al. 1997), and includes the form of “Late Acheulean” toolmaking in which our research subjects were trained (Fig. 1b). In keeping with evidence that the basic manipulative complexity of Oldowan vs. Late Acheulean toolmaking does not differ (Faisal et al. 2010), we have previously interpreted (Stout et al. 2008, 2011) increased right inferior frontal gyrus pars triangularis (IFGpt) activation as reflecting the cognitive control [e.g., inhibition, task switching (Aron et al. 2004; Koechlin and Jubault 2006; Levy and Wagner 2011)] demands of the more complex action sequences involved in this technology. Current evidence of training-related FA changes in white matter under right IFGpt corroborates functional evidence of this region’s involvement in stone toolmaking, and provides the first direct evidence that Paleolithic toolmaking can elicit measurable structural responses in prefrontal cortex.

Toolmaking and language

The frontoparietal regions in which we observed structural change participate not only in tool use, but also in other complex cognitive tasks including action planning and language. Loci of overlap between praxis and communication have been identified in left ventral premotor (Rizzolatti and Arbib 1998) and parietal (Frey 2008) cortex, and both are now specifically linked to Paleolithic toolmaking by functional (Stout and Chaminade 2007; Stout et al. 2008, 2011) and structural results. This supports the hypothesized co-evolution of tool use and other complex functions (Rizzolatti and Arbib 1998; Corballis 2002; Pulvermüller and Fadiga 2010; Stout and Chaminade 2012). Our results also corroborate functional evidence of right IFGpt involvement in toolmaking (Stout et al. 2008, 2011). Right IFGpt is not a core language or tool use region, but supports domain-general cognitive control (Vigneau et al. 2011) relevant to a range of complex communicative [e.g., discourse comprehension (Menenti et al. 2009)] and instrumental [e.g., task switching (Aron et al. 2004; Koechlin and Jubault 2006; Levy and Wagner 2011)] behaviors.

Structural remodeling to these regions in response to Paleolithic toolmaking is consistent with longstanding models of the mutually reinforcing interaction between technological, social, communicative, and neural complexity in human evolution (Holloway 1967; Engels 2003 [1876]). More specifically, we propose that human frontoparietal circuits underwent adaptations for Paleolithic toolmaking that were behaviorally co-opted [“exapted” (Gould and Vrba 1982)] to support proto-linguistic communication and then subsequently altered by secondary adaptations specific to language.