Keywords

1 Introduction

Since antiquity, humans have developed useful tools to improve their lives, and human culture was created by transmitting these technologies between individuals and generations. Technology is transferred by both social learning, which is learning from the behavior of another individual through social interaction, including imitating another’s behavior, and individual learning through trial and error. In particular, social learning such as imitation plays an important role in the acquisition of knowledge about the social group to which the individual belongs. This kind of social learning is realized by developing cognitive abilities to comprehend the intentions of the other individual and learning a novel technology. Thus, elucidating the cognitive abilities underpinning social learning behavior is important to comprehend human evolutionary history, particularly the acquisition of technology. Neuroimaging techniques are highly effective for investigating the neural basis of cognitive abilities (Rilling 2008; for a review). Previous interdisciplinary studies between neuroscience and archaeology have investigated the neural correlates of execution during the construction of early Stone-age tools by experts (Stout et al. 2008) and examined the effect of motor-skill learning on the construction of stone tools by inexperienced subjects (Stout and Chaminade 2007). However, there is no direct evidence about which component of the neural mechanism is required to learn a novel technology from another individual.

In the present study, we examined the neural basis of social learning, particularly the contribution of linguistic ability. For this purpose, a social learning situation in which Mousterian stone tools were constructed was used as an experimental task because an expected neural activity in a historical environment can be depicted by reproducing the behaviors at that time. A previous study suggested that the development of cognitive abilities for tool use and language are closely represented in the cortex (Greenfield 1991), and the neural substrate for language and tool use shares a common region in Broca’s area (Higuchi et al. 2009). Overlapping cortical regions contribute to the neural mechanisms of understanding actions (i.e., the mirror neuron system) (Buccino et al. 2004; Vogt et al. 2007) and tool use in non-human primates (Maravita and Iriki 2004). The importance of this region in language processing and its evolution has been widely discussed (Fogassi and Ferrari 2007; Corballis 2010 for reviews). However, it is unknown whether the cognitive ability to learn inexperienced tool use via social interaction during the prehistoric age was intervened in the cognitive ability for language processing. Humans in the prehistoric age transmitted their technology over a number of generations from the age when there was no explicit evidence about language use. Thus, even if the cognitive mechanism of tool use and language share a common neural basis, the key components of the social learning mechanism to acquire tool use and language might be different.

We assumed that imitative learning was the fundamental avenue for transmitting technology such as stone tool making in the prehistoric social environment. Observing the behavior of others is particularly important as a trigger for acquiring knowledge, and imitative learning is effective for acquiring generalized knowledge within a social group because learners can easily find good examples (e.g., elders) in their living space. It is speculated that imitative learning was essential to propagate technology or culture in the prehistoric social environment. We also expected that essential parts of the neural basis of imitative learning are different for stone tool making and language information processing. In imitative learning an individual needs to understand another’s intention. In the case of stone tool making, the individual must extract intention from bodily actions, whereas in the case of spoken language s/he must extract it from articulated sounds. We speculate that the building process of an internal model for observed bodily action by imitative learning is different than that of internal model for observed speech by imitative learning. Humans in the prehistoric age may have used some kind of verbal communication even if there is no evidence of language at that time. If the essential neural bases of imitative learning to acquire the stone tool making skill and spoken language are different even though there are some common cortical regions that process both skills, this would help inform the cognitive mechanisms of tool-use and language.

To address this question, we used functional magnetic resonance imaging (fMRI) to investigate brain activity while subjects observed the bodily actions used to make a stone tool and observed an unknown spoken language. Common and different activations were analyzed to clarify the differences in the neural processes associated with observing bodily actions versus those associated with word pronunciation. We used a repetition suppression approach to identify the cortical areas in which imitative learning occurs that are different from those in which other behaviors occur. Repetition suppression is a robust neural mechanism in which a neural activity is reduced when stimuli are repeated, and this suppression has been used to identify shared populations of neurons responsive to different stimuli (for a review, see Grill-Spector et al. 2006). Repetition-related reduction in neural activity was also used to probe the neural basis of learning. Repetitive-perceptual learning of auditory words shows a significant repetition-related decrease in the left superior temporal region (Rauschecker et al. 2008; Graves et al. 2008) and the frontal region related to articulation (Rauschecker et al. 2008). Similar repetition suppression effects have been reported in previous observational learning studies for bodily action in premotor and parietal regions (Hamilton and Grafton 2009). The neural mechanism of understanding another’s intention from behavior is represented as a repetitive decrease in neural activity in the right superior temporal sulcus and intraparietal region (Ortigue et al. 2009). Thus, we hypothesized that when learning bodily actions and word pronunciation progress by observing the behavior of others, the neural activity that takes part in the observed behavior decreases due to repetitive observation. The repetition suppression of task-related activation was analyzed to detect the process by which a bodily action or word pronunciation is learned from the repetitively observed behavior of others.

2 Materials and Methods

2.1 Participants

Twenty-four healthy Japanese volunteers participated in this study (12 males and 12 females; mean age, 25 ± 5; range, 20–36 years). The experimental data from six subjects were excluded because of excessive head movement or an insufficient number of responses. Thus, we analyzed data from 18 participants (9 males and 9 females). All participants were right-handed according to the Edinburgh Handedness Inventory (Oldfield 1971), and none had a history of neurological or psychiatric disorders. We confirmed that all participants were inexperienced with stone tool-making and the Uzbek language. All participants provided written informed consent to the experimental protocol, which was approved by the Ethical Committee of the National Institute for Physiological Sciences. The experiments were conducted in compliance with national legislation and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association (Declaration of Helsinki).

2.2 Experimental Procedure

The fMRI experiment consisted of four runs of actual measurement and one practice run. A rapid event-related design was used for the fMRI experiment. During the fMRI session, all participants observed 15 moving pictures and one still picture of stone tool-making (mSTM and sSTM) and 15 moving pictures and one still picture of the pronunciation of an Uzbek word (mUWP and sUWP). Video clips of an expert making stone tools and of an Uzbek word pronounced by a native speaker were recorded using a digital video camera. The moving pictures were 1 s segments extracted from video clips. Each picture showed a kind of bodily action used to make a Mousterian stone tool from which it was easy to understand what kind of process was being depicted such as flaking, which includes a platform-preparation, or abrading (although it is unclear whether this abrading procedure accompanied the actual production of Mousterian stone tools) or to pronounce one Uzbek word. Each moving picture was presented twice in each run. In total, participants observed the same moving picture eight times throughout the fMRI session. The still pictures were used as a low-level control condition of visual stimuli to subtract the difference in the stone tool maker and Uzbek speaker on displayed moving pictures. Both still pictures were presented 15 times during each run. The practice run presented moving pictures that differed from those used in the actual run. The pictures were separated by resting intervals of approximately 4 s, during which time a white fixation cross was presented. The efficiency of the experimental design was highly dependent on the temporal pattern of the stimulus presentations (Dale 1999; Friston et al. 1999). We designed the order of moving and still pictures to become a highly efficient experimental design throughout the fMRI session. The detailed method has been described previously (Morita et al. 2008).

Participants were instructed to observe the pictures and to memorize the content of the bodily action or word pronunciation. To ensure that the participants were conscious of the task, an actual imitation task was conducted immediately after the fMRI measurements. The participants noticed the execution of the imitation task beforehand, and they were asked to imitate the observed bodily actions and the word pronunciation during an actual imitation task. The participant’s imitations were video-recorded to evaluate the accuracy of the memorized content. To confirm the arousal state of the participants, the color of the fixation cross occasionally changed to yellow, and the participants were instructed to press a button when they noticed the change. Figure 26.1 illustrates the timeline of an fMRI run.

Fig. 26.1
figure 1

Example of fMRI experimental stimulus

2.3 fMRI Scanning

All images were acquired using a 3-T Siemens Allegra scanner with a bird cage head coil (Siemens, Erlangen, Germany). To acquire a fine structural whole-brain image, magnetization-prepared rapid-acquisition gradient-echo (MP-RAGE) images were obtained (repetition time [TR], 2500 ms; echo time [TE], 4.38 ms; flip angle = 8°; field of view [FoV], 230 mm; one slab; number of slices per slab = 192; voxel dimensions = 0.9 × 0.9 × 1.0 mm). The fMRI time-series data covering the entire brain were acquired using a T2*-weighted gradient echo-echo planar imaging. Oblique scanning was used to exclude the artifacts of eyeballs and to cover the entire cerebrum. The parameters of the experiment were as follows: (TR, 3000 ms; acquisition time [TA], 2000 ms; TE, 30 ms; flip angle, 85°; 34 slices; FoV, 192 × 192 mm; 64 × 64 matrix; slice thickness, 3 mm; slice gap, 0.45 mm). The initial two scans of each run were dummy scans to equilibrate the state of magnetization and were discarded from the time-series data; thus, we collected 93 scans for each run. In total, 372 scans per subject were included in the analysis.

Stimulus presentation and response collection were performed using Presentation 1.21 (Neurobehavioral Systems, Albany, CA, USA) software implemented on a personal computer (Dimension 8200; Dell Computer Co., Round Rock, TX, USA). A liquid crystal display projector (DLA-M200L; Victor, Yokohama, Japan) located outside and behind the scanner projected the stimuli through another waveguide onto a translucent screen, which the subjects viewed via a mirror attached to the head coil of the MRI scanner. The auditory stimuli were presented via MRI-compatible headphones (Hitachi Advanced Systems, Yokohama, Japan). Behavioral responses were recorded using a fiber-optic response box (Current Designs Inc., Philadelphia, PA, USA).

2.4 Data Analysis

Data preprocessing and statistical analyses of fMRI data were performed using statistical parametric mapping (SPM8, Wellcome Trust Center for Neuroimaging, London, UK). The effect of head motion across the scans was corrected by realigning all scans to the first one. The whole-head MP-RAGE image volume was then co-registered with the first EPI image. All EPI images were spatially normalized to the Montréal Neurological Institute T1 template using the anatomical T1-weighted MRI image for each subject. Finally, each scan was smoothed with a Gaussian filter in a spatial domain (8-mm full-width at half-maximum).

The fMRI data were analyzed using a two-level approach in SPM8. During the first level, the hemodynamic responses produced under the different experimental conditions were assessed at each voxel on a subject base using a general linear model. We hypothesized that the hemodynamic responses under the mSTM, sSTM, mUWP, and sUWP conditions would be the canonical hemodynamic response functions with a 1-s duration. These hemodynamic responses were modeled for every repetition of the mSTM, sSTM, mUWP, and sUWP conditions. Hemodynamic responses to the observation of still pictures and to button responses were also modeled. Global changes were adjusted by proportional scaling, and low-frequency confounding effects were removed using a high-pass filter with a 128-s cutoff. Multiple regression analyses were performed on each voxel to detect the regions in which MR signal changes were correlated with the hypothesized model to obtain the partial regression coefficient of each voxel.

The second level of the analysis was performed on a population-based random-effects analysis using a two-way repeated-measures factorial design. One factor was the type of picture observed (STM or UWP; two conditions), and the other factor was the number of times the same moving picture was presented repeatedly (1–8, eight conditions). The contrast images obtained by subtraction of the (mSTM–sSTM) and (mUWP–sUWP) conditions were used for this analysis to subtract the difference between the visual stimulus and a simple repetitive effect caused by repetitively observing thhe same image. The statistical threshold was set at p < 0.05 (corrected for family-wise error [FWE] by voxel level). The cytoarchitectonic location of each activation focus was confirmed by the SPM Anatomy toolbox (Eickhoff et al. 2005).

To identify the regions showing learning effects for each task, contrast images representing a repetition suppression effect of task-related activation were created and estimated. The contrast images were made using decreased linear contrasts (7, 5, 3, 1, −1, −3, −5, −7) for the factor of the number of repetitions (i.e., eight times) under each STM and UWP condition. Contrast images of the first repetition of the (mSTM–sSTM) and (mUWP–sUWP) contrast (p < 0.05, corrected for FWE) were also made to specify brain activation when the subjects observed each stimulus for the first time, and were used as mask images to detect the repetition suppression effect of task-related activation. The statistical threshold was set at p < 0.05 (corrected for FWE). In addition, to compare the task-specific learning effect on each task, the parameter estimate of the activation foci showing the repetition suppression effect with sphere radii of 4 mm were extracted using MarsBaR 0.42 toolbox (Brett et al. 2002). A gradient of the decrease in the activation profile associated with the repetition between (mSTM–sSTM) and (mUWP–sUWP) contrast was tested.

3 Results

The cortical activations during observation of STM and UWP pictures are summarized in Table 26.1. A cytoarchitectonic location was obtained by Anatomy toolbox 1.7 (Eickhoff et al. 2005, 2007). Significant activations in the bilateral premotor and pre-supplementary motor areas and the right superior and bilateral posterior parts of the middle temporal gyri were commonly manifested during observations of both STM and UWP moving pictures. The large activation cluster over the parietal cortex was found bilaterally during observation of the STM moving pictures, as determined by the procedure (mSTM–sSTM); activation peaks were identified in the bilateral superior and inferior parietal lobules, intraparietal sulcus, and supramarginal gyrus. The frontal activation clusters were extended dorsally to the superior frontal gyrus and ventrally to the middle or inferior frontal gyrus. Moreover, the left insula and right cerebellar posterior lobule were significantly activated (Fig. 26.2a). In contrast, significant activation clusters during observation of moving pictures of UWP were demonstrated by (mUWP–sUWP) in the bilateral temporal areas, including the superior temporal gyrus, and extended to the parietal operculum (Fig. 26.2b).

Table 26.1 Cortical activation during observation of pictures under the STM and UWP conditions
Fig. 26.2
figure 2

Cortical activation associated with observation of stone tool-making and pronunciation of an Uzbek word. Comparison between moving and still pictures under (a) STM and (b) UWP conditions, and (c) result of conjunction analysis of (a) and (b). (d) and (e) show differential activations under STM and UWP conditions

The common and differential activations in response to observation of the STM and UWP pictures are summarized in Tables 26.2 and 26.3, respectively. Significant activations depicted by conjunction analysis of the aforementioned conditions were observed in the left premotor and pre-supplementary motor areas and in the bilateral superior and posterior parts of the middle temporal gyri (Fig. 26.2c). Differential activations associated with (mSTM–sSTM)—(mUWP–sUWP) were obtained in the bilateral middle frontal gyrus, left premotor area, large regions of the bilateral intraparietal sulcus extending to the left supramarginal gyrus, right postcentral gyrus, posterior part of the bilateral middle temporal gyrus, left insula, and right cerebellar posterior lobule (Fig. 26.2d). In contrast, differential activations associated with (mUWP–sUWP)—(mSTM–sSTM) were obtained in the bilateral premotor area, pre-supplementary motor area, and superior temporal gyrus (Fig. 26.2e).

Table 26.2 Common activations during observation of STM and UWP pictures
Table 26.3 Differential activations during observation of STM and UWP pictures

The repetition-suppression effect was found in a region of the right cerebellar posterior lobule under the STM condition and a border region of the left superior temporal gyrus and inferior parietal lobule under the UWP condition. Figure 26.3 depicts regions showing the effect of repetition-suppression on task-related activation and Table 26.4 summarizes anatomical location of those regions. The decrease in the activation profile associated with the repetition of the (mSTM–sSTM) condition was significantly larger than that associated with repetition of the (mUZP–sUZP) condition (F (1, 284) = 14.88, p = 0.0001) on the ROI of the right cerebellar posterior lobule (Fig. 26.3a). Additionally, the decrease in the activation profile associated with the repetition of the (mUZP–sUZP) condition was significantly greater than that associated with the (mSTM–sSTM) condition (F (1, 284) = 20.44, p = 0.000001) in the ROI of the left superior temporal gyrus (Fig. 26.3b).

Fig. 26.3
figure 3

Repetition-suppression effect in the (a) right cerebellar posterior lobule with STM task-related activation, and (b) in the left superior temporal gyrus with UWP task-related activation. The plot shows repetition-related changes of parameter estimates in each activation focus, and error bar shows the standard error of the mean

Table 26.4 Regions showing the repetition-suppression effect on task-related activation during observation of STM and UWP pictures

4 Discussion

In the present study, we observed activity in the parietofrontal network when an inexperienced novice observed the STM actions of an expert. We also found that the cortical network was primarily involved in the experimental task, which is consistent with previous findings of the neural correlates of Acheulean stone tool-making by experts (Stout et al. 2008). A recent meta-analysis of action-observation and imitation tasks also reported similar patterns in cortical networks (Caspers et al. 2010). In contrast, the activation pattern in the dorsal premotor area and the supramarginal gyrus was different from the previous findings of a novice during execution of stone tool making (Stout and Chaminade 2007). This discrepancy may have depended on whether or not the actual execution process was involved. Moreover, we observed a repetition-related decrease in specific activation in the right posterior part of the cerebellum (lobule VI) under the STM picture-observation condition. In contrast, a border region of the superior temporal gyrus and left inferior parietal lobule showed a significant decrease in specific activation under the UWP picture-observation condition. This result supports our hypothesis; the cognitive mechanism of imitative learning for stone tool-making and that of word-pronunciation involves different cortical regions.

4.1 Effect of Learning on Task-Related Activation

The activation of the right cerebellar posterior lobule (lobule VI) showed a significant repetition-related decrease under the STM condition. A previous study suggested that changes in cerebellar activity reflect the progress of internal model formulation to learn a motor control to manipulate a novel tool (Imamizu et al. 2000). A learning-related decrease in cerebellar lobule VI activation was also observed in a neuroimaging study focused on acquisition of a bimanual coordination task (Debaere et al. 2004). Because the subjects were asked to imitate the observed STM action immediately after the fMRI scan, they had to be alert to acquire the STM action presented. Subjects must have observed the posture of the whole body and how to handle the hummerstone and stone core that the expert held in both hands to imitate the STM action. As each observed STM action itself was simple, the subjects could formulate the internal model of STM action using their own repertoires of motor control through a repetitive observation of another’s action. Hence, it was expected that the subjects interpreted the observed STM action and could roughly translate it into their own motor representation via repetitive observation, and the repetition-related decline of cerebellar activation reflected the progress of learning to construct an internal model of STM actions.

In contrast, the border area of the left superior temporal gyrus and inferior parietal lobule showed repetition-dependent suppression of activation. A repetition-related decrease in neural activity associated with the learning of pseudo-words was observed in the left superior temporal gyrus as well as the left frontal cortex, suggesting that the repetition-related decreases in the left superior temporal gyrus reflected the effect of learning on the neural processing of the perception of auditory stimuli (Rauschecker et al. 2008). Graves et al. (2008) also reported that the posterior superior temporal gyrus showed reduced neural activity related to repetitive lexical phonological processing. In the present study, subjects were asked to vocalize an Uzbek word after the fMRI scan as well as to imitate an STM action. Because subjects were inexperienced with the Uzbek language and no information about the meaning of the words presented was provided during the experiment, they had to concentrate on catching the phonological information about the presented word during the mUWP task. Thus, the repetition-dependent suppression in the activity of the left superior temporal gyrus reflected the progress in learning the phonological information. A learning-related decrease in cortical activation was not observed in the common cortical regions of both imitation tasks, suggesting that activity in these regions was not being reflected in the progress of imitative learning by observation without execution such as with the present experimental task. If individual learning accompanied by actual execution such as trial-and-error learning is performed, the learning-related activation change may also be observed in common regions.

Stout et al. (2011) suggested that brain activation specific to naïve subjects reflects kinematic information and visuospatial attention that represents a strategy of observational learning to simulate low-level aspects of task performance such as understanding action elements. The present results support their assertion; that is, a learning effect based on repetitive observation of stone tool-making was found in the neural basis to construct an internal model of action in the cerebellum, whereas the neural basis to understand an action intention such as the mirror neuron system did not show that kind of learning effect. A similar interpretation is possible for the case of Uzbek word articulation. The learning effect due to repetitive observation of Uzbek word articulation was observed in the neural basis of phonological processing, whereas cortical regions representing articulatory processes, such as the dorsal premotor area (Brown et al. 2009; Koelsch et al. 2009), did not show such a learning effect. Therefore, participants who had no exact knowledge about stone tool-making or the Uzbek language focused on a low-level aspect of task information during repetitive observational learning, that is, to acquire the motor sequence of each bodily action for stone tool-making or to follow the sequence of the phoneme for the Uzbek word. It can be predicted that when trained subjects or experts perform the same experiment, subjective attention is turned to a higher-level aspect of task information, that is, to understand the intention of each process of stone tool-making or to simulate articulation of Uzbek words. Cortical regions that show repetition-related decreases in trained subjects may also differ via the aforementioned differences between naïve and trained subjects. Further investigations using trained subjects are necessary to clarify this point.

These results suggest a decrease in brain activation by repetitive observation of another’s behavior, but we did not investigate the quantitative relationship between changes in brain activation and skill progress to carry out the actual behavior. It is expected that there is a relationship between not only social learning such as imitation but also individual learning such as trial-and-error and skill progress. In addition, the internal model of observed behavior may be preliminarily constructed by imitative learning and the model may be refined to suit the personal characteristics of each individual through individual learning. To clarify this point, it is necessary to measure brain activity at each stage of skill acquisition, which can be evaluated from an outside learner as in Stout and Chaminade (2007) and Stout et al. (2011).

4.2 Common Activation for Both Imitation Tasks

The posterior part of the bilateral superior temporal gyrus showed significant activation in the conjunction analysis of the STM and UWP conditions. The location of the activation site was close to that identified by previous studies on biological motion (Grossman and Blake 2002; Thompson et al. 2005; Peelen et al. 2006). The activity of this area is involved in the perception of dynamic facial motion (Sato et al. 2004; Schultz and Pilz 2009). Because the participants observed motion pictures to memorize the projected bodily action or speech for the post-hoc mimicking test in our fMRI experiment, the activity of this region reflected bodily or facial motion for learning the observed behavior of others.

Activation in the superior part of the dorsal premotor area was observed under the STM condition. From the conjunction analysis, a part of the activated region was also observed under the UWP condition. A previous study suggested that the dorsolateral prefrontal cortex was active during tool-use action-planning tasks (Johnson-Frey et al. 2005), and the influence of spatial information on observed behavior has also been suggested (Vogt et al. 2007). Thus, dorsal premotor activation reflected the cognitive process used to integrate an observer’s motor representation with the observed bodily or facial action. However, it has also been reported that stone tool making by a novice does not involve activation of the dorsal premotor area (Stout and Chaminade 2007). The discrepancy in the results of these two studies may be attributable to differences in the actual interpretation of the practice of stone tool making by participants. In the present study, participants did not have an execution session until they were tested with video recordings after the fMRI measurement. Thus, it would be expected that they interpreted and planned to imitate the observed action by substituting their own motor representation, which was similar to the STM action presented during the fMRI measurement, because they were inexperienced and did not have an actual motor representation of the STM action.

Cognitive functions in the pre-supplementary motor area have been suggested to be important for response inhibition (Duann et al. 2009; Chen et al. 2009). In the fMRI experiment, participants were asked to mentally imitate the presented action or speech to memorize them. However, they were asked not to move or vocalize during the experiment. Therefore, the activity of the pre-supplementary motor area was observed in relation to suppression of imitative body movement. Activation of the bilateral posterior part of the middle temporal gyrus that has been reported as a visual motion processing area (Tootell et al. 1995; Malikovic et al. 2007) was commonly observed under the STM and UWP conditions. Because the contrast between the motion-picture minus the still-picture condition was used for the second-level analysis, that activation reflected the perception of visual motion.

4.3 Differential Activation for Each Imitation Task

Ventral premotor activation, which was also specifically induced by observing STM action, was included in the region that has been interpreted as part of the mirror neuron system in previous studies (Buccino et al. 2004; Vogt et al. 2007). Furthermore, execution of stone tool-making involves activity in the ventral premotor region (Stout and Chaminade 2007; Stout et al. 2008, 2011). To imitate an STM action, subjects must observe the posture of the whole body and learn how to handle the hummerstone and stone core that the expert held in both hands. Thus, ventral premotor activation reflected the cognitive process used to analyze the observed action. Caspers et al. (2010) reported that the dorsal part of BA44 is commonly involved in action-observation and imitation tasks, whereas the caudoventral part of BA44 is consistently involved during action imitation. Based on the activation induced by observing the STM action, the peak location of the ventral premotor area was located on the border region between the probabilistic map of BA6 and BA44, and the activation cluster was more expansive on the ventral side of the inferior frontal region. These results suggest that the cognitive mechanism to manipulate the self-motor representations contribute to the action observation during imitative learning. In contrast, the activation peak in the dorsal premotor area under the UWP compared with the STM condition was observed in the inferior portion. Previous neuroimaging studies of phonological processing have reported that the inferior part of the dorsal premotor area plays an important role in the articulatory process (Brown et al. 2009; Koelsch et al. 2009). The participants were not familiar with the pronunciation of Uzbek words because the sequence of the phonemes differed from that of Japanese words. We concluded that the increase in activity reflected the cognitive load of the mental rehearsal needed to pronounce the Uzbek words.

Activations were observed in the parietal cortex adjacent to the bilateral intraparietal sulcus and extended to the left supramarginal gyrus specifically under the STM condition. These findings are consistent with those of a previous study on the neural correlates in the posterior parietal region associated with the observation of action (Caspers et al. 2010). Previous studies have suggested that intraparietal sulcus regions are involved in stone tool making by both novices and experts (Stout and Chaminade 2007; Stout et al. 2008, 2011). In the present study, subjects were asked to memorize observed unfamiliar STM actions that consisted of an interpretable bimanual manipulation using a hummerstone and stone core. Therefore, the activations reflected the acquisition of the procedure of bimanual motor representation for stone tool making by observing the actions of others. Additionally, a differential activation pattern in the supramarginal gyrus of the dorsal premotor area was observed and compared with the results of a previous study of stone tool-making by a novice (Stout and Chaminade 2007). Activity in the left posterior parietal regions including the supramarginal gyrus occurred while planning the bodily actions associated with the use of a familiar tool, even though the execution of actual motor actions was not involved (Johnson-Frey et al. 2005). Moreover, stone tool making by experts was associated with activation in the bilateral supramarginal gyrus (Stout et al. 2008). This may also indicate that participants were using their own motor representations, which would be similar to the STM action, to plan the imitative action instead of using an actual motor representation of an STM action that they had not yet learned.

A direct comparison of (mSTM–sSTM)—(mUWP–sUWP) showed activation of the junction areas of the bilateral middle temporal and occipital region, which had a slightly inferior peak location compared with that during common activation. The pictures under the mSTM condition presented all the bodily motions involved in stone tool-making. By contrast, the pictures under the mUWP condition presented only the facial motion. A previous study reported that the occipitotemporal region was particularly sensitive to the perception of the human body (Astafiev et al. 2004). Thus, differential activation was induced by the perception of motion enacted by the whole human body. In contrast, observation of the pronunciation of a Uzbek word involved large activation clusters in the bilateral superior temporal gyrus. Because only the mUWP condition contained auditory information in the form of pronunciation of a Uzbek word, these activations were considered to reflect audiovisual speech perception (Murase et al. 2008).

5 Conclusion

The progress of imitation learning by repetitive observation was represented in a specific cortical region, and the represented region was dependent on the information that the subject focused on to learn, although a common mechanism representing the premotor and supplementary motor area and posterior part of the temporal region existed between both imitative learning tasks. That is, the subject had to focus on the internal model formulation of the observed action to imitate the stone tool-making procedure, and the imitative learning of observed action caused activity to decrease in the right cerebellum. In contrast, the subject had to focus on the phonological components of the auditory information to imitate an unknown Uzbek word, and the imitative learning of the Uzbek word caused an activity decrease in the left superior temporal gyrus. These results support our hypothesis that the cortical region where imitative learning by repetitive observation of stone tool-making appears as a change in neural activity differs from that of word pronunciation. Our results demonstrate the cortical mechanisms of social learning behavior that were assumed to have been used in the prehistoric age.