Introduction

Over the past two decades, animal model research has documented two phenomena with a profound impact for neurorehabilitation. The first is significant plasticity in the adult brain that accounts for new learning (e.g., Kilgard & Merzenich, 1998, 2002). The second is that retraining animals to perform functions impaired by brain lesion triggers remapping that does not occur without training (e.g., Friel et al., 2000; Nudo et al., 1996). Such findings have sparked an intense interest in the neuroplastic processes that underlie recovery of function and rehabilitative change after brain damage. Functional neuroimaging (fMRI, PET, MEG) allows us to visualize areas of the brain responsible for cognitive and language functions in humans. Within the past 12 years, fMRI instrumentation and technology has become the most widely available functional neuroimaging modality and offers many investigators the opportunity to image the neuroplastic mechanisms responsible for language processing, recovery of language, and rehabilitation in aphasia. Because these mechanisms are not only responsible for improvement during treatment but also define its limitations, the importance of understanding them can hardly be overestimated. Once such mechanisms are understood, fMRI may even become a useful clinical tool for aphasia treatment. However, significant methodological challenges confront the investigator who pursues fMRI of language in aphasia, especially in the imaging of language production. These methodological issues must be resolved to advance research and, eventually, clinical applications in aphasia treatment.

This review has two goals. The first is to review the functional neuroimaging literature on aphasia recovery and treatment. This discussion will emphasize neuroplastic substrates of aphasia recovery and treatment with an eye toward eventual contributions of fMRI to the rehabilitation arena. The second goal is to summarize the most common methodological challenges that aphasia investigators encounter during imaging of language production with fMRI and their potential solutions. This latter discussion will follow roughly the ontogeny of an experiment: Problems relating to experimental design will be discussed first, then problems related to implementation of an experiment will be discussed, and finally problems related to data analysis will be explored. Some of these methodological challenges are unique to using fMRI; others are common to other functional imaging modalities. However, ubiquity of fMRI and its resulting potential to yield crucial knowledge for developing aphasia treatments justifies the focus on this modality.

Imaging neuroplasticity in aphasia recovery and treatment

The most important issue regarding neuroplasticity in aphasia recovery and treatment is the role of the nondominant, usually right hemisphere versus the role of perilesional cortex. This issue has been debated for well over 100 years. As early as 1877, Barlow reported that a ten-year-old boy regained language after a lesion of Broca's area and lost language function again when its right-hemisphere counterpart was lesioned. Ten years later, Gowers (1887) also reported patients who became aphasic after left-hemisphere lesion regained language function, and lost language again after right-hemisphere lesion. In both instances, it was suggested that some language functions reorganize to the right hemisphere. More recent evidence from Wada tests and repeated aphasia assessments has continued to indicate a role for the right hemisphere in the language of at least some aphasia patients. Kinsborne (1971) described some aphasia patients who lost language function when the right but not the left hemisphere was anesthetized during Wada tests. Basso et al. (1989) reported patients who partially recovered from aphasia after left-hemisphere lesion but showed worsening of language functions during objective testing after subsequent right-hemisphere lesions. In the latter part of the 1900's, dichotic listening also was used as an indication of hemispheric lateralization of language perception in aphasia. While some studies suggested transfer of language comprehension to the right hemisphere for both Wernicke's and Broca's aphasias (Crosson & Warren, 1981; Johnson et al., 1977), others indicated that such lateralization may vary from patient to patient in aphasia (Dobie & Simmons, 1971; Schulhoff & Goodglass, 1969; Shanks & Ryan, 1976; Sparks, 1970). The following review addresses two topics: (1) neuroplasticity during recovery from aphasia, and (2) neuroplastic changes during rehabilitation.

Functional neuroimaging and recovery from aphasia

Even with functional neuroimaging, the debate regarding right-hemisphere versus perilesional participation persists, which is an indication that the issue is more complicated than often recognized. Nonetheless, a careful review of the studies of the last decade does indicate a likely resolution to the debate. Both the right-hemisphere and the perilesional positions are tenable under specific circumstances. The following synthesis of the data will endeavor to describe those circumstances.

Some studies suggest that language functions in aphasia are primarily the product of right-hemisphere activity (Abo et al., 2004; Gold & Kertesz, 2000; Weiller et al., 1995). Other studies have indicated that language functions in aphasia are subserved primarily by reorganization of functions in perilesional regions of the language-dominant hemisphere (Breier et al., 2004; Duffau et al., 2001; Léger et al., 2002; Miura et al., 1999; Seghier et al., 2001; Warburton et al., 1999). One review of six recent studies (Price & Crinion, 2005) even espouses this position unequivocally, though it has to explain right-hemisphere correlations with language functions (Peck et al., 2004). A different viewpoint starts by noting that all of these studies are based upon empirical evidence of activity in the left or right hemisphere and often are drawing adequate inferences based upon the data in front of them. Yet, in the absolute, neither of these positions can be correct based upon the evidence supporting the other. Further, the importance of the right hemisphere for language for some aphasia patients cannot be dismissed based on more than a century of lesion and Wada data, as cited above (Barlow, 1877; Basso et al., 1989; Gowers, 1887; Kinsborne, 1971). Thus, a better conclusion is that under the right circumstances either position is tenable. This line of reasoning leads us to ask: What are the circumstances under which right-hemisphere mechanisms play a role in the language of aphasia patients, and what are the circumstances under which perlesional, left-hemisphere mechanisms play a role in the language of aphasia patients?

In this regard, an interesting observation has been made by several studies: Good recovery of language functions in aphasia is accompanied by greater perilesional than right-hemisphere reorganization, while poorer recovery of language functions is accompanied by greater right-hemisphere than perilesional reorganization (Cao et al., 1999; Heiss et al., 1997, 1999; Karbe et al., 1998; Perani et al., 2003; Rosen et al., 2000). Indeed, the data of Heiss et al. (1997) indicate that larger lesions are associated with poor recovery of language functions and reorganization to the right hemisphere. These findings indicate an important principle of recovery: When left-hemisphere lesions are relatively small, perilesional cortex provides an adequate substrate for language recovery; however, as larger lesions destroy more language eloquent and adjacent cortex, right-hemisphere areas become more active. Two questions arise: (1) Does right-hemisphere cortex contribute to language functions, or does it merely represent a disinhibition of right-hemisphere cortices after left-hemisphere damage, as has been suggested by Rosen et al. (2000) or Price and Crinion (2005)? (2) Does the right hemisphere ever contribute to good recovery?

The progress of functional neuroimaging, particularly fMRI, over the last decade has been nothing short of miraculous. At no other time in the short history of behavioral neurology and neuropsychology has a fast-paced expansion of technology made such a rapid expansion of brain-behavior research possible. In the face of such a captivating methodology, it is often tempting to ignore the lesion literature of the past one and one half centuries that has led to valuable insights. Regarding the former question of whether or not the right hemisphere contributes to language functions, a tendency to ignore past discoveries leads to needless debate and distracts us from the important questions. The findings of Barlow (1877), Gowers (1887), and Basso et al. (1989) indicate that some patients who recover language function after left hemisphere lesion lose language function after subsequent right-hemisphere lesion and the findings of Kinsborne (1971) that some patients with aphasia lose language function when the right but not the left hemisphere is anesthetized during Wada testing clearly indicate that the right hemisphere can play a role in language recovery for patients with aphasia. When combined with the more recent functional imaging evidence just reviewed, we can conclude that such right-hemisphere participation is more likely in the case of large lesions.

Even so, the latter question of whether the right hemisphere can play a role in good recoveries is an important one. Some studies indicate right-hemisphere activity in aphasic patients occurs primarily in regions homologous to damaged areas of the left-hemisphere (Calvert et al., 2000; Lazar et al., 2000; Thulborn, 1999). For example, Weiller et al. (1995) showed reorganization of activity to the right-hemisphere homologue of Wernicke's area in patients who had lesions of Wernicke's area and had recovered from Wernicke's aphasia. In another study, aphasic patients with lesions of left pars opercularis (the posterior part of Broca's area), showed right pars opercularis activity during narrative language production, whereas this area did not demonstrate activity either in aphasic patients without lesions in left pars opercularis or in neurologically normal controls (Blank et al., 2003). While such studies do raise the possibility that right-hemisphere homologues of damaged left-hemisphere cortex contribute to recovery, they are not definitive proof of it, and the question of whether or not right-hemisphere activity, in some instances, can impede recovery should not be discarded out of hand.

Indeed, Naeser and her colleagues (Martin et al., 2004; Naeser et al., 2005) used repetitive transcranial magnetic stimulation (rTMS) to inactivate right pars triangularis. The treatment was repeated daily for 10 days, and no other aphasia treatment was given. Each of four patients given this treatment had improved language performance at two months post-treatment, suggesting that inactivating right pars triangularis had a beneficial effect on language (Martin et al., 2004). Many advocates of the position that only the left hemisphere can participate in language recovery cite this evidence as indicating that the right hemisphere does not contribute to language functions. However, Naeser et al. (2005) described a pilot study in which aphasic patients increased naming accuracy and decreased naming latency after rTMS of right pars triangularis, but decreased naming accuracy and increased naming latency after rTMS of right pars opercularis. These data indicate that while activity in right pars triangularis may impede optimal word finding, activity in right pars opercularis may contribute to word finding in these patients. Perhaps this difference is due to the different functions of these areas in normal language, with pars triangularis being more involved in semantic functions and pars opercularis being more involved in phonological functions (Devlin et al., 2003). In any event, the data indicate that optimizing language function may not be so much a question of activating the right hemisphere as it is a question of activating the right hemisphere structures that can contribute to language but not the structures that may interfere. Clearly more attention should be focused on what are the structures that can contribute, the circumstances under which they can contribute, how to engage those structures in the service of language rehabilitation, and how to suppress activity in structures that might interfere with optimizing language performance.

However, another important question is whether or not activity in left-hemisphere structures ever impedes optimal language performance. The data of a recent dissertation performed in our laboratory suggests that this might be the case. In a structural image study, Parkinson (2005) studied 15 patients with chronic aphasias and naming deficits due to left-hemisphere lesion. In comparison to patients in previous functional imaging studies, these patients had moderately large to very large lesions and modest to poor recoveries. They received either gestural or semantic and phonological cuing treatments for aphasias. Degree of lesion in 29 left-hemisphere regions was rated using a modification of Naeser's system (Naeser et al., 1998; Naeser & Hayward, 1978; Naeser, Palumbo, Helm-Estabrooks, Stianssny-Eder & Albert, 1987). When degree of basal ganglia lesion was statistically controlled, there was a high positive correlation between degree of left frontal lesion and improvement during treatment. In other words, very large frontal lesions were associated with greater improvement during the naming treatments than were moderately large lesions. Why would larger frontal lesions be associated with greater improvement during treatment? This phenomenon is difficult to explain unless one considers the possibility that in cases of moderately large lesion, the left frontal cortex is producing activity that interferes with recovery of function during treatment. In this case, the larger lesion would destroy the cortex interfering with recovery. Thus, the best interpretation of available data at this time suggests that under some circumstances, activity in either hemisphere can impede language functions in aphasia. In short, this problem does not appear to be limited to the right hemisphere.

Another question to be considered is factors other than size of lesion that contribute to the degree of right-hemisphere activity in aphasia. It has been known for some time that when the left basal ganglia are lesioned in addition to language cortex of the left hemisphere that aphasia is more severe and persistent (Brunner et al., 1982). Kim et al. (2002) showed that nonfluent patients with left basal ganglia lesion in addition to left frontal damage demonstrated bilateral lateral frontal activity during language production and that nonfluent patients with left frontal but no left basal ganglia damage demonstrated primarily right lateral frontal activity during language production. Crosson et al. (2005) found the same pattern during word production in two patients: the patient with a left frontal and basal ganglia lesion showed bilateral lateral frontal activity, while in the patient with a left frontal lesion but intact left basal ganglia, lateral frontal activity was completely lateralized to the right hemisphere during word production. Based on the data of Crosson et al. (2003) indicating that left pre-SMA uses the right basal ganglia to suppress right frontal activity during normal word production, Crosson et al. (2005) suggested that the right pre-SMA may use an intact left basal ganglia to suppress left lateral frontal activity, making it easier for right frontal mechanisms to take over language production. This interpretation was in keeping with the known bilateral connections of pre-SMA to the basal ganglia (Inase et al., 1999) and with the concept that one function of the basal ganglia is to suppress undesired activity (e.g., Mink, 1996; Nambu et al., 2002). It also is consistent with the PET study of Blank et al. (2003), who showed that in neurologically normal subjects, right pars opercularis shows a decrease in activity during narrative language production compared to a resting baseline while left pars opercularis shows an activity increase in the same comparison. Finally, this interpretation of the findings of Kim et al. (2002) and Crosson et al. (2005) would explain why Parkinson (2005) found that when degree of frontal lesion is controlled statistically, larger lesions of the basal ganglia predict worse treatment outcome than smaller lesions or no lesion of the basal ganglia in patients with relatively large lesions. The implication is that intact basal ganglia can be used to suppress activity in left- or right-hemisphere structures that interfere with language functions.

Finally, it should be noted that cross-sectional as well as longitudinal studies of recovery have been done. Fernandez et al. (2004) imaged a patient who had a lesion of Wernicke's area, the left inferior supramarginal gyrus, and left posterior insula and who had conduction aphasia both 1 month and 12 months after stroke, with significant recovery occurring across the interval. In the early phases of recovery, the main difference from normal controls was greater activity in the right supramarginal gyrus. Increased activity in the right supramarginal gyrus continued in the chronic scan, but increased activity in perilesional areas of the left hemisphere were also evident at this time, suggesting the perilesional activity may have played some role in recovery. In eight aphasia patients given PET scans of word generation an average of two and then 11 months post-stroke, Cardebat et al. (2003) found increased activity in both the right and left hemispheres. Positive correlations with language improvement were shown with activity in the superior temporal cortex bilaterally. In short, activity in both hemispheres may have contributed to recovery. Thus, longitudinal studies may offer some ability to resolve when activity in an area represents a contribution to recovery. Repeated scans have been used in studies of neuroplasticity during treatment, which is the topic of our next section.

Prior to addressing this topic, however, a brief summary is in order. A careful review of the bulk of the recovery literature indicates that question regarding left- versus right-hemisphere participation in recovery for patients with chronic aphasia is often framed in the wrong way. If one approaches the literature asking whether the right hemisphere is responsible for language functions to the exclusion of the left hemisphere or whether the left hemisphere is responsible for language functions to the exclusion of the right hemisphere, the literature will be confusing with many contradictory findings. However, if one approaches the literature by asking when left-hemisphere structures are responsible for language in aphasia and when the right hemisphere contributes, the literature begins to make more sense. The literature indicates that the degree of right-hemisphere participation in language after aphasia may be a function of lesion size and aphasia severity (e.g., Heiss et al., 1997). More severely impaired patients with larger lesions may have to rely on the right hemisphere for some types of processing, while less severely impaired patients with smaller lesions may be able to use remaining left-hemisphere mechanisms to support good recovery. While Naeser's studies (Martin, 2004; Naeser et al., 2005) indicate that right pars triangularis activity can impede language recovery, at least under some conditions, Parkinson's (2005) study suggests that in cases of moderate to large lesions, left frontal activity may impede rehabilitation. Some findings (e.g., Cardebat et al., 2003) indicate that both hemispheres are likely to be involved in recovery, and other studies indicate that, under some circumstances, participation of the right hemisphere is specific to the homologues of the damaged left-hemisphere mechanism (e.g., Blank et al., 2003; Weiller et al., 1995). In short, to concentrate on proving the participation of one hemisphere to the exclusion of the other is no longer a fruitful strategy for functional imaging studies of language recovery in aphasia. Future studies should endeavor to specifically address when left-hemisphere structures are responsible for recovery and when right-hemisphere structures are.

Functional imaging of neuroplasticity during aphasia treatment

Functional imaging studies of neuroplasticity during aphasia treatment are just beginning. Frequently, such studies are not driven by any theoretical position either in terms of a treatment targeting specific substrates or even what substrates a treatment should change. Given the paucity of studies and the general lack of conceptual direction, it is too early to draw many definitive conclusions that would be helpful in clinical applications. Nonetheless, the studies are worth reviewing as a starting point for future endeavors. In a study of seven patients who had received melodic intonation therapy (MIT), Belin et al. (1996) compared repetition without MIT strategies to listening to words using positron emission tomography (PET). They found increased activity in right-hemisphere regions including the sensorimotor mouth region, the homologue of Wernicke's area, prefrontal cortex, and anterior superior temporal gyrus. Melodic intonation therapy has been assumed by many to leverage a shift in substrates from the left to the right hemisphere by using a strategy that engages right-hemisphere mechanisms during treatment. Viewed in isolation, these results appear to confirm engagement of right-hemisphere compensatory mechanisms during repetition. However, when these investigators compared repetition using MIT strategies to repetition without using MIT strategies, a significant increase in activity in (left) Broca's area and a decrease in several right-hemisphere regions occurred. One criticism of this study is that it did not measure change in neural substrates with both pre- and post-treatment images.

Wierenga et al.’s (2006) study of two patients receiving a syntax treatment also indicated the importance of left-hemisphere changes in rehabilitation. Both patients showed primarily left-hemisphere activity both pre- and post-treatment. In the patient whose performance demonstrated generalization from treated to untreated tasks, a significant re-engagement of (left) Broca's area was demonstrated from pre- to post-treatment images. It is worth noting that both patients demonstrated relatively small left-hemisphere lesions and could be classified as having mild aphasias at the time of treatment. Thus, the dependence on left-hemisphere activity for syntax production is not surprising.

Another study by Cornelissen et al. (2003) also supported the importance of the left hemisphere as a substrate for aphasia treatment. They administered a contextual priming treatment for naming to three patients with moderately severe anomia and primarily posterior lesions. All patients demonstrated improvement after treatment. Magnetoencephalography (MEG)/magnetic source imaging (MSI) was used to measure changes in neural substrates. Although strong areas of right-hemisphere activity were noted in pre- and post-treatment imaging for each patient, the only area to show a consistent increase in activity in each patient was in the left inferior parietal cortex. Thus, the authors suggested that activation of left-hemisphere cortex can be an important substrate for treatment. However, the variability between subjects concerning which right-hemisphere areas were active may have been due to individual differences in lesion location or other factors, and the possibility that the activated right-hemisphere structures played an important role in language cannot be ruled out.

Indeed, a study of a relatively homogeneous group of patients by Musso et al. (1999) has provided a clearer indication that right-hemisphere mechanisms can be recruited during aphasia treatment. These investigators gave four patients with left temporoparietal lesions and Wernicke's aphasia brief language training emphasizing comprehension. Training sessions were given between 12 PET scans to image language comprehension mechanisms. Improved performance on the language comprehension task was correlated with increased activity in the right superior temporal gyrus and the left precuneus. These data support the idea that the right-hemisphere region homologous to damaged left-hemisphere mechanisms can be recruited to support re-organization of language mechanisms during rehabilitation.

Crosson et al. (2005) used a complex left-hand movement during a picture-naming treatment in an attempt to activate right-hemisphere intention mechanisms that could catalyze a shift of language production mechanisms from left frontal to right frontal structures and/or increase the efficiency of right frontal mechanisms during language production. Two chronic aphasia patients with moderately severe anomias were studied with fMRI of word generation pre- and post-treatment. The treatment was applied to patients with nonfluent aphasias because the intention mechanisms engaged during treatment interact closely with anterior neural substrates affected in nonfluent aphasia. Crosson et al. (in press) have shown that the treatment produces improved naming in just under 90% of patients who function at a similar level to the subjects of this study, the treatment produces greater incremental improvement than a similar treatment without the intention component, and most patients who improve during treatment also show indications of generalization to untreated stimuli. In one patient, the lesion involved both left frontal cortex and the basal ganglia. Right pre-SMA activity and right lateral frontal cortex showed substantial increases in extent of activation from pre- to post-treatment imaging, which was consistent with the intent of the treatment. (See the brief case description below for more information about this patient.) The second patient had a lesion that involved left lateral frontal cortex but spared the basal ganglia. She showed significant improvement on the intention treatment. During fMRI prior to the intention treatment, lateral frontal activity already was 100% lateralized to right lateral frontal cortex and remained so after treatment. Both left and right pre-SMA were active before and after treatment. However, the amount of activity in right frontal cortex decreased from pre- to post-treatment scans. This pattern of activity may indicate increased efficiency in utilization of right frontal mechanisms, though other explanations are possible. Peck et al. (2004) had previously studied the hemodynamic response (HDR) peaks of these patients during their pre- and post-treatment fMRI. They found that the delay in HDR peaks between right primary auditory cortex and mouth sensorimotor cortex were highly correlated with the delay between hearing a category and generating a category member for pre- and post-treatment images. Given the prominence of right frontal and motor activity in the post-treatment images of these patients, to the exclusion of left frontal and motor activity in one patient, these findings are supportive of the role of right frontal cortex in language production during this intention treatment.

A final study by Meinzer et al. (2006) is notable for separating correct and incorrect responses from picture-naming trials during fMRI. The patient demonstrated improved naming performance after two weeks of constraint induced language therapy, which relies on requiring verbal communication (as opposed to other forms of communication) during functional communication tasks and an intensive treatment schedule to produce significant gains in communication (Maher et al., 2006; Meinzer et al., 2005; Pulvermueller et al., 2001). When correct responses from both the pre- and post-treatment sessions were compared to incorrect responses from both sessions, significantly greater activity was seen in the right inferior frontal gyrus for correct versus incorrect responses. This region was also more active for post- than pre-treatment images when items that were incorrect at pre-treatment and correct at post-treatment were used as the basis for comparison. However, in this latter comparison, increases in activity post-treatment also were seen in the right thalamus, the left and right putamen, and the anterior cingulate region.

In summary, these treatment studies have relied on small numbers of patients, and given the small number of patients, it is difficult to draw definitive conclusions. Nonetheless, it is worth noting that both left and right hemisphere activity may act as neural substrates for treatment gains. A central question is under what circumstances left- versus right-hemisphere contributions are important. Currently data are certainly not adequate to provide definitive answers to this question. The idea that certain substrates can be targeted for treatment and that fMRI can be used to verify whether the targeted changes occur (e.g., Crosson et al., 2005) is important, and it is likely to be the topic of future studies as treatment imaging research matures and becomes more conceptually driven.

Methodological challenges for imaging fMRI of language in aphasia

Even though a fair amount of research has been accomplished in the functional imaging of aphasia, significant methodological challenges must be overcome in this endeavor. While many challenges exist in imaging language comprehension in this population, the hurdles faced in imaging language production are even more daunting. Indeed, until recently, almost all studies of language production were done with covert (silent) language production. Although some problems with covert production in aphasic patients should be immediately apparent, they will be enumerated below. The point here is that significant development of methodology is important not only to improve research, but it is also necessary before any routine clinical application. With the recent approval of billing codes for fMRI, clinical applications are likely to receive increasing attention. Hence, the following discussion has relevance not only to research but also to clinical applications. The methodological discussion that follows will follow roughly the ontogeny of an experiment; however, the discussion is also a roadmap for the development of clinical techniques. The order of presentation for topics is as follows: (1) selection of a baseline task from which the task in question will demonstrate changes, (2) structure of language production trials to capture spoken language and minimize motion-related artifacts, (3) various techniques available for mitigation of motion-related artifacts, (4) the importance of stimulus presentation versus response onset in analyzing brain activity when the interval between stimulus presentation and speech onset is variable, (5) use of trials with correct responses and errors in analyses, and (6) reliability and stability of fMRI images across sessions. At varying points, the exposition will rely upon conceptual analysis of the challenges, literature regarding imaging of cognition or language in normal subjects, available literature on fMRI of language in aphasia, and occasionally, data from our own laboratory to illustrate a point.

Selection of a baseline task

We will discuss two criteria for selecting baseline tasks. The first is what cognitive function is to be imaged, a consideration often approached from some theoretical perspective. The second is making certain that selected baseline-experimental task combinations result in images sensitive to neural mechanisms in aphasia and/or changes during recovery or rehabilitation. Superficially, it seems that meeting one of these criteria will lead to meeting the other. In practice, there are trade-offs that must be considered in light of the goals of a specific study. The following example illustrates one such circumstance.

In preparation for studying neural substrates for syntax treatment, Peck et al. (2004) studied two baseline conditions for the experimental task of generating simple passive sentences from a picture. From a conceptual standpoint, picture naming was considered the ideal baseline task because it controlled for most demands in sentence production (e.g., visual processing, semantics, word retrieval) outside of ordering words (i.e., syntax). Yet, Newman et al. (2001) suggested that in some circumstances a resting baseline will give the most accurate map of all the areas necessary to perform language tasks. This analysis suggested there might be some advantages for including all areas involved in word retrieval. Further, in imaging aphasia, it is often necessary to analyze images at the individual subject level because it is important to quantify the amount of perilesional activity. Since lesion size, shape, and location varies considerably between patients, one patient's perilesional area may be in the lesioned area of another patient. In such cases averaging images can grossly underestimate perilesional activity. While trying to limit images to a single cognitive operation through such careful matching of tasks works well in group studies where sensitivity to activity is optimized by averaging across subjects, we frequently have found that such delicate subtractions do not yield reliable activity patterns in individual subjects. Thus, a second baseline task, differing from the experimental task by a larger number of cognitive operations, was chosen for purposes of comparison. This task was passive viewing of nonsense objects. In a group of neurologically normal participants, Peck et al. (2004b) mapped passive sentence generation versus each of these baseline tasks. A critical difference in the results was that Broca's area showed activity increases for sentence generation in contrast to nonsense object viewing, but not in contrast to picture naming. Because Broca's area most frequently shows activity in syntax processing (e.g., Caplan et al., 2000; Grodzinsky, 2000; Kuperberg et al., 2000; Ni et al., 2000; Zurif et al., 1993), it was deemed a critical area for imaging, and the nonsense object viewing task was therefore selected as the baseline task for the experiment.

This example is one illustration of conflict between the two principles for selecting baseline tasks. The ideal task for isolating syntax, picture naming, would act to eliminate non-syntax language components. Yet, relative to passive nonsense object viewing, it also eliminated Broca's area, an important region for syntax. One potential reason for the elimination of Broca's area from syntax-related activity with a picture-naming baseline is that Broca's area may subserve functional substrates important for syntax and word retrieval. For example, procedural memory is probably important for ordering of phonemes during word retrieval and for ordering of words in sentences (e.g., Nadeau & Crosson, 1977), and Ullman (2004) has suggested that Broca's area is important in such procedural memory processes. Irrespective of the reason for the loss of Broca's area during the experimental-naming baseline comparison, pragmatics dictated that passive nonsense viewing should be used as a baseline task so that potential activity in Broca's area could be visualized.

Structure of language production trials

It is important to consider reduction or compensation for motion artifacts, both spatial and temporal characteristics, as well as the physical nature of the artifact production. Rigid motion can be random or stimulus- or subject response-correlated. Artifacts from overt speaking are response-correlated, though with normal subjects responses are uniformly rapid and therefore also can be well-modeled as stimulus-correlated. However, this is not always the case in patients with aphasia. Both rigid head motions and localized speaking motions lead to artifacts not only where there are signal intensity gradients, but also in the vicinity of magnetic susceptibility gradients. Although fMRI data sets typically undergo volume registration as a first step of motion correction, these procedures correct only for rigid head motion brought about by gross head movement. There are other sources of signal change that are not mitigated by correcting for gross head movement including: (1) regional magnetic field inhomogeneities due to global movement of the head and structures within it relative to the static B0 field, (2) differences in spin history within slices resulting from out-of-plane motion from one acquisition to another, and (3) susceptibility changes due to the rarefaction and compression of paramagnetic oxygen in and around the vocal cavity during speech.

The HDR on which BOLD contrast fMRI is based takes roughly 12 seconds to evolve. Thus, in overt language production spoken responses can not only introduce artifacts into the fMRI signal representing the HDR for an ongoing trial, but can also introduce artifacts into fMRI signal from a preceding trial if the signal from the HDR has not returned to baseline before the participant begins to speak during the ensuing trial. Therefore, motion contaminating the latter portion of a HDR (as well as the initial portion) should be avoided. Fig. 1 shows that if two overt language production trials are presented too closely in succession, motion-related signal change from the second trial will occur before the HDR from the first trial has fully evolved, contaminating the HDR of the first trial by motion-related signal change from the second. In neurologically normal participants, the tight temporal relationship between stimulus presentation and response makes it relatively easy to time the onset of trials to avoid this problem. The length of the inter-trial interval should ideally be at least long enough to allow fMRI signal representing the HDR to return to baseline levels before the ensuing trial begins, thereby avoiding contamination of HDRs by motion-related signal change from the next trial.

Fig. 1
figure 1

Representation of time courses for BOLD contrast signal changes related to spoken production of words in successive trials are shown. Times of response initiation are indicated by arrows below the time axis of the plot in seconds after stimulus presentation. The darkly shaded area represents the region of the first time course vulnerable to motion-related signal artifacts from overt language production. If a second trial is presented too rapidly after the first trial, the latter portion of the HDR for the first trial may become contaminated by motion-related signal changes from the second trial. The lightly shaded region represents the portion of the time course vulnerable to motion-related signal changes from a second trial presented too rapidly after the first.

The response latencies of aphasic patients are not only variable but long. In some word production paradigms in our experience, average response latencies can be as long as 8 seconds with standard deviations of up to 3 seconds. Such long and variable latencies make it difficult to anticipate when HDRs related to production of words will start and end. One solution to this problem that we have used is to measure the mean and standard deviation of response latencies to the experimental task a day or more before the scanning session. Using this information, the length of a trial can be set to the mean latency plus 1.2 standard deviations, ensuring that 90% of the HDRs will have evolved before onset of the ensuing trial. Individualizing trial length in this way requires scheduling of an extra session outside the scanner, and reprogramming of the experimental paradigm for each patient can be tedious. Thus, another strategy we have employed is to use the maximum anticipated mean and standard deviation for response latency and to program the task based on these estimates. This latter strategy entails some knowledge of response latencies in the experimental paradigms, which can vary depending on the characteristics of the sample. It also can lengthen scanning sessions for those patients who are able to respond relatively quickly.

Mitigation of motion-related artifacts

In patients with aphasia, overt responses are preferable when language production is being imaged. Overt responses allow the investigator to track response accuracy, which in turn allows investigators to compare correct-response and error trials in analyses to determine if specific brain areas are necessary to produce a correct response (e.g., Meinzer et al., 2006). Further, on-line monitoring of responses allows the investigator to determine if the patient is complying with task instructions, a potential problem for patients with compromised auditory-verbal comprehension. However, motion-related signal changes induced during overt production in BOLD-fMRI can be mistaken for brain activity supporting language production in images of individual patients. In neurologically normal subjects, Barch et al. (1999, 2000) showed that such artifacts can be “averaged out” in group studies; however, they recommended that spoken language not be used in fMRI studies of individual subjects. There are a few occasions in which group analyses can be used in studies of aphasia. For example, Blank et al. (2003) specifically targeted activity in pars opercularis for measurement. Since this right-hemisphere structure was intact in all of their patients, this strategy worked well. Most commonly, however, interest in perilesional activity and the variability in size, shape, and location of lesion prevent accurate quantification of perilesional activity in aphasic patients with group analyses, as noted above. Thus, fMRI in aphasia research is limited to the individual subject level. Further, for clinical purposes, only the individual patient will be of interest for diagnosis, treatment planning, and outcome measurement. Hence, methods must be developed for managing motion-related artifacts from fMRI images acquired during spoken language production. The darkly shaded area at the beginning of the first HDR in Fig. 1 indicates that the first 3 seconds after onset of a spoken response usually are most vulnerable to motion-related signal change, especially in normal subjects.

Although techniques have been developed for “real-time” prospective motion correction during image acquisition (e.g., Thesen et al., 2000, Ward et al., 2000), as well as “real-time auto-shimming” for mitigation of regional magnetic field inhomogeneities (Ward et al., 2002), all these techniques correct for effects of mainly rigid global head movement and do not sufficiently address the artifacts in the fMRI time-series from local non-rigid motion arising from speech production.

Bullmore et al. (1999) and Friston et al. (2000) have corrected fMRI time-series for global motion-related signal changes using the motion parameter estimates of the image registration program. The problems with this technique are that voxel-wise signal changes are not all linearly related to these motion parameters, either individually or combined, and that global rigid motion accounts for only a portion of motion-related signal change during speech, as just noted. Birn et al. (1999) presented alternative methods for correcting fMRI images for motion-related signal changes induced during overt speech. These methods can be particularly effective in event-related language production paradigms when participants produce a single word. Their effectiveness is based upon the fact that motion-related signal changes typically occur on a faster time scale than the blood oxygenation level dependent (BOLD) HDR, which facilitates their separation. Fig. 2 shows the time course of a typical motion-related signal change and the time course of a representative HDR in an area of brain participating in correct responses for a category-member generation task. The simplest method for imaging voxels with true HDRs is to drop the images contaminated by response-correlated signal artifact (e.g., the first two or three images) in the data analysis. Enough of the HDR will remain to identify voxels with a true HDR related to producing the verbal response.

Fig. 2
figure 2

Representation of time courses of a motion-related signal change and of a true BOLD hemodynamic response in an area of brain participating in production of a verbal response. Time from presentation of the stimulus is shown on the horizontal axis. The more rapid evolution of the motion-related signal change than of the BOLD response can be used to reduce motion-related artifacts by various analysis methods.

A similar method was used by Carter et al. (2000) for group analyses that involved spoken responses. However, this type of analysis results in both decreased sensitivity to true HDRs in some cases (Birn et al., 2004; Gopinath, 2003), an undesirable result because areas of activation may be missed and insufficient protection from speech related artifacts in some cases (Gopinath, 2003) in which the speech-related artifactual signal changes persist longer than the two or three images that are ignored. A related method has been proposed (Huang et al., 1999) where the temporal phase of signal changes after overt word generation is used to discriminate between BOLD activation and speech artifacts. However this method also suffers from the drawbacks mentioned above. Birn et al. (2004) demonstrated that by optimizing the data collection parameters (e.g., event-related design with short task intervals and relatively short, but variable interstimulus intervals), activity related to reading words could be detected without substantial false positive activity, even when motion-related activity was not modeled in the regression. However, this strategy, even if effective, relies upon a tight linkage between stimulus presentation and subject response, an assumption that cannot be made in patients with significant aphasia.

An alternative image-analysis method for reducing motion-related signal changes is to detrend the time series for the motion-related artifact (i.e., remove motion-related signal changes from the fMRI time series). This method has two advantages over the approach of dropping the images where motion occurs: First, it can preserve information about the HDR at time points affected by motion. Second, in some cases, motion-induced artifact can extend well into the HDR so that dropping all the images compromised by motion from the analysis will not leave enough of the HDR to identify active voxels. Birn et al. (1999) advocated orthogonalizing or detrending all voxels nonselectively for signal related to motion; however, Gopinath (2003) demonstrated that detection sensitivity can be compromised when detrending algorithms are nonselectively applied. The problem with this technique is that not all voxels may be equally affected by motion-related signal changes and the temporal evolution of motion artifacts may not be sufficiently separated temporally from signal changes produced by brain activation in some voxels. Thus, to adequately remove motion artifact, nonselective detrending can reduce the ability to detect true BOLD HDRs.

Gopinath (2003) developed an alternative detrending technique to selectively remove trends in signal change due to motion-related artifacts from fMRI data acquired during spoken language. This procedure involves the following steps: (1) An initial deconvolution analysis is performed on raw voxel time series in which fMRI signal change is estimated at each voxel over a fixed period (e.g., 20 seconds) after onset of vocal response, and the significance of this signal change assessed. At this stage voxels with both task-related BOLD HDRs as well as task-related motion artifacts would exhibit significant fMRI signal change. (2) A trained operator selects voxels outside the brain with a significant relationship to response onset as prototypical motion-related artifacts. The sources of these artifacts were explained above. Signal changes from these artifacts occur both inside and outside the brain because of the effects of motion of the articulators on magnetic field homogeneity and the differences in local oxygen density that propagate the signal artifacts. However, responses from voxels outside the brain are not commingled with BOLD HDR signal changes; thus, they yield representations only of the motion-related artifacts. Signal changes from speech-correlated motion usually are characterized by a more rapid time course than BOLD HDRs. In event-related paradigms, this temporal difference can be used to separate artifacts from true HDRs (Birn et al., 1999). The artifacts can vary in shape (e.g., amplitude and positive versus negative deflection). Thus, several prototypical artifacts of different shapes are selected. (3) The trained operator also selects a small number of voxels (e.g., three) in which the deconvolved fMRI signal change is significant and representative of prototypical BOLD HDRs. Frequently, the onset of HDRs varies by an image cycle or two depending upon voxel location, and an attempt is made to represent such differences in onset, if they occur, in the prototypical HDRs. (4) Voxels in which the deconvolved fMRI signal changes are significantly correlated with prototypical motion artifacts but not with prototypical BOLD HDRs are detrended of signal proportional to the maximally correlated motion artifact prototype. Essentially the artifact prototype is convolved with the time-series of speech onset events to generate a corresponding artifact prototype time-course of the same length as the voxel time-series. The voxel time-series is then subtracted of MRI signal changes proportional to the prototypical motion artifact time-course. (5) For voxels in which the deconvolved signal change correlates significantly with both prototypical artifacts and prototypical BOLD HDRs, only the points in the time-course most affected by the artifact, i.e., those representing the first 3 to 5 seconds after onset of each vocal response, are treated by the algorithm. This is done by considering only the first 3 to 5 seconds of the deconvolved artifact prototype when constructing the full artifact time-course described in step 4. (6) For voxels in which the deconvolved fMRI signal change correlates only with protypical HDRs and for voxels in which the deconvolved signal correlates with neither prototypical artifacts nor HDRs, no detrending is performed. (7) Finally, deconvolution analysis is repeated, this time on the “artifact-free” voxel time-series, yielding activation maps sensitive only to BOLD signal changes. Gopinath (2003) described the technique in detail, and preliminary analyses showed the selective detrending technique performs better than nonselective detrending, dropping initial images, and motion parameter regression when both sensitivity and specificity are taken into account.

Figure 3 shows a sagittal slice with significant signal changes for each of four aphasia participants producing spoken words during studies in our laboratory, before (a) and after (c) selective detrending. Two participants (A008, X030) generated exemplars from given categories; two (X105, X115) named pictures. Note the reduction in the false-positive activity after selective detrending. This figure also shows a representative motion-related time series (b) within the parenchyma for each participant prior to selective detrending. All these time series were eliminated by selective detrending based on their correlation with time series in voxels outside the brain designated as containing motion-related signal changes. Generally, the majority of motion-related signal change occurred within the first three images of the time series and could be distinguished easily from the typical hemodynamic response that evolves over a longer period of time. Note that motion-related signal changes can be either positive (A008, X105) or negative (X030, X115). Time series representing “true” hemodynamic responses (d) evolved over a longer time period than in voxels with motion-related signal change. For each of these participants with aphasia, selective detrending substantially reduced the volume of brain demonstrating a significant R 2: by 51% for A008, by 40% for X030, by 48% in X105, and by 95% in X115. In other words, the percentage of significant voxels representing motion-related signal change without a significant hemodynamic response varied between 40% and 95% in these four subjects. Obviously, application of selective detrending greatly improves the specificity for significant clusters of “activity” in these participants.

Fig. 3
figure 3

Examples of motion-related signal change and HDRs for each participant (response-locked analyses). The four columns each represent a different aphasic patient. The first (top) row (a) shows images of significant signal change before selective detrending was applied to the data. The second row (b) shows the deconvolved time course of the voxel at the cross hairs in the image just above it (a). The third row (c) shows images of significant signal change after selective detrending has been applied to the data to remove motion-related signal changes. Note that the detrended images have lost many voxels of significant signal change that represent motion-related signal change rather than HDRs. Many voxels eliminated by selective detrending were in areas of lesion or were outside the brain. The fourth (bottom) row (d) shows the deconvolved time course of the voxel at the cross hairs in the image just above it (c). Note that for motion-related signal, change is most dramatic in the first 3 images after the spoken response; however, hemodynamic responses have a characteristically extended time course across several images. Thresholds for significant activity (red) were set at R 2 = 0.20 for word generation and R 2 = 0.16 for picture naming because of the differences in sensitivity between paradigms.

Although selective detrending appears to perform better than other techniques designated above, two disadvantages to this technique, as presently implemented, are that it involves the judgment of a trained operator and that it is labor intensive. Further, in occasional subjects, the time courses of motion-related signal changes extend beyond the first few seconds after a response, making it more difficult to separate the artifact from true HDRs. In such instances, other techniques for detrending must be applied. Obviously, more work is needed in this area before data from overt language production paradigms can be applied on a routine clinical basis.

Further, it is appropriate to say a few words about four different methods for dealing with motion-related artifacts, some of which involve techniques other than BOLD contrast fMRI. (1) Martin et al. (2005) presented multiple naming trials in several trial blocks. Essentially, they dropped all images from their analysis during which subjects were speaking, which would include the rise of signal from the HDR and most of the signal plateau. In other words, the analysis was dependent upon the final portion of the HDR plateau and the portion of the HDR in which signal returns to baseline. One advantage to this technique is that a blocked format drives BOLD signal above levels characteristic of single responses, improving the contrast-to-noise ratio (CNR). However, the loss of data from the time during which participants are speaking can reduce the ability to detect true HDRs (Birn et al., 2004). Also, this strategy entails the loss of flexibility from an event-related format. Loss of flexibility precludes separation of trials in which errors were made or no response was given from trials in which correct responses occurred (e.g., see Meinzer et al., 2006). (2) Naeser et al. (2004) used dynamic susceptibility contrast (i.e., gadolium infusion) to assess differences in relative cerebral blood volume (rCBV) during telling a series of stories cued by cartoon pictures versus silent viewing of patterns. The technique uses a long block of trials to elicit changes in rCBV. The contrast-to-noise ratio with gadolinium injection is relatively high, and motion-related artifacts are less likely to be a problem. This technique allows for analysis of images acquired during overt speech. However, again the flexibility of event-related trials is lost. Further, injection of the contrast agent is an invasive technique that entails some risk. Gadolinium is deposited in the bone and other tissues (Gibby et al., 2004) and once in those tissues remains there for a long period of time (Talbot et al., 1965). Although the long-term effects of gadolinium are not known, there is cause for concern. Gadolinium is one of the most potent inorganic calcium antagonists known (Biagi et al., 1990; Talbot et al., 1965), has carcinogenic effects (Rocklage et al., 1991; Costa, 1980), and may interfere with coagulation. Thus, repeated administrations of this contrast agent may be inadvisable. (3) A technique called sparse temporal sampling may be used for overt language production in aphasia (e.g., Meinzer et al., 2006). In this technique, fMRI data are not even acquired while a subject is speaking. The loss of the time points during speech is offset by the greater magnitude of signal that occurs because of greater recovery of longitudinal magnetization (Edmister et al., 1999; Hall et al., 1999). The technique has the further advantage of silent intervals during which auditory stimuli can be presented without the interference of gradient noise, which can compromise auditory comprehension of stimuli. However, this technique also relies on a relatively predictable interval between stimulus and response and may not accommodate paradigms or subject samples where the latency between stimulus and response is long and/or variable. (4) Preliminary findings have shown that arterial spin labeling (ASL) can be used as an fMRI technique to acquire images during spoken language, and it does not produce the same artifacts as BOLD contrast fMRI (Kemeny et al., 2005). ASL measures regional cerebral blood flow by comparing an image in which blood has been tagged with a radio-frequency pulse to a control image with no tag. However, the signal-to-noise ratio of ASL is typically less than half that of BOLD contrast fMRI, its temporal resolution is poorer than that of BOLD because it must acquire both tag and control images, and the maximum number of slices that can be acquired is typically less than BOLD images because of the need to acquire images before the tagged blood signal has fully relaxed (Liu & Brown, in press). While problems with temporal and spatial resolution in ASL are still being resolved, the rapid pace of developments in the field suggests that these problems may be resolved or, at least, made manageable. If this technique can demonstrate nearly equal sensitivity to BOLD contrast fMRI, it may become the technique of choice for fMRI of spoken language some time in the next few years.

Fig. 4
figure 4

Representation of time courses of BOLD contrast signal changes related to three different cognitive activities during a word generation trial: perceiving and comprehending the stimulus, retrieving the appropriate word for the picture or category member, and speaking (producing) the selected word. Time of stimulus presentation and response initiation are indicated by arrows below the time axis of the plot. Because of variable response latencies in aphasic patients, the onset of the response cannot be accurately predicted from stimulus onset. The BOLD response related to perceiving and comprehending the stimulus begins soon after stimulus presentation and may return to baseline independent of when the response is given. The BOLD response related to word retrieval also may begin soon after stimulus presentation; however, because cognitive processes related to word retrieval may continue until a response is given, the hemodynamic response is extended in time and may return to its baseline only after the response is given. The BOLD response related to producing the selected word begins just prior to the time of the response. The model as depicted in this figure assumes that the major difficulty for word production lies in word retrieval. Difficulties in comprehension or motor programming of a response may lengthen the hemodynamic responses for stimulus perception or verbal response, respectively.

Use of stimulus presentation versus response onset to time analyses

The problem of long response latencies to stimuli in language production tasks must be considered in analysis of fMRI findings in patients with aphasia. Whether an analysis is timed to the presentation of the stimulus eliciting a response or to the response itself could make a difference in which brain areas demonstrate significant activity. Put simply, processes more closely linked in time to presentation of the stimulus would be favored in a stimulus-locked analysis, and those more closely linked in time to the response would be favored by a response-locked analysis. This consideration becomes increasingly important as the duration between stimulus and response gets longer or is more variable. Fig. 4 shows three hypothetical HDRs for a single trial of a spoken word paradigm for an aphasic participant, with a fairly long time between stimulus onset and response. (1) In such cases, HDRs related to perceiving and comprehending the stimulus from a trial should begin around stimulus onset and be more closely linked to it than to the response. (2) Similarly, HDRs related to response processes should begin somewhat before the response and be more closely linked to the response than stimulus onset. (3) HDRs related to word retrieval, may be less predictably linked to either stimulus onset or response. If an aphasic participant has problems perceiving or comprehending stimuli and comprehension is delayed and variable, then HDRs in areas participating in word retrieval may not begin until the stimulus is comprehended and linkage to stimulus onset may be compromised. On the other hand, if an aphasic participant has difficulty producing a response once the word is retrieved, there may be a variable delay between the end of word retrieval and response production. In the latter case, the link between HDRs in areas participating in word retrieval and timing of the response may be compromised. How data are analyzed will depend on what processes are important to image. If stimulus onset is used to mark the beginning of HDRs in data analysis, as is done in most similar experiments with neurologically normal subjects, the analysis is likely to favor cognitive processes associated with perceiving and comprehending the stimulus, especially in cases where timing of the spoken response cannot be reliably predicted from stimulus onset. Likewise, if response onset is used to mark the beginning of HDRs, then the analysis will be biased toward response processes in cases where timing of stimulus onset and response onset are not closely linked. Further, in many instances there will be some interest in imaging word retrieval processes, which may be variably linked to either stimulus or response onset. From a practical standpoint, it may not always be clear which type of analysis would be best to image activity in structures performing word-retrieval processes.

Table 1 Lag between stimulus presentation for all vocal responses and correct responses only (mean and standard deviation)

Table 1 displays the mean time and the SD between stimulus onset and response for the four participants whose selective detrending results were shown above. Latencies are shown both for all responses and for correct responses only. As noted above, A008 and X030 both received the word-generation task, and X105 and X115 both received the picture-naming task. For A008, the majority of responses were spread across five images for all responses or across 4 images for correct responses. For other subjects, responses usually occurred within the first two to three images for all or correct responses.

Fig. 5
figure 5

Total volume of significant activity in selected regions of interest for response-locked (dark gray) and stimulus locked (light gray) analyses. Both A008 and X030 participated in the word-generation paradigm, where the statistical threshold was R 2 ≥ 0.20. For subject A008, the response-locked deconvolution is clearly more sensitive than the stimulus-locked deconvolution in that 15 of the 16 active ROIs show greater activity with the response-locked than with the stimulus-locked analysis. For X030, neither analysis is clearly more sensitive. For both X105 and X115, who received the picture-naming paradigm (with a statistical threshold of R 2 ≥ 0.16), the stimulus-locked analysis was more sensitive. For future research, these profiles suggest that both paradigmatic differences and patient variables should be explored to assess which analysis provides superior sensitivity. Also, some areas may show activity with one type of analysis but not the other; thus, the purpose of the analysis also should be considered. L = left, R = right. LF = lateral frontal, MF = medial frontal, BG = basal ganglia, Th = thalamus, PP = posterior perisylvian, OP = other parietal, Au = auditory cortex, Vi = visual cortex, Lm = perilimbic cortex.

Figure 5 shows bar graphs of response-locked (dark gray) and stimulus-locked (light gray) analyses by ROI. Data were analyzed with the deconvolution program from AFNI (Cox, 1996), and functional intensities represent the squared correlation of the acquired time series with the estimated HDR convolved with the temporal sequence of either stimulus onset (stimulus-locked analyses) or response onset (response-locked analysis). The dependent variable is the volume of tissue in microliters for each ROI that exceeds the critical statistical threshold. For A008, the response-locked analysis produced a total of 21,256 μl of significant activity, while the stimulus-locked analysis produced a total of 9,164 μl. All 13 areas showing activity on both analyses demonstrated more activity on the response-locked than the stimulus-locked analysis. Additionally, two areas showed activity on the response-locked but not the stimulus-locked analysis, and one area (right thalamus) showed activity on the stimulus- but not the response-locked analysis. Thus, for A008, 15 of 16 areas showing activity on one or both analyses showed greater activity in the response-locked than the stimulus-locked analysis. Based on a binomial test with an expected probability of 0.5, this distribution is highly significant (p < 0.001). On the other hand, for X030, the response-locked analysis produced a total of 8,831 μl of significant activity, while the stimulus-locked analysis produced a total of 10,100 μl. Five of eight areas showing activity on both analyses showed greater activity on the response- than the stimulus-locked analysis. An additional two areas showed activity on the response-locked but not the stimulus-locked analysis, but seven additional areas showed activity on the stimulus-locked but not the response-locked analysis. Thus, for X030, seven of 17 areas showing activity on one or both analyses showed greater activity on the response-locked deconvolution (p = 0.148). For X105, the response-locked analysis produced a total of 2,902 μl of significant activity, but the stimulus-locked analysis produced a total of 6,160 μl. One of two areas showing activity on both analyses demonstrated more activity on the response-locked than the stimulus-locked analyses. Three areas showed activity on the stimulus-locked but not the response-locked analysis, and no areas showed activity on the response- but not the stimulus-locked analysis. Thus, for X105, one of five areas showing activity on one or both analyses showed greater activity in the response-locked than the stimulus-locked analysis; because of the small number of clusters, the binomial probability does not reach significance (p = 0.15). For X115, the response-locked analysis produced a total of 558 μl of significant activity, and the stimulus-locked analysis produced a total of 1,960 μl. One of two areas showing activity on both analyses demonstrated more activity on the response-locked as opposed to the stimulus-locked analyses. Seven areas showed activity on the stimulus-locked but not the response-locked analysis, but no areas showed activity on the response- but not the stimulus-locked analysis. Thus, for X115, eight of nine clusters showing activity on one or both analyses showed greater activity in the stimulus-locked than the response-locked analysis (p = 0.018). While differences in the two word production paradigms (word production, picture naming) may have contributed to the variability between these aphasic participants, the differences in comparative efficiency of the stimulus- and response-locked analyses for A008 and X030 indicate that participant variables may play an important role in the which of the analysis techniques maximizes sensitivity.

From a practical standpoint, it can be difficult to choose an analysis technique for a series of patients who are likely to demonstrate similar variability, yet maintain consistency of analyses across patients so that findings from one patient can be compared to those of another. A solution to this dilemma that we currently use is to merge stimulus- and response-locked analyses, such that each voxel is populated by whatever statistical value (R 2 from stimulus-locked analysis versus R 2 from response-locked analysis) is greater in magnitude.

Correct responses and errors in analyses

Another dilemma is whether to use correct responses in analyses or to include all responses. Generally, increasing the number of trials on which an analysis is based (i.e., by including all trials and not just trials on which a correct response was given) will increase the volume of significant activity because increasing the number of trials will yield more stable statistical analysis, as long as there is no major differences in the shape, intensity, or brain location for that hemodynamic response on correct-response trials versus error trials. However, some brain areas might function differently on correct-response versus error trials, which would give a clue as to the nature of error responses. If this latter possibility were true, then adding incorrect-response trials to the deconvolution would dilute the HDRs for correct-response trials, resulting in a lower total volume of activity. Further, in some instances, it may be desirable to compare correct versus incorrect responses. Indeed, data from Meinzer et al. (2006) demonstrated the value of comparing correct and incorrect responses. They were able to do so because they used sparse temporal sampling where the signal amplitude is greater than continuous BOLD sampling and it takes a smaller number of responses to yield a stable estimate of the HDR.

On the other hand, we used continuous BOLD sampling procedures with the patients discussed in this paper and did not have enough responses to estimate HDRs from both the correct response and error trials. Instead, we analyzed whether it was better to use just correct responses in analyses or to combine correct response and error trials in which there was some response. Table 2 addresses whether any clusters of activity unique to correct-response trials would be lost if an analysis using all responses was used. Unique clusters of activity from the analysis in which all responses were used were compared to those from the analysis in which only correct-response trials were used. On the left are the number and activity volume of unique clusters for the all-response analysis that did not appear on the correct-response analysis. On the right are the number and activity volume of unique clusters for the correct-response analysis that did not appear on the all-response analysis. All subjects demonstrated clusters of activity on the deconvolution analysis in which all responses were used that did not appear on the deconvolution in which only correct-response trials were used. The number of such clusters varied between three for X030 to 12 for A008, with the total volumes of the unique clusters varying between 332 μl and 1991 μl. Only two of the four subjects showed activity on deconvolution in which only correct-response trials were used that did not appear on the deconvolution in which all responses were used, and in each case, the additional activity consisted of a single cluster. In addition to the unique clusters, it should be noted that many individual clusters were larger for one analysis than the other. In such instances, they were most often larger on the deconvolution in which all responses were used than the one in which only the correct responses were used. Thus, greater sensitivity generally was gained by using all vocal responses in data analyses.

Table 2 Number and volume of activity for unique clusters for analysis of all responses versus analysis of correct responses only

As noted above, it may be desirable in some instances to compare brain activity from trials in which correct responses are given to that from trials on which errors are made. To do so, one must make certain that enough responses of each type are available to provide a stable map of activity. Further, one must be careful when different numbers of responses in each category are given since increasing the number of trials in an analysis can increase sensitivity. As noted above, Meinzer et al. (2006) analyzed correct responses versus error trials in data collected through sparse temporal sampling, and the reader is referred to this study for a good example of how to analyze such data.

Reliability and sensitivity of images

Another issue requiring some consideration is the stability of images across time. Such stability involves the related problems of image reliability (i.e., the ability to reproduce findings across time) and changes in image sensitivity (changes in the ability to detect brain activity across sessions). Studies of image reliability across sessions for language paradigms in aphasia are lacking. However, reliability studies in neurologically normal subjects suggest some strategies for dealing with reliability concerns when images must be collected across time, which is necessary in longitudinal studies of recovery or in studies of changes in neural substrates during treatment (Crosson, in press). In general, if a voxel for an individual subject is active in a paradigm during one session, the probability of it being activated during the same paradigm in another session is about one in three. This fact suggests that voxel-by-voxel comparisons across sessions within individual subjects are not stable enough to separate simple session-to-session variability from changes related to treatment or recovery. However, larger regions of interest (ROIs) can yield much better repeatability across sessions. Generally, within larger ROIs known to be relevant to a specific task, the percentage of a volume active from one session to another ranges from 60 to 85% (Machielsen et al., 2000; Maldjian et al., 2002; Wei et al., 2004; Swallow et al., 2003). Hence, for individual subjects reproducibility is much better at the ROI level than at a voxel by voxel level. Such ROI approaches have been used in imaging of language (e.g., Naeser et al., 2004) and treatment change (e.g., Crosson et al., 2005) for aphasia patients. Some of the analyses above have used this approach. Sabsevitz et al. (2003) have shown that laterality indices from large ROIs correlate well with results of the Wada test and can be used to predict decline in function after temporal lobectomy in epilepsy.

In considering the lack of voxel-to-voxel correspondence of activity between fMRI sessions, one might be concerned whether fMRI shows adequate reliability to be useful from a research or a clinical standpoint. However, some parallels in microelectrode stimulation mapping of animals suggests that at least some of that variability may be reflective of underlying variability in the actual tissue response rather than a product of the mapping technique. For microelectrode stimulation mapping in animals, point-to-point correspondence can vary considerably across as little as a few minutes. For this reason, the total area of cortex or the percent of the total area mapped that is occupied by a specific function (e.g., motor representation for a specific digit in primates) is used as a measure of plasticity in pre- to post-intervention studies (e.g., Nudo et al., 1996). Kleim et al. (2003) have proposed that this underlying variability in cortical maps is necessary for learning. We cannot be certain how these observations in the motor cortex of rats and monkeys scale to humans and fMRI. Nonetheless, the fact that such variability exists in animal cortex suggests that variability in voxel-to-voxel correspondence of fMRI maps from one session to another might to some degree be a function of the variability in the underlying map and not a function of the reliability of the mapping technique. Further, the use of the total volume of activity within a mapped region in animal microelectrode stimulation studies is similar to the ROI approach in fMRI that has shown adequate reproducibility.

Even so, another factor affecting session to session reliability of fMRI images is variable sensitivity of images to activity from one session to another. In our experience, the number trials in which a response is given and the position of these responses in the temporal sequence are major contributors to variability in sensitivity of images across sessions. (We include only trials in which a response was given in analyses.) Gopinath et al. (2005) have developed a technique to compensate for differences in detection power of BOLD measures across sessions of the same paradigm. The technique involves using the white noise variance estimated from the voxel time-series power spectral density to encode the noise structure of the voxel with a mixed auto-regressive plus white noise model (Purdon & Weiskoff, 1998). Then, simulated HDRs of varying amplitudes can be added to resulting “noise” time series at the exact time-point where patient responses actually occurred. Detection power curves can be created for each session's dataset for each simulated HDR amplitude. The functional intensities of one session's datset can be adjusted to the functional intensities of the other that would provide equal detection power (i.e., sensitivity) across the sessions. The derived amplitude of an HDR in an individual voxel determines the amplitude of the simulated HDR that is used to equate detection power for that voxel. While this technique has proven useful in correcting for differences in image sensitivity due to differences in number of responses in an analysis or to their position in the time series, the technique also can be used to correct for differences in the underlying noise structure between sessions. The technique has been useful in assessing changes in regional activity from pre- to post-treatment images (Crosson et al., 2005; Wierenga et al., 2006). One weakness in the current implementation of the technique is that it performs the detection power adjustment for all voxels in the brain from the same set of detection power curves, which implicitly assumes similar noise structure across all voxels in a dataset. However, for voxels which possess sufficient fMRI temporal signal to noise ratio, task performance changes are the main contributors to detection power differences and the effects of assuming similar noise characteristics is minimal.

Brief case example

As noted above Crosson et al. (2005, in press) developed a treatment designed to shift frontal activity toward the right frontal lobe during language production. A008 was a 47-year old man who had an ischemic lesion in the left middle cerebral artery territory four years before he was treated. His case is more completely described by Crosson et al. (2005). His lesion encompassed the left frontal operculum, left prefrontal cortex, and adjacent parts of the insula inferiorly. It extended into the temporal and parietal opercula and into the frontal-parietal region above the operculum. The left caudate nucleus, lentiform nucleus, and thalamus were almost completely destroyed by the lesion. At the time he was treated he had a mild to moderate nonfluent aphasia with a Western Aphasia Battery (WAB) aphasia quotient of 79.6. His naming was moderately impaired (72 of 100 points on the WAB). His repetition was slightly better than his naming (80 of 100 points on the WAB), and comprehension was only mildly impaired (172 of 200 points on the WAB).

He was given a novel treatment with an intention manipulation in which picture-naming trials were initiated by a complex left-hand movement. The treatment was designed to shift frontal activity toward right frontal cortex by activating right frontal mechanisms with the complex left hand movement. Because intention is related to action selection and initiation, and patients with nonfluent aphasia show difficulty with selection and initiation of spoken responses, it was hypothesized that the patient would perform better on this intention treatment than on a treatment in which the intention manipulation was replaced by a manipulation of spatial attention (viewing pictures to be named in the left hemispace). Consistent with this hypothesis, the patient showed significant improvement on naming probes for the intention but not the alternative treatment. However, such a treatment response does not necessarily indicate that the hypothesized shift in frontal activity occurred during the intention treatment.

Such confirmation was sought through the use of BOLD contrast fMRI, which was administered before and after the intention treatment. In the experimental task given during fMRI, the patient was given category names, and for each category name, he generated a single category member. This task was used instead of picture naming both because it was more likely than picture naming to demonstrate medial frontal activity in which we were interested and because we assumed that his improvement in picture naming would generalize to category member generation. The latter assumption was confirmed since the subject generated correct responses on 24 of 45 trials during pre-treatment fMRI and 32 of 45 trials during post-treatment fMRI, and his mean response latency was reduced from 8.68 seconds (SD = 3.72) during pre-treatment imaging to 4.80 seconds (SD = 2.39) during post-treatment imaging. The baseline task was viewing a fixation cross. Intertrial intervals were long enough to allow the HDR to return to baseline before the beginning of an ensuing trial. fMRI data were analyzed with both response-locked and stimulus-locked deconvolutions after the selective detrending procedure described above was performed. Images from the two sessions were equated for sensitivity to BOLD response, as described above. A statistical threshold of R 2 ≥ 0.20 was used. Images that combined response-locked and stimulus-locked analyses were created; however, it differed little from the image containing only response-locked data because the response-locked image was consistently more sensitive than the stimulus-locked image (see above analysis for A008). Fig. 6 shows frontal views of activity from pre- and post-treatment imaging (the right side of the image represents the left side of the brain). It can be seen that the left frontal activity volume is relatively stable from pre- to post-treatment imaging. However, the right frontal activity volume increases substantially from pre- to post-treatment imaging. Lateral frontal activity showed no significant lateralization in pre-treatment images but significantly greater right than left frontal activity post-treatment. Pre-supplementary motor area (pre-SMA) activity on the medial frontal wall was significantly lateralized to the left hemisphere during pre-treatment images, but not significantly lateralized during at post-treatment (see Crosson et al., 2005). Hence, fMRI findings indicate that the predicted shift in frontal activity toward the right hemisphere actually occurred. Nonetheless, the other patient whom Crosson et al. (2005) imaged before and after treatment showed frontal activity lateralized to the right hemisphere even before treatment commenced. Yet, this patient showed improvement in the intention treatment. Thus, the impact of this treatment on lateralization needs to be assessed in a larger sample.

Fig. 6
figure 6

Frontal views of pre- and post-intention treatment images for A008. Left-hemisphere activity volumes remain relatively stable from pre- to post-treatment images. However, there is a significant increase in right-hemisphere activity volumes from pre- to post-treatment images. No significant difference in lateralization of lateral frontal activity existed on the pre-treatment image; however, lateral frontal activity is significantly lateralized to the right hemisphere at post-treatment, consistent with the experimental hypothesis.

Conclusions

En masse, the functional neuroimaging literature in aphasia indicates that the question of whether the left or the right hemisphere is responsible for recovery in aphasia cannot be adequately answered. Rather, the appropriate question is when the left hemisphere plays an important role and when the right hemisphere plays an important role. These two outcomes are not mutually exclusive. Smaller lesions with good recoveries tend to favor left hemisphere substrates; larger lesions with poorer recoveries tend to favor the right hemisphere. However, studies also indicate that right-hemisphere activity in some instances is specific for homologues of the damaged left-hemisphere structures. Treatment research indicates that both hemispheres play a role in treatment substrates, depending upon circumstances and individual patients. Clearly, more research is needed to address the patient and treatment variables that determine which left- and right-hemisphere structures are involved in treatment gains and how to best engage them.

In order to accomplish this research, technical challenges for fMRI of spoken language in aphasia must be managed. There is no generally agreed upon method for managing motion-related artifacts in fMRI of language production in aphasia. At the current time, the best technique seems to depend upon task and patient characteristics. If patient response latencies are relatively short and not too varibable, sparse temporal sampling may represent a viable technique and has the added advantages of increased signal and silent intervals during which stimuli can be presented. When response latencies are relatively long and variable, continuous BOLD sampling may be the best technique, and selective detrending or other techniques can be used to mitigate motion-related effects. If limitations of ASL can be resolved, it may replace BOLD as the preferred fMRI acquisition technique in spoken language paradigms because it is not as vulnerable to motion artifacts as is BOLD contrast.

With implementation of CPT billing codes for fMRI, the pressure to use fMRI in clinical rehabilitation settings will increase. However, a great deal of research needs to be done before these techniques are ready for implementation in the clinical arena. For example, knowledge regarding the relationship between regional brain activity and treatment success does not exist. Currently, there is no basis in the fMRI literature to suggest that it can be useful in treatment selection and/or patient management. More research will be necessary to define these relationships. Even so, the promise of fMRI as a tool in clinical rehabilitation is high. It is possible that pre-treatment scans could be useful for selecting treatments, once the proper database exists. Further, it is possible that fMRI research can be helpful in developing new treatments based on knowledge of what neural substrates can be recruited for treatment. The future for fMRI as a clinical tool in rehabilitation is bright, but for aphasia treatment, that future is not yet here.