Introduction

Reading and language researchers have made steady progress in the past three decades towards generating comprehensive neuropsychological models of reading (Marshall and Newcombe 1973; Derouesne and Beauvois 1979; Coltheart 1981; Coltheart et al. 1993, 2001). Comparable advances have been made in targeting underlying cognitive deficits in remediation of distinct reading impairments following stroke (de Partz 1986; Friedman et al. 2002; Lott et al. 1994; Mitchum and Berndt 1991). Despite these gains, many questions still persist with regard to our understanding of normal and disordered reading as well as the practicability of alexia rehabilitation. Some of the enduring questions regarding effective treatments for the alexias are common to research examining other aspects of aphasia rehabilitation. These include questions concerning underlying mechanisms of recovery and long-term maintenance of therapeutic gains.

Findings from recent neuroimaging investigations have underscored the complex and dynamic nature of the mechanisms of post-stroke functional re-organization. Some studies have linked recovery with greater activation in right hemisphere homologues of regions known to be active in normal (left-lateralized) language networks (Cappa et al. 1997; Gold and Kertesz 2000; Musso et al. 1999). This is also in line with studies on children with early left-hemispheric lesions showing a completely mirrored activation pattern for language tasks (Staudt et al. 2001). This notion has been challenged, however, by evidence suggesting that abnormal and/or over-activation of right hemisphere structures, particularly during overt speech tasks, may be, at least in part, a maladaptive response (Belin et al. 1996; Naeser et al. 2004; Rosen et al. 2000). Other studies have suggested that activation of residual left hemisphere perilesional areas may be critical to better, more efficient, or long-term language recovery (Heiss et al. 1999; Saur et al. 2006; Warburton et al. 1999).

With the exception of Belin et al. (1996) and Musso et al. (1999) noted above, along with a handful of more recent studies examining treatment-induced functional reorganization, the lion’s share of investigations into brain plasticity in post-stroke aphasia have focused on patients whose recovery is of natural, or unknown, origins. While this work has contributed to our understanding of neural mechanisms that may support language recovery, only those studies that utilize neuroimaging before and after treatment can actually demonstrate how improvement in a specific behavior targeted by therapy is associated with functional brain reorganization.

Animal studies have demonstrated that training specific motor skills after lesioning of motor cortex can induce cerebral reorganization associated with improved skilled motor movements. The effects of such plasticity can be observed in both perilesional tissue and areas remote from the site of injury for months following stroke. Nudo and colleagues, for example, have demonstrated in both rats and monkeys reorganization of representation for movements following rehabilitative training in a skilled reaching task (Nudo et al. 1996; Nudo and Friel 1999).

Unlike vision, hearing, and locomotion, however, language (and neural mechanisms supporting language recovery following stroke) can only be studied in humans. This has severely limited the methods for elucidating models of treatment-induced functional reorganization in aphasia. The animal research may have important implications for rehabilitation, but what is needed is translational research that can demonstrate how functional reorganization of motor mapping in rats can be extended to language recovery in humans.

A modest number of recent studies utilizing functional neuroimaging to examine treatment-induced changes in patterns of activation has begun to accumulate data supporting the notion that functional reorganization underlies language improvement associated with specific language treatment (Belin et al. 1996; Breier et al. 2007; Cornelissen et al. 2003; Leger et al. 2002; Meinzer et al. 2006; Musso et al. 1999; Pulvermuller et al. 2005; Small et al. 1998; Vitali et al. 2007). As Raymer et al. (2008) recently noted, in addition to replicating these initial findings, it will be critical to begin to explore how other stroke recovery factors, both intrinsic to the patient (e.g., site and size of lesion, and type of language deficit) and intrinsic to the therapy (e.g., timing, intensity and duration of treatment) might influence treatment-induced functional brain reorganization.

Many of the early investigations into neural reorganization have attempted to quantify the degree of left versus right hemisphere activation in aphasia recovery by utilizing a lateralization index ((left − right)/(left + right)) based on number of statistically significantly activated voxels in left and right hemispheres (Fernandez et al. 2004; Thulborn et al. 1999). A significant problem with the traditional lateralization index method is its dependence on one static significance threshold. The classical method, with its threshold dependency, has thus been criticized as failing to produce reliable laterality results, and attempts at improving the methodology have recently emerged (Jansen et al. 2006; Wilke and Schmithorst 2006). In light of the importance of understanding functional re-organization in this clinical population, the present study explores the reliability and robustness of our findings in the context of regional and hemispheric lateralization. This is done utilizing a new method (Wilke and Schmithorst 2006) for examining the dynamic aspect of the laterality question and its application to aphasia/alexia recovery.

A second important question in aphasia and alexia rehabilitation concerns the mechanism underlying long-term maintenance of results. It has often been observed that the gains produced in treatment “do not survive beyond the therapist’s parking lot”. Of the relatively few studies that address issues related to maintenance, the results have been mixed (e.g. Hillis 1989; Friedman and Lott 2002; Thompson et al. 2003). Such inconsistency further underscores the complex nature of aphasia recovery, and suggests the need for future investigations into such diverse factors as treatment type, length, intensity, target selection, task difficulty, and presence or absence of extended treatment beyond criterion. The latter factor, which we predict will affect the likelihood of long-term maintenance, receives particular attention in the current study.

“Overlearning” (treatment continued after performance has reached criterion) has already been demonstrated to have beneficial effects in cognitive skill acquisition in neurologically intact subjects. Moreover, repetition of a newly acquired (or relearned) skill may be a necessary precursor to long-term behavioral and neural changes. In fact it is possible that the plasticity induced by extended practice represents the instantiation of skill within neural circuitry and may be ultimately responsible for maintenance of the acquired skill once training ends (Monfils et al. 2005). Although not well studied in aphasia rehabilitation, a few published reports have suggested that extended practice may be responsible for maintenance of treatment effects (McNeil et al. 1998; Wambaugh et al. 1999). While the mechanisms for such effects are unknown, McNeil and colleagues suggested that overlearning may lead to automaticity and the ‘freeing up’ of cognitive resources such as attention. Because the treatment approach in the current study relies on training a mediating step that introduces an unavoidably inefficient relay mechanism, it presents an ideal situation for examining the benefits of extended practice and its potential to remedy the patient’s dependence on the mediating process. At the same time, we aimed to evaluate changes in the patient’s cognitive and neural strategies by comparing results obtained from functional magnetic resonance imaging (fMRI) pre- and post-treatment, and post-overlearning.

Friedman et al. (1998, 2002) described a re-organization approach to treating phonologic alexia and demonstrated the efficacy of this approach in two case studies. They used paired associate learning to circumvent the impaired orthography-to-phonology reading route in these two patients, taking advantage of their relatively preserved access to phonology through semantics. Words that could not be read at baseline were paired with picture-able nouns that were either homophones (e.g., not/knot, knows/nose) or ‘near-homophones’ (e.g., of/oven, she/sheet). Results demonstrated high success rates for both patients using this re-organization of function approach, as compared to either untrained words or use of a stimulation (repeated practice) approach.

Neuropsychological models of reading suggest that there are multiple parallel pathways from orthographic to phonologic representation, suggesting that the success of this ‘semantic mediation’ treatment depended upon patients’ re-organization of cognitive strategies for reading trained words. Implicit in these models is the notion that, by training patients in a new cognitive strategy, one also elicits a re-organization of neural pathways being utilized to read trained words, but this has rarely been demonstrated in language rehabilitation. Small et al. (1998) did demonstrate functional brain reorganization in a patient with acquired phonologic alexia following a treatment aimed at re-teaching grapheme-to-phoneme correspondences. As the patient moved from a whole word reading approach to a sub-lexical processing approach, she also demonstrated decreased activation in left angular gyrus and increased activation in left lingual gyrus. The authors concluded that their neuropsychological model of reading predicted that this patient’s shift from whole word reading to a de-compositional approach would be accompanied by functional neuroanatomical changes that were measurable by fMRI.

Building upon the earlier case studies demonstrating successful re-organization of function via the semantic mediation strategy (Friedman et al. 2002), we utilized fMRI to investigate the changes in blood oxygenation level dependent (BOLD) signal associated with one patient’s performance during reading of trained and untrained words at three time points (pre-treatment, post-treatment, and post-overlearning). We hypothesized that patterns of activation would change over time, not only owing to greater accuracy in the task, but also due to changes in cognitive strategies as the patient first learned a paired associate task that encouraged semantic processing, and ultimately read overlearned words more automatically. The patient had suffered a stroke in the middle cerebral artery territory and was initially found to have Wernicke’s aphasia and phonologic alexia. We examined differences in activation that could be attributed to semantic mediation treatment as well as to the effects of overlearning.

We expected that there would be no significant differences in behavior (accuracy) or BOLD signal during attempts to read (not yet) “trained” and untrained words at baseline. Post-treatment, we expected to see differences both behaviorally and in BOLD signal on trained vs. untrained words, but not between different sets of trained words (those designated for eventual overlearning versus those designated to be trained-but-not-overlearned). Specifically, we expected to see bilateral activation in regions typically activated during tasks requiring semantic processing, e.g., L inferior frontal, and L middle and inferior temporal cortex (e.g., Vandenberghe et al. 1996). Post-overlearning, we expected to see further differences between trained and untrained words, as well as between trained and overlearned words, reflecting changes in the patient’s cognitive and neural strategies for reading single words. Specifically, we hypothesized that overlearned words would recruit more perilesional cortex, as the patient would be most efficient with this set of words. This study also explored the use of a new method for examining changes in lateralization of function (Wilke and Schmithorst 2006) in the context of functional re-organization after treatment for phonologic alexia.

Methods

Patient YCR

YCR was a right-handed 50-year old African-American man with 15 years education. He joined the study 2 years following a single left hemisphere stroke. A structural T1-weighted MRI scan revealed lesion in cortex and subjacent white matter in temporal and parietal lobes, including Wernicke’s area, posterior middle and superior temporal gyri, angular gyrus, supramarginal gyrus, and portions of inferior and superior parietal lobules (Fig. 1).

Fig. 1
figure 1

Structural T1-weighted MRI scan for patient YCR (phonologic alexia, 2 years poststroke). Lesion was present in cortex and subjacent white matter in temporal and parietal lobes, including Wernicke’s area, posterior middle and superior temporal gyri, angular gyrus, supramarginal gyrus, and portions of inferior and superior parietal lobules

The Institutional Review Board of Georgetown University Medical Center approved the study, and signed informed consent was obtained prior to assessment and treatment. YCR’s language was assessed by the Boston Diagnostic Aphasia Examination 2nd edition (BDAE, Goodglass et al. 2001). He presented with fluent aphasia: 22nd percentile on mean of three auditory comprehension tasks; 20th percentile on word repetition and 35th percentile on sentence repetition tasks; and 33rd percentile on responsive naming on the BDAE. His reading comprehension was similarly impaired: 5th percentile on sentence comprehension; and 20th percentile on sentence/paragraph comprehension. He scored 19/60 on the Boston Naming Test (BNT, Kaplan et al. 2001).

Analysis of reading

In addition to standardized tests of aphasia, YCR received reading tests designed to assess letter knowledge, oral reading, spelling, and recognition of orally spelled words, across several parameters including part-of-speech, length, frequency, and regularity. In addition, ability to read pseudowords was assessed.

Letter knowledge

YCR was able to name 15/26 letters presented in lower case and 16/26 presented in upper case. His identification of letters (ability to point to the letter named by the examiner from an array of 26) was 16/26 for lower case and 18/26 for upper case. When asked whether a written letter was in its correct or mirror image orientation, YCR answered correctly on 52/52 trials. Finally, letter pairs were presented with one letter lower case and the other upper case, and he was asked to determine if the pair represented the same or different letters. He scored 26/26 on same letter pairs and 23/26 on different letter pairs.

Oral reading

Words of different parts of speech were presented individually for oral reading. YCR demonstrated a part-of-speech effect, with better performance reading concrete nouns (59% correct) and adjectives (51%) than abstract nouns (29%), verbs (12%), and functors (7%). Further testing of concrete and abstract nouns also showed a concreteness effect (20/30 concrete vs. 11/30 abstract). YCR had great difficulty reading functors (3/41). He also could not read pseudowords (0/20), although he read real words that differed from the pseudowords by only a single letter moderately well (12/20). He did not show a length effect.

Summary of reading

YCR presented with a phonologic alexia characterized by an inability to read pseudowords, a part-of-speech effect, and a concreteness effect in reading real words. His reading errors were mixed: orthographic and inflectional/derivational errors, but also errors containing some apparently unrelated words, and occasional semantic paralexias. In addition, he tended to make perseverative errors when reading long lists of words.

Treatment experimental design

Two hundred thirty-six words with low imageability (functors and abstract words) were presented for baseline testing on three occasions. Treatment stimuli were selected from the words that YCR was unable to read correctly on at least two of three baseline tests. Of these words, 60 words with low imageability were assigned to three lists:

  1. 1.

    Twenty words that would not be trained (“Incorrect at Baseline”, IB)

  2. 2.

    Twenty target words that would be initially trained to 90% accuracy, and then practiced for a period of time beyond criterion (“overlearned”) in an attempt to prolong maintenance (“Incorrect, to be Trained and Practiced”, ITP)

  3. 3.

    Twenty target words that would be trained to a criterion of 90% accuracy but not practiced beyond that point (“Incorrect, to be Trained; Unpracticed”, ITU)

These three lists of 20 words were matched for frequency distribution (based on logarithmic groups), part of speech, and syllables per word. The two lists of target words to be trained were also matched for number of exact and near homophones. One of these two lists contained slightly less frequent and more multi-syllabic relay words, and was therefore chosen to be the set that would be overlearned, to bias against our hypothesis of overlearned words being better maintained. Of the original 236 words, 32 were consistently read correctly throughout baseline testing, and 20 of these were set aside (“Consistently Correct”, CC) to be probed during functional imaging.

Treatment consisted of three 1-h sessions per week. Each session began with a probe test of the words currently being trained. ITP and ITU words were trained via the semantic mediation strategy (Friedman et al. 2002) described earlier. After reaching criterion of 90% accuracy on two consecutive probe tests, the ITP words, but not the ITU words, were then “overlearned” by continuing practice for an additional 8 weeks.

fMRI data acquisition

High resolution (1 × 1 × 1 mm3) T1-weighted MPRAGE anatomical scans were acquired on a Siemens Trio 3.0T MRI scanner, including 160 slices parallel to the anterior commissure–posterior commissure (AC–PC) line (FOV = 2562, matrix = 256 mm2, TR/TE/TI = 2,300/2.94/900 ms, flip angle = 9°). Functional images were acquired in the same plane (parallel to AC–PC) using a T2*-weighted gradient echo, EPI sequence (TR/TE = 1,500/30 ms, FA = 90°, FOV = 192 mm, 64 × 64 matrix). Twenty-five 5 mm thick slices were acquired with an effective TR of 6.0 s, including a 4.5-s delay inserted between acquisitions.

During each of four 11-min fMRI runs, each condition block (ITP, CC, ITU, IB, or X-a control condition) lasted 60 s: ten stimuli, presented for 4 s, followed by a crosshair (rest/fixation) for 2 s. In order to minimize motion artifact associated with overt speech tasks, while maximizing speech intelligibility in the scanner, a behavior interleaved gradient technique (Eden et al. 1999) was utilized. Data acquisition was delayed until after the overt responses were captured, the latter occurring during the delay period when the gradients were off, thereby capitalizing on the hemodynamic delay. Thus, stimulus presentation began 500 ms prior to the completion of each volume acquisition (TA = 1.44 s). Given YCR’s impaired reading, there was no concern that he might be able to read the next stimulus during the last 500 ms of the previous volume acquisition. The next acquisition began 5 s after onset of the stimulus, taking advantage of the robust positive BOLD response which peaks 5–8 s following stimulus presentation, and resulting in an effective TR of 6.0 s. Overt responses were recorded from the microphone output of the scanner into a digital audio recording and editing software package (Audacity; http://www.audacity.sourceforge.net).

Functional MRI experimental design

Functional MRI scans were acquired while YCR read blocks of words (to be) trained (ITU and ITP) and untrained words (CC and IB) on three occasions: (1) T1: pre-treatment; (2) T2: post-treatment (approximately 4 months after T1); and (3) T3: post-overlearning (approximately 5 months after T2). Scanning sessions consisted of eight blocks of words (the experimental task in two blocks per condition) and two blocks of number and letter strings (the control task) presented in this order: X, ITP, CC, ITU, IB, X, ITP, CC, ITU, IB. In the control blocks, the subject was instructed to simply say ‘letters’ when a string of letters appeared, and to say ‘numbers’ when a string of numbers appeared. This control task was chosen because it requires visual processing of alphanumeric strings and vocalization of a single word, as does the experimental task, but it does not require reading or accessing specific words. The four conditions reflected the four groups of 20 words that were: (1) not trained; (2) trained to criterion; (3) trained to criterion and then overlearned; or (4) consistently correct at baseline. YCR was trained on a different set of words to either read the word aloud, using a one-word response, or to say “pass”. The interstimulus interval was 6 s long, but he was trained to “read the word or say ‘pass’ as soon as possible after the banging noise (scanner) stops”. Each run of ten blocks lasted approximately 11 min, and was repeated four times. The order of the words listed within each block was random, and thus presentation order varied from run to run.

Behavioral analyses

YCR’s overt speech was recorded and analyzed by three raters for response accuracy and reaction time (RT). RT was measured in Audacity as the time from stimulus onset to voice onset.

Functional MRI analyses

Functional MRI data were processed using Matlab 7.0.4 (The MathWorks, Inc, Natick, MA) and the SPM2 software package (Wellcome Department of Imaging Neuroscience). The datasets were slice-timing adjusted to correct for differences in image acquisition time between slices due to the long effective TR used. All time series images were realigned using the middle scan as a reference. The origin (0,0,0) was reset to the anterior commissure. The mean realigned EPI image was co-registered to the 3D MPRAGE using mutual co-registration information with these orientation shifts applied to the realigned EPI time series. Prior to spatial normalization, cost-function masking (Brett et al. 2001) was utilized to mask out YCR’s lesion, effectively excluding the area of lesion and potential distortions from the normalization process. The MPRAGE was spatially normalized to the MNI T1 template. The resulting parameters were then applied to the entire fMRI time series data and images were re-sampled to a 3 mm isotropic voxel size. Spatially normalized functional data were smoothed using an 8 mm isotropic full width half maximum Gaussian kernel.

First level t-tests were performed to investigate significant clusters of task-related activation during the following contrasts: (a) TrainedFootnote 1 vs. Untrained Words (across training sessions); and (b) Overlearned Words vs. all other conditions (T3, post-overlearning). Voxels were regressed against a box-car reference waveform and convolved with a canonical hemodynamic response function (HRF) to approximate the sluggish and blurred nature of the response. A 128-s high-pass filter was applied to remove unwanted low frequency signals, and a first order autoregressive model utilized to correct for serial correlations in time. A cluster-level threshold for statistical significance of p < 0.05, Family-Wise Error (FWE) corrected for whole brain analysis was utilized in this study. Significantly activated voxels were transformed from MNI space to the standard stereotaxic space of Talairach and Tournoux using a Matlab function designed for this purpose (Brett 2003). Graphic imaging was performed using MRIcro software (Rorden and Brett 2000).

In order to examine changes in lateralization index over time, we generated Lateralization Index (LI) curves by using the resulting t-maps from the contrast between (to be) over-learned and untrained words (ITP>IB) for each of the three time points in the LI-toolbox plug-in for spm2 (Wilke and Lidzba 2007). Within the LI-toolbox we chose the bootstrapping algorithm (Wilke and Schmithorst 2006) in order to obtain weighted mean values of lateralization. We then applied a global gray inclusive matter mask. The midline ±5mm was excluded from the volumes to be investigated. Regional masks of the separate lobes were also calculated for analyses of their separate contributions to the LI. In order to correct for the presence of the lesion in one of the hemispheres we also used the clustering and variance weighting options within the toolbox.

Results

Behavioral results

YCR learned to read the trained target words with greater than 90% accuracy, as assessed by probes at the beginning of each session. He required nine sessions, over 4 weeks to hit criterion on the first set of trained words (ITU) followed by four sessions, in just over 1 week, to hit criterion on the second set of trained words (ITP). There was little to no improvement on untrained words (IB). After the second (post-treatment) scan, YCR returned for extended practice on the ITP words. He attended 19 sessions over 2 months and attained 100% accuracy on this set of overlearned words. Although he reached criterion on probes outside the scanner, performance in the scanner was less accurate (Fig. 2). This was likely due to decreased time allowed for responses in the scanner and the instructions to only use one-word responses per trial, or to say “pass”.

Fig. 2
figure 2

YCR scores on experimental word lists (ITP incorrect words to be trained and practiced (“overlearned”) with semantic mediation, ITU incorrect words to be trained-but-not-practiced, IB untrained incorrect words from baseline, CC words consistently correct from baseline) and control condition (X strings of letters or numbers) over time. Percent correct represent mean scores averaged over four runs per condition, except for during baseline testing. No audio recordings were available in the first run, therefore accuracy scores at baseline represent mean scores over three runs

Average reaction times (RTs) were analyzed by condition to ensure that the effects of differences in activation between trained and untrained words were real effects and not due to significantly decreased RTs on trained words, thus sampling closer to the peak of the hemodynamic response after training. Mean RTs for the conditions of interest were as follows: (1) Pre-treatment: ITP = 2.503 s; IB = 2.493 s; (2) Post-treatment: ITP = 1.268 s; ITU = 1.267 s; IB = 1.693 s; and (3) Post-overlearning: ITP = 1.129 s; ITU = 1.222 s; CC = 1.103 s; IB = 1.392 s; X = 0.944 s.

Functional MRI results

Trained vs. untrained words (across training sessions)

Reading of trained and untrained words was contrasted at three time periods: pre-treatment, post-treatment, and post-overlearning (Table 1, Fig. 3). At baseline, there were no statistically significant differences in activation patterns associated with either set of (to be) trained (ITP and ITU) versus untrained (IB) words. Post-treatment, the activation patterns associated with reading of one set of trained words vs. untrained words (ITP vs. IB) included three clusters within the bilateral frontal and left occipital lobes. Clusters of activation were centered in left (−54 30 9; Z = 3.97) and right (54 27 24; Z = 4.21) inferior frontal cortex and in the left lingual gyrus (−9 −81 −12; Z = 5.20). A larger cluster in right inferior frontal cortex (51 21 24; Z = 5.04) was activated preferentially during reading of the other set of trained versus untrained words (ITU vs. IB). In addition, the activation pattern demonstrated by this contrast included clusters in the left and right temporal and right parietal lobes. These clusters were centered in left posterior middle (−63 −54 −6; Z = 4.78) and right anterior inferior temporal gyri (69 −9 −18; Z = 4.45) as well as in the right inferior parietal lobule (45 −48 54; Z = 4.98) in a region homologous to YCR’s lesion.

Fig. 3
figure 3

Semantic Mediation Treatment effect: BOLD signal increases observed during reading of trained and practiced (ITP) or trained but not practiced (ITU) vs. untrained (IB) words at three time points: (1) pre-treatment (T1); (2) post-classical treatment (T2); and (3) post-overlearning (T3). Significant regions of activation shown superimposed on YCR’s reconstructed lateral and medial images at p < 0.001 uncorrected, for display purposes. Numbers in parentheses indicate accuracy on the task

Table 1 Mean activation peaks identified during reading of trained vs. untrained words

Post-overlearning, statistically significant activation patterns associated with reading of overlearned versus untrained words (ITP vs. IB) included four clusters in the bilateral frontal, and left temporal and parietal lobes. The main cluster preferentially activated during this contrast was centered on the left anterior superior temporal gyrus (−33 9 −30; Z = 5.99). This cluster included a smaller peak of activation in left inferior frontal cortex (−36 30 −15; Z = 4.69). Other significant clusters of activation included one perilesional cluster in left superior parietal lobule (−30 −81 45; Z = 5.35), and one cluster in right inferior frontal cortex (54 39 −6; Z = 5.43). At this same time point, the contrast examining differences between reading trained (but not overlearned) versus untrained words (ITU vs. IB) revealed one main cluster of activation in the left anterior temporal lobe. Although a smaller cluster, and not as strongly activated, it was centered on the identical voxel of peak activation (−33 9 −30; Z = 5.30) observed in the contrast examining overlearned vs. untrained words.

Overlearned words vs. other conditions (time 3)

Reading of overlearned words was contrasted with the other two experimental and one control task, (Table 2, Fig. 4). Overlearned versus untrained words (ITP vs. IB) is described above. During reading of overlearned words versus trained-but-not-overlearned words (ITP vs. ITU), a pattern of activation was observed that appears to be a subset of the network activated during ITP versus IB. The main cluster of activation was a perilesional cluster in L precuneus (−21 −81 45; Z = 4.99), centered more medially, but very close to the one observed earlier (ITP vs. IB) in L superior parietal lobule (green squares in Fig. 4). Contrasting overlearned words with words that had been read consistently correctly at baseline (ITP vs. CC) revealed a network of regions including the left frontal, bilateral temporal, and right parietal lobes. There were two large clusters of activation in left inferior (−30 21 −18; Z = 4.78) and middle frontal cortex (−39 33 18; Z = 4.56). Left temporal activation included a large cluster with foci of activation in posterior middle temporal (−54 −45 −18; Z = 5.90) and temporal fusiform (−45 −39 −24; Z = 5.49) and one in anterior superior temporal cortex (−45 15 −27; Z = 5.16). The largest cluster of activation was centered in right parieto-temporal cortex, with foci of activation in right angular gyrus (57 −63 36; Z = 6.46) and right superior parietal lobule (36 −69 60; Z = 6.53). In the contrast of overlearned words with the control task (identifying strings of, and saying, “letters” or “numbers”), one cluster of activation survived corrections for multiple comparisons. This cluster had a nearly identical focus of activation in left anterior superior temporal cortex (−48 12 −27; Z = 5.26) to that observed earlier (red squares in Fig. 4).

Fig. 4
figure 4

Overlearning effect: BOLD signal increases observed during reading of: (1) trained and practiced (ITP) words vs. untrained (IB) words; (2) ITP vs. trained but unpracticed words (ITU); (3) ITP vs. words that were consistently correct from baseline (CC); and (4) ITP vs. the control task. Significant regions of activation shown superimposed on YCR’s reconstructed lateral images at p < 0.001 uncorrected, for display purposes. Numbers in parentheses indicate accuracy on the task

Table 2 Mean activation peaks identified during reading of overlearned words vs. other conditions at time 3 (post-overlearning)

Lateralization index

Given the threshold dependency and inherent unreliability of the classical lateralization index (LI), a new method for computing the LI (Wilke and Lidzba 2007) was used to examine changes in lateralization curves across increasingly stringent thresholds (Fig. 5). Figure 5 demonstrates the variable nature of the LI across the whole brain (gray matter), over varying thresholds, and between training sessions using the contrast of overlearned versus untrained words. Rather than the sparse information provided by the classical index reported above, the lateralization curves demonstrate a trend in shifting patterns of lateralization over the course of treatment. In addition, the mean LI is weighted to take into account lateralization index values obtained at higher thresholds. In other words, the LI weighted mean applies more weight to values obtained from more statistically significant clusters of activation. Thus, using this new method, compared to untrained words, the set of words that would be/were trained and practiced (overlearned words) shifted from a pattern of predominantly right-lateralized activation pre-treatment (LI weighted mean = −0.42) to strongly right lateralized post-treatment (LI weighted mean = −0.84) to predominantly left-lateralized post-overlearning (LI weighted mean = 0.3).

Fig. 5
figure 5

Lateralization curves for gray matter for YCR in contrasts examining words (to be) trained and practiced vs. untrained words (ITP vs. IB) at three different time points: (1) T1 (blue: pre-treatment); (2) T2 (green: post-treatment with Semantic Mediation); and (3) T3 (red: post-extended practice [i.e., overlearning]). Curves produced using bootstrapping method (Wilke and Schmithorst 2006) of examining lateralization over varying thresholds for the same contrasts. Numbers in parentheses indicate weighted means (WM)

In addition to utilizing a whole brain inclusive mask, the LI was calculated using standard masks of individual lobes and the cerebellum for analysis of their separate contributions. Separate regional analyses revealed that the main contributor to a shift to left lateralization post-overlearning was parietal activation (strongly left lateralized across all thresholds). Frontal and cingulate activations remained strongly right lateralized across thresholds, and increasingly so from pre-treatment to post-treatment to post-overlearning. Across these same timepoints, temporal activations shifted from left-lateralized to strongly right-lateralized, and back to left-lateralized.

Discussion

In this study, a patient with phonologic alexia, 2 years post left temporoparietal stroke, participated in a behavioral treatment designed to recruit the relatively preserved semantic route in order to improve his reading of words with low semantic valences. After reaching criterion on two sets of trained words, he continued a period of extended training (“overlearning”) on one set of trained words. Over the course of therapy, he also participated in three fMRI sessions (pre-treatment, post-treatment, and post-overlearning).

Pre-treatment, as predicted, there was equivalent (low) accuracy on sets of words (to be trained or not), and no statistically significant differences in BOLD activation in contrasts comparing these different sets of words. Post-treatment, the patient’s accuracy on trained, but not untrained, words improved. In contrasts comparing both sets of trained to untrained words, a bilateral pattern of frontal and temporal lobe activation was observed, but larger and more significant clusters of activation were recruited in the right hemisphere, including right inferior frontal, inferior parietal, and anterior inferior temporal cortex. Post-overlearning, accuracy on all trained words continued to improve, with overlearned words seeing the greatest improvement. In the contrast examining overlearned vs. untrained words, predominant foci of activation appeared to “shift” to the left hemisphere, including perilesional activation in superior parietal cortex, and a cluster that included left inferior frontal and anterior superior temporal cortex.

In order to appreciate the significance of these apparent shifts in activation during the course of learning (and subsequently overlearning) a new reading strategy, it may be helpful to situate these results within the context of what is known (and what remains elusive) in current neurological and cognitive models of single word reading. As Price et al. (2003) notes, cognitive models of reading suggest at least two different reading routes, but so far, neither lesion studies nor neuroimaging studies of normal readers have been able to precisely identify the corresponding neural systems.

We hypothesize that normal literate adults take advantage of a direct whole word orthographic-to-phonologic lexical processing strategy when reading single words. Such processing includes early visual analysis, letter and word form recognition, access to semantics/concepts, phonological retrieval, and articulatory planning and execution. Functional neuroimaging studies examining single word reading in this population suggest that regions normally participating in this endeavor often include the following: bilateral primary visual cortex, bilateral/predominantly left fusiform/occipito-temporal cortex, bilateral posterior superior, middle, and inferior temporal cortex, left inferior frontal cortex, and premotor and motor cortices (Price 2000; Jobard et al. 2003). This list is not exhaustive, because, as Price et al. (2003) notes, functional neuroimaging of normal subjects is likely to reveal only prepotent systems. That is, of the many coarse and degenerate (one-to-many, and many-to-one) possible mappings of anatomical substrate onto neural computation, and neural computation onto complex behaviors (Mesulam 1990), functional neuroimaging when averaged over subjects may not reveal uncommon or degenerate pathways. Another limitation is that functional neuroimaging of normal subjects can only reveal sufficient systems, i.e., it does not contribute to constraining a model of which regions are necessary for single word reading.

Lesion studies, which do reveal areas necessary for a task, introduce their own limitations, notably that lesions can be large, and are constrained by vascular, rather than cognitive architecture, making it difficult to determine which precise areas within a lesion may be causing abnormal behavior. Indeed, the abnormalities may be due to disconnection of distant, unaffected cortex, and not related to the lesion location per se. Nonetheless, the lesion method has contributed to our understanding of the normal cognitive architecture underlying single word reading. It is clear, for example, that patients with lesion in posterior superior temporal and/or angular gyrus, putatively involved in generating auditory word representations from visual word representations, tend to exhibit phonological alexia characterized by an inability to perform spelling-to-sound conversion, and a relative advantage for reading real words, especially concrete and/or semantically rich words.

Our patient, YCR, fit this same profile. What we hoped to gain from acquiring functional neuroimaging data before and after a treatment program designed to alter his behavioral strategy for single word reading, was a window into potential changes in his cognitive and/or neural strategies that correlated with treatment-induced improvements in reading. The functional neuroimaging data do suggest that behavioral changes were accompanied by evidence of treatment-induced neural plasticity. Compared to mostly unsuccessful attempts at reading of untrained words post-treatment, YCR’s improvement in reading trained sets of words was accompanied by a bilateral pattern of frontal and temporal lobe activation, with larger and more significant clusters of activation recruited in the right hemisphere, including right inferior frontal, inferior parietal, and anterior inferior temporal cortex.

This predominant right hemisphere activation may reflect the nature of the cognitive strategy that was inherent in the treatment, i.e., in taking advantage of a semantic route using a paired associate paradigm. As Hillis (2006) suggests, rather than imagining a right hemisphere “take-over of function” in the early stages of recovery, it is more plausible that some patients may rely more on normal right hemisphere functions to perform certain tasks for which the right hemisphere is capable of contributing in a beneficial way. This increase in right hemisphere activation post-treatment may thus reflect a re-organization in cognitive strategies supporting single word reading, rather than re-organization of neural structure/function relationships. It is also plausible that the choice is not either cognitive or neural, but rather that the right hemisphere activation was due to a combination of the release of inhibition of the dominant left hemisphere along with changes in strategy due to treatment.

YCR’s markedly improved reading of trained words was not, however, demonstrated in the scanner. Post-treatment, in the scanner, he was still making errors on every other word, while recruiting predominantly right hemisphere activation. One probable reason for YCR’s dip in performance compared to his post-treatment probes outside of the scanner is the time constraint imposed in the scanner. During probes, he was allowed to take his time, to self-correct if necessary, and he often benefited from semantic priming. For example, in responding to the word “fare”, he might say, “money...when I came today...bus...fare”. Because of the issues surrounding overt speech in the scanner, YCR was trained to give one-word responses, and if he could not read the word within the time limit, to just say, “pass”. While mitigating the potential artifacts from motion, this also lowered performance in the scanner.

Post-overlearning, the rise in performance in the scanner to 89% correct correlated with a shift to predominantly left hemisphere activation. This shift in hemispheric lateralization and YCR’s near-normal performance under the time constraints of the scanning protocol suggest that a shift in cognitive strategies may have taken place during overlearning. The treatment required a mediating step—linking a target word with a highly picture-able homophonic or near-homophonic word. One goal of extended practice beyond criterion was to make the task of reading these words automatic, possibly rendering the mediating step no longer necessary. It is possible that as the task became more automatic for YCR, he was able to perform it via a more normal processing route that was more heavily dependent on remaining, viable left hemisphere structures. While normal controls would tend to recruit a network including parietal cortex that lies within YCR’s lesion, it is apparent from his highly accurate performance in the scanner that a network which included a cluster in perilesional parietal cortex co-activating with anterior superior temporal cortex was sufficient for the task.

Because the treatment involved a mediating step that took advantage of YCR’s intact semantic reading route, recruitment of this particular region of temporal cortex (L temporal pole—BA 38) is likely related to semantic processing. Indeed, DeLeon et al. (2007) recently found that degree of hypoperfusion of this area in acute ischemic stroke (within 24 h) was most highly correlated with semantic deficits in tests of picture and tactile naming and word/picture verification. Moreover, this region has previously been observed to be activated in functional neuroimaging studies examining naming of concrete objects in healthy subjects, especially in males (Grabowski et al. 2003).

Overlearning, in comparison to no training, was correlated with predominant activation in the left hemisphere in this patient. A shift of language function to the left hemisphere has previously been correlated with better recovery in patients (Rosen et al. 2000; Saur et al. 2006). Patients with small lesions in regions necessary for a particular language function may have the best chance in terms of recovery due to perilesional take-over of function. This was the case in Rosen and colleagues’ study, where two of the six patients with left inferior frontal lesions who showed the best recovery also had the smallest lesion extent. Certainly the smaller lesions make possible a lateralization shift back to the left hemisphere. What is remarkable in YCR is this pattern of predominant left hemisphere activation post-overlearning that includes perilesional parietal cortex, in the context of his massive temporoparietal lesion.

What is the mechanism for changes in neural plasticity underlying YCR’s improvements in single word reading and how do they compare with other treatment-induced studies of functional reorganization? As alluded to earlier, Small et al. (1998) also conducted a neuroimaging treatment study of a patient with phonologic alexia. There are several reasons why a direct comparison of these studies is difficult. Most notably, their patient’s lesion was fronto-temporal (not temporoparietal) and her left angular gyrus was intact. Also, the treatment program focused on teaching their patient a sublexical, i.e., de-compositional approach to word reading, whereas YCR continued to use a whole word approach to reading. It is also not clear how much improvement their client made or how accurate her performance was in the scanner, as these are not reported. Finally, the experimental and control tasks were very different. Their patient silently read a version of the Token Test developed as a picture/word verification task for fMRI studies. Nonetheless, there are some intriguing similarities and differences in the changing patterns of activation induced by these two different treatment methods in two different patients.

Pre-treatment, their patient primarily activated the left angular gyrus of the inferior parietal lobule (BA 39), with a center of activation at Talairach coordinates (−51, −64, 36). The authors state that her reading strategy pre-treatment was a whole word (lexical) approach. Our patient, YCR, recruited a similar (although homologous) cluster of activation within a network of regions activated during a comparison of overlearned vs. consistently correct words. That is, compared to words that he consistently read correctly and therefore were untrained, YCR recruited a network of regions that included the right angular gyrus with a center of activation at Talairach coordinates (56, −59, 36). It is tempting to conclude that the angular gyrus—in either hemisphere—forms part of a sufficient network activated during whole word reading. In YCR’s case, the left angular gyrus is in his lesion, therefore recruitment of the right angular gyrus may be indicative of functional reorganization of the type involving homologous area adaptation. At any rate, it is specific to experience-dependent plasticity, as this activation occurred in a contrast between overlearned vs. consistently correct words.

Post-treatment, after learning a phonological, de-compositional approach to word reading, the patient in Small et al. (1998) demonstrated a shift in predominant activation to the left lingual gyrus (BA 18), with a center of activation at Talairach coordinates (−16, −79, −10). The authors suggest that her learning of grapheme-to-phoneme correspondences led to a greater use of a phonological/sublexical strategy which was neurally instantiated in both a decrease in activation in left angular gyrus and an increase in activation in left lingual gyrus. In YCR’s case, compared to untrained words post-treatment with a semantic mediation whole word approach, reading of trained words activated a cluster in left lingual gyrus with a similar center of activation at Talairach coordinates (−9, −79, −6). Whereas Small and colleagues suggest that functional brain reorganization following therapy in their patient led to unmasking of occipital circuits normally recruited in early stages of phonological reading, we do not think that semantic mediation therapy produced this same effect in our patient. Rather, as Price (2000) notes, lingual gyri activation is not specific to reading because it is also activated by picture naming. Since this area was significantly activated post-treatment, but not post-overlearning, we suggest that lingual gyrus activation in our patient was indicative of the mediating step of recalling the paired associate, including its picture and name.

While these two studies of treatment-induced functional reorganization have more differences than commonalities, they each contribute to a modest but growing literature demonstrating how individuals with specific neuropsychological deficits can be retrained with theoretically motivated therapy programs, and how fMRI can be used to measure therapy-induced language improvements. Such studies are important first steps in translational research, i.e., in translating what is known about neural plasticity from animal models into models of language recovery and rehabilitation in humans. One of the principles of experience-dependent neural plasticity that emerged from animal models, for example, is that repetition matters (Kleim and Jones 2008). Rats trained on a skilled reaching task did not immediately demonstrate increases in synapse number or map reorganization, despite behavioral improvements (Kleim et al. 2002). Monfils et al. (2005) hypothesized that changes in neural plasticity evoked by repetition actually represent the instantiation of skill within neural circuitry. In the case of YCR, extended practice led to overlearning a set of words, and the functional reorganization that accompanied this behavioral change may represent “...a surrogate marker of functional recovery indicative of behavioral change that is resistant to decay” (Kleim and Jones 2008, p. S229). It may be too soon to tell, but during maintenance testing 1 year post-treatment, YCR was still reading the overlearned set of words at 80% accuracy.

The current study also demonstrates use of a new tool for reliably examining shifts in lateralization (Wilke and Schmithorst 2006) in the context of functional reorganization in poststroke aphasia. A common practice in attempts to quantify changes in hemispheric lateralization that correlate with aphasia recovery has been to compute a laterality index (LI): ((left − right)/(left + right)) using the number of significantly activated voxels at a given threshold for the whole brain. In using a laterality index in this population, one must be aware of some potential issues that threaten validity. Notably, there are at least three concerns related to counting activated voxels in patients with unilateral lesions: (1) the LI depends on the number of voxels remaining in the left hemisphere, i.e., the larger the lesion, the more potentially biased the LI is toward right lateralization; (2) finding more activity in one hemisphere may be interesting, but ‘more is not always better’, i.e., greater activation can also be associated with abnormal, or maladaptive, function (Belin et al. 1996; Perani et al. 2003; Naeser et al. 2004); and (3) the whole brain LI glosses over the significance of critical regions within the language network. For these reasons, reporting LI may not always be warranted.

In longitudinal single-subject treatment studies, however, it is informative to track changes in patterns of activation that correlate with patients’ improvement in language function. There is as yet no approach to describing laterality as changes in a network, and fMRI does by design take a segregationist instead of an integrative approach. This does not invalidate the approach itself but rather defines the questions that can be answered by it and poses limits to the interpretation of results. Networks can be assessed by more elaborate approaches like dynamic causal modeling or structural equation modeling, but neither is feasible in this setting due to the single-subject setup. We therefore chose this approach, and the results of our LI calculations are in line with the observable behavioral changes.

Even when computing an LI is warranted, there are potential issues with the traditional method that should not be overlooked. One problem with this method is its dependence on an arbitrary threshold for determining statistical significance after correction for multiple comparisons. That is, there are a number of different correction methods and while each of these is legitimate, they yield different numbers of significantly activated voxels, and thus different laterality indices. For example, in the contrast of overlearned words compared with the control task, we observed different “shifts in activation” between post-treatment and post-overlearning that depended on which correction method was utilized. Using the most conservative method (p < 0.05 FWE), there appears to be a shift from left lateralized (LI = 1.0) to bilateral (LI = 0.14). With a less conservative method (p < 0.05 False Discovery Rate), the index shifts from bilateral (LI = 0.13) to even more strongly bilateral (LI = 0.04). Using a cluster level family-wise correction (p < 0.05 FWE, cluster level), the shift seems to be from right lateralized (LI = −0.70) to left lateralized (1.0).

Although many studies of aphasia recovery still report lateralization indices based on this classical approach, there are several problems with leaving the analysis at this static level. The classical method essentially counts voxels at one threshold, regardless of the statistical significance, or weight, of the cluster. This seems counter-intuitive since degree of statistical significance gives us relative confidence regarding the probability of a real effect. A good example of this is demonstrated by trained (ITP) vs. untrained (IB) words post-treatment. The classical LI would assume that the two left hemisphere clusters (left inferior frontal and lingual cortex, each having 49 voxels) had about the same weight as the single cluster in right inferior frontal cortex (85 voxels). It, therefore, produces an evenly bilateral index (LI = 0.07). This classical LI appears to be supported by the activation map (Fig. 3). In reality, however, both of these representations are snapshots of the contrast examining trained versus untrained words, and only reveal a part of the story. A finer grained picture can be found in the LI curve for this contrast (Fig. 5, green line) which reflects the fact that the activation in the right inferior frontal cluster (p = .004, FWE corrected) is more statistically significant than the two left hemisphere clusters (both p = 0.046, FWE corrected). Thus the LI curves weighted mean (−0.84) in Fig. 5 clearly shows right lateralized activation, which is a truer picture, giving greater confidence in the results of this contrast over a number of thresholds.

In classical approaches, an equally thorny problem could occur when only small samples survive the correction for multiple comparisons. In the most extreme case, one remaining activated voxel would yield a lateralization index of ±1. Unfortunately, as Wilke and Schmithorst (2006) note, this is “...not a plausible scenario, biologically, statistically or computationally” (p. 524). In fact, several equally legitimate methods for correction of multiple comparisons are available, depending on the user’s preference for sensitivity (e.g., using the False Discovery Rate correction) versus specificity (e.g., using the Family-Wise Error correction) in identifying functionally specialized regions of cortex. Each yields different patterns of activation along with different classically computed LIs. The weighted means mitigate this threshold-dependency problem.

The LI toolbox (Wilke and Lidzba 2007) offers an alternative method for assessing lateralization without the shortcomings of classical approaches. As with any new tool, however, the results are constrained by the analytical methods. For example, we opted to analyze all three sessions together, rather than individually. This has consequences for variance weighting, one of the built-in approaches for dealing with data sparsity and statistical outliers. Indeed, optional use of clustering and variance weighting, intended to mitigate possible outliers, also can have a considerable effect on the curves. Clustering does this by smoothing, and variance weighting by “down-weighting” voxels with high variance, i.e., those that are not a good fit to the model. Use of these techniques should provide a “stabilizing influence” on the inherent trend to lateralization. Nonetheless, we will continue to explore the effects of these optional techniques on analysis and interpretation of the data.

In summary, a patient with phonologic alexia, 2 years post left MCA stroke, participated in a behavioral treatment designed to recruit the preserved semantic route in order to improve his reading of semantically impoverished words. He was tested with fMRI pre- and post-treatment, and after a period of extended practice (“overlearning”) in order to examine functional reorganization following treatment. Shifts from bilateral, to right lateralized, to left lateralized were observed at these three time points in contrasts examining trained and overlearned words versus untrained words.

Attempts to frame the question of functional reorganization in post-stroke alexia (or aphasia) in terms of a dichotomous left/right choice likely trivialize the very complex and dynamic nature of functional recovery in individual patients. Imaging snapshots, whether frozen by choice of time in recovery or by thresholding methods, may obscure the evolving roles of perilesional and homologous cortex. Alternative methods of calculating a laterality index (e.g., Wilke and Schmithorst 2006) may provide a more holistic view of hemispheric lateralization. These methods also remind us to avoid over-reliance on static thresholds when using functional neuroimaging methods to explore the elusive questions surrounding functional re-organization after stroke. Finally, it is also possible that the changes that appear to be shifts in lateralization evoked by treatment-induced neural plasticity may be influenced by physiological fluctuations. For this reason, further research is necessary in order to clearly define the natural course of events in larger groups.