Introduction

Stroke is a leading cause of long-term acquired disability in adults (Ma et al. 2014; Mendis 2013). Over the coming decades the incidence of post-stroke disability is only expected to increase, with the aging of worldwide populations, the rising prevalence of cerebrovascular risk factors (NCD Risk Factor Collaboration 2016; Ng et al. 2014), and medical advances continuing to reduce post-stroke mortality rates (Koton et al. 2014). Historically, stroke rehabilitation has emphasized physical rather than cognitive recovery (Park and Yoon 2015). This is problematic, as post-stroke cognitive deficits are pervasive (Cumming et al. 2013b) and enduring (Barker-Collo et al. 2012; Chahal et al. 2011), and stroke survivors identify efforts to detect and improve cognitive impairment as one of their top priorities (Andrew et al. 2014; Pollock et al. 2012).

Cognitive deficits can significantly impair everyday functional capacities (Saxena et al. 2007; Stephens et al. 2005), social engagement (Njomboro 2017; Yuvaraj et al. 2013), and quality of life (Hochstenbach et al. 2001; Park et al. 2013a), irrespective of stroke severity (Tatemichi et al. 1994; Wagle et al. 2011). Cognitive impairment also diminishes the efficacy of rehabilitation interventions (Nys et al. 2007; Paolucci et al. 1996), with the cost of stroke care in patients with cognitive impairment three times higher than those without (Cumming et al. 2013a; Douiri et al. 2013).

Recovery following stroke occurs through a combination of spontaneous and learning dependent remediation approaches, including restitution, substitution, and compensation (Kwakkel et al. 2004). Despite growing interest in new pharmaceutical agents, brain stimulation, stem cell and tissue engineering technologies, and brain–computer interface systems (Brewer et al. 2013; Frisoli et al. 2016; Wu et al. 2017), cognitive remediation (CR) programs remain the most common approach to treating cognitive impairment (Pollack and Disler 2002; Stringer 2003). These programs use either cognitive training or cognitive rehabilitation paradigms. Cognitive training involves completing highly structured and repetitive cognitive tasks with the aim of improving specific cognitive abilities, whilst cognitive rehabilitation engages a broader set of abilities in more “real-world” settings, to address the functional performance goals of the individual (Bahar-Fuchs et al. 2013). Seminal reviews by Cicerone and colleagues highlighted the efficacy of CR approaches for post-stroke cognitive deficits (Cicerone et al. 2000, 2005, 2011). While subsequent reviews (Table 1) have often reproduced these findings, there has been limited progression in our understanding of how best to optimize the design and implementation of CR programs to maximize efficacy. Additionally, an objective comparison of cognitive training and cognitive rehabilitation approaches to remediation of post-stroke deficits is overdue. Finally, methodological weaknesses have persisted in this research that require addressing.

Table 1 Summary of past reviews and meta-analyses of post-stroke cognitive remediation (in chronological order)

A particular problem in previous reviews has been the inclusion of combined cohorts of both traumatic brain injury (TBI) and stroke survivors (Chung et al. 2013; Park and Ingles 2001; van de Ven et al. 2016; van Heugten et al. 2012; Weicker et al. 2016) despite evidence of different CR outcomes between the two (Elliott and Parente 2014; Miklos et al. 2015; Rohling et al. 2009; Virk et al. 2015). Disparity in age, mechanism of injury, and neuropathology may account for these differences in outcome between stroke and TBI (Herrmann et al. 2000), and the grouping of mixed acquired brain injury (ABI) patients may obscure unique effects within each population.

Second, many past reviews included cohort studies (Bogdanova et al. 2016; Cicerone et al. 2000, 2005, 2011; Elliott and Parente 2014; Park and Ingles 2001; Poulin et al. 2012; Rohling et al. 2009; van de Ven et al. 2016). Good quality control group designs are critical in brain injury research because spontaneous recovery can and does occur, and retest effects are common (Elliott and Parente 2014; Miklos et al. 2015; van de Ven et al. 2016). Previous reviews of CR in stroke have confirmed these concerns, reporting larger effect sizes for single group treatment studies compared with randomized controlled trial (RCT) designs and practice effects from repeat testing in control conditions (Park and Ingles 2001; Rohling et al. 2009). Notwithstanding this, the ideal control group remains unclear. A passive control group (i.e., a “no treatment” group) can correct for practice effects and spontaneous recovery, while active control groups (i.e., “sham” treatment) can also correct for placebo and Hawthorne effects (van de Ven et al. 2016).

Third, past reviews have tended to focus on outcomes within a specific cognitive domain, despite evidence that stroke can impact cognitive domains differentially (Cumming et al. 2013b; Tatemichi et al. 1994) and that recovery rates vary across different cognitive domains (Cicerone et al. 2000, 2005, 2011; Hurford et al. 2013; Rohling et al. 2009). Unfortunately, inconsistencies in inclusion and exclusion criteria, and variability in outcome assessment methods between different reviews currently confound attempts to make meaningful within- and between-domain comparisons. A single review simultaneously examining multiple cognitive domains would resolve this issue. In addition, confining analysis of CR to single cognitive domains is contrary to long-standing conceptualizations of cognitive domains as highly overlapping and hierarchically interconnected networks of function (Lezak et al. 2012; Spearman 1904).

Fourth, there remains an unmet need to move beyond the “simple question” of whether CR is effective (Cicerone et al. 2005, 2011), and examine what design and implementation factors make interventions most effective. Empirically derived recommendations regarding intervention design choices such as duration and frequency of therapy (aka “dose”) have recently been identified for CR in healthy older adults (Lampit et al. 2014); similar empirical evidence is lacking for stroke survivors. Fifth, the durability of immediate post-intervention gains following CR is under-examined and unclear, with conflicting reports of sustained improvement (e.g., Weicker et al. 2016) and no maintenance of gains (e.g., Loetscher and Lincoln 2013; Virk et al. 2015). Certainty regarding the sustained benefits is necessary if CR is to be considered an effective component of post-stroke care.

To address the identified shortcomings of past research, the aim of the current systematic literature review and meta-analysis was to evaluate the impact of CR exclusively in RCT studies of stroke survivors, analyzing immediate and longer-term outcomes across a range of cognitive and non-cognitive domains (Fig. 1). Combined with an analysis of intervention design and implementation factors that may moderate treatment efficacy, the current review aimed to provide a more valid and clinically meaningful set of conclusions to inform researchers, practitioners, and the patients they treat.

Fig. 1
figure 1

PICO question and the main variables included in the systematic literature review and meta-analysis

Methods

The current review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement (Liberati et al. 2009). Prior to commencement, protocol details for the review were registered with the online International Prospective Register of Systematic Reviews (PROSPERO ID: CRD42017076385).

Data Sources and Search Strategy

AMED, CINAHL, Cochrane Library, EMBASE, MEDLINE, PsycEXTRA, PsycINFO, Science Direct, and Scopus indexing databases were systematically searched from inception to 6 December 2017. Combinations of the following medical subject headings (i.e., MeSH terms) and key words were used across all databases: aneurysm, bleed, cerebral artery disease, cerebrovascular accident, brain embolism, brain cortex lesion, aneurysm rupture, artery rupture, brain artery aneurysm, vein rupture, hemiplegia, paresis, stroke patient, stroke rehabilitation, artery occlusion, middle cerebral artery occlusion, internal carotid artery occlusion, venous thromboembolism, thromboembolism, thrombosis, subarachnoid hemorrhage AND cognitive rehabilitation, cognitive remediation therapy, cognition, cognitive enhancement, cognitive feedback, cognitive intervention, memory training, brain training, cognitive strategy, cognitive defect, cognition, mental function, cognitive retraining, cognitive treatment, neuropsychological, neuropsychology, cognition assessment, cognitive function test, problem solving. To illustrate, the full electronic search strategy for the EMBASE database is included in Appendix Table 5.

Inclusion and Exclusion Criteria

Eligible RCT studies met the following inclusion criteria: (i) were specific to a stroke population; (ii) delivered a CR protocol; (iii) included participants affected by a cognitive deficit(s) following stroke; (iv) measured outcomes using a validated measure of cognitive function; (v) evaluated treatment superiority relative to a treatment as usual, placebo, or waitlist control group; and (vi) were published in English in a peer-reviewed journal. Studies were excluded that recruited a mixed study cohort including non-stroke cases, included cases under 18 years old, or did not include a randomized control group. There is currently limited consensus regarding recommendations for the frequency and intensity of cognitive remediation; however, common practice suggests that interventions require regular and repeated practice (Bahar-Fuchs et al. 2013). Thus, for the current meta-analysis, studies where the intervention consisted of a single treatment session were excluded. Randomized controlled non-inferiority trials were also excluded, that is, those studies that aimed to establish whether CR was equivalent to or no worse than another accepted treatment approach (e.g., Cherney 2010). Finally, due to the inability to isolate the specific effects of CR, studies were excluded that applied a “hybrid” approach, for example combining CR with a virtual reality intervention (e.g., Kim et al. 2011) or brain stimulation (Park et al. 2013b).

Identification of Relevant Studies and Data Extraction

The eligibility assessment was performed independently by two of the authors (JR and RF) using a standardized protocol. After deleting duplicate papers, the title and abstract of all studies were screened by the authors to assess suitability for inclusion. Those considered potentially eligible were read in full. Hand searching the reference lists of relevant reviews, meta-analyses and included studies were also used to identify potentially relevant publications, which yielded three additional relevant studies (Barker-Collo et al. 2009; Winkens et al. 2009; Young et al. 1983).

For articles meeting inclusion criteria, two of the authors (JR and RF) extracted data on study design, intervention characteristics, participant characteristics, and outcomes at post-intervention, and follow-up, when available (Fig. 1). Disagreements between reviewers were resolved by consensus, with the senior author (PW) as arbitrator. Cognitive outcomes measured at the impairment level (World Health Organization 2017) were categorized using accepted typologies (Donovan et al. 2008; Lezak et al. 2012; Strauss et al. 2006), including: general cognition (e.g., Mini-Mental State Examination); processing speed (e.g., Mental Slowness Questionnaire); attention (e.g., Digit Span Forward/Backward); language function (e.g., Western Aphasia Battery); visuospatial and perceptual skills (e.g., Picture Completion); memory (e.g., Wechsler Memory Scale); and, executive function (e.g., Delis-Kaplan Executive Function System). Cognitive outcomes measured at the activity or participation level (e.g., Cognitive Failures Questionnaire), and other non-cognitive measures of impairment (e.g., mood questionnaires), activity (e.g., disability scales) and participation (e.g., quality of life instruments) were also extracted, when available.

Each included study could contribute to one or more outcome measures. When a study reported on more than one instrument for an outcome measure (e.g., multiple measures of attention), all results were combined into a single mean effect size (Borenstein et al. 2009). Furthermore, when there were multiple comparison groups (e.g., both an active and a passive control group), the active control group, presumably controlling for more potential confounding factors, was selected (e.g., Katz and Wertz 1997).

Quality Assessment

Two authors (JR and RF) assessed the risk of bias of each included study using the Physiotherapy Evidence Database (PEDro) Scale (Maher et al. 2003). The 11-item PEDro Scale rates methodological quality across the domains of Selection, Performance, Detection, Information, and Attribution biases (Kamper et al. 2015). Studies with PEDro total scores 6 and above are typically considered high quality (Sherrington et al. 2002). The current review adopted a slightly more stringent threshold of 7 or above, while papers with a rating of 6 and below were classified as low quality. Disagreements between reviewers were resolved by consensus.

Quantitative Analysis

From the published manuscript, post-intervention means and standard deviations on each outcome measure, p values, and sample sizes for the experimental and control groups were entered into Comprehensive Meta-Analysis (CMA; Biostat, Englewood, NJ, USA) version 3.3.070). Heterogeneity was formally assessed with the Q (the distribution of observed effects) and τ2 (the absolute variance of true effects) statistics (Borenstein et al. 2017). The risk of publication bias was assessed qualitatively by examining funnel plot asymmetry, and quantitatively using Egger’s regression test (2-tailed p value) and computation of the Classic fail-safe N (Egger et al. 1997).

A random effects model was used to compute the effect size estimate Hedges’ g, a variation of Cohen’s d that corrects for small sample sizes. The magnitude of Hedges’ g was categorized as follows (Cohen 1988): small (≥0.2), medium (≥0.5) and large (≥0.8). Pooled effect sizes were calculated by aggregating the mean effect sizes weighted by each study’s sample size, and the 95% confidence intervals (CI) and z scores were based on the overall mean and standard error. Effect size outcomes favoring CR were assigned a positive value while effects favoring the control condition (i.e., treatment-as-usual) had a negative value. When no means or standard deviations were provided, other reported test statistics (e.g., t, f or p) were converted into Hedges’ g.

Subgroup analyses were performed by calculating Hedges’ g separately for each cognitive domain measured at the impairment level immediately post-intervention, and at follow-up, when available. For studies with longitudinal data, only outcomes that were reported at multiple end points were analysed to compare post-intervention and follow-up effects. A separate sub-group composed of “other” impairment and activity/participation level outcomes was also examined at post-intervention and follow-up.

Random-effects meta-regression analysis using maximum likelihood estimates (2-tailed p value) was also conducted to estimate the likelihood of a given variable moderating observed effect sizes. Seven moderator variables were examined (Table 2). Study quality, recovery stage, study duration, and weekly frequency were examined as continuous variables. Intervention type, control group type, and generalizability of training were treated as dichotomous categorical variables. Generalizability of training was classified as either a trained domain or an untrained domain effect. A trained domain effect was defined as gains made on similar, but untrained tasks (e.g., improvement on untrained attention tasks following attention training). An untrained domain effect referred to gains to dissimilar, untrained domains (e.g., improvement in memory function or quality of life following attention training). Moderator effects were only interpreted for outcomes reported in two or more studies (Valentine et al. 2010).

Table 2 Moderators included in the analysis

Results

Following the selection process depicted in Fig. 2, a final sample of 22 articles was identified for inclusion in this review (Table 3). Two articles were published on the same cohort, the first detailing pre-post outcomes (Aben et al. 2013), and the second the follow-up results (Aben et al. 2014). Data from these two articles were analysed as a single study. One other article used two different treatment groups (Bakheit et al. 2007), and was subsequently divided into separate studies: Bakheit comparison “a” examining intensive (i.e., high frequency) therapy to a control group; and Bakheit comparison “b” examining standard intensity therapy to a control group.

Fig. 2
figure 2

Four-phase PRISMA flow-diagram for study collection, showing the process for identifying and screening of the articles for inclusion in the review and meta-analysis

Table 3 Characteristics of the included studies

Participant Characteristics

The 22 studies yielded 1098 participants including 583 cases receiving CR. Sample sizes ranged from four to 62 participants per group. Only four studies (18%) had less than 10 cases in the CR group (Table 3), while 11 studies (50%) had 20 or more participants in the treatment condition. The average age of participants was 62 years (SD = 7.32 years, range 48–78 years). The average time post-stroke ranged from three days to 6.7 years (M = 16 months, SD = 23 months). This included 10 studies (45%) conducted during the sub-acute stage (≤ 3 months) and 12 studies (55%) completed during the chronic stage (> 3 months). Seven studies (32%) included both ischemic and hemorrhagic stroke patients, two (9%) included only ischemic stroke patients, and 13 (59%) did not report on stroke type. Twelve studies (55%) recruited from inpatient hospital or inpatient rehabilitation settings, while nine studies (41%) recruited from outpatient services or community settings. One study provided no information on recruitment setting (Young et al. 1983).

Cognitive Remediation and Control Group Interventions

Of the 22 studies, CR targeted language function in six studies (27%), attention in four studies (18%), visuospatial and perceptual skills in four studies (18%), memory in three studies (14%), executive function in two studies (9%), and processing speed in one study (5%). In two studies (9%) the aim was to remediate general, non-specific cognitive ability after stroke (Wentink et al. 2016; Zucchella et al. 2014), and thus all impairment level cognitive outcomes were included in trained domain effect analysis.

A variety of methods were used to administer the CR interventions (Table 3). Eight studies (36%) delivered CR via computerized programs, three (14%) used pen and paper activities or workbooks, and seven (32%) others described therapist-led strategy interventions. Four studies (14%) utilized a group therapy approach. Furthermore, 12 studies (55%) evaluated a cognitive training approach, and 10 (45%) a cognitive rehabilitation approach. Typical cognitive training approaches included repeated practice of pen and paper tasks (Weinberg et al. 1977; Weinberg et al. 1982) or computer-based repetitive exercises (Cho et al. 2015; Katz and Wertz 1997; Lin et al. 2014). Cognitive rehabilitation approaches included both group and individual sessions designed to develop thinking skills and strategies (Aben et al. 2014, 2013; Doornhein and De Haan 1998), practice daily living activities, and achieve behavioral goals (Wolf et al. 2016; Worrall and Yiu 2000).

A “passive” control group (either waitlisted for CR or receiving treatment as usual only) was used in eight studies (36%). Treatment as usual was described in limited terms, but typically involved engagement with inpatient or outpatient physical and occupational rehabilitation programs, medical check-ups, and general post-stroke care. In 14 studies (64%) control group participants received further rehabilitation interventions to match the additional time in therapy provided to participants receiving CR. These so-called “active” control group interventions included structured peer support groups (Aben et al. 2014; 2013), computer games (Katz and Wertz 1997), organized recreational activities (Worrall and Yiu 2000), and general discussion with a psychologist (Wentink et al. 2016; Zucchella et al. 2014).

For all CR approaches combined, the mean overall duration was 24 h (SD = 23 h), with a range of 4.5 (Carter et al. 1983) to 80 h (Elman and Bernstein-Ellis 1999). The mean frequency was 3.5 sessions per week (SD = 1.7 sessions), with a range of one (Winkens et al. 2009; Worrall and Yiu 2000) to six sessions per week (Lin et al. 2014). Session length varied from 15 min (Wentink et al. 2016) to 2.5 h (Elman and Bernstein-Ellis 1999), with a mean daily intensity of 63 min (SD = 38 min). The total length of CR varied from two (Prokopenko et al. 2013) to 26 weeks (Katz and Wertz 1997), with an average of 8 weeks (SD = 5.5 weeks).

Outcome Measures

For cognitive outcomes at the impairment level, attention was the most commonly assessed domain, with 10 studies (45%) including a measure addressing this domain (Table 3). Language was the second most commonly assessed cognitive domain, with eight studies (36%). Six studies (27%) measured executive function, five (23%) examined memory outcomes, four (18%) measured visuospatial and perceptual skills, three (14%) measured processing speed, and two (10%) included a measure of general cognition. “Other” outcomes included subjective cognitive failures in three studies (14%), disability in three studies (14%), and mood state or quality of life in two studies (10%).

Risk of Bias

The methodological quality of included studies was generally rated high (see Table 4), with an average PEDro total score of 7.8 (SD = 1.1, range 6–9). Four studies (Aben et al. 2014; Carter et al. 1983; Elman and Bernstein-Ellis 1999; Kim et al. 2014; Worrall and Yiu 2000) were classified as low quality (PEDro total score = 6), mainly omitting details on allocation concealment, and blinding of participants, therapists, and assessors. The Egger’s test was performed to provide statistical evidence of funnel plot asymmetry (Fig. 3), and the intercept value for all outcomes combined was 1.85, p < 0.01 (two-tailed), suggesting pronounced asymmetry and increased likelihood that smaller studies reported larger than average effects (Rothstein et al. 2006). To minimize the risk of publication bias for all analyses, all reported effect size outcomes were based on a random-effects model to give more weight to larger trials (Egger et al. 1997).

Table 4 PEDro scale risk of bias ratings for the included studies
Fig. 3
figure 3

Funnel plot of the overall effect size Hedges’ g versus standard errors

Overall Efficacy of Cognitive Remediation

For all cognitive outcomes combined (Fig. 4), CR approaches had an overall significant small effect compared with control conditions [g = 0.48 (95% CI 0.35, 0.60), p < 0.01]. Heterogeneity approached significance [Q(21) = 32.91, p = 0.05, τ2 = 0.03], but the overall fail-safe N was high at 494, suggesting a robust finding. Meta-regression analysis revealed overall cognitive outcomes were significantly moderated by study quality (b = −0.13, z = −1.97, p = 0.04, R2 = 0.60), with lower quality studies associated with larger effect sizes. The moderating effect of recovery stage (b = −0.01, z = −3.06, p < 0.01, R2 = 0.32) was also significant, with earlier interventions associated with larger effect sizes. Specifically, the largest effect size was observed in the three studies (Bakheit et al. 2007; Zucchella et al. 2014) delivering cognitive remediation within one month of stroke [g = 0.62 (95% CI 0.38–0.86), p < 0.01].

Fig. 4
figure 4

Overall effects of post-stroke cognitive remediation on cognitive outcomes

Study duration (b = 0.01, z = 2.05, p = 0.04, R2 = 0.08) also had a significant moderating effect, with more hours of intervention associated with larger effect sizes. The largest effect size [g = 0.74 (95% CI 0.36–1.12), p < 0.01] was observed in the two studies delivering 60 h of intervention (Bakheit et al. 2007; Lin et al. 2014). However, the most common total duration was 20 h (Weinberg et al. 1977, 1982; Young et al. 1983), associated with the second largest effect size [g = 0.57 (95% CI 0.28–0.86), p < 0.01]. There was no significant moderating effect of generalizability to trained or untrained processes (p = 0.50), control group type (p = 0.56), remediation approach (p = 0.63), or intervention frequency (p = 0.65).

Nine studies (41%) provided follow-up data (Fig. 5), obtained two to 52 weeks later. At follow-up, a small but significant overall effect on cognitive outcomes was maintained [g = 0.27 (95% CI 0.04–0.51), p = 0.02]. Heterogeneity was significant [Q(8) = 16.31, p = 0.04, τ2 = 0.06]. Mixed-effects meta-regression analysis found that variations in study quality (p = 0.10), frequency (p = 0.16), duration (p = 0.18), recovery stage (p = 0.26), remediation type (p = 0.35), control group (p = 0.52), or generalizability to trained or untrained processes (p = 0.90) had no significant moderating effect on cognitive outcomes at follow-up.

Fig. 5
figure 5

Follow-up effects of post-stroke cognitive remediation

Domain Specific Efficacy of Cognitive Remediation

Medium effects were observed on language (g = 0.66) and visuospatial and perceptual (g = 0.75) outcomes (Fig. 6). Small effects were observed for general cognition (g = 0.29), processing speed (g = 0.37), attention (g = 0.40), executive functioning (g = 0.47), and memory (g = 0.47) outcomes. Heterogeneity was significant (p < 0.05) only for memory outcomes [Q(4) = 11.32, τ2 = 0.15]. Improvements in processing speed were observed for CR that targeted general cognition (Wentink et al. 2016), but not for the CR that targeted processing speed deficits specifically (Winkens et al. 2009). Improvements in attention function were observed for interventions targeting either attention (Prokopenko et al. 2013), visuospatial and perceptual skills (Young et al. 1983), or general cognition (Zucchella et al. 2014). Improvements in language outcomes were observed for CR targeting either language (Bakheit et al. 2007; Elman and Bernstein-Ellis 1999) or visuospatial and perceptual function (Weinberg et al. 1977; Young et al. 1983). Improvements in visuospatial and perceptual skills were only observed for CR that targeted these functions specifically (Carter et al. 1983; Weinberg et al. 1977). Improvements in memory function were observed for interventions targeting either memory (Lin et al. 2014) or general cognition (Zucchella et al. 2014). Finally, improvements in executive function were observed for CR targeting either executive function (Kim et al. 2014), attention (Prokopenko et al. 2013), or general cognition (Zucchella et al. 2014).

Fig. 6
figure 6

Domain-specific effects of post-stroke cognitive remediation

Memory outcomes were significantly moderated by recovery stage (b = −0.01, z = −2.97, p < 0.01, R2 = 1.00), with earlier intervention associated with larger effect sizes. Visuospatial and perceptual outcomes were significantly moderated by the frequency (b = −0.59, z = −2.26, p = 0.02, R2 = 0.57), duration (b = −0.08, z = −2.26, p = 0.02, R2 = 0.28), and type of intervention (b = −1.18, z = −2.26, p = 0.02, R2 = 0.15), with lower dose interventions and cognitive rehabilitation approaches associated with larger effect sizes. Finally, follow-up data (Fig. 5) on each specific cognitive domain was either insufficient (k < 2) for analysis or produced a non-significant effect (p ≥ 0.05).

Other Outcome Measures

Six studies (27%) examined “other” impairment level (i.e., mood state) and activity/participation level (i.e., subjective cognitive failures, quality of life, disability) outcomes of CR (Fig. 6). There was a small but significant combined effect [g = 0.25 (95% CI 0.07–0.43), p = 0.01], with improvements observed for CR that targeted either attention (Westerberg et al. 2007) or memory function (Aben et al. 2014, 2013). As heterogeneity was not significant [Q(5) = 5.40, p = 0.37, τ2 = 0.00], no moderator analysis of the overall effect was performed. Four studies included follow-up data, but the effect size was small and non-significant (Fig. 5). Heterogeneity was not significant [Q(3) = 1.12, p = 0.77, τ2 = 0.00], and therefore no moderator analysis was performed.

Discussion

While there has been significant growth in the evidence base for post-stroke CR, much of the review literature to date has failed to capture critical design and implementation factors that may explain treatment effects. These include the clinical and functional differences between stroke and other ABI groups, the relative risk of bias in RCT and non-RCT studies, transfer and durability of effects, and other moderators like dose and time since injury. In response, the current review focused on high quality RCT studies in stroke and systematically evaluated a range of factors that may potentially moderate the effect of post-stroke CR.

General Effects of Cognitive Remediation

Taken together, the results of this review confirm that CR interventions have a positive impact on post-stroke cognitive outcomes measured at the impairment level of function. Specifically, when compared with control groups, stroke patients showed general improvement in cognitive functioning of a small magnitude (g = 0.48; Fig. 4), above and beyond that experienced via natural recovery or treatment as usual. This general effect mirrors the quantitative (Rohling et al. 2009) and qualitative (Cicerone et al. 2000, 2005; 2011; Gillespie et al. 2015; van Heugten et al. 2012) findings of previous reviews that have examined the effect of CR across multiple cognitive domains.

As expected, the current study found the strength of the overall effect of CR was moderated significantly by study quality, with lower quality RCT studies reporting larger effect sizes. Studies at highest risk of bias often overestimate effect sizes (Balk et al. 2002), and it is possible that treatment effects have been inflated in past reviews that included (lower quality) non-RCT designs (Bogdanova et al. 2016; Cicerone et al. 2000, 2005, 2011; Elliott and Parente 2014; Miklos et al. 2015; Park and Ingles 2001; Rohling et al. 2009; van de Ven et al. 2016). As the field has progressed over time, decision making can now draw exclusively on Level 1 and 2 evidence (Howick et al. 2011), and lower quality research should be interpreted with caution.

Risk of Bias

To maximize the quality of evidence in this review, all of the included studies were Level 1b (RCTs) to Level 2b (small RCTs). Using a modified PEDro cut-off ≥7 (Table 4), the overall quality rating of included studies was also generally high (18 of 22 studies). However, even studies rated as high quality still require considered critical appraisal, as no study satisfied all quality assessment criteria, and each still exhibited some methodological weaknesses and risks of bias. As the overall rating of study quality can obscure this risk, domain by domain analysis may be more informative. Not surprisingly, blinding of participants and therapists administering CR was difficult to achieve when novel and distinct clinical interventions are used (Miller and Stewart 2011). Articles rated as low quality (Carter et al. 1983; Elman and Bernstein-Ellis 1999; Kim et al. 2014; Worrall and Yiu 2000) also routinely failed to conceal group allocation or blind assessors, increasing the risk of treatment bias. While the PEDro criteria require full reporting of only one key outcome, reporting bias remained a concern in several studies that failed to fully report point measures and measures of variance for all outcomes, including both significant and non-significant results (Weinberg et al. 1977, 1982; Wolf et al. 2016; Worrall and Yiu 2000). However, with an overall fail-safe N of 494, the risk of publication bias was low: 22 missing studies for every included study would be required to nullify the overall effect of CR.

Analysis of Moderating Factors

Treatment Dose

For all cognitive outcomes combined, increasing the duration of CR was associated with larger effect sizes (p = 0.04). However, the dose-response relationship suggested a logarithmic function, with a plateauing of effects occurring around 20 h of active treatment. Language interventions included in the current review uniquely employed longer durations of CR (mean = 44 h). Previous reviews of aphasia treatments after stroke and other ABIs have also reported a need for more intensive and comprehensive therapy (Cherney et al. 2008; Cicerone et al. 2011; Robey 1998), suggesting higher intensity linguistic programs are required to promote neuroplasticity and reformation of cortical pathways within language networks (Bakheit et al. 2007). Such interventions are likely best designed and delivered in consultation with a speech pathology expert (Hinckley and Douglas 2013).

Visuospatial and perceptual function was the only individual cognitive domain to be significantly affected by dosing parameters. Specifically, shorter durations were significantly associated (p = 0.02) with larger effect sizes, in contrast to the findings for overall cognitive outcomes. Larger effect sizes for visuospatial and perceptual outcomes were also associated with a less frequent training schedule (p = 0.02). In comparison, overall cognitive outcomes were not significant affected by weekly frequency. Notwithstanding, dose-response analysis indicated the largest effect sizes for overall cognitive outcomes were observed for studies (Carter et al. 1983; Katz and Wertz 1997; Kim et al. 2014) delivering three sessions of CR per week [g = 0.85 (95% CI 0.41, 1.30), p < 0.01). The most common frequency (k = 9) was five times weekly, but this was associated with an overall effect size of only 0.38.

In sum, overall and domain specific cognitive outcomes were somewhat consistent with the notion of diminishing returns for CR, with limited additional gains experienced after exceeding dose and frequency thresholds (Dobkin 2004). These findings encourage reexamination of assumptions that interventions should occur as long and as frequently as the patient can tolerate (Brewer et al. 2013; Weicker et al. 2016) and suggest CR after stroke is not only effective, but also relatively efficient. This is a potentially important finding for stroke patients, who have pressing rehabilitation needs, but often exhibit reduced physical and mental stamina (Acciarresi et al. 2014).

Recovery Stage

The current review suggested overall and domain-specific (i.e., memory) cognitive outcomes were most effective when CR was delivered during the acute stage of recovery after stroke. Previous literature on this issue is both limited and inconsistent: some research suggests the greatest recovery occurs when interventions are delivered during the chronic stage (Rohling et al. 2009), while others have found no significant effect of chronicity (Bakheit et al. 2007; Laska et al. 2010). In particular, CR delivered within the first month after stroke may be preferable (Zucchella et al. 2014), to address cognitive deficits in the early stages before they evolve into chronic impairment (Hakkennes et al. 2011; Musicco et al. 2003). Currently, there are too few studies measuring activity and participation level outcomes to reach conclusions regarding the impact of acute CR on everyday functioning.

Active Versus Passive Controls

There were no significant differences in effect sizes for studies using either passive or active control groups. This result was surprising, given that active controls account for a greater number of non-specific factors (Mohr et al. 2009). The current results add to a small body of evidence suggesting that passive control groups are still valid in treatment efficacy studies, and may not overstimate the effect size for intervention groups (Aminov et al. 2018; Weicker et al. 2016). However, further research is required to compare the utility of passive and active control group design within CR research (Freedland et al. 2011).

Domain Specific Effects of Cognitive Remediation

Small to moderate effects of CR were shown within the individual outcome domains examined (Fig. 6). Specifically, the current review identified a small effect (g = 0.37) for processing speed outcomes, a cognitive domain not covered in prior reviews. Processing speed is presumed to make substantial contributions to general cognitive ability (Salthouse 1996), and is a key component underlying new learning and the successful execution of most other cognitive processes (Fry and Hale 2000). To date, CR approaches that targeted processing speed have had limited cognitive effects (Winkens et al. 2009). However, further research is encouraged, as such interventions may afford both domain specific and generalized cognitive benefits (Su et al. 2015; Takeuchi and Kawashima 2012).

The current study found a small effect for attention (g = 0.40), consistent with a number of recent reviews (Bogdanova et al. 2016; van de Ven et al. 2016; Weicker et al. 2016). Conversely, other reviews examining attention have found non-significant or negligible effects, albeit for combined ABI cohorts (Park and Ingles 2001), or positive effects confined to particular aspects of attention (Elliott and Parente 2014; Gillespie et al. 2015; Loetscher and Lincoln 2013; Virk et al. 2015). Reviews that have divided attention outcome measures into sub-domains have tended to show large variations in effect size (Loetscher and Lincoln 2013; Virk et al. 2015). A more comprehensive analysis of this issue was not possible in the current review, but should be a focus for future investigation.

Small effects were also observed for memory remediation (g = 0.47), whereas previous research has suggested little or no effect (Gillespie et al. 2015), or positive impact only on subjective measures of memory (das Nair et al. 2016). It is important to note that Gillespie’s findings (2015) were based on only two publications, while the das Nair review (2016) included a mixed ABI population.

A small effect of CR was also shown on executive function outcomes (g = 0.47), with previous research providing mixed results. The earlier reviews of Chung et al. (2013) and Gillespie et al. (2015) had too few studies meeting inclusion criteria to draw firm conclusions. The qualitative reviews of Bogdanova et al. (2016) and Poulin et al. (2012) suggested a treatment effect, but their results were derived from mixed ABI populations and non-RCT studies.

The moderate effect for language remediation (g = 0.66) was in keeping with the bulk of previous findings (Cicerone et al. 2000, 2005, 2011; Miklos et al. 2015; Rohling et al. 2009). Only a review by Paiva et al. (2015), based on just three studies, contradicts the collective evidence in the language domain.

Finally, the moderate effect of CR on visuo-spatial functioning (g = 0.75) is consistent with the reviews of Cicerone and colleagues, albeit confined to right-sided stroke (Cicerone et al. 2000, 2005, 2011; Rohling et al. 2009). In other work by Bowen et al. (2013) and Gillespie et al. (2015) there was an immediate effect of CR similar to the current review, but no follow-up or far transfer effects.

The current study also provides evidence of transfer effects to untrained cognitive domains. Moderator analysis suggested these so called far transfer effects were generally equivalent in size to those observed for trained cognitive domains (i.e., near transfer). The suggestion of generalized cognitive gains is perhaps unsurprising given the interdependence of cognitive processes (Lezak et al. 2012; Spearman 1904). Specifically, CR targeting general cognition also saw improvements in processing speed, memory, and executive function; targeting visuospatial and perceptual functions also improved attention and language function, and attention-focused training also improved executive function. CR protocols that targeted either language, memory, or executive function appeared more specific, with no evidence of generalization. Data on processing speed protocols did not support either near or far transfer.

Few studies investigated transfer effects from impairment to activity and participation level outcomes. The small effect of CR on untrained mood, disability, and quality of life outcomes (g = 0.25) was consistent with earlier reviews that examined non-cognitive outcomes (das Nair and Lincoln 2012; Hoffmann et al. 2010); however, the current result was based on a small number of studies (k = 6).

Limitations of Current Research and Directions for Future Research

Several theoretical and methodological limitations persist in the CR literature. Only one of the included studies provided a thorough theoretical rationale for their CR intervention (Katz and Wertz 1997). Most provided only a cursory rationale, and several studies provided no rationale whatsoever (Carter et al. 1983; Cho et al. 2015; Lin et al. 2014; Young et al. 1983). Moreover, most studies did not draw strong connections between the aims of the study and primary outcome measures, contributing to a risk of detection bias not assessed by the PEDro Scale. Moving forward, intervention design in CR should be informed by a clear theoretical model (Gillespie et al. 2015), identifying the cognitive control functions/networks that are a target for training, describing the presumed active ingredients of training, and modelling the expected gains across different levels of function (World Health Organization 2017).

Many of the included studies provided limited or incomplete descriptions of the CR program itself and control conditions (Bakheit et al. 2007; Kim et al. 2014; Prokopenko et al. 2013; Weinberg et al. 1977, 1982; Winkens et al. 2009; Wolf et al. 2016; Young et al. 1983). More detailed descriptions were provided in fewer studies (Aben et al. 2014, 2013; Elman and Bernstein-Ellis 1999; Katz and Wertz 1997). Use of checklists (e.g., van Heugten et al. 2012) is recommended to improve the precision of reporting and, with it, opportunities for between-study comparison and replication studies. While several studies made use of off-the-shelf computer programs (Barker-Collo et al. 2009; Cho et al. 2015; Lin et al. 2014; Wentink et al. 2016; Westerberg et al. 2007; Zucchella et al. 2014), training work-books (Carter et al. 1983; Doornhein and De Haan 1998), or modular programs (Laska et al. 2010; Worrall and Yiu 2000), enhancing replication, the role of the therapist and program modifications tailored to individual needs remains under-reported.

There was substantial variability in the quality and type of cognitive outcome measures. While the majority of studies utilized standardized assessment batteries or testing instruments (Bakheit et al. 2007; Elman and Bernstein-Ellis 1999; Katz and Wertz 1997; Weinberg et al. 1977, 1982; Westerberg et al. 2007; Worrall and Yiu 2000; Young et al. 1983), poorly validated tests were still used (Aben et al. 2014, 2013; Carter et al. 1983). Even when standardized measures were used, the exact choice varied in how specific cognitive skills and abilities were assessed (Table 3). Use of a minimum core set of outcome measures (e.g., the NIH Toolbox) has been suggested to improve between-study comparisons and meta-analysis (Gillespie et al. 2015). The included studies also omitted objective criteria to classify the degree of cognitive impairment (e.g., mild, moderate, or severe) of participants before and after interventions. As a result, it remains unclear if CR is most effective in patients with mild degrees of impairment (Cicerone et al. 2000, 2005), or if there are lower limits to the degree of post-stroke cognitive impairment that will respond to CR (Stringer and Small 2011). Furthermore, as none of the included studies completed clinical significance analysis of outcomes (Page 2014), the practical importance of the generally small reported effects is not well understood. Future research is encouraged to include analyses of treatment effects beyond statistical significance, including the evaluation of outcomes using functional measures examining the effect of CR on daily life.

Unfortunately, the assessment instruments of included studies were largely at the level of Body Structure and Body Function (World Health Organization 2017), with more limited relevance to functional gains (van Heugten et al. 2012). While researchers have previously been encouraged to keep the “real life” significance of CR in mind (Gillespie et al. 2015), the majority of studies included in the current review used neuropsychological outcomes only. Only three studies (14%) examined cognition at the activity/participation level (Barker-Collo et al. 2009; Wentink et al. 2016; Westerberg et al. 2007), and only five studies (23%) examined non-cognitive outcomes, including mood state (Aben et al. 2014, 2013; Barker-Collo et al. 2009), disability (Barker-Collo et al. 2009; Weinberg et al. 1977; Wolf et al. 2016), and quality of life (Aben et al. 2014, 2013; Barker-Collo et al. 2009). A priority in future research is the inclusion of activity-based cognitive outcome measures with demonstrated ecological validity, as well as non-cognitive outcome measures tapping into a broad range of outcomes at the impairment, activity, and participation levels (Stringer and Small 2011; Virk et al. 2015).

It is still unclear whether improvements following CR are sustained over time, with past reviews identifying little or no evidence of the durability of post-intervention effects at follow-up (Bowen et al. 2013; das Nair et al. 2016; Loetscher and Lincoln 2013; Virk et al. 2015). Nine studies (41%) reviewed included follow-up assessment (Aben et al. 2014, 2013; Bakheit et al. 2007; Barker-Collo et al. 2009; Kim et al. 2014; Laska et al. 2010; Wentink et al. 2016; Winkens et al. 2009; Wolf et al. 2016), varying from two weeks (Kim et al. 2014) to 12 months (Aben et al. 2014) post-intervention (Fig. 5). A further study described a follow-up protocol, but results were unable to be extracted due to insufficient reporting of data points (Elman and Bernstein-Ellis 1999). While the overall effect (g = 0.26) for follow-up outcomes was significant (p = 0.03), no individual outcome domain was statistically significant on its own. Notably, higher intervention doses were not reliably associated with larger follow-up effects, as previously suggested (Weicker et al. 2016). Future research is recommended to examine means by which treatment gains can be sustained and longer-term outcomes optimized, using techniques like booster sessions, activity monitoring, goal setting, or feedback systems (Peek et al. 2016).

Post-stroke cognitive deficits may be the consequence of a variety of injury-related factors including the location of focal damage, diffuse neurological dysfunction, and functional deactivation of distant areas in the brain (i.e., diaschisis; Ferro 2001). Patient-related variables such as age, premorbid level of functioning, and comorbidities also affect post-stroke cognitive outcomes (de Haan et al. 2006). However, studies included in the current review typically provided limited details on either injury- or patient-related variables, and when such data was included, it was not utilized as a moderator in analysis. In particular, several papers omitted stroke details (Doornhein and De Haan 1998; Kim et al. 2014; Prokopenko et al. 2013; Worrall and Yiu 2000) or provided only surface details such as stroke hemisphere or time since injury (Cho et al. 2015; Elman and Bernstein-Ellis 1999; Katz and Wertz 1997; Lin et al. 2014; Weinberg et al. 1977, 1982; Young et al. 1983). Others reported more in-depth details (Aben et al. 2014, 2013; Bakheit et al. 2007; Barker-Collo et al. 2009; Wentink et al. 2016), but only a few used an objective measure (i.e., National Institutes of Health Stroke Scale, Rankin Scale, Barthel Index) of stroke severity (Carter et al. 1983; Laska et al. 2010; Westerberg et al. 2007; Winkens et al. 2009; Wolf et al. 2016; Zucchella et al. 2014). Few studies provided details of premorbid intellectual or functional status, or comorbidities, and none controlled for these individual differences. Future studies are recommended to report injury- and patient-related data, to facilitate identification of sub-groups likely to be more or less responsive to CR. Furthermore, optimism surrounding CR “thrives on the lure of neuroplasticity” (Rabipour and Raz 2012), but only a single study eligible for inclusion in the current review examined the association between post-stroke cognitive outcomes and neuroimaging results (Lin et al. 2014). These preliminary results suggested both structural and functional brain changes were related to training improvements, but to better understand the mechanisms underlying CR, future studies are encouraged to include adequate visualization of the brain.

CR is generally accepted as safe and well tolerated (Institute of Medicine 2011), and none of the studies included in the current review reported data about any adverse events. However, it is unclear if the lack of data is due to the absence of events, or the absence of monitoring for such events. Concerns have been expressed that cognitive training may provoke frustration and low mood in stroke (Withiel et al. 2018) or dementia patients and their caregivers (Small et al. 1997). While this negative impact can likely be avoided by focusing on a patient’s successes rather than their deficits, formal recording and reporting of data regarding the occurrence of adverse events or harm would be beneficial for establishing the safety and efficacy profile of CR. Furthermore, future research is encouraged to provide evidence of the cost-effectiveness of CR. If effective, the short-term costs of delivering an intervention may lead to benefits in terms of reduced length of stays in hospital after stroke, decreased long-term care needs, or increased opportunity to participate in valued roles.

Finally, the current study found both cognitive rehabilitation and cognitive training approaches were effective in improving overall post-stroke cognitive outcomes. This finding may in part be due to the lack of a standardized definition of what constitutes cognitive training compared to cognitive rehabilitation, and interventions classifying themselves as one or the other that in fact are incorporating elements of both a restorative and a compensatory approach. We also acknowledge the possibility of misclassification of studies in our analysis, owing to the limited details on treatment design and delivery that were typically available for extraction, as discussed above. Past reviews examining the topic have provided mixed results, with one advocating for cognitive training approaches (Park and Ingles 2001), and another reporting efficacy for either approach (Poulin et al. 2012). While several important theoretical distinctions have been made between the two approaches (Bahar-Fuchs et al. 2013), both approaches may produce gains in many cognitive outcomes when delivered within a high-quality experimental design. However, visuospatial and perceptual outcomes specifically were more likely to benefit from a cognitive rehabilitation approach (p = 0.02). Further research directly comparing the two approaches in randomized controlled trials can be encouraged, to further investigate such trends.

Strengths and Limitations of the Current Review

This is one of the first comprehensive systematic reviews of stroke-specific CR efficacy research. Strengths of the review include strict compliance with PRISMA reporting guidelines and analysis of a range of intervention design and implementation factors that may moderate treatment efficacy with respect to both overall and domain-specific outcomes. In regard to limitations, we acknowledge that other CR studies have been conducted using mixed ABI samples. However, upon closer inspection, the reporting of stroke-specific data within these studies was generally limited and did not meet our inclusion criteria. We also acknowledge that due to resource limitations we were unable to include non-English research studies (e.g., Schöttke 1997). To facilitate comparability, a standardized risk of bias assessment tool was utilized. However, the scale lacked sensitivity to incomplete reporting of study design and outcomes data, and may over-estimate overall study quality compared to past reviews utilizing alternate risk of bias instruments (e.g., Bowen et al. 2013; das Nair et al. 2016; Loetscher and Lincoln 2013). All of the cognitive domains examined in the current review are multi-dimensional constructs (e.g., attention can be divided into selective, focused, sustained, divided, etc.; memory can be divided into immediate, long-term, prospective, free recall, recognition, etc.). However, with no clear consensus on the valid factor structure of models of each domain, or agreement on how different cognitive instruments map onto those factors, a finer grained analysis of each cognitive domain was beyond the scope of the current review. Furthermore, multiple comparisons in the current review were handled by averaging effect sizes into one mean effect per study (e.g., for multiple measures of attention in a study) or selecting one effect size per study (e.g., for multiple control groups in a study). However, this approach can result in loss of information, and emerging techniques such as three-level Structural Equation Modelling may be preferable to handle statistically-dependent effect sizes (Cheung 2014; Van den Noortgate et al. 2015). Finally, while most CR interventions examined efficacy exclusively using impairment-level cognitive outcomes, the pooling of remaining outcomes created a heterogenous category of impairment, activity, and participation measures. This “other” outcome should therefore be interpreted with caution, but does provide promising preliminary evidence of the generalizability of gains following CR interventions for stroke.

Conclusions

Stroke is a major health issue, and frequently results in persistent and pervasive cognitive difficulties. Although these cognitive deficits are a major contributor to long-term disability and impairment in quality of life, treatments for cognitive deficits after stroke remain under-prescribed and underutilized (Mellon et al. 2015; Shigaki et al. 2014). In response, this review highlights an evidence base of well-designed studies to inform practice, and provides a foundation from which to advance understanding of the efficacy of CR in stroke. While there is currently insufficient evidence to recommended one form of CR over another, delivering interventions more than three times per week or for more than 20 h in total may not be productive. The efficacy of acute interventions should encourage the early deployment of CR in clinical settings, with the strength of effect greatest for the remediation of visuospatial and perceptual skills, and language ability. Speed of processing, attention, memory, and executive function exhibit significant but more modest improvements. However, this set of cognitive domains reflects only part of the difficulties experienced by stroke survivors, and the generalization of CR gains to daily living activities should be a focus of future research. Finally, while this review demonstrates that acquired cognitive deficits after stroke are responsive to CR, greater attention to theories of brain function and individual participant characteristics are still needed to identify and tailor factors capable of enhancing the efficacy and durability of these effects.