Introduction

The prevalence of chronic disease has risen dramatically over past decades, making it the major cause of deaths worldwide [1]. In 2005, the World Health Organization estimated that chronic illnesses accounted for 49 % of the total global burden of disease, and when basing the analysis on people aged 30 years and above, this proportion was estimated to be as high as 72 % [2, 3]. To respond to this burden and reduce both the societal and the economic impact of chronic illnesses, effective chronic disease management is essential. While disease management programs are systematic approaches to navigate patients through the healthcare system and improve quality of care [4], successful management of a chronic illness is highly dependent on the individual patients who have to take extensive responsibility themselves [5]; that is, as affected persons spend the majority of the time outside the healthcare system, they have to learn to manage their chronic disease on their own in their own time. Therefore, active self-management and interventions supporting patients in the acquisition of skills and techniques to learn to live with their disease are key components in managing a chronic condition.

While the logic of providing self-management education for people with chronic disease is clear, it is surprising that current evidence of effectiveness of these interventions is rather mixed. A summary of meta-analytic and systematic reviews on ‘self-management’, ‘psycho-educational’, or ‘psychological’ programs suggests that there is reasonable evidence that these interventions are beneficial for a wide range of people and courses are regarded as an important adjunct to standard medical care [610]. For example, individuals with diabetes have been reported to receive small effects in outcomes such as fasting blood glucose levels and medium effects in both glycated hemoglobin (HbA1c) and psychological variables [7, 9, 11]. Reviews on programs for hypertension have shown positive effects on systolic blood pressure [7, 11]. In contrast, in studies with people with arthritis, fewer benefits have been reported. Despite this group receiving much attention in the literature, most studies report negligible to small effects in outcomes such as disability, function, impairment, and pain [68, 11, 12]. Similarly, a narrative review shows that only few trials report positive effects for disability, pain, painful and swollen joints, and symptoms [5].

In Australia, several programs for chronic disease self-management education are offered in a variety of settings. Despite the large number of different interventions, ranging from face-to-face consultations to multimedia campaigns [13], group-based courses are the most common form of self-management program delivery [14]. Around 50 organizations, including most Australian Arthritis Foundations, are currently licensed to run self-management courses following the curricula of the Stanford Patient Education Research Center [15]. These courses are highly structured with course leaders following a clearly defined protocol. The predominant interventions are the generic Chronic Disease Self-Management Program (CDSMP) [16] and the disease-specific Arthritis Self-Management Course [17]. The difference between the two interventions is that the former is aimed at a broader audience with a wide range of different chronic conditions [18] as it is built on the assumption that people with any type of chronic disease face similar problems regarding the management of their condition [19].

In view of the large number of organizations offering Stanford programs, thereby investing a considerable amount of public and private funds, there is an urgent need to understand and document the true impact of this intervention. As none of the reviewed meta-analyses or systematic reviews focused on this specific protocol, but instead included a range of different self-management programs that not only differed across but also even within the same chronic disease [5], it was deemed necessary to prepare a systematic review of self-management trials that were largely following the Stanford curricula [16, 17]. Apart from a summary of the effectiveness of these interventions, the main objective of this research is to provide an in-depth investigation of the types of outcomes that studies are based on. While the present research summarizes the systematic review, our companion paper explores the pattern of effect sizes across studies and relates this pattern to the different types of outcome measures used.

Methods

Search strategy

Both systematic search and literature review were carried out in January 2007. The search was performed across the databases MEDLINE, EMBASE, CINAHL, and PsycINFO as recommended for systematic reviews [20].

The criteria and rationale for selecting studies for the systematic review were as follows:

  1. 1.

    Inclusion of studies evaluating disease-specific or generic self-management interventions comparable with the Stanford curricula. If studies did not directly refer to Stanford [16, 17], studies were selected that evaluated interventions that included at least two of the three keywords ‘problem-solving’, ‘action planning’, and ‘relaxation’. To be included in the review, four characteristics had to be met by the self-management program:

    1. a.

      Interventions were delivered in a group setting;

    2. b.

      Interventions were based on a formal syllabus;

    3. c.

      Interventions ran between four and ten sessions within a period of 3 months;

    4. d.

      Interventions did not include any additional components such as exercise lessons, reinforcement techniques, individual consultations, and/or home visits.

  2. 2.

    Inclusion of studies between 1982 and 2006 as the first Stanford program was published in 1982 [21].

  3. 3.

    The search was limited to randomized controlled trials (RCTs), following the hierarchy of research designs where RCTs are considered the ‘gold standard’ [22].

  4. 4.

    Exclusion of studies that did not have sufficient power to detect a large-sized difference between intervention and control group means; that is, at α = 0.05, a minimum sample size of n = 26 is required to detect a large difference between the means of two independent samples [23, 24].

  5. 5.

    Exclusion of studies that did not provide sufficient information on the outcome variables for calculating effect sizes (ES), and missing data could not be obtained from the authors.

  6. 6.

    Exclusion of studies that did not assess any self-report outcomes; that is, studies assessing outcomes such as cost-effectiveness or drug adherence were excluded.

  7. 7.

    Exclusion of studies on interventions for children or adolescents.

The search terms used in the systematic review were derived from other reviews in this area [5, 7] and were chosen in a way that the first part of the search string referred to the type of intervention, the second part referred to the study design, and the third part referred to the target group of the intervention. Hence, the search terms were (‘patient education’ or ‘self-management’) and (‘randomised’ or ‘randomized’ or ‘RCT’) and (‘arthritis’ or ‘asthma’ or ‘chronic condition’ or ‘chronic disease’ or ‘chronic obstructive pulmonary disease’ or ‘congestive heart failure’ or ‘COPD’ or ‘diabetes’ or ‘fibromyalgia’ or ‘hypertension’ or ‘musculoskeletal’ or ‘osteoarthritis’ or ‘osteoporosis’ or ‘pain’ or ‘rheumatoid’ or ‘stress’). The following limits were set: in PsycINFO, the search was restricted to ‘Journal articles only’, languages ‘English, German, Spanish’ and ‘Age groups 18 years and older’; in CINAHL, the search was restricted to above languages and ‘all adult’. Additional studies were retrieved from the Cochrane Database of Systematic Reviews [25] and from reference lists of other reviews and meta-analyses.

Data analysis and presentation

For comparison across trials, ES were calculated for each outcome. Between-group treatment effects were calculated using Cohen’s d [26], a commonly reported effect size index [27]. The 95 % confidence interval was estimated using standard errors obtained by multiplying the square root of the standard errors by 1.96 [28]. Additionally, within-group ES were calculated for each treatment condition separately. This was used to identify the source of between-group differences; for example, whether a large effect was caused by an improvement in the intervention group (IG) and/or a decline in the control group (CG) or whether a negligible between-group ES was caused by no change in either group or by a parallel change in both IG and CG subjects. Reported effects are presented such that positive ES reflect improvement and negative ES reflect decline. Results were interpreted as small (ES ~ 0.2), medium (ES ~ 0.5), or large (ES ~ 0.8) effects [26].

While several studies included repeated measures with varying time points, only the first post-intervention assessment was used for the calculation of ES. Results are presented in a way that both the minimum and the maximum ES per outcome across studies are shown. Further, the median ES of the included studies were calculated. Hence, no summary scores such as those used in meta-analyses are reported as the aim of the review was to show and discuss the full range of effects across studies.

Results

The results of the systematic search are presented in Fig. 1. A total of 2,175 papers were identified in this search. After pre-screening the titles, 1,676 publications were excluded as they failed to meet all inclusion criteria. The majority of studies were rejected because they evaluated other types of interventions (n = 861), were a description of a program (n = 319), or children/adolescents were included (n = 177). Of the remaining 499 publications, all abstracts were screened. The majority of these again did not meet all inclusion criteria. This left 78 papers which were examined in full. Of these, 55 studies were excluded with the majority again describing different types of interventions such as classic patient education. Eight of the remaining 23 studies did not report sufficient data for the calculation of effect sizes. After contacting the authors, three researchers were able to provide most missing data [2931], while no further information could be obtained from five authors [3236]. As a result, the review is based on 18 studies.

Fig. 1
figure 1

Flowchart of the search strategy

The majority of the 18 trials investigated the effectiveness of arthritis-specific interventions. Seven evaluated the Stanford Arthritis Self-Management Course for osteoarthritis [3739] or other musculoskeletal disorders [4043]. Three trials evaluated alternative arthritis-specific interventions, with one being the ‘Bone Up On Arthritis’ program [44], while two explored interventions for rheumatoid arthritis [31, 45]. The remaining studies evaluated disease-specific interventions for people with back pain [46], chronic pain [47], and fibromyalgia [48], and four focused on the generic CDSMP [19, 30, 49, 50]. One of above trials compared lay-led with professional-led interventions, with each trial including a separate control group [41]. As we did not differentiate between ‘modes of instruction’ in the present research, the two trials were regarded as separate studies. Consequently, the total number of trials included in this review increased to 19.

Across trials, more than 70 different variables were assessed. As most of these were not assessed frequently enough to perform inter-study comparisons, this review is restricted to the most often reported outcomes. Hence, a total of 11 variables are shown in Table 1, with pain, disability, depression, and self-efficacy being the most commonly assessed outcomes. Further outcomes presented herein are visits to physician, general health, fatigue, communication with physician, knowledge, anxiety, and physical functioning. For a more detailed overview of the results, see Appendix.

Table 1 Effect sizes (ES) of most frequently assessed outcome measures in studies of chronic disease self-management programs largely following the Stanford curricula

The impact on pain was assessed across all but one study [45]. Reported ES varied greatly and ranged from some trials observing negative to negligible between-group effects [19, 29, 30, 40, 41, 44, 46, 47] to other studies showing small to medium positive between-group effects [31, 3739, 42, 43, 4850]. In three of these studies [38, 48, 49], however, between-group effects were largely influenced by increased pain in control group subjects.

Another outcome that was assessed frequently was disability. Reported effects again varied greatly across studies, with Cohen’s d ranging from ES = −0.18 [39] to ES = 1.42 [45]. Half of included studies showed negligible to small effects [19, 4144, 47], while four reported small to medium between-group effects [31, 46, 48, 49]. Within-group effects for intervention group subjects showed a similar range of benefits, with almost all effects ranging between ES = 0.0 and ES = 0.26 [19, 31, 4144, 48, 49].

Depression was reported in 10 trials. Both between- and within-group effects varied greatly. While one trial showed medium effects [45] and two trials showed small between- and within-group effects [40, 48], most other studies reported negligible to small between-group ES [2931, 4244, 49]. In three of these studies, however, between-group effects were influenced by simultaneous improvements in both intervention and control group subjects [31, 42, 49].

Self-efficacy was also assessed in 10 trials. Reported results again varied greatly. Effects ranged from very small [37, 46] to above medium size between- and within-group effects [48]. Small to medium between- and within-group effects were reported in five studies [29, 30, 40, 49, 50]. Medium between-group ES were observed in another two trials [38, 42] of which one also showed medium within-group effects [42].

The number of visits to physician was assessed in eight studies. Calculated ES ranged from small decreases to small increases in the number of visits [19, 30, 4042, 49, 50].

The impact of self-management courses on general health was assessed in seven trials. Similar to most previous outcomes, effects varied greatly between studies. They ranged from small negative between-group [46] to medium positive between- and within-group effects [50]. The median effect was small, with between-group effects of 0.16 and a within-group effect size of 0.17 for intervention group subjects.

Six studies assessed fatigue. The majority of trials reported negligible to small between- and within-group ES [19, 29, 30, 40, 49]. The only exception was a Hispanic chronic disease self-management intervention with between-group effects of ES = 0.26 and within-group effects of ES = 0.40 for intervention group subjects [50].

Communication with the physician was assessed in five trials. Between- and within-group effects were negligible to small [19, 30, 40, 49] in most studies. Larger effects were observed in a Spanish-speaking population with between-group effects of ES = 0.34 and within-group effects of ES = 0.49 [50].

Knowledge was also assessed in five of the included studies all of which were arthritis-specific interventions. In contrast to all other outcomes, both between- and within-group effects were generally medium or large [38, 41, 43, 44]. The only exception was the trial comparing lay-led courses with interventions run by health professionals. While the former showed medium-sized improvements in both intervention and control groups, the latter had a much larger impact on intervention group subjects [41].

The impact of self-management interventions on levels of anxiety was assessed in four trials. Overall, negligible to small effects were found, with within-group ES being consistently larger than between-group ES [29, 30, 40]. Largest within-group effects were reported by Taal et al. [31], showing an effect size of 0.31 for intervention group subjects compared with 0.13 for control group subjects [31].

Finally, physical functioning was also assessed in four trials. In contrast to most previously presented outcomes, between- and within-group effects were largely consistent, with all studies showing negligible [37, 40] or small effects [29, 48].

Discussion

This review summarizes the effectiveness of group-based chronic disease self-management courses. In contrast to other reviews in this area, the inclusion criteria were rather strict in that only studies were included that either followed or were reasonably similar to the Stanford curricula [16, 17]. As the majority of included trials described programs that were targeted at people with musculoskeletal conditions, this review can best be compared with reviews on arthritis. Although our review is restricted to Stanford-type programs and only median scores are presented herein, reported effects are largely similar to summary scores of reviews on arthritis. For example, outcomes depression [10], disability [6, 10, 12], and pain [6, 7, 10, 12] showed negligible to small ES across all publications. The only deviation was found for self-efficacy, with the present review showing somewhat larger effects than reported by others [6]. In sum, people with arthritis seem to only marginally benefit from participating in self-management programs across a range of outcomes [58, 11, 12].

Above findings are discouraging and seem to be in stark contrast to our previous research where we showed that on average one-third of participants received substantial benefits from attending such courses. That is, expressed in terms of number needed to treat (NNT), three persons had to attend a chronic disease self-management course for one person to receive substantial benefits [14] which is a rather good result. However, several aspects need to be considered when interpreting results presented in this review. First, it needs to be discussed whether changes are to be expected in some outcomes that are frequently assessed. For example, psycho-educational programs such as self-management courses are neither aimed at nor can they achieve a reduction in levels of pain. In contrast, these programs are aimed at providing individuals with the skills on how to cope with symptoms and manage episodes. That is, the aim of self-management education is the reduction in the perception of pain, that is, the effect that the perceived pain has on the patient, rather than a reduction in the actual level of pain [5]. Hence, if such outcomes are included, it is important to define the intent of the outcome measure, that is, level of pain versus pain coping and self-management skills.

Second, closely related to the previous aspect, it is questionable whether studies assess the most pertinent outcomes that self-management programs are supposed to impact on. As was observed in a narrative review [5], studies frequently assess outcomes that are not particularly targeted by these interventions. While the Stanford curricula include topics on communication with the physician, emotions and self-efficacy, it is questionable whether self-management programs are able to have an impact on variables such as disability, fatigue, physical functioning, and again pain as these outcomes are not specifically targeted by the programs’ curricula. An overview of intended impacts of chronic disease self-management programs, incorporating both the patients’ and the health professionals’ perspectives, has been provided through the Health Education Impact Questionnaire (heiQ). This questionnaire was developed during a comprehensive stakeholder consultation phase and can be used to reliably evaluate chronic disease self-management programs [14, 51, 52]. Instead of focusing on physiological or clinical outcomes, the heiQ outcomes cover areas such as behaviors, skills, attitudes, self-monitoring, health services navigation, and emotional distress [52].

Third, a further challenge in interpreting outcomes of self-management programs concerns the time frame in which outcomes can be expected to occur. The trials of the present review assessed post-intervention outcomes ranging from direct post-course assessment to several months after the intervention. It is beyond the scope of the present research to consider this dimension further; however, studies that are concerned with the effectiveness of chronic disease self-management education should take the dimension ‘time’ into account as the different types of outcomes can be expected to occur at different points in time. A program logic model of impacts of self-management education can serve as a guide to categorize outcomes into short-, medium-, and long-term effects [53].

Finally, when comparing program outcomes for people with arthritis with other chronic illnesses, it becomes clear that studies rely on a different set of outcomes across the different types of interventions. For example, outcomes for participants with diabetes or hypertension can be both self-report and clinical. In contrast, there are no objective biological measures of disease severity in musculoskeletal diseases. As a result, when looking at program effectiveness, evaluators have to exclusively rely on participant self-report [5]. In view of large differences in accuracy of measuring self-report variables compared with clinically assessed outcomes, it is plausible that evaluations do not accurately reflect program effects. This hypothesis is supported by the Quality of Life Appraisal model by Schwartz and Rapkin [54] where ‘appraisal’ denotes the cognitive process carried out when responding to a question. The model describes an increase in measurement error and bias with increasing complexity involved in the response process [54]. In our companion paper [reference to potential companion paper], we describe a practical application of the model where program outcomes presented herein are allocated to the three categories performance-, perception-, and evaluation-based measures as introduced by Schwartz and Rapkin [54].

The present research has some limitations. Even though this review was defined narrowly, with only Stanford-type interventions included, comparability of included trials is still difficult; that is, over 70 variables were assessed across studies, and data were often collected with different measurement instruments. The former aspect clearly shows that, while only comparable interventions were included where according program objectives should be similar if not identical, it is surprising that different research groups expect impacts across such large range of outcomes. Further, the aforementioned range of instruments is of concern. While this aspect was not specifically considered in this research, a careful review of questionnaires suitable for the evaluation of self-management programs should be carried out as the different instruments differ in their relative sensitivity to measure change [5], and it seems inappropriate to merge scores from different tests. To make results comparable across studies—while taking into account the objectives of different self-management education interventions—a standardized suite of outcome variables and according instruments needs to be defined to evaluate self-management programs and make results comparable across trials. For example, the heiQ [52] may well be suitable given its robust psychometrics and wide application, including its function as the national quality and monitoring tool for the UK Expert Patients Program [55].

In summary, this research confirms other reviews in this area, suggesting that the Stanford CDSMP and other programs have nil to small effects on program participants. However, it also suggests that, when looking at the types of outcomes that trials are based on, alternative explanations for these results seem probable. As evaluations heavily rely on participant self-report, that is, outcomes that show negligible to small effects, current approaches to program evaluation may not be sufficient to assess the true impact of chronic disease self-management education. An in-depth exploration of the types of outcomes that the reviewed trials are based on is described in a separate paper.