Introduction

Self-management programs for chronic diseases seek to generate substantial positive change in individuals in areas such as self-management abilities, health-directed behavior or illness-related emotional distress [14]. “Stable changes” means that observed variations over time in these attributes do not primarily depend on specific (measurement) situations, but on “real” changes in the persons. However, measurements of psychological attributes are always influenced by aspects of both the measurement situation and aspects of the measured person, albeit to varying degrees [5]. Instruments to measure situation-independent constructs should therefore primarily capture person factors and should be little affected by situational factors. Otherwise, it would be difficult to differentiate between mere situation-dependent and short-term variability and situation-independent long-term changes in the constructs of interest [6, 7].

A framework to empirically test assumptions about the influences of either measurement situations or measured persons is offered by latent state–trait theory (LST theory) [5, 8, 9]. LST theory postulates that the variance of an observed variable (OV) can be broken down into a trait component, a state component and measurement error. The trait component reflects the influence of the measured person, whereas the state component reflects the influence of measurement situations (and person × situation interactions) on the OV. If method effects of different measures for the same construct are taken into account, method factors also explain parts of the variance of the OV. From these theoretical assumptions, psychometric parameters, referred as LST parameters, can be defined that quantify influences of these different sources [5, 8, 10]. For example, in the SF-36, it was shown that 69–75 % of the variance in the mental component summary, which measures situation-independent mental health, is explained by stable person factors [11], while up to 77 % of the variances in measures of mood states can be attributed to measurement situations [12].

A variety of different LST models can be tested empirically via structural equation modeling techniques [9]. Basically, models that assume a constant trait over time can be distinguished from models that allow trait-changes over time [13, 14]. In this study, a latent trait-change model is used to model changes in important outcomes of self-management programs in the context of inpatient rehabilitation.

A generic instrument that comprehensively captures goals of self-management programs is the Health Education Impact Questionnaire (heiQ™) [15, 16]. The heiQ™ measures proximal outcomes of self-management programs across eight different constructs, ranging from self-management skills to navigation in the health care system. The original English version as well as translations into German and French show good psychometric properties in terms of factorial validity, concurrent validity and reliability [15, 17, 18]. However, to be a valid instrument to measure stable changes as a result of attending a self-management program, the heiQ™ must also show to be primarily influenced by the measured persons, not by the measured situations.

Therefore, this study seeks to answer the following main question: Are the items and scales of the heiQ™ primarily influenced by the measured persons (traits) or by the measurement situations/interactions of person × situations (states)?

Furthermore, the study explored how the traits measured by the heiQ™ scales change after inpatient rehabilitation that included self-management programs and whether these changes are associated with disease group, sex, age or initial trait values at the beginning of the inpatient rehabilitation.

Methods

Sample

All analyses are based on a subsample (N = 580) of subjects from of a study that translated and psychometrically tested the heiQ™ in Germany [18]. Patients with rheumatic disorders (e.g., psoriatic arthritis, ankylosing spondylitis; n = 186), asthma (n = 170), orthopedic conditions (e.g., chronic back pain; n = 121) and inflammatory bowel disease (n = 103) from five German rehabilitation clinics were included. All participants filled out the heiQ™ at the beginning (T1), the end of (T2) and 3 months after (T3) a disease-specific inpatient rehabilitation. All interventions lasted 3–4 weeks and included a self-management program aimed at enhancing self-management skills and health-directed behaviors as well as reducing emotional distress. Further details about the interventions are presented elsewhere [1923] and are available upon request.

Measures

heiQTM

The heiQ™ contains 40 items (4-point response scale) across eight independent scales: Positive and active engagement in life, Health-directed activities, Skill and technique acquisition, Constructive attitudes and approaches, Self-monitoring and insight, Health service navigation, Social integration and support, and Emotional distress. The scale scores are formed by computing the mean of respective items. Generally, higher values in the heiQ™ scales indicate better status, except for Emotional distress, in which higher values indicate higher distress [15, 16, 18].

Statistical analysis

The statistical analyses were done in three steps. First, for each heiQ™ scale, we tested whether factor structure and model parameters (e.g., factor loadings) did not change over time, i.e., whether the scales showed measurement invariance over time [24, 25]. At minimum scalar invariance (i.e., same factor structure and constant factor loadings/intercepts over time) is a prerequisite to interpret LST models accurately (Geiser et al., under review). Second, the change process was modeled for each heiQ™ scale using LST models as described below. Third, psychometric LST parameters were computed.

Measurement invariance over time

Configural, metric and scalar invariance over time [24, 26, 27] were tested through several confirmatory factor analyses. First, a measurement model over time—also called multi-state model in LST theory [9]—was computed for each heiQ™ scale (Fig. 1). All observed items OVit of a measurement occasion t load on a latent state factor (S t ). S 1S 3 were allowed to correlate with each other. They represent the common variance of the items of a scale on occasion t, and the correlations between the S t account for stable individual differences over time. However, state- and trait-aspects are not clearly separated in a multi-state model [9]. Item-specific associations over time were modeled according to the correlated trait-correlated method minus one (CT − C(M − 1)) approach [28, 29]. In this approach, latent method factors were modeled for each indicator, except for a reference indicator (see below). The method factors represent stable aspects of each indicator that are not shared by the reference indicator.

Fig. 1
figure 1

Multi-state model of heiQ™ scale Skill and technique acquisition with three measurement occasions (OV i,t : observed item i on time t; S t : latent state variable of measurement occasion t; M i : method factor of item i; δ it : factor loading of item i on occasion t on S t ; γ it : factor loading of item i on occasion t on method factor M i )

To identify the model and test for configural invariance, variances of S 1S 3 were set to 1 and corresponding mean values to 0 [30]; all other parameters were estimated freely. Metric invariance was tested by holding the factor loadings on the latent state variables (δ it ) and the factor loadings on the method factors (γ it ) constant over time, i.e., δ it  = δ i and γ it  = γ i. The variance of S 1 was still fixed to 1, but the variances of S 2 and S 3 were now estimated freely. Scalar invariance was tested by additionally holding all intercepts constant over time. The mean value of S 1 was still fixed to zero, while the mean values of S 2 and S 3 were estimated freely.

Non-invariant parameters were identified via expected parameter changes (EPC) and modification indices using the software JruleMplus [31]. JruleMplus tests whether an EPC exceeds a reference value. However, up to now, there are no guidelines for choosing appropriate reference values in invariance testing. In some studies, differences in intercepts about 0.3 lead to biased estimates in mean values, while differences in factor loadings about 0.2 did not [32; Schuler et al., in press]. Therefore, reference values were fixed to 0.25 in this study, representing the minimal difference in factor loadings or intercepts over time regarded as meaningful [33, 34]. Whenever a non-invariant parameter was identified, this parameter was set free and partial invariance models [35] were tested.

Estimation of LST model

Afterwards, each heiQ™ scale was modeled by a LST model (Fig. 2). The latent state variables S t of Fig. 1 were separated in latent trait variables (Trait1, Trait2) and latent state residual variables (SR1–SR3). Trait1 and Trait2 represent the influence of the persons on the OVit that is independent of the measurement situation, while SR1–SR3 represent the “pure” influence of the three measurement situations. Therefore, SR1–SR3 do not correlate anymore with each other.

Fig. 2
figure 2

Latent trait-change model of heiQ™ scale Skill and technique acquisition with three measurement occasions and predictors (OV i,t : observed item i on time t; SR t : latent state residual variable of measurement occasion t; M i : method factor of item i; Trait k : latent trait variable k; Trait2-1: latent difference variable; δ it : factor loading of item i on occasion t on S t ; γ it : factor loading of item i on occasion t on method factor M i ; ortho orthopedic; IBD inflammatory bowl disease)

To identify SR1–SR3, the factor loading of a reference indicator was fixed to 1 and the intercept was fixed to 0 on each measurement occasion. Indicators that best represented the content of the scale and that showed high factor loading as shown in [18] were chosen as reference indicator. All other factor loadings and intercepts were freely estimated, but should show scalar invariance over time.

SR1 loaded only on Trait1, while SR2 and SR3 loaded only on Trait2. The rationale for this model is as follows: It was assumed that participating in a self-management program and, for example, learning new techniques to cope with the chronic illness may change the traits of the heiQ™ constructs between T1 and T2. This means that possible mean changes in heiQ™ items/scales between T1 and T2/T3 may indicate “real” trait-changes. But changes in heiQ™ items/scales between T2 and T3 were regarded as mere fluctuations around a common trait. For example, a decline in mean values between T3 and T2 may rather reflect an adaption process of new learned techniques on daily life circumstances than a “real” change in the measured trait. Note that these assumptions are difficult to test empirically; they rather represent underlying interpretations made by the authors.

Of note, Trait1 and Trait2 were not fully identified with only three latent state residual variables. As an additional restriction, we therefore assumed constant variances of the latent state residual variances over time [14]. Furthermore, the intercepts of SR1 and SR3 were fixed to zero, but the intercept of SR2 was estimated freely.

Changes in latent trait variables

To estimate the difference between the two trait variables directly, a latent difference variable Trait2-1 was introduced [14, 36]. A latent difference variable represents the true difference of two latent variables, i.e., a difference without measurement error. Technically, it is defined and identified by fixing the paths from Trait1 and Trait2-1 to Trait2 at 1 and the variance of Trait2 at 0 [36]. A standardized estimate was computed by dividing the mean of Trait2-1 by the standard deviation of Trait1 [37, 38]. Furthermore, to compare these values with more common parameters, the difference between T3 and T1 in a manifest heiQ™ scale was computed and divided by the standard deviation of the scale value on T1. Standardized differences of 0.2/0.5/0.8 were regarded as small/medium/large in latent and observed scores [39].

Method effects were again modeled according to the CT − C(M − 1) approach [28, 29]. The latent method factors have expected values of zero and do not correlate with latent trait factors or the latent state residuals.

Predictors of change in latent trait variables

Age, sex and diagnostic group were included as predictors of Trait1 and Trait2-1 in the model using weighted effect-coding [40].

LST parameters for items

From the estimated parameters of the LST models, the following five psychometric parameters were computed (abbreviations used in the equations are explained in the following paragraph) [28]:

$${\text{CCO}}({\text{OV}}_{it} ) = \frac{{\lambda_{it}^{2} {\text{Var}}({\text{Trait}}_{k} )}}{{{\text{Var}}({\text{OV}}_{it} )}}$$
(1)
$${\text{UCO}}({\text{OV}}_{it} ) = I(i \ne r)\frac{{\gamma_{it}^{2} {\text{Var}}({\text{MF}}_{i} )}}{{{\text{Var}}({\text{OV}}_{it} )}}$$
(2)
$${\text{TCO}}({\text{OV}}_{it} ) = {\text{CCO}}({\text{OV}}_{it} ) + {\text{UCO}}({\text{OV}}_{it} )$$
(3)
$${\text{OSpe}}({\text{OV}}_{it} ) = \frac{{\delta_{it}^{2} {\text{Var}}({\text{SR}}_{t} )}}{{{\text{Var}}({\text{OV}}_{it} )}}$$
(4)
$$\text{Re} {\text{l}}({\text{OV}}_{it} ) = {\text{TCO}}({\text{OV}}_{it} ) + {\text{OSpe}}({\text{OV}}_{it} )$$
(5)

The common consistency coefficient (CCO) represents the degree of the variance of an OV that is stable over time and explained by the trait factor Trait k (k = 1, 2). The uniform consistency coefficient (UCO) represents the part of the variance that is also stable over time, but not shared with the reference indicator. These two coefficients sum up to the total consistency coefficient (TCO), reflecting the total stable part of the variance of an OV. The part of the variance that is explained by the measurement situation and/or person × situation interaction is represented by the occasion specificity coefficient (OSpe). The sum of TCO and OSpe results in the reliability coefficient (Rel) that indicates the degree to which the observed variance is explained by systematic influences, i.e., not determined by measurement error. Reliability coefficients above 0.7/0.8/0.9 were regarded as acceptable/good/very good. The terms λ it and δ it represent the factor loading of an item i at time t on Trait k and SR t , respectively, while γ it represent the factor loading of item i at time t on the latent method factor i (except for the reference indicators r that do not load on any method factor).

LST parameters for scales

LST parameters were also estimated for whole heiQ™ scales. Aggregated latent variables were computed for each scale on each measurement occasion [41, 42]. The aggregated equations developed by Eid and Diener [42] were used and adjusted (see Appendix). Note that all computations of LST parameters were based on model-implied estimates.

Software, missing values and alpha

All structural equation models were computed using Mplus v7.1 [43] with robust maximum likelihood estimator and were based on covariance matrices. Missing values (0–3 %) were handled using full-information maximum likelihood estimation. Models were evaluated using chi2-test and fit-indices Comparative Fit Index (CFI) and Root Mean Square Error of Approximation (RMSEA), but as the chi2-test is highly influenced by sample size, a significant test did not automatically lead to model rejection. Generally, model fit was regarded as acceptable with CFI close to 0.95 and RMSEA close to 0.06 [44]. However, invariance tests were guided by EPCs/modification indices and not by measures of fit for the whole model. Therefore, models with fit-indices below these values may sometimes also be regarded as acceptable. Alpha was fixed to 0.05 for all analyses.

Results

Sample

Fifty-eight percent of the patients were female; mean age was 48.3 years (SD = 9.1). Detailed sample characteristics are found in Table 1.

Table 1 Sample characteristics

Measurement invariance over time

All heiQ™ scales showed measurement invariance over time in factor loadings of the latent state factors S t and in all intercepts (Table 2). Therefore, they fulfilled the requirements for scalar invariance over time with respect to the latent state factors.

Table 2 Measurement invariance analysis over time of all heiQ™ scales

Estimation of LST models

Model fit of LST models were nearly the same as the scalar invariance models of Table 2 (not shown). However, in Health-directed activities, estimation of the LST models leads to a negative residual variance in Trait2-1 [45]. After fixing this parameter on 0.01, the model could be estimated regularly.

LST parameter for scales

Estimated LST parameters for the scales are presented in Table 3. Reliability coefficients range from 0.81 to 0.94. In all scales, TCO values are clearly higher than OSpe values, indicating that the scales are influenced more strongly by stable person factors than by situational factors. Highest OSpe values are found in Skill and technique acquisition (0.23–0.27) and Health-directed activities (0.28–0.39), i.e., up to 40 % of the latter scale’s variance can be attributed to the measurement situation. CCO varies between 0.37 and 0.75, showing that 75 % of the observed variance in Emotional distress can be attributed to the common trait, while in most other scales, the common trait accounts for about 50 % of the variances. Method-specific factors account for 4–5 % (Emotional distress) to up to about 15 % (Positive and active engagement in life, Self-monitoring and insight) of the scale variances.

Table 3 Latent state–trait parameters for each heiQ™ scale

LST parameters for items

Table 4 shows the range of LST parameters for each scale per measurement occasion (detailed results are available upon request). The range of the parameters can be viewed as an indicator of homogeneity. The smaller the range, the more homogenous are the items of that scale in respective scale parameter. For example, reliability coefficients show little variation in Social integration and support, indicating that these items hardly differ in view of reliability.

Table 4 Range of LST parameters of respective heiQ™ items in each heiQ™ scale

For nearly all items, TCO values are clearly higher than OSpe values. Most scales show considerable ranges in CCO. However, they differ in the amount of CCO of the items. For example, the range of CCO values is >.30 in Emotional distress as well as in Self-monitoring and insight. However, even the items with lowest CCO value in Emotional distress (CCO = 0.34) shares one-third of its variance with the reference indicator, while in Self-monitoring and insight, the item with the lowest CCO value (CCO = 0.12) shares only about 12 % with the reference indicator.

Reliability coefficients range between 0.38 (item 3, “As well as seeing my doctor, I regularly monitor changes in my health”) and 0.85 (item 21, “If I think about my health, I get depressed”). There is no general decrease or increase in LST parameters over time. Overall, CCO, OSpe and Rel are lower for most items than for respective scales. However, UCO is sometimes higher on the item level than on the scale level.

Changes in latent trait variables

The standardized means of Trait2-1 across all disease groups (Table 5) vary from 0.07 (Social integration and support) to 0.57 (Skill and technique acquisition, Self-monitoring and insight), indicating that in some constructs trait-changes of low to medium size can be detected. In five scales, the statistical significant intercept of SR2 indicates mean changes between T2 and T3. The highest intercept value is found in Health-directed activites. Table 4 also shows standardized mean differences between T1 and T3 in observed heiQ™ scales (ESObs3-1). In Positive and active engagement in life, Health-directed activities and Self-monitoring and insight ESObs3-1 were substantially lower than the standardized difference between latent trait variables. In all other scales, both latent and observed differences were nearly equivalent.

Table 5 Means and standard deviations of latent trait variables (Trait1, Trait2) and the latent difference variable (Trait2-1), effect sizes and intercepts of latent state residual variable on measurement occasion 2 (SR2)

Predictors of change in latent trait variables

Table 6 shows that in some scales Trait1 and/or Trait2-1 mean values depend on diagnosis group. For example, patients with rheumatic disorders show lower T1 mean values in most scales and also smaller increases in three scales (Self-monitoring and insight, Positive and active engagement in life and Health-directed activities). Asthma patients show higher means in Trait1 and in Trait2-1 in Skill and technique acquisition and Self-monitoring and insight as well as lower Emotional distress on T1 and higher decline to T2. Higher gain for asthma patients were also found in Positive and active engagement in life and in Health-directed activities. Furthermore, mean differences and Trait1 means depend on age and sex, in some scales. For example, women as well as younger patients show higher Trait1 values and a higher decrease over time in Emotional distress. Women also show higher gains in Constructive attitudes and approaches and in Positive and active engagement in life.

Table 6 Impact of disease group, age and sex on latent trait variables (Trait1) and latent difference variables (Trait2-1) of each heiQ™ scale

Discussion

By using models of LST theory, this study showed that the scales of the German version of the heiQ™ are invariant over time that they are reliable and that they are primarily influenced by person factors that are independent of the measurement situation. Therefore, the heiQ™ may be a useful instrument to measure stable effects of self-management programs over time. This study further showed that short-time changes in self-management skills and emotional well-being after inpatient rehabilitation remain stable over a period of at least 3 months.

The LST models of seven scales showed good fit values, suggesting that parameter estimates are trustworthy. Estimates for consistency and occasion specificity showed that most items and scales of the heiQ™ are primarily influenced by person factors rather than by measurement situations. In addition, the heiQ™ scales show (very) good reliability coefficients. The TCO coefficients are comparable with or even higher than those of other measures of situation-unspecific mental health [11, 14]. For example, it was shown that some measures of depression show OSpe > 0.4, indicating that the scales are affected to a large extend by the measurement situation [46]. These results demonstrate that the heiQ™ fulfills important requirements to be a valid instrument to evaluate stable effects of self-management programs.

However, the results for Health-directed activity must be interpreted with caution. The model could only be estimated by fixing the residual variance of Trait2-1 to 0.01; however, this procedure assumes that the population value of this parameter is close to 0 and that the improper solution is not caused by wrong model specification [45, 47].

As expected, most items of the heiQ™ showed lower reliability values than their respective scales, supporting the notion that researcher should use scales rather than individual items. Interestingly, some items show higher UCO values than CCO values, indicating that most part of the stable variance measured by these items are not shared by the reference item. For example, in Self-monitoring and insight, the CCO values of item 11 (“I have a very good understanding of when and why I am supposed to take my medication”) are somewhat low (CCO = 0.12–0.14), while the UCO values are comparatively high (UCO = 0.28–0.31). One might conclude that item 11 is a poor indicator of the measured construct. However, another explanation might be that the construct measured by this scale is not fully captured by the reference item (in this case, item 16 [“When I have health problems, I have a clear understanding of what I need to do to control them”)]. Note that in a CT − C(M− 1) model the contents of the state- and trait variables are determined by respective reference items.

Schwarze and Spanier et al. (in prep) demonstrated short-term changes in the heiQ™ scales after inpatient rehabilitation. Our analyses complement their results and showed that changes in self-management skills (Skill and technique acquisition, Self-monitoring and insight) or emotional well-being (Emotional distress, Active engagement in life) of medium size were stable at least over a period of 3 months. These results are also in line with the assumption of a trait-change in the constructs measured by the heiQ™ scales. However, it needs to be noted that we did not (and could not) prove that the trait-change model is correct; we only showed that the model provides a plausible interpretation of the data.

Also, in line with Schwarze and Spanier et al. (in prep), changes in most heiQ™-scales can be found in all diagnostic groups. This result supports the generic definition of the heiQ™ scales, as all self-management programs pursue similar goals (e.g., enhancing self-management). Nevertheless, in many scales, Trait1 and Trait2-1 are clearly affected by type of chronic disease. This may reflect differences in clinical presentation, course of diseases, and demands of treatment among these chronic conditions. For example, the lower values of patients with rheumatic diseases on Trait1 and Trait2-1 in some heiQ™-scales may reflect that rheumatic diseases have high impact on health status and that both symptoms and disease trajectories are difficult to control. On the other hand, the higher gains of asthma patients in Skill and technique acquisition may reflect that training to control asthma attacks is a major part of the treatment [48].

All heiQ™-scales show scalar invariance over time according to the latent state factor. In contrast, Nolte et al. [49] found some heiQ™ items to be scalar invariant; however, they used an earlier (English) version of the heiQ™ (42, item, 6-point response scale) and a different (and probably stricter) criterion to identify non-invariant items, i.e., the chi2-difference test [50]. In our study, non-invariance is defined by the size of the difference in a parameter over time [33], i.e., EPC > 0.25.

Nearly all heiQ™ scales show non-invariant factor loadings in some items according to method factors. However, since the method factors in the CT − C(M − 1) approach are defined as residuals with respect to the latent trait factors/latent state residual factors and have an expected mean of 0, non-invariant factor loadings of method factors do not change the meaning of the latent trait factors/latent state residual factors nor do they affect latent means.

Limitations

The main limitation of this study is that some model assumptions could not be tested empirically. For example, whether mean differences in heiQ™ items or scales should be modeled as trait-changes or as state-changes is a conceptual decision, not an empirical result. Another challenging problem is that our LST models could only be identified by restricting the latent state residual variances to be stable over time. Neither the correctness nor the consequences of this assumption on estimates of latent trait–state variances can be tested. To identify a latent change model without such restrictions, a further measurement occasion for trait1 is necessary [13, 14]. Unfortunately, data from two or more time points before an intervention are often unavailable or difficult to obtain.

Using the CT − C(M − 1) approach, parts of our results (e.g., CCO/UCO coefficients, latent trait means) strongly depend on the choice of an appropriate reference indicator. However, small CCO values in some items raise the questions whether the main content of all heiQ™ scales can be captured by a single item, whether the best items were chosen as reference indicator or whether items with low CCO values should be deleted from the scale. Further psychometric studies of the heiQ™ may clarify these issues.

Though we included disease group as predictor of Trait1 and Trait2-1, we did not examine whether LST parameters differed between disease groups since the numbers of patients in the subgroups would have been too small to yield sufficiently robust estimates.

Conclusion

The heiQ™ can be used with confidence in a variety of settings and conditions. Our results indicate that it may be a useful tool for assessing stable effects in important outcomes of self-management programs over time, e.g., changes in self-management skills or emotional well-being. Furthermore, we showed that applying LST theory can give further insights into the psychometric properties of measurement instruments in the health sciences.