Synonyms

Critical difference; Clinical significance; Clinically significant change; Significant difference

Definition

Reliable Change Index (RCI) is a concept in measurement and assessment. An RCI is a psychometric criterion used to evaluate whether a change over time of an individual score (i.e., the difference score between two measurements in time) is considered statistically significant. Computationally, RCIs represent a ratio, in which the numerator represents an actual observed difference score between two measurements, and the denominator is some form of standard error of measurement of the difference. An RCI indicates whether an individual change score (e.g., between a patient’s pre-intervention and post-intervention assessment) is statistically significantly greater than a difference that could have occurred due to random measurement error alone.

Description

The concept of Reliable Change Index (RCI) refers to a method that is used to test whether a change over time – that is, the difference score between two assessments of the same person at two points in time – may be considered “reliable” or “(clinically) significant.” In particular, RCIs are commonly used to assess whether some condition or construct (e.g., depression; cognitive functioning) changed during an intervention (e.g., between pre-intervention and post-intervention).

The term “reliable change” is used to differentiate change that is reliable in the statistical sense (i.e., change that is statistically significant) from change that may have occurred due to random fluctuation in measurement (e.g.measurement error; Jacobson & Truax, 1991; Maassen, 2004).

Originally, Jacobson and Truax introduced an index to assess change in 1981, based on Classical Test Theory, building on previous work by McNemar (1962) and Lord and Novick (1968). The term Reliable Change Index was then introduced by Jacobson, Follette, and Revenstorf (1984). Today, the method is commonly referred to as the Jacobson-Truax Index or as the classical approach to reliable change. However, alternative methods to calculate RCIs have been developed, and the term RCI generally refers to a large number of different variations of reliable change indices that are based on similar concepts.

In essence, an RCI is designed to numerically quantify whether an observed difference between two measurements or assessments may be considered “reliable” or statistically significant. In principle, the formula for an RCI is simple: An RCI is calculated as a ratio, in which the numerator represents the difference between two measurements (e.g., a pre-intervention assessment of depression and a post-intervention assessment of depression) and the denominator represents some form of a standard error of measurement of the difference.

$$\begin{array}{ll} \mathrm{ Reliable}\;\mathrm{ Change}\;\mathrm{ Index}={{{{{\mathrm{ x}}_{{\left( {\mathrm{ time}\;2} \right)}}}-{{\mathrm{ x}}_{{\left( {\mathrm{ time}\ 1} \right)}}}}}} /\cr {{\mathrm{ standard}\;\mathrm{ error}\;\mathrm{ of}\;\mathrm{ measurement}\;\mathrm{ of}\;\mathrm{ the}\;\mathrm{ difference}}}\end{array}$$
(1)

where x(time 1) and x(time 2) are measurement scores for an individual at different points in time.

In the original formulation of Classical Test Theory, the standard error of measurement of the difference was defined as the following (cf. McNemar, 1962):

$$ \begin{array}{ll}\mathrm{ standard}\;\mathrm{ error}\;\mathrm{ of}\;\mathrm{ measurement}\;\mathrm{ of}\;\mathrm{ the}\;\mathrm{ difference}\cr=\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}1}}*\left( {1-{{\mathrm{ r}}_{\mathrm{ x}1}}} \right)+\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}2}}*\left( {1-{{\mathrm{ r}}_{\mathrm{ x}2}}} \right) \end{array} $$
(2)

where variance x1 refers to the variance of all x scores at time 1, variance x2 refers to the variance of all x scores at time 2, rx1 is the test reliability at time 1, and rx2 is the test reliability at time 2.

The formula shows that the standard error of measurement of the difference takes into account the variance of scores at time 1 and the variance of scores at time 2 as well as the reliability coefficients of time 1 and time 2. All methods for calculating the standard error of measurement of the difference are variations of this formula. The various methods differ in how they substitute – or estimate – the parameters of the formula (e.g., for cases in which the reliability coefficient and/or the variance of scores at time 2 is unknown, some methods assume/propose it is equal to the reliability coefficient and/or variance at time 1).

The size of an RCI is a direct estimate of the statistical significance of the difference score. Given that, in order to obtain an RCI, observed difference scores are divided by the corresponding standard error of measurement of the difference, RCI values are equivalent to standardized z-scores (i.e., they have a mean of 0 and a standard deviation of 1). Thus, an RCI that is greater than 1.96 denotes a statistically significant difference or, to use the current terminology, reflects a “reliable change.” (A z-score of +1.96 corresponds to the 97.5th percentile of a normal distribution; in other words, 95 % of all z-standardized values in a normal distribution are smaller or equal to +/− 1.96. Following the convention of using 5 % as the threshold for statistical significance, an RCI of greater than 1.96 is therefore considered statistically significant).

Despite the simplicity of the generic RCI formula, RCIs have been a matter of debate in the psychometric literature (Hinton-Bayre, 2000; Maassen, 2000; Mellenbergh & Van den Brink, 1998; Temkin, Heaton, Grant, & Dikmen, 1999). Primarily, the debate concerns the question of how to calculate the standard error of measurement of the difference – not to be confused with standard error or standard error of measurement , both of which are related, but different concepts. Due to different approaches to conceptualizing and calculating the standard error of measurement of the difference, numerous RCIs exist in the literature. In a review, Perdices (2005) presents eight different approaches to calculating the standard error of measurement of the difference. The different methods for calculating the standard error of measurement of the difference do not reflect trivial differences. In fact, Maassen (2004), Maassen, Bossema, and Brand (2009), and Perdices (2005) demonstrate how applying various RCIs to identical data may lead to a different standard error of measurement of the difference and, consequently, to different conclusions about the “significance” of a change score.

For a user of RCIs, it is critical to explicate the different statistical and conceptual assumptions that underlie the different methods for calculating RCIs. For example, the method proposed by Ley (1972) calculates the standard error of measurement of the difference by inserting the (generic) variance and retest reliability of a test (which, of course, need to be known, e.g., from a test manual/normative sample), as shown in formula (3).

$$ \begin{array}{ll} \mathrm{ standard}\;\mathrm{ error}\;\mathrm{ of}\;\mathrm{ measurement}\;\mathrm{ of}\;\mathrm{ the}\;\mathrm{ difference}\cr =\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}}}*\mathrm{ sqrt}\left( {2*\left( {1-\mathrm{ retest}\;\mathrm{ reliability}} \right)} \right)\end{array} $$
(3)

In comparison, the formula by Christensen and Mendoza (1986) estimates its parameters from a sample under study, that is, by using the variances of measurement scores at time 1 and time 2 and the covariance between measurement scores at time 1 and time 2, as illustrated by formula (4).

$$\begin{array}{ll} \mathrm{ standard}\;\mathrm{ error}\;\mathrm{ of}\;\mathrm{ measurement}\;\mathrm{ of}\;\mathrm{ the}\;\mathrm{ difference}\cr =\mathrm{ sqrt}\left( {\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}1}}+\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}2}}-\mathrm{ vairanc}{{\mathrm{ e}}_{\mathrm{ x}1}}*}\right. \cr \quad\ \left.\mathrm{ varianc}{{\mathrm{ e}}_{\mathrm{ x}2}} * \mathrm{ correlatio}{{\mathrm{ n}}_{\mathrm{ x}1\mathrm{ x}2}} \right) \end{array} $$
(4)

Except for in a few special scenarios, these two formulas lead to different results. Furthermore, the two methods are based on different assumptions and are designed for different scenarios. Ley uses a test’s generic variance and test-retest reliability (given that they are known) to calculate the standard error of measurement of the difference. This approach assumes that a test’s variance and its reliability are equal across samples and across measurement occasions.

Christensen and Mendoza’ s approach (1986), on the other hand, takes into account that, for a given sample of interest, the variance of x scores may change between measurement time 1 and time 2, and it considers the covariance between the two measurement time points for the observed sample to be an adequate reference criterion against which to evaluate the significance of an individual change score.

Therefore, Ley’s formula seems appropriate if an individual’s change over time is to be evaluated in relation to a representative reference sample and if the retest reliability of the measure is known. Mendoza and Christiansen’s method, on the other hand, can be used in a scenario in which measurements at time 1 and time 2 for a sample are available and in which individual change scores are to be evaluated in relation to the distribution of difference scores in the given sample. Choosing an appropriate RCI should therefore be guided by conceptual and theoretical considerations. However, as can be seen, in practice, the parameters of a given scenario limit which RCIs can possibly be calculated. As demonstrated above, some RCIs require that the test reliability be known; other RCIs require that the distribution of scores at time 1 and time 2 of a reference sample be known.

Conclusion and Further Resources

The discussion of the RCI presented here has focused on a generic description of the RCI as a concept in assessment. In addition, we briefly delineated some key challenges and limitations of RCIs that have been discussed in the psychometric literature. Given the substantially different conceptual and computational approaches of the various RCIs for calculating the standard error of measurement of the difference, it is critical that users indicate which RCI they have chosen to use, point out the RCI’s potential limitations, and stringently explicate and evaluate the assumptions that underlie the chosen RCI.

Cross-References

Measurement Error

Reliability

Significance, Statistical

Standard Error of Measurement

Standard Errors

Standard Scores

Test-Retest Reliability