3.1 Introduction

In Chap. 2, the analysis of data from an RCT with only one follow-up measurement was discussed. However, as has been mentioned before, in the past decade, an RCT with only one follow-up measurement has become very rare. Mostly more than one follow-up measurement is performed. In some RCTs two follow-up measurements are performed in order to estimate the short-term and long-term effects of the intervention, but sometimes even more follow-up measurements are performed in order to estimate the difference in the development over time in a particular outcome between the intervention and control groups.

Basically, for the analysis of data from an RCT with more than one follow-up measurement, the same problems arise than for the analysis of data from an RCT with only one follow-up measurement, i.e., an adjustment must be made for the baseline value in order to adjust for regression to the mean. Until the start of a new millennium, the analyses of data from an RCT with more than one follow-up measurement were split into separate parts, i.e., the effect of the intervention was estimated for all follow-up measurements separately. Although it is interesting to estimate the intervention effects at the different follow-up measurements, performing separate analyses for the different follow-up measurements ignores the fact that the measurements were performed on the same subjects, i.e., it ignores the fact that the repeated measurements on the same subject are dependent of each other. Because of that, nowadays, it is necessary to take this dependency of the observations into account and estimate the effects of the intervention at different follow-up measurements in one statistical model. The most classical way to do this is to use a generalized linear model (GLM) for repeated measures, but that method has some serious flaws. Therefore, regression-based methods such as mixed models or generalized estimating equations (GEE analysis) are mostly used to estimate the effect of an intervention from an RCT with more than one follow-up measurement. In the remaining part of this chapter, several methods will be discussed that can be used (or are used) in the analysis of RCT data with more than one follow-up measurement. Not all of the methods are equally appropriate for the analysis of RCT data with more than one follow-up measurement, but it is important to discuss the pros and cons of the different methods to finally give a solid recommendation which method(s) should be used.

3.2 Example

To illustrate the different possible ways to analysis RCT data with more than one follow-up measurement, a hypothetical example will be used. In this example dataset, a new intervention is compared to a control condition regarding the outcome variable complaints. Complaints are measured as a continuous outcome variable, and beside a baseline measurement, three follow-up measurements were performed. Table 3.1 shows descriptive information for both the intervention and control groups at all four measurements.

Table 3.1 Descriptive information (mean and standard deviation) of the outcome variable complaints

3.3 GLM for Repeated Measures

The basic idea behind GLM for repeated measures (which is also known as (multivariate) analysis of variance ((M)ANOVA) for repeated measures) is the same as for the well-known paired t-test. Within a GLM for repeated measures, the statistical testing is carried out for the T − 1 absolute differences between subsequent measurements. In fact, GLM for repeated measures is a multivariate analysis of these T − 1 absolute differences. Multivariate refers to the fact that T − 1 differences are used simultaneously as outcome variable. Besides the multivariate approach, the same research question can also be answered with a univariate approach. This univariate procedure is comparable to the procedures carried out in an analysis of variance (ANOVA) and is based on the sum of squares, i.e., squared differences between observed values and average values. From a GLM for repeated measures with one dichotomous independent variable (i.e., the intervention variable), basically three effects can be derived: an overall time effect (i.e., is there a change over time, independent of the different groups), an overall group effect (i.e., is there a difference between the groups on average over time) and, most important, a group-time interaction effect (i.e., is there a difference between the groups in development over time). See for details, regarding GLM for repeated measures, Twisk et al. (2013). Table 3.2 shows the structure of the data used to estimate the parameters of a GLM for repeated measures.

Table 3.2 Data structure needed to perform a GLM for repeated measures

Output 3.1 shows the results of a GLM for repeated measures performed on the example dataset, while Fig. 3.1 shows the so-called estimated marginal means resulting from the GLM for repeated measures.

Output 3.1
scheme 1

Results of a GLM for repeated measures performed on the example dataset

Fig. 3.1
figure 1

Estimated marginal means derived from a GLM for repeated measures performed on the example dataset (continuous line = control; dotted line = intervention)

Output 3.1 contains two tables with results. The first table of the results shows the p-value for the overall intervention effect (p = 0.0002). This highly significant p-value indicates the difference between the intervention group and the control group on average over time. In the second table of the results, the p-values are given for the overall time effect (p < 0.001) and for the interaction between intervention and time (p = 0.6101). The overall time effect indicates the development over time for the whole population, while the intervention-time interaction effect indicates the difference in development over time between the intervention and control groups. From Fig. 3.1 and Table 3.1, however, it can be seen that the baseline values of both groups are different. In Chap. 2 it was already discussed that the difference in baseline values between the groups leads to regression to the mean and that, therefore, an adjustment must be made for these baseline differences. Within the framework of a GLM for repeated measures, also an adjustment can be made for the baseline value. This approach is also known as a multivariate analysis of covariance (MANCOVA) for repeated measures. Output 3.2 and Fig. 3.2 show the results of a GLM for repeated measures adjusting for the baseline value performed on the example dataset.

Output 3.2
scheme 2

Results of a GLM for repeated measures adjusted for the baseline differences performed on the example dataset

Fig. 3.2
figure 2

Estimated marginal means derived from a GLM for repeated measures adjusted for the baseline differences performed on the example dataset (continuous line = control; dotted line = intervention)

From Output 3.2 it can be seen that the p-value for the interaction between intervention and time decreases to 0.0987. The latter is a better indication of the significance level of the intervention effect, because from Fig. 3.2 it can be seen that the decrease in complaints over time is a bit in favor of the intervention group. It also makes sense in light of the adjustment for regression to the mean. Because the intervention group has a lower baseline value, the decrease in complaints is harder to achieve. An adjustment for the baseline value provided, therefore, a lower p-value for the interaction between intervention and time.

Although GLM for repeated measures is often used, it has a few major drawbacks. First of all, it can only be applied to complete cases; all subjects with one or more missing observation are not part of the analyses. Secondly, GLM for repeated measures is mainly based on statistical testing. The parameters obtained from a GLM for repeated measures are p-values. This is a major drawback, because there is much more interest in effect estimates and confidence intervals around the effect estimates. Within a GLM for repeated measures, it is hard to get a proper effect estimate. Because of this, nowadays, GLM for repeated measures is not much used for the analysis of RCT data with more than one follow-up measurement.

3.4 Regression-Based Methods

The two mostly used regression-based methods to analyze RCT data with more than one follow-up measurement are mixed model analysis and GEE analysis (Twisk, 2013). The two most important advantages of the regression-based methods are that all available data is included in the analysis and that they provide effect estimates and confidence intervals around the effect estimates.

It has been mentioned before that when more than one follow-up measurement is analyzed in one statistical model, an adjustment must be made for the dependency of the repeated observations within the subject. In fact, when there is more than one follow-up measurement, there is longitudinal data. Both mixed models and GEE analysis can be used to analyze longitudinal data, and the difference between the two methods is that they take into account this dependency in a different way.

The basic idea behind the adjustment for the dependency of the observations within the subject is that in the regression model an adjustment has to be made for the variable “subject.” The variable “subject” is mostly the id number, and although it looks like a discrete variable, in regression modeling, it should be treated as a categorical variable, and a categorical variable must be represented by dummy variables. Suppose there are 200 subjects in a particular RCT; this means that 199 dummy variables are needed to adjust for “subject.” Because this is practically impossible, the adjustment for “subject” has to be performed in a more efficient way, and the two regression-based methods that are mostly used to analyze longitudinal data (mixed model analysis and GEE analysis) differ from each other in the way they perform that adjustment (Twisk, 2013).

As has been mentioned before, the general idea behind all longitudinal statistical methods is to adjust for “subject” in an efficient way. If the adjustment for the “subject” variable was performed by adding dummy variables to the regression model, basically for each subject a separate intercept is estimated. The starting point of a mixed model analysis, which is also known as multilevel analysis (Goldstein, 2003; Twisk, 2006), hierarchical linear modeling, or random effects modeling (Fitzmaurice et al., 2004; Laird & Ware, 1982), is the estimation of all these intercepts, but then the different intercepts are summarized into one coefficient: the variance. This variance is based on a normal distribution that is drawn over all the intercepts. So, a mixed model analysis consists of three steps: (1) estimating the different intercepts for all subjects, (2) drawing a normal distribution over all these intercepts, and (3) estimating the variance of that normal distribution. That variance is known as the random intercept variance, and the random intercept variance is added to the regression model.

It is also possible that not only the intercept is different for each subject but that also the development over time is different for each subject; in other words, there is an interaction between “subject” and time. In this situation the variance of the regression coefficients for time can be estimated, i.e., a random slope for time. In fact, this kind of individual interactions (i.e., random slopes) can be added to the regression model for all independent variables that are time-dependent. In a regular RCT, however, assuming a random slope for the intervention is not possible, because the intervention variable is time-independent (Twisk, 2006). When a certain subject is assigned to either the intervention or control group, that subject stays in that group along the intervention period. An exception is the cross-over trial, in which the subject is its own control and the intervention variable is, therefore, time-dependent. In this situation the intervention effect can be different for each subject, and therefore a random slope for the intervention variable can be added to the model (see Chap. 5).

Within GEE analysis, the adjustment for the dependency of observations is done in a slightly different way, i.e., by assuming (a priori) a certain working correlation structure for the repeated measurements of the outcome variable (Liang & Zeger, 1986; Zeger & Liang, 1986). Depending on the software package used to estimate the regression coefficients, different correlation structures are available. They basically vary from an exchangeable (or compound symmetry) correlation structure, i.e., the correlations between subsequent measurements are assumed to be the same, irrespective of the length of the interval between the repeated measurements, to an unstructured correlation structure. In this structure no particular structure is assumed, which means that all possible correlations between the follow-up measurements are estimated.

In the literature it is assumed that GEE analysis is robust against a wrong choice for a correlation structure, i.e., it does not matter which correlation structure is chosen; the results of the longitudinal analysis will be more or less the same (Liang & Zeger, 1986; Twisk, 2004). However, when the results of analyses with different working correlation structures are compared to each other, the magnitude of the regression coefficients can be different (Twisk, 2013). It is therefore important to realize which correlation structure should be chosen for the analysis. Although the unstructured working correlation structure is theoretically always the best, the simplicity of the correlation structure also has to be taken into account. The number of parameters (in this case correlation coefficients) which needs to be estimated differs for the various working correlation structures. The best option is therefore to choose the simplest structure which fits the data well. The first step in choosing a certain correlation structure can be to investigate the observed within person correlation coefficients for the outcome variable. It should be kept in mind that when analyzing covariates, the correlation structure can change (i.e., the choice of the correlation structure should better be based conditionally on the covariates). For a detailed explanation of the principles behind mixed model analysis and GEE analysis, one is referred to Twisk et al. (2013).

Within the framework of the regression-based methods, several models are available to evaluate the effect of an intervention in an RCT with more than one follow-up measurement (Twisk et al., 2018). In the next part of this chapter, the different models will be discussed.

3.4.1 Longitudinal Analysis of Covariance

Longitudinal analysis of covariance is an extension of the analysis of covariance described in Chap. 2, i.e., the outcome variable measured at the different follow-up measurements is adjusted for the baseline value of the outcome (Eq. 3.1):

$$ {Y}_t={\beta}_0+{\beta}_1X+{\beta}_2{Y}_{t0} $$
(3.1)

where Yt = outcome measured at the follow-up measurements, X = intervention variable, β1 = overall intervention effect, and Yt0 = outcome measured at baseline.

Table 3.3 shows the structure of the data used to estimate the parameters for a longitudinal analysis of covariance.

Table 3.3 Data structure needed to perform a longitudinal analysis of covariance

Output 3.3 shows the results of the longitudinal analysis of covariance (Eq. 3.1) performed with linear mixed model analysis to estimate the overall intervention effect over time in the example dataset which was introduced in Sect. 3.2.

Output 3.3
scheme 3

Results of the longitudinal mixed model analysis of covariance

Output 3.3 basically contains three parts. The first part shows some general information regarding the analysis which is performed. It can be seen that a mixed effects maximum likelihood (ML) regression analysis is performed and that the group variable is the id number. This means that the mixed model analysis takes into account the dependency of the observations within the subject. It can also be seen that there are 416 observations performed among 150 subjects and that the average number of follow-up measurements is 2.8. These numbers indicate that not all patients were measured at all follow-up measurements. It should be noted that the regression-based methods and especially mixed model analysis are highly suitable to deal with missing data (Twisk et al., 2013). Furthermore, this part of the output shows some additional model fit information, such as the log likelihood. The log likelihood is used in the likelihood ratio test, which can be used to compare models with each other.

The second part of the output contains the fixed part of the mixed model. In this part of the output, the regression coefficients are given. Besides that, also the standard errors, z-values, p-values, and 95% confidence intervals around the regression coefficients are provided. The coefficient for intervention (−0.1419588) indicates that on average over time, the intervention group has a 0.14 lower score on complaints compared to the control group. The standard error of this coefficient equals 0.0654837, and the z-value (−2.17) is derived by dividing the regression coefficient by its standard error. Based on the z-value, the p-value (0.030) is obtained, and the 95% confidence interval around the regression coefficient (−0.2703044 to −0.0136132) is calculated by the regression coefficient ± 1.96 times the standard error. It can further be seen that the difference between the groups (i.e., the effect of the intervention) is adjusted for the differences between the groups at baseline, i.e., the baseline value is added to the model as a covariate. The last part of the output contains the random part of the model, which contains the random intercept variance (0.1129856). This variance indicates the variation between the subjects in the outcome variable or in other words, the amount of variance in the outcome explained by the differences between the subjects.

Output 3.4 shows the results of exactly the same analysis but now performed with a linear GEE analysis. In this GEE analysis, an exchangeable correlation structure is used.

Output 3.4
scheme 4

Results of the longitudinal GEE analysis of covariance

The output of a longitudinal GEE analysis of covariance contains two parts, which are more or less the same as the first two parts of the output of the longitudinal mixed model analysis of covariance. In the first part, some general information is provided. This general information contains the group variable (id) and what kind of regression model is performed. In this situation a linear regression model is used (i.e., the link function is identity and the family is Gaussian). The information also shows that an exchangeable correlation structure is used for the estimation and it provides the scale parameter, which is a measure for the remaining unexplained variance after the analysis is performed. In the right column of the first part of the output, the same information is provided as has been provided in the first part of the output of the longitudinal mixed model analysis of covariance.

The second part of the output of a longitudinal GEE analysis of covariance provides the regression coefficients. The interpretation of the regression coefficient for the intervention variable (−0.1432429) is exactly the same as the interpretation of the regression coefficient of the intervention variable obtained from the longitudinal mixed model analysis of covariance. It also provides the standard error of the estimate (0.0649198), which is used in the calculation of the 95% confidence interval around the estimate, which ranges from −0.2704833 to −0.0160025 and the corresponding p-value (0.027). It should be noted that the effect estimate obtained from the longitudinal GEE analysis of covariance is almost the same as the one obtained from the longitudinal mixed model analysis of covariance (−0.1432429 versus −0.1419588). This is always the case. In fact, when there are no missing data, the regression coefficient obtained from a linear mixed model analysis with a random intercept is exactly the same as the regression coefficient obtained from a linear GEE analysis with an exchangeable correlation structure. This is caused by the fact that estimating one variance (the random intercept variance) is exactly the same as estimating one correlation (an exchangeable correlation structure) (Twisk, 2013). The only difference between the two regression coefficients in the present example is caused by missing data, and it is generally accepted that mixed model analysis deals better with missing data then GEE analysis (Twisk et al., 2013). Because the two methods to estimate the effect of the intervention almost give the same results and the fact that mixed models deals better with missing data, in the remaining part of this book, all examples with a continuous outcome variable will be analyzed with linear mixed model analyses.

After estimating the overall effect of the intervention on average over time, in a second step, the effects of the intervention at the (three) follow-up measurements can be estimated. With the longitudinal analysis of covariance, this is not done with three separate linear regression analyses, but this is done in one model. To assess the effect of the intervention at the different follow-up measurements, time and the interaction between the intervention variable and time are added to the model (Eq. 3.2):

$$ {Y}_t={\beta}_0+{\beta}_1X+{\beta}_2{Y}_{t0}+{\beta}_3{time}_2+{\beta}_4{time}_3+{\beta}_5X\times {time}_2+{\beta}_6X\times {time}_3 $$
(3.2)

where Yt = outcome measured at the follow-up measurements, X = intervention variable, β1 = intervention effect at the first follow-up measurement, Yt0 = outcome measured at baseline, and time2, time3 = dummy variables for the second and third follow-up measurement.

In this model, the regression coefficient for the intervention variable indicates the intervention effect at the first follow-up measurement. The intervention effect at the second follow-up measurement is calculated as the sum of the regression coefficient for the intervention variable and the regression coefficient for the interaction between the intervention variable and the time dummy variable for the second follow-up measurement (β1 + β5), while the intervention effect at the third follow-up measurement is calculated as the sum of the regression coefficient for the intervention variable and the regression coefficient for the interaction between the intervention variable and the time dummy variable for the third follow-up measurement (β1 + β6). Output 3.5 shows the result of this analysis.

Output 3.5
scheme 5

Results of the longitudinal mixed model analysis of covariance including an interaction between intervention and time

Output 3.5 also contains three parts: the upper part which contains the overall information, the middle part which contains the fixed part of the model, and the lower part which contains the random part of the model. Most interesting is, of course, the middle part, because that part contains the regression coefficients. The analysis performed leads to a regression coefficient for the intervention variable, two regression coefficients for the time dummy variables, two regression coefficients for the interactions between the intervention variable and the two time dummy variables, and the regression coefficient for the baseline value. The latter indicates again that a longitudinal analysis of covariance was performed with an adjustment for the baseline value. The regression coefficient for the intervention variable (−0.1036192) indicates the difference between the intervention group and the control group at the first follow-up measurement (i.e., the reference time point). The regression coefficients for the two time dummy variables indicate the difference in complaints between the reference time point (i.e., the first follow-up measurement) and the other two follow-up measurements for the control group. These coefficients are, therefore, not really interesting. The regression coefficients for the two interactions terms indicate the difference between the first follow-up measurement and the other two follow-up measurements in the difference between the two groups. With these coefficients the effect estimates for the intervention at the second and third follow-up measurement can be calculated. For the second follow-up measurement, the effect estimate is −0.1036192 + −0.040448 = −0.1440672, while the effect estimate at the third follow-up measurement equals −0.1036192 + −0.0799769 = −0.1835961. The problem, however, is that although the effect estimates at the second and the third follow-up measurement can be calculated in this way, the standard errors (and therefore also the 95% confidence intervals and corresponding p-values) cannot be calculated. To obtain these standard errors, the performed longitudinal analysis of covariance should be reanalyzed with a different reference category for time. Output 3.6. shows the result of the analysis with the second follow-up measurement as reference time point, and Output 3.7 shows the result of the analysis with the third follow-up measurement as reference time point.

Output 3.6
scheme 6

Results of the longitudinal mixed model analysis of covariance including an interaction between intervention and time, with the second follow-up measurement as reference time point

Output 3.7
scheme 7

Results of the longitudinal mixed model analysis of covariance including an interaction between intervention and time, with the third follow-up measurement as reference time point

From Output 3.6, it can be seen that the regression coefficient for the intervention variable equals −0.1440672, which is equal to the number calculated based on the two regression coefficients provided in Output 3.5. Besides the effect estimate, the output also gives the standard error of the estimate and, therefore, also the 95% confidence interval around the effect estimate and the corresponding p-value. In Output 3.7, the effect estimate at the third follow-up measurement is provided (0.1835961) with its 95% confidence interval and corresponding p-value.

3.4.2 Repeated Measures

In the repeated measures analysis, the values of all four measurements of the outcome variable (i.e., the baseline value as well as the values of the three follow-up measurements) are used as outcome in the analysis. When the overall intervention effect is estimated, the model does not include time (Eq. 3.3), while when the intervention effect at the different follow-up measurements is estimated, time is represented by dummy variables (Eq. 3.4) . Because all four measurements are used as outcome, in the latter, three dummy variables are needed to represent time. The model includes further the interaction between intervention and time:

$$ {Y}_t={\beta}_0+{\beta}_1X $$
(3.3)

where Yt = outcome measured at all measurements, X = intervention variable, and β1 = overall intervention effect.

$$ {Y}_t={\beta}_0+{\beta}_1X+{\beta}_2{time}_1+{\beta}_3{time}_2+{\beta}_4{time}_3+{\beta}_5X\times {time}_1+{\beta}_6X\times {time}_2+{\beta}_7X\times {time}_3X+{\beta}_2{time}_1+{\beta}_3{time}_2+{\beta}_4{time}_3+{\beta}_5X\times {time}_1+{\beta}_6X\times {time}_2+{\beta}_7X\times {time}_3 $$
(3.4)

where Yt = outcome measured at all measurements, X = intervention variable, β1 = difference between the groups at baseline, and time2, time3, time4 = dummy variables for the first, second, and third follow-up measurement.

Table 3.4 shows the structure of the data used to estimate the parameters of a repeated measures analysis.

Table 3.4 Data structure needed to perform a repeated measures analysis

In Eq. 3.3, the regression coefficient for the treatment variable indicates the difference between the intervention and control groups on average over time. In the model with the three dummy variables (Eq. 3.4), the intervention effect at the first follow-up measurement is calculated as the sum of the regression coefficient for the intervention variable and the regression coefficient for the interaction between the intervention variable and the dummy variable for the first follow-up measurement (β1 + β5), while the intervention effect at the second follow-up measurement is calculated as the sum of the regression coefficient for the intervention variable and the regression coefficient for the interaction between the intervention variable and the dummy variable for the second follow-up measurement (β1 + β6). And of course, the intervention effect at the third follow-up measurement is calculated as the sum of the regression coefficient for the intervention variable and the regression coefficient for the interaction between the intervention variable and the dummy variable for the third follow-up measurement (β1 + β7).

In the repeated measures analysis, the baseline value is part of the outcome (see Table 3.4), and therefore it is not possible to adjust for the baseline values as well. Although some researchers try to do so, it does not make sense, because in that situation the baseline value as outcome is adjusted for itself. So, therefore, the analysis is relatively simple and only contains the intervention variable (Eq. 3.3). Output 3.8 shows the result of the analysis.

Output 3.8
scheme 8

Results of the longitudinal repeated measures mixed model analysis

In the upper part of Output 3.8, it can be seen that the maximal number of measurements for each subject is equal to 4, which shows that in this analysis, the baseline value is part of the outcome. As for the outputs of the longitudinal analysis of covariance, the most interesting part of the output is the middle part which contains the effect estimate for the intervention. The effect estimate is the regression coefficient for the intervention variable (−0.2480394), which indicates the difference between the intervention and control groups on average over time. It should be realized that this effect estimate includes the difference between the two groups at baseline, which is not caused by the intervention.

With the repeated measures analysis, it is also possible to obtain the effects of the intervention at the different time points. Therefore, three time dummy variables and the interaction between the intervention variable and the three time dummy variables were added to the model (Eq. 3.3). Although the default option in analyses with a categorical variable (i.e., time) is to take the first category as reference category, in this particular situation that makes no sense. Because the first category indicates the first measurement (i.e., the baseline value), the estimated difference between the groups at the first measurement is not related to the intervention and, therefore, not an actual effect estimate of the intervention. Therefore, in the first analysis, the second measurement (i.e., the first follow-up measurement) is used as reference category. Output 3.9 shows the results of this analysis.

Output 3.9
scheme 9

Results of the longitudinal mixed model repeated measures analysis including an interaction between intervention and time, with the second measurement (i.e., the first follow-up measurement) as reference time point

It has been mentioned before that from Output 3.9, the most interesting regression coefficient is the coefficient for the intervention variable (−0.2212681). That coefficient indicates the difference between the intervention and control groups at the first follow-up measurement. It has also been mentioned before that the regression coefficients given in Output 3.9 can also be used to calculate the effect estimates for the intervention at the second and third follow-up measurement. To do so, the regression coefficient of the intervention variable has to be added to the regression coefficients of the interactions between the intervention variable and the corresponding time dummy variable. So, the effect estimate for the intervention at the second follow-up measurement equals −0.2212681 + −0.0502898 = −0.2715579, while the effect estimate for the intervention at the third follow-up measurement equals −0.2212681 + −0.0774314 = −0.2986995. The problem of these calculations is (again) that there is no estimation of the standard errors of the estimates, and therefore there is no estimation of the 95% confidence intervals and the corresponding p-values. To obtain those, the repeated measures analysis with the interaction between the intervention variable and time must be performed with different reference categories for time. Outputs 3.10 and 3.11 show the results of the analyses with the second follow-up measurement and the third follow-up measurement as reference category.

Output 3.10
scheme 10

Results of the longitudinal mixed model repeated measures analysis including an interaction between intervention and time, with the third measurement (i.e., the second follow-up measurement) as reference time point

Output 3.11
scheme 11

Results of the longitudinal mixed model repeated measures analysis including an interaction between intervention and time, with the fourth measurement (i.e., the third follow-up measurement) as reference time point

The regression coefficient of the intervention variable provided by Output 3.10 (−0.2715579) gives the effect estimate for the intervention at the second follow-up, while the regression coefficient of the group variable provided by Output 3.11 (−0.2986994) gives the effect estimate for the intervention at the third follow-up. Although these effect estimates were already known from the calculation performed on the regression coefficients provided in Output 3.6, now for both effect estimates, also the standard errors are given, which are used in the estimation of the 95% confidence intervals and the corresponding p-values.

Although the repeated measures analyses performed so far included the baseline value, it should be noted again that in this analysis there is no adjustment for the baseline value. This is a general misunderstanding. Many researchers do believe that the repeated measures analysis performed does adjust for the baseline value. However, because the baseline value in these analyses is treated as an outcome instead of a covariate, the method actually does not adjust for the baseline value. To obtain an effect estimate of the intervention with a repeated measures analysis adjusted for the baseline, an alternative repeated measures analysis can be used. In this alternative repeated measures analysis, the intervention variable is not part of the model, but its interaction with time still is (Eqs. 3.5 and 3.6):

$$ {Y}_t={\beta}_0+{\beta}_1 time+{\beta}_2X\times time $$
(3.5)

where Yt = outcome measured at all measurements, X = intervention variable, and β2 = overall intervention effect.

$$ {Y}_t={\beta}_0+{\beta}_1{time}_1+{\beta}_2{time}_2+{\beta}_3{time}_3+{\beta}_4X\times {time}_1+{\beta}_5X\times {time}_2+{\beta}_6X\times {time}_3 $$
(3.6)

where Yt = outcome measured at all measurements, X = intervention variable, β4 = intervention effect at the first follow-up measurement, β5 = intervention effect at the second follow-up measurement, β6 = intervention effect at the third follow-up measurement, and time1, time2, time3 = dummy variables for the first, second, and third follow-up measurement.

Table 3.5 shows the structure of the data used to estimate the parameters of an alternative repeated measures analysis.

Table 3.5 Data structure needed to perform the alternative repeated measures analysis

Because the intervention variable is not included in the model, the baseline values for both groups are assumed to be equal and are reflected in the intercept of the model (β0). The intervention effects can be directly obtained from the regression coefficients for the interaction between the treatment variable and time (the overall treatment effect over time; β2 in Eq. 3.5) or between the treatment variable and the three dummy variables for time (intervention effects at the three follow-up measurements: β4, β5, and β6 in Eq. 3.7).

First, the alternative repeated measures mixed model analysis is applied to estimate the overall intervention effect on average over time. The model only includes time (coded 0 for the baseline value and 1 for all follow-up measurements; see Table 3.5) and the interaction between the intervention variable and time. Output 3.12 shows the result of this analysis.

Output 3.12
scheme 12

Results of the alternative longitudinal mixed model repeated measures analysis

From Output 3.12, the most important estimate is the regression coefficient for the interaction between the intervention variable and time (−0.1503951). This coefficient indicates the difference between the intervention and control groups on average over time. Because the intervention variable is not present in the model, the β0 (3.367597) is an estimation of the outcome (i.e., complaints) for the whole population when the time variable equals 0, which is in this situation the baseline value (see Table 3.5). Because it is an estimation for the whole population, it implies that the baseline value is assumed to be equal for both groups, which implicates that the analysis is adjusted for the baseline value.

To get effect estimates of the intervention at the three follow-up measurements, for each follow-up measurement, a time dummy variable must be used, and for all these three dummy variables, an interaction with the intervention variable must be added to the model. Again, the intervention variable itself is not part of the model (see Eq. 3.6). Output 3.13 shows the results of this analysis.

Output 3.13
scheme 13

Results of the alternative longitudinal mixed model repeated measures analysis including an interaction between intervention and time

The regression coefficients of interest from Output 3.13 are the three regression coefficients for the interactions between the intervention variable and the time dummy variables. These regression coefficients directly provide the effect estimates for the intervention at the three follow-up measurements. The regression coefficient for the interaction between the intervention variable and the dummy variable for the first follow-up measurement (−0.1077524) indicates the intervention effect at the first follow-up measurement, and the regression coefficient for the interaction between the intervention variable and the dummy variable for the second follow-up measurement (−0.1579727) indicates the intervention effect at the second follow-up measurement, while the regression coefficient for the interaction between the intervention variable and the dummy variable for the third follow-up measurement (−0.1852883) indicates the intervention effect at the third follow-up measurement. All these effect estimates are adjusted for the baseline value, because (again) the intervention variable itself is not added to the model. A nice advantage of the analysis performed is that for all effect estimates at the different follow-up measurements, the corresponding standard errors are estimated directly, and, therefore, the 95% confidence intervals around the effect estimates and the corresponding p-values are directly provided by Output 3.13. So, it is not necessary to reanalyze the data with different reference categories for the different follow-up measurements.

3.4.3 Analysis of Changes

In the third method to analyze RCT data with more than one follow-up measurement, not the observed values at the different follow-up measurements are analyzed but the changes between the baseline measurement and the first follow-up measurement, between the baseline measurement and the second follow-up measurement, and between the baseline measurement and the third follow-up measurement (Eq. 3.7):

$$ {Y}_t-{Y}_{t0}={\beta}_0+{\beta}_1X $$
(3.7)

where Yt = outcome measured at the follow-up measurements; Yt0= outcome measured at baseline; X = intervention variable, and β1 = overall intervention effect.

Although, it is sometimes suggested that the analysis of changes takes into account the difference between the groups at baseline, this is not the case (see Sect. 2.1), and, therefore, this method can also be performed with an adjustment for the baseline value of the outcome variable (Eq. 3.8):

$$ {Y}_t-{Y}_{t0}={\beta}_0+{\beta}_1X+{\beta}_2{Y}_{t0} $$
(3.8)

where Yt = outcome measured at the follow-up measurements, Yt0= outcome measured at baseline, X = intervention variable, and β1= overall intervention effect.

As in all other discussed methods, the model can be extended with time and the interaction between the intervention variable and time to estimate the effect of the intervention at the different follow-up measurements (Eqs. 3.9 and 3.10):

$$ {Y}_t-{Y}_{t0}={\beta}_0+{\beta}_1X+{\beta}_2{time}_2++{\beta}_3{time}_3+{\beta}_4X\times {time}_2++{\beta}_5X\times {time}_3 $$
(3.9)
$$ {Y}_t-{Y}_{t0}={\beta}_0+{\beta}_1X+{\beta}_2{Y}_{t0}+{\beta}_3{time}_2+{\beta}_4{time}_3+{\beta}_5X\times {time}_2+{\beta}_6X\times {time}_3 $$
(3.10)

where Yt = outcome measured at the follow-up measurements, Yt0=outcome measured at baseline, X = intervention variable, β1= intervention effect at the first follow-up measurement, and time2, time3 = dummy variables for the second and third follow-up measurement.

The overall intervention effect and the intervention effects at the three follow-up measurements can be obtained in the same way as been described for the longitudinal analysis of covariance (see Sect. 3.4.1). Table 3.6 shows the structure of the data used to estimate the parameters of the analysis of changes.

Table 3.6 Data structure needed to perform an analysis of changes

Output 3.14 shows the result of the mixed model analysis performed on the change scores in the outcome variable. The three change scores are calculated as the difference between the baseline value and the three follow-up measurements (see Eq. 3.7 and Table 3.6).

Output 3.14
scheme 14

Results of the longitudinal mixed model analysis of changes

The output of the longitudinal mixed model analysis of changes looks similar to the outputs of the mixed model analyses performed earlier. From the first part of Output 3.14, it can be seen that there are a maximum number of three observations: i.e., the three change scores between the baseline measurement and the three follow-up measurements. In the second part of the output (the fixed part of the model), the regression coefficients are given. The regression coefficient for the intervention variable (−0.0522368) indicates the overall intervention effect on average over time. This intervention effect actually is the difference between the groups in the changes between the baseline measurement and the three follow-up measurements. In Chap. 2 it was already argued that analyzing change scores (can) lead to bias in the effect estimates due to regression to the mean. It was also argued that a solution to this problem is to adjust the analysis of the change score for the baseline value. It has been mentioned before that this solution can also be applied for the longitudinal analysis of change scores (see Eq. 3.8). Output 3.15 shows the result of the analysis.

Output 3.15
scheme 15

Results of the longitudinal mixed model analysis of changes adjusted for the baseline value

In the middle part of Output 3.15 (the fixed part of the model), it can be seen that an adjustment is made for the baseline value of the outcome variable. The regression coefficient of the intervention variable (−0.1419588) again indicates the overall intervention effect on average over time, i.e., the difference between the groups in the differences between the baseline value and the three follow-up measurements. This difference, however, is now adjusted for the baseline differences between the groups.

In the same way, it is of course also possible to obtain the effects of the intervention at the different follow-up measurements. Therefore, the models have to be extended with time (i.e., two time dummy variables) and the interaction between the intervention variable and time (see Eqs. 3.9 and 3.10). Output 3.16 and 3.17 show the results of the two analyses. In the first analysis, there is no adjustment for the baseline value, while in the second analysis, the baseline value of the outcome is added to the model.

Output 3.16
scheme 16

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time

Output 3.17
scheme 17

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time, adjusted for the baseline value

As has been mentioned before, in Outputs 3.16 and 3.17, the regression coefficient for the intervention variable indicates the effect of the intervention at the reference time point, which is the first follow-up measurement. Without an adjustment for the baseline value, the intervention effect at the first follow-up measurement equals −0.0139447, while with an adjustment for the baseline value, the effect estimate equals −0.1036192. The difference illustrates nicely the importance of the adjustment for the baseline value, i.e., the adjustment for the baseline differences between the two groups.

As for all analyses with an interaction term, based on the analyses performed, it is possible to calculate the effect estimates at the other two follow-up measurements. Therefore, the regression coefficient for the interaction between the particular time dummy variable and the intervention variable has to be added to the regression coefficient for the intervention variable itself. For instance, the intervention effect at the second follow-up measurement based on the analysis of changes with an adjustment for the baseline value (Output 3.17) equals −0.1036192 + −0.040448 = −0.1440672. Although Outputs 3.16 and 3.17 can be used to calculate the effect estimates at the different time points, they cannot be used to calculate the standard errors of these estimates, and therefore they can also not be used to calculate the 95% confidence intervals around the effect estimates and the corresponding p-values. To obtain the 95% confidence intervals and p-values, the analyses have to be redone with different reference categories for the time dummy variables. Outputs 3.183.21 show the results of these analyses, both without and with an adjustment for the baseline value.

Output 3.18
scheme 18

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time with the second follow-up measurement as reference time point

Output 3.19
scheme 19

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time with the third follow-up measurement as reference time point

Output 3.20
scheme 20

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time with the second follow-up measurement as reference time point, adjusted for the baseline value

Output 3.21
scheme 21

Results of the longitudinal mixed model analysis of changes including an interaction between intervention and time with the third follow-up measurement as reference time point, adjusted for the baseline value

3.5 Overview and Discussion

Table 3.7 shows an overview of the results obtained from the different analyses in order to estimate the overall intervention effect on average over time, while Table 3.8 shows an overview of the results obtained from the different analyses in order to estimate the effect of the intervention at the different follow-up measurements.

Table 3.7 Overview of overall effect estimates on average over time, 95% confidence intervals (CI), and p-values obtained from the different analyses
Table 3.8 Overview of effect estimates at the different follow-up measurements, 95% confidence intervals (CI), and p-values obtained from the different analyses

From Tables 3.7 and 3.8, it is obvious that the effect estimates differ remarkably between the different methods used to estimate the effect of an RCT with more than one follow-up measurement. This is partly caused by the observed differences at baseline between the groups. In Table 3.1 it could be seen that the baseline value for the intervention group was lower than the baseline value for the control group (3.25 for the intervention group and 3.47 for the control group). Because of that, the decrease over time in the intervention group is (much) harder to achieve than the decrease over time in the control group. The control group tends to decrease over time due to regression to the mean, while the intervention group tends to increase over time due to regression to the mean. Because of that the analysis of changes without adjustment for the baseline leads to an underestimation of the intervention effects. The repeated measure analyses on the other hand lead to an overestimation of the effect estimates. In these analyses the differences between the groups at baseline are part of the estimated differences between the groups, i.e., are part of the effect estimates. Because the baseline differences between the groups are in favor of the intervention group (the intervention group has a lower complaint score at baseline than the control group), the effect estimates, which include the baseline difference, are (highly) overestimated.

It was already mentioned in Chap. 2 that regarding the adjustment for the baseline value, it does not matter whether the outcome variable is the observed value at the different follow-up measurements (i.e., longitudinal analysis of covariance) or the changes between the baseline measurement and the follow-up measurements (i.e., analysis of changes); the effect estimates are exactly the same in both methods. The mathematical equivalence between the two methods leading to the same estimation of the treatment effect was already explained in Chap. 2 (see Box 2.1).

Although the general idea is the same, the results of the alternative repeated measures analysis without the treatment variable in the model (Eqs. 3.4 and 3.5) slightly differed from the results of the longitudinal analysis of covariance. The advantage of the alternative repeated measures analysis is that also subjects with only a baseline measurement are included in the analysis. So, in the present example, the two analyses are based on a slightly different population. However, also when the method is used in a dataset without any missing data, the results of the alternative repeated measures analysis are not exactly the same as the results obtained from a longitudinal analysis of covariance. This is caused by the adjustment for the dependency of the repeated observations within the subject by adding a random intercept to the model. In the repeated measures analysis using all measurements as outcome, this random intercept variance is mostly a bit higher than in the longitudinal analysis of covariance. In the latter, part of the random intercept variance is explained by the baseline value of the outcome which is included in the model. However, in the present example, this is not the case. Another difference between the alternative repeated measures analysis and the longitudinal analysis of covariance is that the standard errors of the effect estimates are a bit lower in the alternative repeated measures analysis. This has to do with the fact that the alternative repeated measures analysis includes more observations in the analysis. In the alternative repeated measures analysis, all four measurements are used as outcome, while in the longitudinal analysis of covariance, only the three follow-up measurements are used as outcome. The lower standard error in the alternative repeated measures analysis is, however, maybe invalid, because the observations at baseline are not related to the intervention. And although the inclusion of too many observations is counteracted by the correlation between the repeated measurements (Twisk, 2013, 2018), it still leads to a slight underestimation of the standard error.

3.6 Recommendation

To estimate an intervention effect in an RCT with more than one follow-up measurement, the analysis has to be adjusted for the baseline value of the outcome variable. A proper adjustment is not achieved by performing a standard repeated measures analysis with the baseline value as part of the outcome variable or by the analysis of changes without adjusting for the baseline value. It is advised to use either a longitudinal analysis of covariance (or its mathematical equivalent, analysis of changes with an adjustment for the baseline value) or an alternative repeated measures analysis.

3.7 Should the Analysis Be Adjusted for Time?

In the literature there is some discussion whether the analysis to obtain the overall effect of the intervention on average over time should be adjusted for the time variable. Some researchers believe that the time variable should always be part of the model. The main argument for this is that there is always a development over time in the outcome variable. So, time is related to the outcome, and, therefore, the analysis should be adjusted for the time variable. Although the first part of this argument is true, mostly there is a development over time in the outcome variable, and it should be realized that adding a variable to a regression model can have an influence on the regression coefficient of interest only when the variable is related to both the outcome and the independent variable. In this case, the time variable is related to the outcome, but not to the independent variable. In a regular RCT, the intervention and control groups are measured at the same time points, so there is no relationship between the intervention variable and time. Therefore, the adjustment for the time variable in the analysis to obtain the overall effect of the intervention on average over time does not make sense.

3.8 Alternative Repeated Measures for the Analysis of an RCT with One Follow-Up Measurement

The alternative repeated measures analysis (i.e., the mixed model analysis with both the baseline and the follow-up measurements as outcome and without the intervention variable as independent variable) can also be used in the example with only one follow-up measurement (see Table 2.2). Output 3.22 shows the results of this analysis performed on the example with only one follow-up measurement for total serum cholesterol (Output 3.22a) and the physical activity index (Output 3.22b).

Output 3.22a
scheme 22

Results of the alternative repeated measures mixed model analysis for total serum cholesterol in the example with only one follow-up measurement

Output 3.22b
scheme 23

Results of the alternative repeated measures mixed model analysis for the physical activity index in the example with only one follow-up measurement

The two effect estimates can be directly derived from the outputs of the alternative repeated measures analyses. For total serum cholesterol, the effect estimate equals −0.141 with a 95% confidence interval ranging from −0.296 to 0.014 and with a corresponding p-value = 0.07. For the physical activity index, the effect estimate equals 0.308 with a 95% confidence interval ranging from 0.130 to 0.486 and a corresponding p-value <0.001. In Chap. 2, the effect estimates of this example were based on an analysis of covariance, and they were respectively −0.137 and 0.347. The (small) differences are due to the fact that in the alternative repeated measures analysis, subjects with only a baseline value are included in the analysis, while in the analysis of covariance, they are not. Therefore, the number of observations analyzed with the two methods is different. From Output 3.22 it can be seen that the number of subjects included in the two alternative repeated measures analyses were respectively 299 and 297, while in Chap. 2 (Outputs 2.1 and 2.2) it could be seen that the number of subjects analyzed in the longitudinal analysis of covariance was equal to 222 and 217, respectively.