Introduction

Regression models with change points have a long history [14], and with the availability of accessible computing power, experienced a resurgence that led to numerous theoretical/statistical advances making the applicability of the piecewise linear regression splines easy and straightforward [511]. Piecewise linear regression splines found their applicability in fields ranging from epidemiology to genetics [12, 13].

Patient-reported health-related quality-of-life (HRQoL) measures have the potential to act as drivers for better organization and improvement of health care [14, 15].

Despite widespread usage of self-assessed HRQoL measures, there is no consensus in the scientific community on how to best obtain clinically and economically meaningful effect measures from them. The premise is that an intervention will likely change the self-assessed HRQoL of the patients. However, the degree of change may depend on the HRQoL prior to the intervention. Patients with poor preoperative HRQoL might experience substantial improvement, while patients with higher preoperative self-assessed HRQoL might show only a minute increase or even stagnation.

A common use of the generic EuroQol 5 dimension HRQoL questionnaire (EQ-5D) [16] is to compare populations over time and to gauge the cost effectiveness of medical treatments [17]. When measuring HRQoL with the EQ-5D index, this effect may be exaggerated by the floor and ceiling of the EQ-5D index. According to the British time-trade-off (TTO) value set, the index cannot go beyond −0.594 and 1.000 [18]. Researchers who aim to understand the relationship between HRQoL measurements prior to and after clinical interventions have to assure that the predicted scores are in the appropriate range.

In this paper, we explore the use of piecewise linear regression splines for the analysis of HRQoL measures, with specific interest in EQ-5D index profiles. Regression models with change points assume that the structural relationship between the outcome and the predictor will differ in various patient groups within the dataset [19, 20]. Application of piecewise linear regression splines has the advantage of fitting separate regression equations to different subsets of the data. Providing a vivid description of changes in HRQoL measures will aid clinicians and researchers trying understand the influence of an intervention on patient-reported HRQoL. We illustrate the applicability of our proposed method with pre- and post-operative EQ-5D indices of 36,625 Swedish patients treated with total hip arthroplasty (THA).

Methods

Data collection and measures

In Sweden, patients undergoing THA are routinely monitored by a patient-reported outcome measures (PROMs) programme run by the Swedish Hip Arthroplasty Register. Patients are contacted preoperatively by their clinic and 1 year post-operatively by mail to complete the surveys. Response rates are 86 % preoperatively, and 90 % of the patients who return the preoperative questionnaire participate at the one-year follow-up as well [21]. In order to achieve high response rates, the questionnaire is kept short comprising a visual analogue scale for hip pain, the Charnley classifier for musculoskeletal comorbidity [22, 23], and the HRQoL measure EuroQol 5 dimension HRQoL questionnaire (EuroQolGroup 1990). The EuroQol 5 dimension HRQoL questionnaire consists of five dimensions measuring mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. With increasing severity, there are three answer options for each dimension: (1) no problems, (2) moderate problems, and (3) extreme problems. This yields 243 unique health states which can be used to calculate a single, weighted health utility index where 1 is perfect HRQoL, 0 represents death, and scores <0 are health states worse than death. Patients who respond that they have ‘extreme problems’ in all five dimensions receive the lowest possible score of −0.594. Different valuation sets are used to adjust for cultural response pattern differences. Lacking a specific Swedish value set, the Swedish Hip Arthroplasty Register uses the British TTO value set to calculate the EQ-5D index in their population. The EuroQol 5 dimension HRQoL questionnaire is commonly utilized in the clinical setting as a general health measure because it is short, easy for patients to interpret, and has been validated in over 150 languages and is frequently used to conduct cost effectiveness analyses. There are extensive published works confirming the validity and reliability of this tool in clinical and research settings [2426].

Data from 36,625 primary THA patients with a diagnosis of primary osteoarthritis performed between 2002 and 2011 with complete preoperative and one-year post-operative PROMs were selected from the Swedish Hip Arthroplasty Register. Each patient contributed two EQ-5D indices one before and one a year after total hip arthroplasty.

Primary osteoarthritis is the main reason for THA surgeries, accounting for around 82 % of cases. Facture patients who account for around 10 % of the cases were excluded as they lack preoperative PROM values. We included patients with the most common surgical techniques. If a patient had arthroplasty on both hips during the study period, only the first operation with complete PROMs was selected. The exclusions were made to reduce residual confounding, and these data were selected for illustrative purposes. Ethical review approval was obtained from the Central Ethical Review Board in Gothenburg, Sweden (decision 293-13).

Model formulation

The effect of preoperative EQ-5D index values on the post-operative EQ-5D index values were explored with four regression equations. We considered four models: intercept only null model, single line regression, and segmented regressions with 1 and 2 change points. The post-operative EQ-5D index served as the outcome for the regression modelling, while the preoperative EQ-5D index was the predictor.

Generally, there are two possible situations with which the relationship between the outcome and predictors can be best described with segmented models; a continuous piecewise model with regression lines having different slopes connected at unknown change points or a discontinuous regression line that jumps at change point values.

In the following, we restrict our attention to the one change point regression equation, where the outcome is modelled as

$$y_{i} = \left\{ {\begin{array}{*{20}c} {\beta_{0}^{(1)} + \beta_{1}^{(1)} x_{i} + \varepsilon_{i1} \quad if \quad x_{i} < r} \\ {\beta_{0}^{(2)} + \beta_{1}^{(2)} x_{i} + \varepsilon_{i2} \quad if \quad x_{i} > r} \\ \end{array} } \right.$$
(1)

where r is the unknown change point, and β (1)0 β (1)1 and β (2)0 β (2)1 are the intercept and slope for equations on the left and right side of the change point, respectively. Each of the model parameters (change point, intercept, and slope) was estimated from the data.

If we assume that the relationship between the outcome and predictor is continuous, then the following restriction is imposed

$$\beta_{0}^{(1)} + \beta_{1}^{(1)} r = \beta_{0}^{(2)} + \beta_{1}^{(2)} r$$
(2)

If the continuous model is assumed, then Eq. 1 can be written as

$$y_{i} = \beta_{0} + \beta_{1}^{*} x_{i} + \beta_{r}^{*} (x_{i} - r)I(x_{i} < r) + \varepsilon_{i}$$
(3)

where β 0 = β (1)0 , β *1  = β (1)1 , β * r  = β (2)1  − β (1)1 , and β (2)0  = β (1)0  + r(β (1)1  − β (2)1 ). In this case, I(x i  < r) is an indicator function which takes a value of 1 if the condition is met; otherwise, it is 0. The superscript asterisk (*) denotes a model parameter that is defined as the difference between the two slopes. For the matrix notation, we refer the reader to the Supplementary materials.

Parameter estimation and statistical inference require the definition of a loss function, either likelihood or least squares. Conditional on the change points the density function for the outcome y i will be

$$f( \cdot |{\varvec{\upbeta}}_{r} ) = f_{1} ( \cdot |{\varvec{\upbeta}}_{r} )^{I(x < r)} f_{2} ( \cdot |{\varvec{\upbeta}}_{r} )^{I(x > r)}$$
(4)

and the likelihood function will be

$$\ell ({\varvec{\upbeta}}_{r} ) = \sum\limits_{i = 1}^{n} {\left\{ {I(x < r){ \ln }f_{1} ( \cdot |{\varvec{\upbeta}}_{r} ) + I(x > r){ \ln }f_{2} ( \cdot |{\varvec{\upbeta}}_{r} )} \right\}}.$$
(5)

The density functions are assumed to be Gaussian. Parameter estimates are given by \({\varvec{\upbeta}}_{r}\) that maximize the log-likelihood. There is no single best method for optimization; the choices are influenced by the assumed structural form whether continuous or not [19]. We chose the linearization proposed by Muggeo [27] which does not assume constant variance across regions. The Muggeo’s linearization provides simultaneous parameter estimation and statistical inference, and its estimates are more accurate than the other possible alternatives [19]. It should be noted that the EQ-5D index data violate the normality assumption, thus in small samples, extra caution is warranted. Computer intensive methods such as bootstrapping might be useful to consider with this kind of data.

Model selection

Model selection was based on the Bayesian Information Criterion that penalizes additional parameters with a factor of ln(n), where n is the sample size. Once the working model was selected, the preoperative index values were used to calculate the range of predicted post-operative index values as a means to describe how well the model fit the data.

Model validation

To ascertain whether the model would accurately predict patient outcomes, we utilized a validation procedure based on the 0.632 bootstrap [28, 29]. A total of 1,000 bootstrap resamples were drawn from the original dataset. The 0.632 bootstrap assumes a resampling probability of 1–1/n where n is the sample size. The models are refitted on the bootstrapped data with the original coefficient estimates. Differences in R 2 values between the original and resampled data denote over-optimism in the estimation. If this over-optimism is substantial, we have an indication of over-fitting when the fit of a complex model cannot be adequately replicated in subsequent samples or resamples. For additional details, we refer the reader to the book by Harrell [30].

Adjustment

To see if the estimated change point withstands adjustment for confounders, we adjusted the model for two background variables (age and gender), a clinical variable (surgical approach) and the preoperative Charnley classifier for musculoskeletal comorbidities. Preoperative EQ-5D index and age were modelled as piecewise linear splines.

Software

Calculations were run using the R computing environment R 3.0.1, R Core Team 2013 [31], and we used the ‘segmented’ package for change point estimation [27, 32].

Results

Of the 36,625 patients studied, 30,807 (84.1 %) showed improved post-operative EQ-5D indices, 2,557 (6.9 %) did not experience any change, while 3,261 (9.0 %) deteriorated in HRQoL. Regression modelling and model selection based on BIC suggested that the two-line regression provided the best fit for the data (Table 1). The estimated change point was at EQ-5D index value of 0.159 (95 % CI 0.135–0.182) as illustrates in Fig. 1. We did not observe any support for a three-line regression and estimation of two change points failed, and the algorithm did not produce parameter estimates.

Table 1 Parameters of the three competing models considered for modelling the changes in post-operative EQ-5D index as a function of the preoperative index values
Fig. 1
figure 1

Change point estimation by minimizing the residual sum of squares with the estimated change point of a two-segment piecewise linear regression spline and the associated 95 % confidence intervals

For the purpose of reporting, individuals with a preoperative EQ-5D index of 0.159 or less were classified as having low preoperative HRQoL (n = 13,157 (35.9 %)) and those over 0.159 as having high preoperative HRQoL (n = 23,468 (64.1 %)).

In the low preoperative group (with preoperative EQ-5D index values from −0.594 to 0.159), the predicted post-operative EQ-5D index values ranged from 0.368 to 0.765, with post-operative EQ-5D index values increasing 0.528 (95 % CI 0.480–0.576) points for each unit in the preoperative EQ-5D index values. In the high preoperative group (initial range from 0.159 to 1.000), the predicted post-operative EQ-5D index values ranged from 0.765 to 0.855, increasing 0.106 (95 % CI 0.091–0.121) points for each unit in the preoperative EQ-5D index values (Fig. 2). Table 2 summarizes the predicted values for the post-operative EQ-5D index values based on preoperative EQ-5D index values with simple linear regression and piecewise linear regression splines. The increment was set at 0.1, and the expected increase with 0.1 units of change in preoperative EQ-5D index by the simple linear regression was 0.018. The expected increase with 0.1 units of change in preoperative EQ-5D index by piecewise linear regression splines was 0.052 for values under 0.159 and 0.01 for values 0.159 and higher.

Fig. 2
figure 2

Piecewise linear regression spline modelling of pre- and post-operative EQ-5D indices with the associated 95 % confidence intervals. The relationship is best described by a two-segment regression equation joined at a preoperative EQ-5D index change point of 0.159

Table 2 Predicted post-operative EQ-5D index values and associated 95 % confidence intervals based on simple linear regression and by two-segment piecewise linear regression splines

The predictive power of the model was 6.59 %. The model validation procedure did not identify problems with over-fitting. The mean predictive power of the model in the training set was 6.60 %, while in the validation set 6.55 %, indicating an over-estimation of 0.75 %.

Adjustment for important confounders did not induced substantial change in the estimated change point and resulted in minor changes in the slope coefficients. BIC indicated that a model with one change point for the EQ-5D index at 0.158 (95 % CI 0.133–0.183) and one change point for age at 68.98 years (95 % CI 66.73–71.23) offered the best fit (Table 3). The model’s predictive power was 12.67 %. This was largely due to the inclusion of Charnley classification (partial R 2 = 4.69 %), despite significance age, gender, and surgical approach had negligible influence.

Table 3 Parameters of the multivariate model with post-operative EQ-5D index as the outcome and preoperative EQ-5D index as the exposure

Discussion

In this paper, we demonstrate that piecewise linear regression splines provide a better model fit and better predictive power relative to simple linear regression when HRQoL measures such as the EQ-5D index in a nationwide cohort of THA patients is analysed. The selected model accounts for the fact that the same post-operative effect is not expected through the entire range of the preoperative EQ-5D indices. An added benefit of the different slopes is that predicted post-operative scores based on the preoperative scores are more likely to be in the appropriate range. Thus, piecewise linear regression splines are more likely to conform to the floor and ceiling effect that simple linear regression cannot, although they do not have the ability to explicitly consider these limits. Alternatives like nonlinear least squares regression consider not only explicitly the existence of the limits but also the ability to estimate their value from the data [33, 34]. However, these models were derived for modelling growth rates and disease progression, so the relationship that they assume between the predictor and outcome might not be proper for HRQoL studies. Censored or truncated regression models may also consider the existence of floor and ceiling values [35, 36] which can be set by the researcher, yet they assume a constant effect throughout the whole range of outcome scores from the predictor. Regression splines such as restricted cubic splines do not necessarily consider the existence of floor and ceiling values, but they will likely conform to the data well. The main drawback of such splines is the lack of presentable and interpretable effect measures. Nonlinear splines are routinely reported in graphical form, thus hampering between-study comparisons and or meta-analyses. An added challenge with nonlinear splines is the lack of generally accepted and proven statistical routines for statistical inference. Classic tests such as the ANOVA are hard to use and interpret with splines, and Information Theoretic Criteria (AIC or BIC) might not guard satisfactorily against artefacts.

Piecewise linear regression splines have attractive mathematical/statistical characteristics allowing researchers to interpret the change point as a distinguishing characteristic between patient groups preoperatively. This change point might be abrupt, nonlinear splines might offer a smoother transition point. However, only piecewise linear regression splines can exactly estimate the location of such points when the structural relationship between predictor and outcome changes. In this paper, we focused on the relationship between preoperative and post-operative values of the same measure. As we illustrated with age, this method also fits well with other exposures. Albeit a very weak relationship between age and post-operative EQ-5D index, we were able to identify that patients aged over 69 years benefited less from THA than younger patients. Considering further measures such as BMI and applying them to large cohorts might lead to the determination of important thresholds that have the potential to aid in clinical decision-making.

Analysis of change scores (post-operative value minus the preoperative value) with preoperative values as a predictor still prevails in the literature; however, they can lead to inaccurate results because in this method, the change score is negatively correlated with the preoperative data [37, 38].

Piecewise linear regression splines can be applied as a stand-alone statistical analysis, and interpretation of the regression coefficients can offer essential insights. In this analysis, we show that the slopes of the regression lines represent the influence of preoperative HRQoL on post-operative recovery and general wellbeing. The coefficient of determination (R 2) tells how much of the variation in the post-operative EQ-5D index was due to the preoperative HRQoL. In our case, around 6.5 % of the variation in post-operative EQ-5D index was explained by preoperative HRQoL. This suggests that factors other than preoperative HRQoL also need to be considered. Adjusting the model for age, gender, surgical approach, and preoperative musculoskeletal comorbidity (Charnley classification) increased the coefficient of determination, mostly due to the addition of Charnley classification.

Often, the purpose of studying HRQoL is to compare alternative treatments or approaches. In most cases, patients cannot be randomly allocated to different treatments types as they would be in a randomized clinical trial. Thus, to be able to compare alternative treatments for the same condition, we need to conduct proper risk adjustments [39] and the better we describe the relationship between the outcome and predictors, the less likely we are to induce bias in the parameter of interest [40]. Piecewise linear regression splines offer a simple and easy to interpret approximation of this unknown relationship. Alternatives, such as nonparametric regression or generalized additive models, might provide a better mathematical fit. However, we believe that simplicity and ease of interpretation of the piecewise linear regression outweigh the gains in the goodness of fit of the aforementioned methods. It is intuitive that patients who report poor preoperative HRQoL (those with low preoperative EQ-5D index scores) have more room for improvement on the EQ-5D index scale after treatment. Contrastingly, patients who report ‘no problems’ in many of the five dimensions before treatment have little to no capacity for improvement when measured by the EQ-5D index. Thus, even if the perceived improvement in HRQoL is substantial and the operation was life changing, this improvement will induce little to no change in the EQ-5D index. By dividing the patients into two (or more) groups, we can have a better understanding of the benefits of a given treatment.

Conclusions

Piecewise linear regression splines are a useful and practical approach to analysing and reporting HRQoL measures such as the EQ-5D index. The main appeal of segmented models lies in the easy interpretability of the influence of the preoperative HRQoL on post-operative scores in different patient subgroups. Piecewise linear regression splines may provide a reasonable approximation of the shape of the underlying pre- and post-operative HRQoL relationship where the threshold and slopes prove informative and meaningful [32].