Introduction

Time-to-event outcomes have often been the primary endpoint in clinical trials, with health-related quality of life (HRQoL) relegated to a secondary focus. However it is now well known that attention must be paid to both “quantity” and quality of life. Generally, HRQoL data are collected during trials through standardized questionnaires completed by patients at selected time points of clinical interest (e.g., baseline, and before and after treatment).

HRQoL and time-to-event analyses are usually done separately using linear mixed models (LMMs) and survival models, respectively. This kind of approach may however lead to inefficient or biased results [1,2,3]. In particular, when studying the HRQoL longitudinal outcome, dropouts caused by events or simply nonresponse might be observed. When the dropout mechanism depends on unobserved longitudinal HRQoL measurements, analysis based on an LMM alone can be misleading [2, 4]. Thus, to overcome this problem, it is necessary to jointly model the HRQoL outcome and the risk of dropout through a joint modeling framework. A joint model (JM) is composed of two submodels: a model for the time-to-event outcome (e.g., a proportional hazard model) and a model for the longitudinal outcome (e.g., an LMM) linked together though shared random effects.

Furthermore, the LMM used to model the HRQoL outcome is usually a random intercept and slope model, assuming a linear trajectory over time. This oversimplification can lead to wrong or simplistic findings. As an example, a linear trajectory will not be able to capture nonmonotonic evolution, typically, a deterioration during the treatment phase followed by the patient’s health improvement.

Currently, the analysis of HRQoL in clinical trials is often brief and simplistic and ignores the occurrence of dropouts. In a previous article [5], a simulation study demonstrated the benefit of using a JM rather than an LMM alone when death might interrupt observation of the HRQoL outcome, resulting in an informative dropout. The two models compared assumed linear trajectories for the HRQoL outcome. However, this assumption could not hold on actual data, resulting in unsatisfactory results. In this article, through the application of the models to data from a clinical trial of patients with advanced esophageal cancer, the aim is to compare two JMs in terms of interpretation of the results, graphical representation, and goodness of fit: a JM assuming a linear trajectory of the HRQoL outcome and a spline-based JM allowing for a more flexible trajectory.

In this article, we first present the clinical trial, analyze completion of the questionnaires to explain why a JM should be used rather than an LMM, and study the trajectories of the HRQoL outcome to highlight the fact that a random intercept and slope LMM might not be the best option. Second, we detail the two models and their assumptions; we also detail how to make a choice between the two. Finally, we discuss our findings.

Motivating dataset

The clinical trial PRODIGE5/ACCORD17

PRODIGE 5/ACCORD 17 (NCT00861094) was a multicenter, randomized, open-label, parallel-group, phase 2–3 clinical trial comparing two treatment regimens of definitive chemoradiotherapy: FOLFOX (the combination of oxaliplatin and fluorouracil with leucovorin) versus fluorouracil and cisplatin. A total of 267 patients (134 in the experimental arm, and 133 in the control arm) with esophageal cancer were enrolled between October 2004 and August 2011 [6, 7]. Progression-free survival (PFS) was the primary endpoint of this study, and no significant difference was highlighted between the two treatment groups. Overall survival (OS) and HRQoL were also included as secondary endpoints. Just as for PFS, no significant difference in OS was found between the groups. HRQoL was assessed by means of the European Organisation for Research and Treatment of Cancer core quality-of-life questionnaire (QLQ-C30 version 3.0 composed of 30 items grouped into five functional scales, nine symptoms scales and a global health status/HRQoL scale) [8] and the esophagus-specific questionnaire (QLQ-OES18 composed of 18 items for ten symptoms scales) [9] at inclusion, during treatment (at 1.25 and 3 months), at the first evaluation of treatment efficacy (month 4), and during follow-up (at 6, 12, 24, and 36 months). In general, a high score for a functional scale and the global health status/HRQoL scale represents a high level of functioning/HRQoL while, for symptom scales, it represents a high level of symptomatology. The secondary objective was mostly based on QLQ-C30 scales: of global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), and pain (PA) but also included dysphagia (OESDYS) from the QLQ-OES18 scale. Hereafter, all analyses are based only on these dimensions. Of note, unlike the symptom scales of the QLQ-C30 (i.e., FA and PA), a high OESDYS score represents low symptoms.

HRQoL questionnaire completion

At each assessment, several scenarios of completion are possible; a questionnaire can be fully completed, partially completed or entirely noncompleted. Accordingly, depending on which items have been completed or not, one score might be missing and another might not. Hereafter, all analyses are performed separately for each scale, making no distinction between missing score data (i.e., whether due to fully noncompleted questionnaire or item non-response). Barplots in Fig. 1 show the percentages of completion at each visit for each of the five scales of interest. The same conclusions can be drawn for both arms, namely a decrease in completion over time (in yellow in Fig. 1). Noncompletion across visits gives rise to two types of missing data. In the first, patients do not fill in some items or the entire questionnaire at particular visits and then fill it in again at later times; this corresponds to intermittent missing data (in blue in Fig. 1), assumed to be missing at random in our analysis. In the second type, patients do not fill in some items or the entire questionnaire at a particular visit and then do not complete any further items/questionnaires; this corresponds to monotone missing data (in gray in Fig. 1), considered to be missing not at random in our analysis. Hereafter, we use the term dropout to refer to monotone missing data, whether there is an actual dropout or simply nonresponse without any associated event being identified. Those dropouts, which can be informative (i.e., linked to the HRQoL), have to be taken into account to correctly model the HRQoL, hence the importance of a joint modeling approach as outlined previously.

Fig. 1
figure 1

QLQ-C30 questionnaire completion over time by arm on global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), pain (PA), and dysphagia (OESDYS) scales. (Color figure online)

HRQoL over time

For each dimension studied, individual trajectories by arm were plotted (Fig. 2). LOESS (locally estimated scatterplot smoothing) curves were also drawn by treatment groups to get a better idea of the mean trajectories of the observed scores. The same trend was observed for every dimension, namely an HRQoL deterioration (i.e., decrease in QL, PF, and OESDYS, increase in PA and FA) from baseline (t = 0) to the final day of radiotherapy or on day 1 of the last chemotherapy cycle (t = 1.25 months/3 months), then an improvement (i.e., increase in QL, PF, and OESDYS, decrease in PA and FA) up to 6 months, and finally a plateau until the end. Therefore, HRQoL trajectories, whatever the dimension considered, do not seem to follow a linear trend; it therefore appears to be relevant to consider a more flexible modeling approach allowing nonlinear trajectories.

Fig. 2
figure 2

Individual trajectories and LOESS (locally estimated scatterplot smoothing) regression smoothing over time by arm on global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), pain (PA), and dysphagia (OESDYS)

Joint model

As mentioned previously, generally a JM is composed of two submodels: an LMM for the longitudinal outcome and a survival model. In the two considered JMs, the general form of the HRQoL score of patient \(i\) at time \(t\), \({Y}_{i}\left(t\right)\), is:

$$\begin{array}{c}\left\{\begin{array}{c}{Y}_{i}\left(t\right)={Y}_{i}^{\star }\left(t\right)+ {\varepsilon }_{i}(t) \\ {Y}_{i}^{\star }\left(t\right){= \beta }^{T}{X}_{i}\left(t\right)+{b}_{i}^{T}{Z}_{i}(t)\\ {b}_{i}\sim N\left(0,D\right), {\varepsilon }_{i}(t)\sim N\left(0,{\sigma }^{2}\right)\end{array}\right.\end{array}$$
(1)

where \({X}_{i}\left(t\right)\) and \({Z}_{i}(t)\) are the respective time-dependent design matrices for the fixed and random effects, \(\beta\) and \({b}_{i}\), containing the covariates values for each subject (rows representing patients and columns variables), and \({\varepsilon }_{i}(t)\) is the error terms, also time-dependent. Roughly the fixed effects have the same interpretation as in a linear regression and characterize the mean score trajectory over time [i.e., \({\mathbb{E}}\left({Y}_{i}\left(t\right)\right){= \beta }^{T}{X}_{i}\left(t\right)\)], and the random effects \({b}_{i}\) describe the deviations for each patient \(i\) from the mean trajectory, therefore, combining with the fixed effects, depict the individual trajectories. We assume that \({b}_{i}\) and \({\varepsilon }_{i}(t)\) are independent and normally distributed with mean zero and variance–covariance matrice/variance \(D\) and \({\sigma }^{2}\), respectively. Each parameter is defined in more details thereafter in Sections “Linear HRQoL trajectory” and “Spline-based HRQoL trajectory”, according to each LMM. Finally, \({Y}_{i}^{\star }\left(t\right)\) denotes the true value of the longitudinal outcome at time point \(t\).

This term was included in a proportional hazards model used for the time-to-dropout:

$$\begin{array}{c}{h}_{i}\left(t\right)={h}_{0}\left(t\right)\mathrm{exp}\left\{\gamma {arm}_{i}+\alpha {Y}_{i}^{\star }(t)\right\}\end{array}$$
(2)

where \({h}_{0}(t)\) represents the baseline hazard function (describing the instantaneous risk of dropout all covariates being equal to zero), \({arm}_{i}\) the arm factor equals 1 if patient \(i\) belongs to the experimental arm and 0 if patient \(i\) belongs to the control arm, \(\gamma\) its corresponding effect and \(\alpha\) the parameter of association that quantifies the effect of the current HRQoL value on the risk of dropout.

Linear HRQoL trajectory

In the first model, we assume that the HRQoL score follows a linear trajectory over time using a random coefficient model, as is commonly done in clinical trial settings:

$$\begin{array}{c}\left\{\begin{array}{c}{Y}_{i}^{\star }\left(t\right)={(\beta }_{0}+{b}_{0i})+{(\beta }_{1}+{b}_{1i})t+{\beta }_{2}({arm}_{i}\times t) \\ \left(\genfrac{}{}{0pt}{}{{b}_{0i}}{{b}_{1i}}\right)\sim N\left(0, D\right), D= \left(\begin{array}{cc}{\sigma }_{{b}_{0}}^{2}& {\sigma }_{{b}_{0}{b}_{1}}\\ {\sigma }_{{b}_{0}{b}_{1}}& {\sigma }_{{b}_{1}}^{2}\end{array}\right) \end{array}\right.\end{array}$$
(3)

where the fixed intercept \({\beta }_{0}\) represents the mean score at inclusion (\(t\) = 0), the fixed slope \({\beta }_{1}\) the score change by unit of time in the control arm, the interaction effect \({\beta }_{2}\) the difference between the slopes of the experimental and control arms (\({\beta }_{1}+{\beta }_{2}\) being the slope in the experimental arm), and the random intercept \({b}_{0i}\) and random slope \({b}_{1i}\) the individual deviations from the fixed intercept and fixed slope, respectively. Of note, no arm effect was added to the model, as HRQoL at baseline was broadly similar between the two arms due to randomization.

Spline-based HRQoL trajectory

In the second model, we suppose that the HRQoL trajectory is not linear over time, since, as can be seen in Fig. 2, this assumption seems to be wrong. To gain flexibility, a model using natural splines was investigated:

$$\left\{\begin{array}{c}{Y}_{i}^{\star }\left(t\right)={(\beta }_{0}+{b}_{0i})+{(\beta }_{1}+ {b}_{1i}) {B}_{n}\left(t, { \lambda }_{1}\right) +{(\beta }_{2}+ {b}_{2i}) {B}_{n}(t, { \lambda }_{2})+ {(\beta }_{3}+ {b}_{3i}) {B}_{n}\left(t, { \lambda }_{3}\right)\\ + {(\beta }_{4}+ {b}_{4i}) {B}_{n}\left(t, { \lambda }_{4}\right)+ {\beta }_{5} \left\{{B}_{n}\left(t, { \lambda }_{1}\right){arm}_{i}\right\} + {\beta }_{6} \left\{{B}_{n}\left(t, { \lambda }_{2}\right){arm}_{i}\right\} \\ +{ \beta }_{7} \left\{{B}_{n}\left(t, { \lambda }_{3}\right){arm}_{i}\right\} +{ \beta }_{8} \left\{{B}_{n}\left(t, { \lambda }_{4}\right){arm}_{i}\right\}\\ \left(\genfrac{}{}{0pt}{}{\begin{array}{c}{b}_{0i}\\ {b}_{1i}\end{array}}{\begin{array}{c}{b}_{2i}\\ {b}_{3\mathrm{i}}\\ {b}_{4i}\end{array}}\right)\sim N\left(0, D\right), {\text{with}}~D~{\text{unstructured}} \end{array}\right.$$
(4)

where \({\{B}_{n}\left(t, { \lambda }_{k}\right);k=1, 2, 3, 4\}\) represents a B-spline basis matrix for a natural cubic spline with three internal knots placed at 1.25, 3, and 6 months. Based on the observations made in Section “HRQoL questionnaire completion”, these time points correspond to observed changes in trend. The fixed intercept \({\beta }_{0}\) still represents the mean score at inclusion, the fixed effects \({\beta }_{1}\), \({\beta }_{2}\), \({\beta }_{3}\), and \({\beta }_{4}\) govern the score trajectory in the control arm, the fixed effects \({\beta }_{5}\), \({\beta }_{6}\), \({\beta }_{7}\), and \({\beta }_{8}\) allow the trajectory in the experimental arm to be different from the one in the control arm, and the random effects \({b}_{0i}\), \({b}_{1i}\), \({b}_{2i}\), \({b}_{3i}\), and \({b}_{4i}\) allow for individual deviations from the mean trajectories.

A spline function is a piecewise polynomial function, the time variable being divided into distinct intervals (using the knots) where pieces join smoothly. Therefore, its general formulation is not a linear combination of \(\{\mathrm{intercept}, t,{t}^{2}, \dots \}\), as in a usual polynomial, but a set of functions depending on time and these intervals. As a consequence, spline coefficients \({\beta }_{1-8}\) cannot be interpreted directly, in contrast to those from the linear model, in which it is simply a linear combination of the intercept and t in each arm. In particular, a more restrictive model, usually used for inference purpose, is much more interpretable than a flexible model for which prediction is the aim; this representing the tradeoff between flexibility and interpretability [10]. Therefore, for this analysis, more importance has to be given to the predicted trajectories for instance.

Arm-by-time interaction effect on HRQoL

Evaluating the arm-by-time effect in the linear trajectory JM is a straightforward task; attention has simply to be paid to the interaction term coefficient estimate (i.e., \(\widehat{{\beta }_{2}}\) in Eq. 3). However, for the spline-based JM, the interpretation of the spline coefficients is not that simple; however the arm-by-time interaction effect can be tested using a log-likelihood ratio (LR) test with \({H}_{0}: {\beta }_{5}={\beta }_{6}={ \beta }_{7}={ \beta }_{8}=0\) vs \({H}_{1}: {\beta }_{5}\ne 0\) or \({\beta }_{6}\ne 0\) or \({\beta }_{7}\ne 0\) or \({\beta }_{8}\ne 0\) (cf Eq. 4).

Joint model assumptions

For the survival submodel, martingale residuals plots were used to check the excess observed events and the chosen functional form (identity in our case) for the time-dependent covariate (current true score value in our case) and the Cox-Snell residuals to assess about the overall goodness-of-fit of the submodel [4, 11]. For the longitudinal submodel, two types of widely used residuals in mixed models were explored; subject-specific (conditional) residuals for checking the homoscedasticity and normality assumptions, and marginal (population averaged) residuals to investigate misspecification of the mean structure \({\beta }^{T}{X}_{i}\left(t\right)\) and validate the assumptions for the within-subjects covariance structure [4, 12, 13]. However, a problem arises when using these residuals in the context of joint modeling. Indeed, scores might be missing from a given time point (i.e., patients might drop-out) which, in turn, affects the use of residuals based on the observed data alone [4, 14, 15]. To overcome this nonrandom dropout issue, a multiple imputation approach was performed (10 imputations) to obtain scores that would have been observed if patients had not dropped out of the study. The residuals produced from the augmented longitudinal data were then used to check the JM assumptions by means of the usual diagnostic plots. More details about JM assumptions diagnostic can be found in Supplementary materials—Section 1.

Application

All analyses were performed in a modified intent-to-treat population; all patients with at least one available HRQoL score were included. Thus, the population of analysis was not the same across the five dimensions explored: QL (experimental arm: N = 130; control arm: N = 122), PF (experimental arm: N = 131; control arm: N = 123), PA (experimental arm: N = 131; control arm: N = 123), FA (experimental arm: N = 131; control arm: N = 123), and OESDYS (experimental arm: N = 129; control arm: N = 118). The event of interest considered in the survival submodel of the JM was the dropout defined as the last visit for which a score is available (right-censored time in case of an available score at the last planned visit), and the follow-up ended with the occurrence of one of these events.

All analyses were performed with R version 4.0.3 [16] using the JM package [17] for the joint modeling approach and the splines package for the spline-based modeling strategy. R codes are available from the first author under reasonable request. Main models implementation can be found in Supplementary materials—Section 2, more details in [4].

Comparison of linear and spline-based joint models

JMs for linear and spline-based HRQoL trajectories were applied to the five scales: QL, PF, PA, FA, and OESDYS. The logarithm of the baseline hazard function was approximated by B-splines, using four knots placed at the quantiles of the event times. For each JM, seven Gauss-Hermite pseudo-adaptive quadrature points were used to approximate the integrals over the random effects. Tables 1 and 2 give an overview of the estimates obtained from these two strategies.

Table 1 Parameter estimates with 95% confidence intervals (95% CI) of the linear joint model applied to PRODIGE5/ACCORD17 HRQoL data in five dimensions: global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), pain (PA), and dysphagia (OESDYS)
Table 2 Parameter estimates with 95% confidence interval (95% CI) of the spline-based joint model applied to PRODIGE5/ACCORD17 HRQoL data in five dimensions: global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), pain (PA), and dysphagia (OESDYS)

Concerning the survival submodel, no arm effect \(\gamma\) was identified whatever the model and/or the dimension, meaning that the risk of dropout did not differ from one arm to the other, which is consistent with Fig. 1. The association effect between the longitudinal outcome and the risk of dropout \(\alpha\) was significant for PF and FA in the linear JM (\({\widehat{\alpha }}_{\text{PF}}\) = − 0.020 [− 0.031; − 0.010], p < 0.001 and \({\widehat{\alpha }}_{\text{FA}}\) = 0.013 [0.004; 0.023], p = 0.005); this was confirmed in the spline-based JM (\({\widehat{\alpha }}_{\text{PF}}\) = − 0.017 [− 0.025;-0.009], p < 0.001 and \({\widehat{\alpha }}_{\text{FA}}\) = 0.012 [0.005; 0.018], p < 0.001)—this means a decrease in PF score/increase in FA score corresponded to an \(\mathrm{exp}(- \widehat{\alpha })\)-fold increase in the risk of dropout. For PA and OESDYS, no association effect was significant either in the linear or in the spline-based JM. However, for QL, no significant association effect was highlighted in the linear model (\({\widehat{\alpha }}_{\text{QL}}\) = − 0.011 [− 0.027; 0.005], p = 0.162), while in the spline-based model a decrease in QL score was found to be associated with an increased risk of dropout (\({\widehat{\alpha }}_{\text{QL}}\) = − 0.027 [− 0.041; − 0.014], p < 0.001).

Concerning the longitudinal outcome submodel, coefficient estimates cannot be compared directly (except \(\widehat{{\beta }_{0}}\)), since the interpretation of the regression coefficients is different across models. Moreover, as noted previously in Section “Spline-based HRQoL trajectory”, spline coefficients \({\beta }_{1-8}\) cannot be interpreted directly, that’s the reason why they are not presented in this article. Thus, the comparison was mainly based on a graphical representation of the predicted mean HRQoL trajectories conditionally to the arm (i.e., \({\mathbb{E}}\left({Y}_{i}\left(t\right)\mid {arm}_{i}\right)\), Fig. 3), which illustrates the goodness of fit of each model considering the mean trajectories obtained from the raw data (Fig. 2). Visually, linear trajectories showed an increase in QL and OESDYS and a decrease in PF, PA, and FA scores over time; trajectories by arm being equal at baseline without any possible intersect thereafter. Regarding the arm-by-time interaction effect, none was found whatever the dimension. With nonlinear trajectories, the trend was the same as that shown in Fig. 2. First, a deleterious effect was observed in both arms (stronger in the experimental arm, FOLFOX toxicities are known to be considerable), namely a decrease in PF, QL, and OESDYS as well as an increase in PA and FA scores between baseline and month 1.25 or 3 (i.e., on the final day of radiotherapy or on day 1 of the last chemotherapy cycle). Then, HRQoL improved and stabilized during follow-up for the five dimensions. Arm trajectories intersected around month 6, after which better HRQoL was observed in the experimental arm. Of note, confidence intervals became wider over time, making estimations less precise. This uncertainty can be explained by the discontinued collection of the HRQoL scores due to dropouts. Unlike the linear JM, in which the arm-by-time interaction term \({\beta }_{2}\) was not significant for any dimension (Table 1), the spline-based JM found an arm-by-time interaction effect for QL, PF, PA and OESDYS (LR test \({p}_{\text{QL}}\) = 0.040, \({p}_{\text{PF}}\) = 0.017, \({p}_{\text{PA}}\) = 0.003 and \({p}_{\text{OESDYS}}\) < 0.001) and a trend for FA (\({p}_{\text{FA}}\) = 0.076) highlighting between-arms differences in HRQoL score trajectories.

Fig. 3
figure 3

Predicted mean score trajectories with confidence bands by arm for the joint models (JM) assuming linear or spline-based trajectories on global health status/HRQoL (QL), physical functioning (PF), fatigue (FA), pain (PA), and dysphagia (OESDYS)

Tables 1 and 2 also present Akaike information criterion (AIC) and Bayesian information criterion (BIC) values for each model. The AIC always chose the spline-based model over the linear one (\({\Delta {\text{AIC}}}_{\text{QL}}=31.78\), \({\Delta {\text{AIC}}}_{\text{PF}}=127.60\), \({\Delta {\text{AIC}}}_{\text{PA}}=43.10\), \({\Delta {\text{AIC}}}_{\text{FA}}=130.07\), and \({\Delta {\text{AIC}}}_{\text{OESDYS}}=87.01\)). The BIC, meanwhile, preferred the linear model for QL and PA (\({\Delta {\text{BIC}}}_{\text{QL}}=-31.75\) and \({\Delta {\text{BIC}}}_{\text{PA}}=-20.57\)) and the spline-based model for PF, FA and OESDYS (\({\Delta {\text{BIC}}}_{\text{PF}}=63.93\), \({\Delta {\text{BIC}}}_{\text{FA}}=66.40\), and \({\Delta {\text{BIC}}}_{\text{OESDYS}}=23.84\)), which might be explained by a more conservative penalty in the BIC that will always tend to select simpler models (i.e., those containing fewer parameters).

Joint model assumptions

According to Figs. S1–S5 in the Supplementary materials, for the longitudinal outcome submodel a random behavior around zero was observed for the multiply imputed residuals for all dimensions regardless of the model used, indicating compliance with the assumptions mentioned in Section “Joint model assumptions”. For all Cox-Snell residual plots, the survival function estimate was around the unit exponential distribution; the models are thus considered to fit the data well. However, more attention must be paid to the martingale residuals. Across all martingale residual plots, the same trend was highlighted, namely a deviation from zero for low longitudinal responses observed for PF, PA, and FA (high responses for QL and OESDYS); that this phenomenon was much less pronounced for the spline-based JM suggests that the functional form for the time-dependent covariate (i.e., HRQoL score over time) might be more appropriate in this model.

Discussion

In this article, through the application to five HRQoL dimensions using data from the PRODIGE 5/ACCORD 17 clinical trial, we demonstrated that, despite the use of a spline-based JM that might appear more complicated than a linear JM at first sight, the interpretation of the results, is not necessarily so. On the contrary; by revealing all the information that can be extracted, such a model can facilitate the interpretation of the results and remove some of the inconsistencies that may arise from use of the usual models.

Indeed, by modeling more flexible trajectories of the longitudinal outcome, the spline-based JM, in contrast with the linear JM, allowed detection of between-arm differences in HRQoL score trajectories and association effects between the longitudinal outcome and the time-to-event. In our application, the spline-based JM identified significant arm-by-time interaction effects for almost all dimensions (a trend for FA) and a significant association effect for QL, while these effects were not significant using the linear JM. The flexible JM also gave a more reliable and precise representation of the HRQoL score trajectories, while the linear specification of the model made the score increase/decrease necessarily constant and made the beneficial effect of the experimental treatment grow throughout the whole period (the trajectories by arm being equal at baseline without any possible intersect thereafter). By contrast, the spline-based JM highlighted a nonconstant evolution of the scores and variations in the apparent benefit from one arm to the other (with trajectories intersecting once or twice). Moreover, when we checked the validity of the JM assumptions, martingale residuals suggested that the functional form of the HRQoL scores across time in the spline-based JMs might be more appropriate.

However, some aspects of applying a flexible JM might constitute limitations. For instance, in spline-based JMs, a choice had to be made concerning the number and location of the knots. Our choice was based on the trend seen in the observed mean trajectories plot but also on clinical considerations. Three knots were chosen; on the final day of radiotherapy (t = 1.25 months), on day 1 of the last chemotherapy cycle (t = 3 months), and on the first day of follow-up (t = 6 months). We used the AIC to check that using fewer knots would not have been as or more efficient [18]. Thus, three knots were retained in all models, which seemed to be the best compromise between flexibility and overfitting. Of note, there were between 155 and 318 observations in the intervals defined through these knots.

Conclusion

In the literature, the choice of the functional form of a continuous variable has been widely addressed in Sauerbrei et al. articles [19, 20], Harrell’s book [18] or Royston and Sauerbrei’s [21]. However, from our knowledge, few articles deal with a joint modeling approach using splines in an HRQoL data context [22,23,24]. In Li et al. [22], a semiparametric JM for terminal trend of HRQoL has been used, meaning that the time is counting backward from the time of death. In Yang et al. [23], the association structure between survival and longitudinal processes is of different nature and a cure model has been used for the survival part. Finally, the model in Terrin et al.’s article [24] is the closest from ours but focus is given to the survival part and not the longitudinal part. As discussed in Sauerbrei et al. [19], a linear relationship between the outcome and continuous variables (HRQoL score across time in our case) is usually the default choice, and flexible modeling techniques (i.e., using fractional polynomials, splines) are underused. However, the choice of the functional form for a variable turns out to be important, as it affects the validity of the model and the statistical significance of the variable [19]. This article illustrates this statement in a joint modeling framework but can be extrapolated to other types of models, in particular to LMMs.

In this paper, we have given a lot of arguments favoring the use of a spline-based JM to study HRQoL score trajectories in a clinical trial context. As each database is unique, here are some guidelines, with some references alongside, for helping to choose the most appropriate model:

  1. 1.

    Look at missing values. If a lot of monotone missing HRQoL data is to be counted, the use of a JM that uses the last visit as the time-to-event might be preferable to use of an LMM alone [5].

  2. 2.

    Choose the HRQoL score trajectory model carefully (whether a JM is used or not, these recommendations are still valid):

    1. a.

      Plot the observed mean trajectory (e.g., using LOESS regression smoothing) to get an idea of the degree of flexibility needed:

      1. i.

        If the relationship between the HRQoL outcome and time seems to be linear, a random intercept and slope model is the best choice, as it gives a better interpretability of the model parameters;

      2. ii.

        Otherwise, a more flexible model is needed, using fractional polynomials, splines, or simply by adding, for example, a quadratic term [18, 19, 21].

    2. b.

      Inspect the residual plots to validate the model’s assumptions, and if these are not respected make the appropriate changes and/or look at the results with a critical eye [4, 21].

    3. c.

      If hesitating between several models, consider looking at the AIC or BIC.