Introduction

Patient-reported outcomes (PROs) collected for clinical registries are increasingly playing a role in improving health outcomes by generating records of patients’ “real-world” experiences [1,2,3,4]. PROs provide information obtained directly from patients about the effects of a health condition and its management, and include their quality of life, the impact of a disease state on daily living, symptom information, and treatment satisfaction [5]. The use of PROs in atrial fibrillation (AF) is especially relevant because its management is focused on reducing symptoms and the risk of stroke [6, 7].

Studies suggest that the routine collection of PROs may help clinicians evaluate the responses of patients to treatment and tailor their interventions [8, 9]. However, most clinical and evaluation studies that have routinely collected PROs focus on the notion of an average trajectory, which assumes that all individuals in a given population follow the same pattern of change. However, previous studies have shown substantial intra- and inter-individual diversity in the type and severity of AF people experience, its risk factors and treatment [10,11,12]. For example, sex-related differences in the clinical course of AF have been observed; women (including those with new-onset AF) seem to be more symptomatic and have poorer quality of life than men at baseline, and these differences persisted throughout follow-up [13,14,15]. Other factors such as age, education, comorbidities, and treatment strategy likely influence changes in outcomes over time [16]. The identification of multiple PRO trajectories rather than one “average” trajectory is more likely to capture the variability within the patient population over time. This specificity could have important clinical implications. For some patients, the variability in their symptoms and responses to therapy (or lack thereof) could indicate a need for more aggressive treatment [17]. More broadly, certain treatment strategies may not be appropriate depending on a patient’s trajectory [18].

One approach to capturing patterns of change in PROs over time is growth mixture modeling (GMM). GMM differs from basic growth models in longitudinal analysis (e.g., multilevel and latent growth models) because it assumes that a population is not adequately represented by one trajectory. GMM works by grouping individuals who share similar patterns in their measurement scores over time into previously unidentified subgroups or “latent classes.” These latent classes are derived from the data and represented as probabilities, with each individual receiving a fractional membership in all of the identified latent classes to reflect varying degrees of precision in the classification [19]. Identifying multiple unique PRO trajectories for the latent classes might provide insights about the diversity of patients’ outcomes, inform the tailoring of interventions for different subgroups, and guide program evaluations for heterogenous clinical populations [20, 21].

The aims of this study were to (1) identify latent subgroups of outpatients with AF based on their PRO trajectories and (2) identify factors that predicted their “membership” in different group trajectories. The factors potentially associated with PRO trajectories were grounded in the Wilson and Cleary conceptual framework [22], which has been widely applied to different patient populations and clinical settings to explore patients’ health and quality of life [23].

Methods

Study design

This was a retrospective cohort study of outpatients who had been referred to AF clinics in a province in western Canada between 2008 and 2016 and who provided data for a clinical registry. In the region, patients with new-onset AF or who required anticoagulation therapy were generally referred to a cardiologist, and patients who required more complex AF management were referred to an electrophysiologist for ablation consideration. Common treatment features of the participating clinics included their focus on education, enhancing patient participation in treatment selection, and close collaboration among a multidisciplinary team [24]. PROs were routinely collected by staff in paper format or by mail on repeated visits (the maximum in the dataset being 10 visits over 5 years). The Cardiac Services BC (CSBC) Registry was established to coordinate, monitor, evaluate, and fund cardiovascular disease-related treatment services. CSBC provided data from five of these multidisciplinary AF clinics for the purposes of this study, including patient PROs, their demographic, clinical, and medication information, the interventions they received, and their outcomes.

Data source

Information obtained from the registry was linked to provincial administrative health data. These data sources included: (1) the CSBC AF Clinic Registry database [25], (2) the Consolidation files [26], (3) Hospital Separations [27], (4) the Medical Services Plan payment files [28], (5) PharmaNet files [29], and (6) Vital Statistics—Deaths [30].

Measures

From the linked registry database, patient characteristics were specified according to the Wilson and Cleary framework adapted for the cardiac population [31] (see Fig. 1).

Fig. 1
figure 1

Measures associated with the conceptual framework

For treatment, patients were identified as having received ablation (i.e., atrioventricular node ablation, pulmonary vein isolation ablation, or the maze procedure) based on the Canadian Classification of Health Interventions coding assignment [32]. Patients were classified as having been prescribed anticoagulation therapy based on the medications reviewed in the clinics. The timing of ablation and anticoagulation therapy was treated as a separate independent binary variable based on the time intervals of the multiple patient-provided information inputs (between 6 months to 1 year, 1 year to 1.5 years, 1.5 to 2 years, and more than 2 years).

For biological function, the CHADS2 (recorded at the initial consultation) was used to estimate the risk of stroke [33]. The final score (0 to 6) is derived based on stroke risk factors including a history of stroke or transient ischemic attack, congestive heart failure, hypertension, or diabetes, and age ≥ 75 years. For individual and environmental characteristics, age, gender and distance to the clinic (dichotomized to living ≥ or < 100 km away) were obtained from the linked registry.

For self-reported health, the Atrial Fibrillation Effect on QualiTy-of-life (AFEQT) PRO questionnaire was used [34]. It is composed of 20 items, with a seven-point Likert-type response scale that assess four domains: symptoms (4 items), daily activities (8 items), treatment concerns (6 items), and treatment satisfaction (2 items). The analyses were conducted on the patients’ summary score that incorporates the responses of the first three domains. The AFEQT questionnaire has good internal consistency reliability (Cronbach’s alpha coefficient of > 0.88 for all of its domains) [34], and has been widely used in clinical research to compare differences in treatment outcomes [35, 36] and to evaluate models of care delivery [37]. In this study, the internal consistency reliability was evaluated between the initial consultation and second follow-up visit (individually varying), and ranged between 0.89 and 0.92 for symptoms, 0.97 and 0.98 for daily activities, and 0.91 and 0.94 for treatment concerns (based on 20 imputed datasets). At the AF clinics, patients were provided the opportunity to complete the questionnaire during their visit or to have it mailed to their homes. Completion of the questionnaire varied throughout the follow-up period depending on the complexity of the patients’ management, the clinic wait times, and individual patients’ decisions about whether to complete the questionnaire.

To further describe the patient population and coexisting comorbidities, the Charlson Comorbidity Index [38], prescribed AF medications, completed cardiac procedures, and the Canadian Cardiovascular Society Severity in Atrial Fibrillation Scale [39] (CCS-SAF) were obtained and included in the analyses.

Study population

The study sample was based on pre-determined eligibility criteria and information provided by clinicians. The initial study population included 16,525 patients who had been referred to the AF clinics between 2008 and 2016. Those who were not linked to the Population Directory (n = 52) or who did not have Personal Health Numbers (n = 80) were excluded. We further excluded those not registered with the Medical Services Plan (n = 30). Among the remaining 16,362 patients, we excluded those without initial consultation dates (n = 2902) as clinicians advised that these would have been patients with inappropriate referrals. We further limited the study sample by excluding those who did not have an initial consultation date between 2008 and 2016 (n = 347), resulting in a total of 13,113 eligible patients who met our inclusion criteria. The exclusion of patients who did not complete at least one PRO assessment (n = 5674) resulted in a study sample of 7439 patients. For this sample, PRO data were available for 4040 patients at T0 (initial consultation), 4412 at T1 (first follow-up visit), 1285 at T2 (second follow-up visit), and 689 patients who had more than two follow-up visits (see Fig. 2). The PRO data in the registry were linked to the administrative health data by matching the patients’ unique identifiers.

Fig. 2
figure 2

Flow chart of analysis cohort. T0 = initial consultation; T1 = first follow-up visit (individually time-varying); T2 = second follow-up visit (individually time-varying)

Statistical analyses

All data sources were housed within the Secure Research Environment of Population Data BC. SAS [40] was used to calculate the longitudinal Charlson Comorbidity Index; R [41] was used to prepare, clean, and describe the data; and Mplus version 8.3 [42] was used to model the PRO trajectories.

The patients’ demographics and clinical characteristics are presented with descriptive statistics. To assess for selection bias, respondents who had completed at least one questionnaire and non-respondents who did not complete any questionnaires were compared. Among the respondents, odds ratios were estimated to identify any differences between their status at their initial consultation (limited to those seen only once) and at their first follow-up visit (after having had an initial consultation). To address missing data, we used multilevel multiple imputation to account for the longitudinal data structure, in which the repeated collection of PROs (level 1) was nested within individual patients (level 2). All available auxiliary variables were added to increase the accuracy of the imputed values. To address the unbalanced data structure, we used full information maximum likelihood, which does not impute missing values but uses all information from included variables to compute parameter estimates [42]. Based on published recommendations [43], we created 20 imputed datasets for modeling.

For the longitudinal analysis, GMMs [44] were used to identify latent classes of patients with different health status trajectories and to account for the individually varying times and numbers of observation. All growth parameter estimates and posterior probabilities (i.e., the probability of assigning individuals to latent classes) were obtained using the expectation–maximization algorithm (with robust standard errors). We limited the analysis to three time points (T0, T1, and T2) because very few people chose to complete beyond this period. We followed the recommended guidelines [45] using three GMM parameterizations: (1) unrestricted random effects, (2) restricted random effects (random intercepts only and no covariances), and (3) restricted random effects plus an autoregressive structure (AR1) (see Appendix for details).

To avoid the problem of label switching across multiple imputed datasets [46], starting values were used for the analyses across the imputed datasets for both the overall and the class-specific models. The confidence in the final solution was evaluated based on several statistical fit indices, including the Akaike’s information criterion (AIC), the Bayesian information criterion (BIC), and the sample-size adjusted BIC (SABIC). Other evaluation criteria included entropy, which is a summary index of the accuracy of latent class assignments. Conventional SEM goodness-of-fit statistics and mixture model statistical comparison tests were not available because the slope values varied across individuals [47].

To identify the predictors of class membership, a three-step multinomial regression approach was used. This approach involved first estimating the GMM using only latent class indicator variables without covariates, and using the resulting posterior probabilities to determine the most likely latent classes were created. In the final step, multinomial logistic regression was used on the predictor variables of the latent classes while adjusting for classification uncertainty [48]. For variable selection, univariate analyses of each of the predictor variables were conducted [49]. To better interpret the odds ratios, the distributions of each of the predictors within each of the classes were estimated. The BCH procedure [50, 51] has been shown to be more robust with varying sample sizes and entropy levels compared with other methods; however, since the BCH procedure was not available for multiple imputed datasets, the means and the standard errors were averaged across the 20 imputed datasets using Microsoft Excel®.

To evaluate the fit of the GMM with the predictors, a separate multinomial logistic regression analysis was conducted in IBM® SPSS [52]. This approach is limited in that the classes were treated as being discrete (i.e., not taking entropy into account) and thus should be cautiously interpreted. The overall model was evaluated with Nagelkerke’s R2 statistic.

Results

Among the 13,113 eligible patients, 7439 (56.7%) patients completed at least one questionnaire during follow-up. We found that patients who did not respond differed from those who did by living further away from their respective clinics, being at higher risk for stroke (CHADS2), being more symptomatic (CCS-SAF) and more likely to have been ablated (see Table 1).

Table 1 Characteristics of respondents and non-respondents during study period (2008–2016)

The respondents consisted of 2897 women (38.9%) and 4542 men (61.1%) mostly in the 60 and older age category (72.9%). The majority of the patients lived less than 100 km from their AF clinic (83.5%). Based on CHADS2, most patients were at some risk of stroke (65.3%). Hypertension was the most frequent comorbidity (n = 2620; 35.2%) followed by heart failure (n = 1164; 15.6%). Of the interventions performed before the initial clinic consultation, the most frequent was cardioversion (n = 2001; 15.5%) followed by ablation (n = 453; 6.1%). Most of the patients received anticoagulation therapy (n = 4614; 62.0%).

There were some differences in the characteristics of the patients seen only at the initial consultation and of those with at least one follow-up visit. The latter were more likely to be older (≥ 76 years of age), live further away from the clinic, less symptomatic (CCS-SAF ≥ 1), less likely to have had certain comorbidities (including heart failure) and more likely to have had ablation and anticoagulation therapy. These patients were also less likely to score in the AFEQT first and second quartile (see Table 2).

Table 2 Characteristics of study sample (N = 7439)

The results of the three GMM parameterization models are provided with the mean fit indices and entropy values, which were averaged across the 20 imputed datasets (see Table 3).

Table 3 Mean likelihood and information criteria for the growth mixture models

The 3-class restricted standard model had the smallest information criteria (i.e., BIC, AIC, and SABIC) and largest entropy value (0.66), suggesting it was more appropriate than the 2-class unrestricted model. The results of the 3-class restricted standard model are plotted in Fig. 3.

Fig. 3
figure 3

Three-class trajectory model of self-rated health status (AFEQT Scores)

The most common trajectory (63.6%) was of patients who had AFEQT scores that averaged 51.4 points at the initial consultation and whose self-reported health gradually improved at each follow-up. This class was labeled “poor but improving.” The second most common trajectory (27.7%) was of patients that started with higher baseline AFEQT scores of 79.1 points, on average, but showed little improvement (change) over time. This class was labeled “good and stable.” The least common trajectory (8.6%) showed that some patients’ initial baseline AFEQT scores were very high at 95.0 points on average, with little change through the follow-up period. This class was labeled “excellent and stable” health. The mean time (years) elapsed between each follow-up visit varied for the three trajectory classes, ranging from 0.8 to 0.9 years between the initial consultation and the first follow-up and from 1.2 to 1.7 years between the first and second follow-up.

Figure 4 shows the relative frequencies of the predictors of latent class membership with odds ratios and 95% confidence intervals. The “poor but improving” health group (27.4%) had a greater likelihood of being in the younger age category (less than 60 years of age with reference to 76 or older) compared with the “good and stable” (26.7%) and “excellent and stable” (26.8%) groups (OR = 1.66, 95% CI [1.20–2.31] and OR = 1.65, 95% CI [1.03–2.65], respectively). The “poor but improving” health group also had a greater likelihood of being female (44.8%) compared with the “good and stable” (29.6%) and “excellent and stable” (25.2%) groups (OR = 2.15, 95% CI [1.74–2.65] and OR = 2.71, 95% CI [2.04–3.59], respectively). For the CHADS2 scores, the “poor but improving” group had a greater likelihood of having higher stroke risk (mean = 1.25, SD = 0.01) compared with the “good and stable” (mean = 1.04, SD = 0.02) and “excellent and stable” (mean = 1.00, SD = 0.04) groups (OR = 1.29, 95% CI [1.15–1.45] and OR = 1.22, 95% CI [1.06–1.42], respectively).

Fig. 4
figure 4

Relative frequencies by class with adjusted odds ratios for predictors of latent class membership

There were group differences in the time intervals when ablation therapy was performed. Between 6 months to 1 year after their initial consultation, the “poor but improving” health group had a greater likelihood (10.3%) of having received ablation therapy compared with the “good and stable” (2.9%) and “excellent and stable” (6.0%) groups (OR = 3.51, 95% CI [2.04–6.05] and OR = 1.80, 95% CI [1.10–2.96], respectively). Similarly, more than two years after their initial consultation, the “poor but improving” group (8.8%) had a greater likelihood of having received ablation therapy compared with the “good and stable” (2.4%) and “excellent and stable” (3.2%) groups (OR = 3.67, 95% CI [2.01–6.70] and OR = 3.24, 95% CI [1.63–6.42], respectively). With respect to anticoagulation therapy, the “poor but improving” group (76.2%) had a greater likelihood of having received anticoagulation therapy within the first zero to six months after the initial consultation compared with the “good and stable” (62.4%) and “excellent and stable” (60.3%) groups (OR = 1.40, 95% CI [1.13–1.74] and OR = 1.89, 95% CI [1.44–2.47], respectively). The final multivariate model explained 7.6% (Nagelkerke R2) of the variance in group membership.

Discussion

To our knowledge this is the largest longitudinal PRO study of people with AF to date. We had comprehensive descriptive information from a real-world outpatient clinical setting about patients who completed AFEQT questionnaires as well as those who did not respond. Although we found differences in non-respondents versus respondents, the major finding is that several different health trajectories were identified with different patient characteristics. A key advantage of using GMM is the potential to identify and assess differences in health trajectories, which could lead to tailored subgroup-specific interventions and enhanced patient education strategies. Clinicians could use this information to enter into discussions with patients about the anticipated outcomes of their treatment and to be guided in shared decision-making. For individuals with higher AFEQT scores, it may be beneficial to set realistic expectations as they may be unlikely to experience substantial improvement in their symptoms. For individuals with low baseline AFEQT scores, it may be beneficial to discuss possible improvements associated with their condition with treatment but also acknowledge the variability in their health status they will experience, and the durability of the benefits they may reap over the long term. However, it is important to note that while severe symptoms are expected to adversely affect patients’ health and quality of life, the absence of symptoms does not automatically equate to an optimal state [53]. Likewise, a reduction in the frequency and duration of AF may not automatically improve symptoms and quality of life [10]. It is therefore important to consider the uniqueness of the individual patient when assessing PROs in light of the multifaceted nature of AF.

Another notable finding is that the two identified trajectories (“good and stable” and “excellent and stable” health groups) reached close to a “ceiling effect” where further potential improvement could not be observed with the measurement tool used. The importance of selecting an appropriate PRO measure is highlighted based on previous studies showing that measurement scores may improve over time regardless of treatment efficacy [54, 55], which may explain the low sensitivity of certain PRO measures to detect changes associated with treatment. This information may have implications for the design of clinical trials related to AF. More focused AF-specific symptom assessment instruments, such as the Mayo AF-Specific Symptom Inventory (MAFSI) [56], may be more informative because they more directly reflect changes in rhythm status.

Although the assessed risk factors associated with group membership had little explanatory power, our study highlights considerations of broader individual and environmental characteristics associated with PROs. For example, based on an integrative literature review of factors associated with AF and the self-reported health of AF patients [57], exercise, physical activity, alcohol use, sleep, employment, psychological profile, and financial burden/income could also be considered as predisposing factors in the prediction of group membership.

Limitations

This study has some limitations. Since the findings are based on voluntary patient participation in the collection of PROs, there were both questionnaire and item level nonresponses. Patients who chose to participate and respond to the questionnaire may not be representative of the larger community-based population of patients who did not respond and who were sicker than those who did. This may partly explain why we did not observe a class with a deteriorating health trajectory. To address some of the non-response bias, we used the latest missing data techniques; nonetheless some degree of bias likely persisted. In addition, caution is warranted in generalizing the health status trajectories to a broader population of patients with AF because the results may be specific to a unique cohort of patients seen in specialized multidisciplinary clinics and who provided PRO data.

Conclusion

In summary, we found that instead of a single health trajectory in this population of AF outpatients, there were three health trajectories with different patient characteristics associated with them. The use of GMM can provide insights with respect to longitudinal class differences and, thus, can provide more nuanced understandings of the variability in the magnitude of change in symptoms and perceived health status, which could enhance patient education strategies and inform the development of tailored interventions for heterogeneous clinical populations. In particular, the significance of varying times of treatment was noteworthy; an improved understanding of its role could better inform when to tailor interventions in an effort to improve patients’ health and quality of life. Future studies should explore the extent to which other predisposing factors affect model estimation and improve the overall predictive value of the characteristics and factors that determine patients’ trajectories.