FormalPara Key Points for Decision Makers

Classification and regression trees (CART) provide a methodology to highlight how and why variation between practice and guidelines occurs, not just that an evidence-practice gap exists.

Prescribing practices for lipid-lowering medications do not follow absolute risk guidelines or eligibility criteria for subsidisation by the Pharmaceutical Benefits Scheme. There are potentially significant gains from clarifying best-practice prescribing, to promote either greater adherence to guidelines or increased clinical freedom.

Big data techniques such as CART are applicable to a wide range of healthcare applications, including those where big data are absent.

1 Introduction

Physicians employ a range of risk assessment strategies when making prescribing decisions for primary prevention of cardiovascular disease (CVD), including assessment against thresholds on individual CVD risk factors and assessment of total or absolute risk (AR) of cardiovascular (CV) events [1, 2].

Internationally, many clinical practice guidelines recommend calculation of AR of CV events, with lipid-lowering medication (typically statins or statins in combination with another drug) recommended for patients evaluated as high risk.Footnote 1 In Australia, an AR approach is recommended by the National Vascular Disease Prevention Alliance (NVDPA) [5] and the National Heart Foundation (NHF) [6]. However, the Australian Government’s universal drug insurance scheme, the Pharmaceutical Benefits Scheme (PBS), limits the subsidising of these medicines using eligibility criteria based on individual risk factors such as diabetes and cholesterol.

In practice, several studies suggest that clinicians deviate from guidelines and/or eligibility criteria [7, 8], perhaps in response to care-seeking behaviour from patients or other aspects of patient preference [1, 2]. However, Bonner et al. [1] noted that while CVD risk management is not consistently based on AR, “little is known about … and the alternative strategies employed when AR is not the focus of assessment”. Similarly, few studies have formally tested whether different prescribing thresholds are being applied in different patient groups, and many have struggled to characterise the complexity of prescribing practice [1, 9].

The purpose of this paper was to employ classification and regression trees (CART) to analyse the prescribing patterns of Australian general practitioners (GPs) of lipid-lowering medication for the primary prevention of CVD. CART is a machine-learning ‘big data’ technique that has been shown to be particularly valuable when analysing nonlinear relationships and interactions, where it can outperform standard regression models for classification [10]. We aimed to use CART to improve our understanding of clinical practice, potentially identifying prescribing thresholds and patient subgroups missed by traditional analyses, and to demonstrate how CART can be useful for understanding complex treatment decisions in healthcare.

2 Methods

2.1 Data

We used linked survey and administrative data from the AusHeart Study, a cluster-stratified, cross-sectional survey of CVD risk management in primary care fully documented elsewhere [7, 11, 12]. The study enrolled GPs from across Australia, who recruited 15–20 consecutively presenting adults aged 55 years or older. It gathered information on patient socioeconomic factors, CVD risk factors, prescribed medications, and the GP’s own estimation of the patient’s AR of a CV event within the next 5 years [7].

For consenting patients, these data were linked to Medicare administrative data containing records of all pharmaceuticals purchased under the PBS from 1 March 2008 to 1 January 2010 [12]. To avoid complications associated with prior exposure to medication, we reduced the dataset to 1903 patients who had not been prescribed lipid-lowering medication prior to GP recruitment.Footnote 2 We developed models to classify these patients according to prescription/nonprescription of any lipid-lowering medication (Anatomical Therapeutic Chemical code C10) during a 1-year period.

2.2 Classification and Regression Trees (CART) Methodology

CART sorts observations into increasingly homogeneous subgroups [13]. At each step, CART splits observations using a simple decision rule (e.g. if total cholesterol exceeds 7.0 mmol/L, then prescribe medication) chosen to minimise diversity (with respect to the binary outcome or classification) in right and left ‘child nodes’. Branches and nodes are added until a stopping criteria is met and the tree terminates in ‘leaves’ or ‘bins’ containing proportions of correctly and incorrectly classified observations [10].

There are three distinct strengths of CART that make it particularly applicable to analysing complex decision-making processes such as those employed in clinical practice. First, the hierarchical structure of CART models is often more intuitive than traditional regression models because it mimics the heuristics of decision making [14, 15]. Second, CART can outperform standard regression models when predicting outcomes in the presence of nonlinear relationships and interactions [10]. In clinical practice, treatment decisions may depend on nonlinear thresholds with respect to one or more risk factors, and thresholds may vary with other risk factors. For example, PBS guidelines allow prescribing of statins for patients with hypercholesterolemia (>9 mmol/L) [16]. This drops to >5.5 mmol/L if the patient has diabetes [16].Footnote 3 Third, CART affords the data greater freedom to speak for themselves [20]. Whereas regression models are refined by comparing across a limited number of possible specifications, CART performs an exhaustive search over all possible cut-points and predictors [10]. As a result, the precise form of the relationship between a predictor and outcome is not delimited by the inclusion/exclusion of higher order terms. It is this strength that has seen CART used in a variety of prognostic analyses to identify risk thresholds for in-hospital mortality [21], vertebral fractures [22] and cirrhosis [23].

However, CART is subject to a number of limitations. CART “… tends not to work very well if the underlying relationship is linear” [10]. A second limitation of CART is the risk of overfitting [24, 25]. Finally, CART can be prone to instability. Small differences in the training data can lead to very different trees [26]. We manage these limitations in the methodology below.

2.3 Using CART to Understand Prescribing in Cardiovascular Disease (CVD)

We used a three-stage approach to construct the CART. In CART-1, we limited predictors to patient sociodemographics (age, sex, Aboriginal/Torres Strait Islander, household income and education level) and GP-estimated 5-year AR of a CV event. This provided a benchmark for which to compare performance. In CART-2, we added individual risk factors (smoking, body mass index, systolic and diastolic blood pressure, low- and high-density lipoprotein cholesterol, total cholesterol, triglycerides, kidney disease, diabetes, CVD history, weekly exercise and self-reported health) “… to determine whether cardiovascular risk factors might have an additional influence on prescribing beyond their contribution to [GP-estimated] cardiovascular risk” [27]. Finally, in CART-3 we added AR, estimated using the 1991 Framingham risk equations. Framingham AR forms the basis of the NHF 2004 and NVDPA 2008 guidelines. If GPs adopt NHF or NVDPA guidelines, we would expect the addition of Framingham AR to improve the predictive validity of the CART, and to see cut-off thresholds and a hierarchy similar to the guidelines (described in "Appendix 1").

We implemented CART using the Matlab fitctree function [28]. Gini’s diversity index was used as the default splitting criterion, as suggested by Breiman et al. [24], and we compared model performance under entropy splitting to check model robustness. In our default models, variables with missing data still enter the model, but training uses only valid values. In prediction, an observation with a missing value is assigned to the largest split group. An alternative method for dealing with missing data in CART is to find ‘surrogate’ variables by applying CART with the missing data as the dependent variable [28]. We checked model performance under these two methods to test robustness, and used tenfold cross-validation to indirectly evaluate out-of-sample performance.Footnote 4 We bootstraped the cross-validation 100 times to describe the distribution of mean out-of-sample error and receiver operating characteristic (ROC) area under the curve metric. We pruned the CART to reduce overfitting and optimise out-of-sample performance. This helps to eliminate illogical branches that can grow from the sample data but which would not perform well out-of-sample (e.g. where a node suggests that patients with a household income between AUS$52,000 and AUS$72,799 are less likely to be prescribed than those with a household income below AUS$52,000 or above AUS$72,799). Where there was no difference in out-of-sample performance, we followed Breiman et al. [24] in preferring smaller trees over larger trees.

Once optimised, the structure of the CART was evaluated to identify patient subgroups and prescribing thresholds. We calculated a predictor-importance metric for the preferred model using the predictor-importance Matlab algorithm.Footnote 5 Next, we compared patient subgroups and prescribing thresholds identified by the CART against NHF 2004, NVDPA 2008 and PBS guidelines to identify similarities and differences.

Finally, we evaluated the stability of our results. The robustness of predictor-importance and specific hierarchies is difficult to assess because of the conditional nature of the CART [30]. As a simple guide, we trained 100 ‘bagged’ treesFootnote 6 on bootstrapped samples of the data, and counted the number of times each predictor appeared [32, 33]. Following Dannegger [32], we calculated confidence intervals (CIs) and density functions of the cut-off thresholds used at key decision nodes to highlight stability.

3 Results

3.1 Prescribing and Risk Factor Statistics

Table 1 provides the sample mean and standard deviation (SD), or frequency count and percentage, for demographic and clinical characteristics. Of the 1903 patients, 296 (16 %) were prescribed lipid-lowering medication.

Table 1 Characteristics of the patients in the AusHeart study

3.2 Model Performance

CART-1 considers only patient demographics and the GP-estimated 5-year AR of a CV event. It provides a performance benchmark but is not expected to perform well given the absence of individual risk factors or Framingham AR. The unpruned CART-1 correctly identified 1560 (97 %) patients who were not prescribed lipid-lowering medication, but only 115 (39 %) of those who were prescribed, for an overall within-sample error rate of 12 %. As expected, the performance of CART-1 drops when moving out-of-sample; error increases to 20 % (95 % CI 20–21) but with pruning this is reduced to 18 % (95 % CI 17–18). The out-of-sample ROC metric is 0.53 (95 % CI 0.51–0.55), indicating the model is barely better than a random guess at predicting prescribing patterns (Table 2).

Table 2 Model performance (%)

CART-2 adds 13 individual risk factors to CART-1. This improves both within- and out-of-sample performance. Within sample, the model correctly identified 1585 (99 %) patients who were not prescribed lipid-lowering medication, and 157 (53 %) of those who were, for an overall error rate of 8 %. After pruning, the out-of-sample error was 17 % (95 % CI 16–17 %) and the ROC metric was 0.63 (95 % CI 0.60–0.64).

CART-3 adds Framingham AR to CART-2, which should identify NHF and/or NVDPA guidelines if they are followed. Within sample, the model correctly identified 1579 (98 %) patients who were not prescribed, and 172 (58 %) who were prescribed, for an overall error rate of 8 %. After pruning, the out-of-sample error was 17 % (95 % CI 16–18) and the ROC metric was 0.62 (95 % CI 0.60–0.63), which is not significantly different from CART-2. Framingham AR does not appear in the pruned version of CART-3.

3.3 Predictors of Prescribing

Household income, GP-estimated AR, and the individual risk factors low-density lipoprotein [LDL], diabetes, total cholesterol, CVD history and triglycerides all influence GP prescribing under the pruned CART-2 model. The predictor-importance results suggest that LDL, GP-estimated AR and diabetes make the most improvement to classification, followed by triglycerides, income, total cholesterol and CVD history (Table 3).

Table 3 Predictor results

Figure 1 shows interactions between the AR assessments, individual risk factors, and sociodemographic factors, and highlights the paths that lead to prescribing. On the right-hand side of the tree, prescribing is most likely for patients with high LDL (>4.09 mmol/L), high total cholesterol (>6.95 mmol/L), high GP-estimated AR (>17.5 %) and relatively low household income (<AUS$52,000). On the left-hand side of the tree, patients without high LDL (<4.09 mmol/L) are more likely to be prescribed if they have high triglycerides (≥4.25 mmol/L for patients with diabetes; ≥4.35 mmol/L for patients without diabetes and with GP-estimated AR ≥2.5 %), or if they have previously had a coronary artery event.

Fig. 1
figure 1

CART-2. CART classification and regression trees, LDL low-density lipoprotein (mmol/L), total cholesterol total cholesterol (mmol/L), GP AR general practitioner-estimated absolute risk (5), TRI triglycerides (mmol/L), CVD cardiovascular disease, CAD coronary artery disease, TIA transient ischaemic attack, Income household income, Asterisk 0 indicates not prescribed medication, 1 indicates prescribed medication

CART also highlights interactions where prescribing is unlikely. Patients with high LDL, total cholesterol and GP-estimated AR are less likely to be prescribed lipid-lowering medication if they have relatively high household income (≥AUS$52,000). Patients with high LDL and high total cholesterol but without high GP-estimated AR are also less likely to be prescribed. Finally, prescribing is less likely for patients with high LDL but without high total cholesterol.

3.4 Robustness of Results

Comparison of CART-2 performance under different splitting criteria and approaches to missing data show no significant differences in ROC out-of-sample performance (Table 2). Comparison of the 100 bagged trees highlights robustness of the specific hierarchies and decision nodes within CART-2. LDL and diabetes appear in all 100 trees, at the root and second node positions, and have the highest average predictor-importance (Table 3). The LDL decision threshold is bimodal, with a mode at 4.6 mmol/L in addition to the 4.1 mmol/L suggested in CART-2 (Fig. 2); however, the difference between modes is less than one SD in LDL in the sample (0.8 mmol/L). By contrast, total cholesterol appears at the third node in only 6 of the 100 bagged trees. Education (those with University education are less likely to be prescribed) appears 44 times at node 3. Triglycerides and GP-estimated AR, which appear twice in CART-2, appear 171 and 116 times, respectively, within the first 10 nodes. The triglyceride decision threshold shows the cut-off at 4.3 mmol/L, as seen in CART-2, but also identifies another mode at 2.0 mmol/L. Household income appears 76 times in the first 10 nodes, with the median cut-off at AUS$52,000 as per CART-2. Exercise, Framingham AR and self-rated health status are not present in the pruned CART-2 model but appear in 38, 10 and 4 of the first 10 nodes of the 100 bagged trees.

Fig. 2
figure 2

Sample and threshold densities for selected risk factors. AR absolute risk, GP general practitioner, LDL low density lipoprotein

4 Discussion

4.1 Key Findings

4.1.1 Prescribing Varies Across GPs and Does Not Appear to Follow AR Guidelines or PBS Regulations

We find that prescribing practices do not appear to be congruent with NHF, NVDPA or PBS eligibility guidelines. NHF and NVDPA use Framingham AR assessment as the basis of their guidelines, yet thresholds on Framingham AR rarely appear in the CART. The guidelines also recommend prescribing on the basis of individual risk factors (e.g. for patients with kidney disease or diabetes). Kidney disease does not appear in CART-2; however, diabetes does appear but is neither necessary nor sufficient for prescription.

Similarly, the PBS has conditional criteria based on individual risk that govern eligibility, such as total cholesterol >5.5 mmol/L for patients with diabetes, and >6.5 mmol/L for patients with HDL <1 mmol/L. These decision branches do not appear in CART-2; however, the model suggests that prescribing is more likely for low LDL patients with triglycerides >4.25 mmol/L (for patients with diabetes) and >4.35 mmol/L (for patients with high GP-estimated AR). This is somewhat consistent with the PBS, which allows prescribing for a subset of patients with triglycerides >4 mmol/L.

Our findings contribute to a growing body of evidence [2, 7, 18, 27], suggesting there is considerable room for improvement in the prescribing practices for CVD. If guidelines provide an accurate description of optimal treatment, divergence from guidelines is likely to be costly, both in terms of health expenditure and patient outcomes. For example, the prescription of drugs to patients who fall outside the specified indications, often referred to as leakage [34], is likely to diminish the real-world cost effectiveness of pharmaceuticals if it results in patients gaining a lower average benefit than was assumed at the time of the approval for use. There may then be dividends from interventions to improve adherence to guidelines, such as IMPLEMENT, ALIGN and IRIS [3537]. CART would be an appropriate method to assess such adherence. Conversely, if thresholds for reimbursement constrain best-practice prescribing (e.g. based on total or absolute risk of CV events or a more thorough understanding of the patient), there may be a case for removing thresholds for reimbursement and allowing increased clinical freedom in prescribing. Either way, there are potentially significant opportunity costs to this uncertainty.

While we find discordance between practice and guidelines, we do not identify one standard decision tree that consistently explains prescribing behaviour across our representative dataset. Instead, we find that prescribing practices vary across the GP population. This is perhaps unsurprising given the volume of guidance available [38] and the potential for between-GP variation in uptake and acceptance of decision-support tools and guideline recommendations [39]. We posit that the low ROC performance of the CART models is a result of this variation. In an environment of clearer and more widely adopted guidelines, we would expect the ROC performance to improve.

4.1.2 CART Suggests How and Why GP Prescribing Deviates From Guidelines

The CART analysis provides additional insights regarding the roles of individual risk factors and the hierarchy of GP decision making. LDL is the root node in all bagged trees, suggesting it is the first risk factor used in the decision-making process. Similarly, diabetes is consistently the second node in the decision tree, suggesting it is an important risk factor that GPs consider in decision making. It is well established that lowering CV risk is associated with the degree to which statins reduce LDL cholesterol [40]. Similarly, statins have been widely prescribed in people with diabetes, given their higher CV risk [41]. It appears that evidence regarding these risk factors takes precedence over AR and the eligibility criteria.

We also show that prescribing to high-risk patients varies based on the patient’s household income and/or educational attainment, with those with household income above AUS$52,000 or with a university degree unlikely to be prescribed. There has been some evidence of this internationally [42, 43]; however, the CART method uncovers the hierarchy of these factors. Specifically, we show that income/educational attainment are deciding factors at the bottom of the prescribing decision tree, after clinical establishment of high risk. However, there is likely to be confounding between these factors and patient’s health and lifestyle. Self-rated health and exercise are significantly collinear with household income (Pearson Chi-square p values of 0.016 and 0.000), and both entered some trees in the robustness analysis. Nonetheless, the results concord with theories that GPs use clinical judgement and knowledge of the patient to make decisions based on a wide range of factors, not just AR-based guidelines involving absolute or individual risk factors [1].

Finally, we show that CVD history is taken into consideration for otherwise low-risk patients, with those patients with CAD more likely to be prescribed. This concurs with previous research that highlighted inconsistent CV risk perceptions across vascular territories [44].

4.2 Limitations

There are limitations to this study. First, the analysis uses filled prescriptions, rather than written prescriptions, as the measure of prescribing. To the extent that patients with unfilled scripts differ in some respect from more compliant patients, the CART may not characterise prescribing practice across all patient groups.Footnote 7 Caution should therefore be exercised in generalising our findings to patients prescribed but who do not go on to fill their prescriptions.

Second, the AusHeart sample is a stratified random sample of GPs who had previously expressed an interest in participating in the study. While this approach produced a nationally representative sample with respect to a number of observable GP characteristics [7], it may have selected GPs with a greater than average interest in CVD management and the guidelines associated with it. There are also some limitations from the survey design; for example, we do not know the time interval of prescribing or nonprescribing prior to the study.

Finally, the CART method has limitations. Overall model performance is low, which could be due to variance in prescribing practices; each GP might use a different tree for each patient. We discuss GP variability and clustering in "Appendix 2". Instability in trees uncovered by CART can be difficult to measure and visualise.

5 Conclusions

While previous studies showed discordance between evidence and practice, CART extends traditional analyses by highlighting the alternative decision trees and key factors that GPs use in practice to make prescribing decisions. The advantages of CART are the ability to identify hierarchies and nonlinearities, and to provide results that are relatively easy to understand. These strengths are evident in this analysis, which show hierarchical decisions with complex interactions between individual risk factors and sociodemographic factors.

This example has shown that the CART big data technique is applicable to a wide range of healthcare topics, including those where big data are absent. There are an increasing range of applications in healthcare that utilise CART’s strength in uncovering nonlinear thresholds and hierarchies to develop guidelines for clinical decisions. It follows that evaluating if these guidelines are used in practice requires methods that can identify such structures and thresholds. In these instances, CART provides a useful addition to the analyst’s toolkit.