INTRODUCTION

Risk stratification identifies patients who are more likely to encounter potentially avoidable health outcomes, so that they can be targeted with early interventions. Because risk stratification (RS) can help control the costs associated with adverse health events, these programs have become more commonplace in clinical settings.1,2 In 2015, the Agency for Healthcare Research and Quality (AHRQ) estimated that 5% of the US population accounts for greater than half of all health care spending.3 As a result, payors are turning to population management to reduce avoidable health care among high-risk patients, while improving health outcomes. Population management typically involves care coordination, monitoring, education, and social support conducted by care managers.4 The Comprehensive Primary Care Plus (CPC+) initiative from The Center for Medicare & Medicaid Innovation (CMMI) is an alternative payment model that promotes population-based care management. This program requires organizations to stratify their patient panels into risk tiers and provide targeted care management to those who would likely benefit.5,6

Identifying patients at risk for adverse outcomes remains challenging. Patients at higher risk of morbidity and mortality typically have a number of comorbid conditions, higher disease severity indicators, more behavioral health needs, and experience other social determinants of health, including financial challenges.7,8 Risk stratification simplifies the identification of high-risk patients by using either automated algorithms, clinical intuition by providers, or both. Computerized algorithms, also known as clinical prediction models (CPM), rely on data available in electronic health records (EHR).9 While these data are vast and detailed, they are limited by a lack of discrete representation for many functional and psychosocial patient factors.

A definitive method of risk stratification in primary care has not been identified. On one hand, automated algorithms have been successful in predicting outcomes under varying circumstances, such as in the identification of patients at risk for hospitalization or increased costs.1,5,10,11 On the other hand, human intuition often disagrees with the computer models. One study of diabetic patient complexity found poor concordance between the opinion of 40 physicians and three commonly used algorithms.12 In another mixed-methods study, 35 nurses and social workers based their classifications of patient candidacy for intensive care management on nuanced themes, such as social support, health trajectory, motivation, and agency. The researchers were unable to adequately match their opinions using models from EHR data alone.4 Furthermore, automation can sometimes cause unintended harm, as when racial bias was identified in a widely used algorithm that predicted costs as a proxy for illness.13

Hybrid methods of risk stratification attempt to circumvent the weaknesses of automated algorithms by allowing physicians to review and adjust computer-generated scores in a process known as adjudication.8 Adjudication allows human experts to weigh nuanced factors that may not be available in computer models. Provider input generally increases overall confidence among stakeholders in the risk stratification and care management process.8,14 Studies have demonstrated that a combined approach was shown to improve the prediction model accuracy.15,16 However, additional research is needed to evaluate providers’ considerations and value contributions to automated algorithms. In this investigation, we assessed the factors that influence provider adjudication of risk scores using qualitative interviews and statistical modeling. We then compared the performance of the commercial EHR risk score against the providers’ adjudicated risk scores by their ability to predict unplanned care utilization and mortality.

METHODS

Study Design

We conducted a mixed-methods study using key informant interviews and a patient cohort to evaluate the accuracies of risk stratification models derived from commercial algorithms and clinician input. This study was approved by the OHSU Institutional Review Board (IRB# 00008917). Patients in the cohort were those seen in primary care clinics at Oregon Health & Science University (OHSU) who were assigned risk scores using a vendor-provided algorithm. The primary care providers were then asked to adjudicate the risk scores of the patients on their panels by either increasing, decreasing, or retaining the score as calculated by the algorithm. Participating providers included board-certified physicians, physicians-in-training, nurse practitioners, and physician assistants. A purposive sample of participating providers were interviewed about their experiences and perceptions of the process. Over a 12-month period, the patient cohort was observed for negative outcomes. The accuracies of the risk stratification models, with and without adjudication, were then compared through statistical analysis.

Study Setting

The adult primary care clinics at OHSU participate in the CPC+ initiative from CMMI. CPC+ practices must have a risk stratification process that includes computerized risk assessment and provider perception. An interprofessional committee at OHSU reviewed several options for algorithmic risk stratification.11 While the algorithms had comparable predictive abilities, the committee chose the commercial tool from Epic because it was readily available for integration with the existing electronic health record (Epic Systems Corporation, Verona, WI). The “risk of hospital admission or ED visit model” is a proprietary algorithm for predicting a patient’s risk for an emergency department (ED) visit or hospital admission over the following year. The model input variables include Medicare status, Medicaid status, relationship status, assignment to PCP, number of hospital and ED visits in the past year, and twelve high-risk diagnoses. The vendor documentation reports that the model demonstrated an area under the receiver operating characteristic curve (AUC or concordance-statistic) of 0.63–0.78.17 During the latter half of 2018, OHSU primary care practices began implementing the tool. Providers are shown the percentage risk and a corresponding 0 to 4 categorical risk score: 0 (0–1% risk of ED/hospitalization in the next year), 1 (2–9% risk), 2 (10–19% risk), 3 (20–59% risk), and 4 (60–100% risk).

After the risk tool was implemented, internal medicine and family medicine providers were asked to review and adjudicate the scores assigned to patients on their panels. In the adjudication process, the provider can adjust the risk score up or down by 1 point or keep it unchanged if they agree with the algorithm-generated score. The final adjudicated risk scores are used to evaluate patients for a care management intervention.

Key Informant Interviews

Structured interviews were conducted with nine providers who were responsible for the adjudication of 3029 patient risk scores. A structured interview guide was developed and then used to ask providers about their experience with adjudication, how they assess risk, and what factors they consider in predicting mortality and utilization outcomes. A think-aloud protocol was utilized, by which providers voiced their subjective experience of adjudication while the researcher observed and asked guiding questions. Interviews were conducted by a single researcher with awareness of the adjudication process.

Study Population and Variables Collected

Patients aged 18 years and older with a primary care provider at an OHSU internal medicine or family medicine clinic were first assigned Epic risk scores in 2018. Those patients who had their score adjudication completed between 7/12/2018 and 12/31/2018 were included in the quantitative analysis. 7/12/2018 was chosen as the initial date because this is when the commercial tool became available. Patient characteristics include Epic risk score, adjudicated risk score, age, gender, ethnicity, and race. Race and ethnicity data are self-identified by patients and collected as part of standard clinic intake. Disease severity indicators and behavioral health factors were chosen to align with the themes from the provider interviews. These include maximum hemoglobin A1c and maximum B-type natriuretic peptide (BNP) representing severity of diabetes and heart failure, respectively. Medication count was also utilized because polypharmacy is associated with increased mortality.18 The Patient Health Questionnaire (PHQ)-2 screen was used as an indicator for mental health. Outcome metrics included an OHSU ED visit, OHSU hospital admission, and death during the 2019 calendar year.

Data Analyses

For qualitative analysis of interviews, we used an immersion/crystallization approach to identify and describe emergent themes and to select salient exemplars, which were later used to guide EHR data collection and statistical modeling in our mixed-methods work.19 To reveal the factors that may be influencing provider adjudication, we used multivariate logistic regression to evaluate the association between up-scoring behavior and patient characteristics, including demographics and disease severity indicators. For the purposes of the regression model, maximum BNP and hemoglobin A1c were input as zero for patients without a recorded value. Similarly, patients without a PHQ-2 or PHQ-9 recorded were considered to have a negative PHQ-2. Adjusted odds ratios and 95% confidence intervals were produced from the multivariable models.

After collecting patient outcome data for 2019, we compared the predictive abilities of the vendor algorithm with the adjudicated score. The receiver operating characteristic (ROC) curves were plotted on a continuous scale of 0 to 4. For each risk model, we calculated the area under the curve (AUC or c-statistic), which were compared using DeLong’s test for two correlated ROC curves. Sensitivity, specificity, and positive predictive value (PPV) were also calculated with a score cutoff of 4 and compared using McNemar’s test. Finally, we repeated the risk model comparisons for the subgroups of males, seniors (65+ years), and Black patients.

RESULTS

Patient Cohort

From July 12 through December 31, 2018, 47,940 patients were assigned Epic risk scores and had their scores adjudicated by primary care providers (Table 1). Of these, 47% were males, 23% were 65 years or older, and 3% were Black, 7% were Asian, and 5% were Hispanic, and 7.3% had Medicaid. Over the calendar year of 2019, 14% of our cohort were seen in an ED visit, 7% were admitted to the hospital, and 1% passed away.

Table 1 Patient Outcomes by Demographic Characteristics

Key Informant Interviews

Nine primary care providers (PCP), including seven physicians, one nurse practitioner, and one physician’s assistant from one internal medicine and one family medicine practice participated in structured interviews. Qualitative analyses resulted in five emergent themes that the providers considered in risk adjudication (Table 2). Lower perceived risk was described as being related to strong self-management skills, including health literacy, engagement with care plans, and adherence to medications. Providers perceived that disease severity and behavioral health factors were insufficiently considered by the vendor algorithm. Patients with poor social support or self-management skills were described as benefiting most from care management intervention. Providers voiced that they occasionally up-scored patients to provide them with additional resources. In these cases, perceived risk was disregarded.

Table 2 Factors Reported by Providers in Risk Adjudication, Categorized by Theme

Overall Adjudication Results

When asked to adjudicate the algorithm-derived risk score, overall providers elected to decrease 7.5%, increase 6.8%, and keep 85.7% of the scores unchanged. Of the patients whose risk scores were decreased during adjudication, 74% did not suffer a measured negative outcome (ED visit, admission, or death). Of the patients whose scores were increased, 32% experienced at least one negative outcome (Table S1 of the Supplementary Appendix).

Factors Associated with Up-Scoring to High-Risk Category

Table 3 displays patient characteristics associated with provider up-scoring during adjudication. Patients who had their risk scores increased by adjudication were more likely to be males, older, or Black. Up-scored patients were also more likely to have a higher active medication count, a positive PHQ-2 screen, a hemoglobin a1c > 9 mg/dL, or a pro-B-type natriuretic peptide >200 pg/mL.

Table 3 Adjusted Odds of Provider Up-Scoring in Risk Score Adjudication by Patient Characteristics

Risk Model Performance Comparison

Overall, the adjudicated score models showed greater predictive ability over the original vendor score model for all three outcomes (Table 4). When limited to males or seniors, the adjudicated risk models again displayed greater AUC performance for all outcomes. When limited to Black patients, the differences between the Epic and adjudicated risk models were not significant. We also examined differences in sensitivity using a score cutoff of 4. Overall, adjudication improved the model sensitivity for all three outcomes. When the analysis was limited to the subgroups of males or Black patients, the increases in sensitivity were not always significant.

Table 4 Performance Comparison of the Epic and Adjudicated Risk Score Models: Concordance Statistics and Sensitivity

DISCUSSION

This study is among the first to our knowledge to illustrate the contribution provider adjudication can make in clinical predictions. In this study, we found that adjudication adds value to risk models by improving the prediction of adverse outcomes. As a result, complex patients can be more readily linked to needed resources. Clinical prediction models typically provide an automated process for identifying high-risk patients who can then be provided interventions to reduce adverse outcomes, including emergency department visits, inpatient admissions, and death. However, automated risk algorithms are often inadequate in identifying high-risk patients, which can result in wasted resources or missed opportunities for intervention. Adjudication allows providers with patient-specific knowledge to improve the identification of at-risk patients. This study quantifies the value of provider adjudication.

This study also found that primary care providers consider a number of factors when they adjudicate risk scores, including disease severity, health literacy, behavioral health, and whether an actionable intervention, such as care management, is warranted. When they adjudicate patient risk scores. These findings align with past work that demonstrated providers and care managers consider complexity of medical decision-making, individual patient behaviors, and psychosocial factors when evaluating complexity.4,12,15 Only two of the five emergent themes identified here (disease severity and behavioral health factors) can be represented by EHR data.

Further to this, we found that sex, senior age, Black race, PHQ-2 positivity, high BNP, high a1c, and a higher medication count were all associated with provider up-scoring behavior. While not all of these factors were specifically mentioned in our interviews, they may be factors that providers considered. Some of these factors have been shown to be associated with adverse outcomes. For example, polypharmacy is associated with increased mortality and depressive symptoms are a risk factor for hospital admission18,20 Patient activation, as a measure of self-management skills, has been associated with improved biometrics, including hemoglobin a1c, blood pressure, and cholesterol.21 The underlying reasons why providers up-scored Black patients is multifactorial, the facets of which stem from systemic anti-black racism in the healthcare setting including implicit bias underlying a perception of increased risk.22,23 Furthermore, it is important to recognize as stated by Vyas et al. that “racial differences found in large data sets most likely often reflect effects of racism — that is, the experience of being black in America rather than being black itself — such as toxic stress and its physiological consequences.”24 It is not clear if adjudication incorporates racial bias or not, and this issue highlights the need for deeper evaluation of providers’ racial bias in assessment of risk.

In our patient population, adjudication by PCPs improved the prediction of adverse outcomes. When compared to a commercial risk algorithm, there was a statistically significant improvement in the AUC with adjudication for prediction of ED visits, hospital admissions, and death. Among the highest risk patients (risk score 4), we demonstrated that adjudication improves sensitivity, indicating that providers are accurately identifying complex patients and labeling them as higher risk. Overall, these findings demonstrate that adjudication has significant value.

While adjudication improved the predictive abilities in our overall population, we further demonstrated that it did not harm specific populations. We analyzed the risk models for three demographic groups characterized as higher risk by providers: males, seniors (age 65+), and Black patients. Within each of these groups, adjudication improved algorithm performance, though not always to a statistically significant degree. This finding suggests that providers are not displaying harmful bias in their adjudication.

This study also demonstrated that provider adjudication improves the predictive ability of risk stratification algorithms. That said, adjudication remains a highly manual process that requires clinician time and health-system funding, which is not always feasible or available. Provider adjudication behavior offers clues regarding which elements are lacking from risk algorithms and holds promise for automation of a highly manual process. Integration of clinician-identified factors into risk algorithms is a promising pathway for improvement of future iterations of risk algorithms, and if effective, would reduce the burden of manual processes, save clinician time, and potentially improve patient outcomes.

Limitations

Our provider sample and patient cohort are limited to a single academic medical center with a large regional catchment. 83% of our sample were white patients, suggesting that certain demographic subgroups were underrepresented. In addition, our cohort had a lower percentage of patients insured by Medicaid compared to the general population.25 Furthermore, the patient demographics and disease severity factors were chosen for convenience from the available EHR data. The assessment of patient death was limited to only what was available in the EHR and did not include claims data.

CONCLUSION

Provider adjudication of automated risk stratification algorithms contributes value to the process because human providers offer patient-specific knowledge and experience. In this two-phase study, we first show that providers consider patient characteristics that are unused by a commercial algorithm or that are not available in the EHR. Second, we demonstrated that adjudication makes a meaningful contribution to the risk stratification process over the automated commercial system alone by improving the performance of the risk models.