INTRODUCTION

Low-value care, or patient care that produces net harm or offers no net benefit in specific clinical scenarios, can lead to unnecessary spending and patient harm.17 An estimated $12.8 to $26.5 billion annually could be saved in the USA through interventions that reduce low-value services.8 A major challenge to reducing low-value care has been the identification of high-priority and evidence-based clinical targets.1 Given that prior studies often use loosely characterized definitions of low-value care, relying on the absence of evidence, frontline clinicians have challenged their clinical validity.9

The US Preventive Services Task Force (USPSTF) recommendations, on the other hand, are precisely defined and each grading accounts for the uncertainty and quality of evidence available. Because USPSTF preventive services impact millions of Americans, one potential group of services for intervention are those deemed Grade D, which the USPSTF actively discourages from providing (see Table 1 for grading definitions).11 The Grade D designation for a particular service requires sound evidence that the service either offers net harm or offers no net benefit to asymptomatic patients. Grade D services, therefore, are among the most rigorously developed lists of low-value services to target for reduction. Data describing the utilization or costs of Grade D services within Medicare are lacking. While studies have examined low-value care in Medicare, these studies used data as recent as 2011 or focused on one to three services.4, 5, 7 As interest grows among policymakers to deter the use of low-value care, a more recent and broader understanding of the extent of Grade D services is needed.

Table 1 Definitions of the USPSTF Grading System10

Hence, the objective of this study was to quantify the utilization and costs of selected Grade D services among Medicare beneficiaries. We used a nationally representative survey of outpatient visits across a 10-year period and constructed measures of Grade D services using existing literature and the USPSTF recommendations.

STUDY DATA AND METHODS

Data Source and Collection

We used data from 2007 to 2016 from the National Ambulatory Medical Care Survey (NAMCS), a nationally representative survey of ambulatory visits to non-federal office-based practices in the USA. The National Center for Health Statistics (NCHS) administers NAMCS annually and employs a multistage probability design to sample visits to office-based clinicians. Physician offices and representatives of the US Census Bureau abstract data from the medical record with a standardized survey instrument. Information collected includes reasons for visit (chief complaint and two secondary complaints), diagnosis codes (International Classification of Diseases, Ninth and Tenth Editions), demographic information, expected payers, selected laboratory tests, imaging, and medications (both prescription and over-the-counter medications) either ordered or continued at the visit. Indicators for selected chronic diseases are included in addition to diagnosis codes. The NCHS calculates survey weights for visits based on the inverse probability of selection at each sampling stage in order to derive national estimates. Annual response rates during the study period ranged from 46 to 64%.

Eligible visits included beneficiaries aged 18 years and older in which Medicare was listed as a payer (including fee-for-service Medicare, Medicare Advantage, those with supplemental Medicare plans, and dual-eligible Medicare-Medicaid beneficiaries). NAMCS has been frequently utilized as a nationally representative data source for studying low-value care.1216

Main Outcome Measures

We selected seven USPSTF Grade D services that could be feasibly identified in NAMCS. While we examined all Grade D services at the time of the analysis (n=20), we excluded measures that could not be replicated with the available diagnosis codes, reason for visit codes (corresponding to chief complaints and other secondary symptoms), and indicators for certain services (such as colonoscopy or sigmoidoscopy) (n=13). Additionally, some services were coded too infrequently in the data to be deemed reliable, such as carotid ultrasonography for carotid artery stenosis screening (n=4). The seven services that could be reliably measured were as follows: (1) asymptomatic bacteriuria screening in nonpregnant adults, (2) cardiovascular disease screening in low-risk adults with either rest or stress electrocardiography, (3) cervical cancer screening in women over 65 years old with Papanicolaou or HPV testing, (4) colorectal cancer screening in adults over 85 years old with either colonoscopy or sigmoidoscopy, (5) COPD screening in asymptomatic adults with peak flow or spirometry, (6) prostate cancer screening with prostate-specific antigen testing in men 75 years old and older, and (7) vitamin D supplementation for fracture prevention among postmenopausal women.

In calculating a specific measure, NAMCS data were pooled across years that the indicator variable for a test or service was available and a USPSTF recommendation existed. We used 2007 through 2016 for the cervical cancer screening and cardiovascular disease screening measures as indicators for these services were available. In the case of colorectal cancer screening, a variable for colonoscopy and/or sigmoidoscopy was available from 2009 to 2016. We supplemented this definition by including ICD-9 and ICD-10 codes used during screening encounters. For consistency, we conservatively used the prostate cancer screening recommendation from 2008 as there have been two changes since then (with a Grade D recommendation for screening at any age in 2012 and later to 70 years and older in 2018). Because the USPSTF issued the recommendation in the second half of 2008, we used 2009 to 2016 years for this measure. For the asymptomatic bacteriuria measure, we used 2009 to 2016 data as these were years where the urinalysis or urine culture indicators were available. For COPD screening, we used 2012 to 2016 because 2012 was the first year the spirometry indicator was available in the data. Finally, 2013 to 2016 data were used for the vitamin D supplementation measure because 2013 was the first year the USPSTF issued a recommendation against its use for osteoporosis prevention among postmenopausal women. All measures were studied only when a Grade D recommendation was active.

To account for the clinical nuance required to identify these services, we excluded visits with competing diagnoses or other clinical information. For the asymptomatic bacteriuria screening measure, we excluded encounters that reported symptoms localizing to the urinary tract. These included reason for visit codes or ICD-9 or ICD-10 diagnosis codes corresponding to hematuria, nocturia, painful urination, and burning. We further excluded patients who were pregnant using the NAMCS indicator for pregnancy status along with ICD-9 or ICD-10 codes for pregnancy or prenatal care. To construct the population eligible for cardiovascular screening, we excluded encounters with diagnostic codes corresponding to any cardiovascular condition, such as dyslipidemia, hypertension, coronary artery disease, and ischemic stroke. We also excluded visits with reason for visit or diagnostic codes denoting clinical features prompting a diagnostic work up, such as syncope, palpitations, edema, murmurs, or history of diabetes. Among those eligible for the COPD screening measure, we excluded those with respiratory symptoms, such as cough or wheezing, and those with a history of any pulmonary disorder, including asthma, obstructive lung disease, and interstitial lung disease. For the vitamin D supplementation measure, we excluded patients with a diagnosis of osteoporosis, vitamin D deficiency, or conditions associated with increased risk for malabsorption such as inflammatory bowel disease, celiac disease, or post-bariatric surgery.

Because the USPSTF recommendations apply to average-risk patients, we further excluded patients at high risk for cancer under the cancer screening measures. For cervical cancer screening, we excluded encounters among women with a history of abnormal Papanicolaou tests, positive HPV tests, cervical dysplasia, any gynecological malignancy or carcinoma in situ, and human immunodeficiency virus, or presenting with alarm symptoms such as vaginal bleeding. For the colorectal cancer screening measure, we excluded visits where a colonoscopy or sigmoidoscopy would be recommended for diagnostic purposes, such as gastrointestinal bleeding or a diagnosis of iron deficiency anemia. We further excluded individuals that were not at average risk for colon cancer, including patients with any inflammatory bowel disease diagnosis, a personal or family history of colonic polyps, Lynch syndrome, and familial adenomatous polyposis. Similarly, we excluded patients from the prostate cancer screening measure if they had a personal or family history of prostate cancer. To minimize misclassification of an appropriately ordered service as low value, we maximized the number of exclusions to construct the most conservative measures possible. The complete list of exclusion criteria and associated codes is included in the appendix.

To calculate annual expenditures, we multiplied the weighted number of visits in which a service was ordered by the per-unit Medicare price for that individual service. We searched for the best publicly available sources for Medicare prices in February 2020, which included publication dates ranging from 2015 to 2020 (see appendix for the list of price sources). When ranges were available, we used the lower bound to derive more conservative estimates. These encompass the average national price paid to physicians and do not specifically reflect out-of-pocket spending.

Statistical Analysis

We report age, gender, race/ethnicity, and payer information by receipt of any Grade D service. We collapsed race/ethnicity into mutually exclusive categories of non-Hispanic white, non-Hispanic black, Hispanic/Latino, and other. Visits where Medicare and private insurance were listed as payers were categorized as Medicare plus supplemental private insurance and visits with Medicare and Medicaid listed were categorized as dual-eligible. We report annualized weighted counts of utilization for each measure and their 95% confidence intervals, using standard methods to account for weighting and the complex survey design. In accordance with NCHS requirements, we only calculated utilization if the unweighted number of sampled visits for a service was 30 or higher and with a relative standard error of 0.30 or less.17, 18 We performed all analyses using SAS (version 9.4) and the UCLA IRB deemed this study exempt from human subjects research.

STUDY RESULTS

From 2007 to 2016, we identified 95,121 unweighted Medicare patient visits within NAMCS, representing approximately 2.4 billion visits. The average age was 72.2 years and 57.4% of patients were female. Table 2 illustrates the characteristics of patients seen at visits based on whether a Grade D service was utilized. Across visits where Grade D services were used, approximately 8.5% of patients were non-Hispanic black and 10.6% were Hispanic, compared to 8.9% non-Hispanic black and 8.2% Hispanic among visits where no Grade D service was used. With respect to payer, patients with Medicare-only coverage (which include both traditional fee-for-service and Medicare Advantage enrollees) comprised a slightly higher proportion of Grade D visits at 56.4% versus 54.8% among visits without a Grade D service utilized.

Table 2 Weighted Characteristics of Medicare Beneficiaries at Visits, 2007–2016

The utilization of the selected Grade D services exceeded 30 million episodes annually, averaging approximately 13 services per 100 Medicare ambulatory visits. Table 3 shows the count for each Grade D service. The annual count ranged from 137,441 (95% CI: 62,736–212,147) for colon cancer screening in adults 85 years and older to 14,144,166 (95% CI: 12,711,424–15,576,907) for asymptomatic bacteriuria screening. The top two Grade D services, asymptomatic bacteriuria screening and vitamin D supplementation for fraction prevention among postmenopausal women, were used in high volume, comprising 83.9% of the annual count for the seven Grade D services.

Table 3 USPSTF Grade D Preventive Services by Utilization Volume and Costs

The total annual costs of these Grade D services averaged $477.9 million (95% CI: $377.2 million–$578.6 million, see Table 3). Across all Medicare visits in which these services were utilized, they contributed an additional $25 per visit on average. Some services comprised a disproportionate share of costs relative to their volume. For example, colon cancer screening comprised 0.4% of the annual count of these Grade D services, but 14.5% of the costs. The three services that contributed the most to annual costs included (1) screening for asymptomatic bacteriuria, (2) vitamin D supplements for fracture prevention, and (3) colorectal cancer screening among adults >85 years, which comprised 67.5% of spending and 84.4% of the utilization for these services (Table 3).

DISCUSSION

In this nationally representative analysis of outpatient visits made by Medicare beneficiaries over a 10-year period, a group of seven rigorously defined low-value preventive services were utilized over 30 million times each year, totaling over $477 million in estimated annual health care spending. We found that the two Grade D services that were highest in volume were also the two services that contributed most to total annual spending for the seven Grade D services. Additionally, we found that colon cancer screening for those over 85 years was used the least but ranked among the top three most costly services. While much attention to low-value care in Medicare has previously focused on a large number of measures that included a few Grade D recommendations, our study identified additional measures that comprise a relatively large proportion of spending, which reflect important targeted opportunities to safely reduce spending while improving the quality of care.

Our findings differ slightly compared to prior work that examined low-value preventive services in Medicare. For example, Grade D prostate cancer screening was ordered during approximately 1,786,701 visits in this study, which is higher than a previous estimate of about 762,000 instances in 2009.4 In studies that examined screening from 2013 to 2016, estimates ranged between 9.8 and 18.6% of eligible men.19, 20 While these proportions of Grade D prostate cancer screening are higher than the 4.2% of eligible visits in our study, it is important to note that our units of analysis were visits among Medicare beneficiaries seeking care rather than all Medicare beneficiaries. Therefore, the differences in screening may be due to differences in levels of analysis (visit-level versus patient-level) and the number of years used to derive estimates. With regards to Grade D colon cancer screening, our estimate of 137,441 instances is lower than the prior estimate of over 244,000; however, our specification more conservatively excluded additional clinical conditions.4 Given the limitations of claims data in capturing symptoms, data derived from medical records, such as NAMCS, offer greater detail and the ability to examine symptoms in developing exclusions.2129 The potentially enhanced specificity of our findings is a particular strength as we sought to avoid misclassifying encounters as low value given the potential clinical and policy implications of how low-value care is defined.

While the costs of services examined here is considerable, the full extent of utilization and costs of low-value preventive services is likely larger. Not all USPSTF Grade D services were included in this analysis, and these findings are limited to the direct costs of each service, not including those associated with subsequent harms and other unnecessary downstream testing and/or referrals. For instance, approximately 0.5% of patients undergoing prostate cancer screening experience complications of incontinence or impotence from prostate surgery each year according to a USPSTF systematic review.11 Extrapolated to our findings, these low-value prostate cancer screening tests would have caused an estimated 89,335 additional older Americans to develop incontinence or impotence during the study period. Even when no immediate complications result, unexpected findings from low-value screening often trigger further unnecessary tests in “cascades of care,” adding to costs and potential harms.30

Our findings also clarify and highlight an under-recognized source of low-value care for which strong evidence of no benefit exists. The USPSTF Grade D definitions of low-value care cite evidence of absence of benefit, which is more robust than looser definitions simply citing an absence of evidence.31 Lists of low-value services in general are subject to criticism given the infrequency of some of the services listed, the unclear potential impact for improving quality, and the weak evidence in developing some of them.1, 9 These lists are often methodologically closer to Grade I services where evidence is uncertain or evidence to identify benefit or harm is lacking. While an important nuance, many would consider Grade I low-value care under broader and potentially less widely accepted definitions. Moreover, the USPSTF does not consider the cost of services in making its determinations, focusing solely on clinical benefits versus harms.11 Hence, efforts to reduce Grade D services can also avoid ethically complex debates about rationing and cost-effectiveness.

Second, as policymakers consider options to improve value for Medicare beneficiaries, reducing Grade D services can be incorporated into payment reform. Canada has set an important precedent by successfully eliminating payment for population-based vitamin D laboratory screening, leading to marked reductions in low-value vitamin D testing.32 In the USA, the Affordable Care Act grants CMS authority to decline payment for Grade D services; however, CMS has not yet exercised this provision.33 CMS could implement such a policy through a randomized pilot demonstration to reduce potentially harmful asymptomatic bacteriuria screening, with careful attention to stakeholder engagement, valid measurement, and unintended consequences (e.g., upcoding, financial toxicity to patients, or widening inequities in evidence-based care).2, 34, 35 Implementation would likely rely on administrative data using ICD-10 diagnosis codes, which have demonstrated reasonably strong sensitivity and specificity for low-value care measures when compared with manual chart review by professional coders.36

If such a pilot program is proven safe and effective, rolling it out nationally has the potential to simultaneously protect older Americans from harm and produce cost-savings, which can be directly tied to further cost-sharing reductions for evidence-based, high-value care (e.g., eliminating cost-sharing for life-saving blood pressure medications37). Tying the reduction of low-value care directly to the lowering of financial barriers to high-value care serves as a compelling ethical justification for maximizing clinical benefits for patients, while preserving financial sustainability for the Medicare program.

Finally, the COVID-19 pandemic has introduced additional constraints on Medicare spending. Many vulnerable Medicare beneficiaries face potentially catastrophic expenses during the current crisis and reducing low-value care is an important step to prevent exacerbating the impact of financial toxicity.38 The continued ordering of these low-value services despite a Grade D recommendation, however, underscores the challenge of de-implementation in the post-pandemic period. Low-value care remains an intractable problem for a wide array of reasons, including clinician factors (e.g., lawsuit fears, time pressure, uncertainty), patient factors (assumptions that more care equals better care), and health system factors (institutional culture, fee-for-service payments).5, 14, 3950 As the RAND Health Insurance Experiment demonstrated, the difficulty of reducing low-value care while avoiding an undesirable simultaneous reduction in high-value care poses a major challenge.51, 52 This is why rigorously defined measures of low-value care such as Grade D services can help refine currently blunt policy tools, such as cost-sharing, that undesirably lower both high and low-value care simultaneously.

Ultimately, while there remains an evidence gap in understanding the effectiveness of most interventions, a 2017 systematic review of interventions to reduce low-value care found that multipronged interventions are more effective than single interventions.53 Strategies that combine novel payment reforms, such as those described above or accountable care organizations, with supply-side interventions, including physician education, engagement, and seamless alerts embedded in the electronic health record, might be the most effective approach in reducing exposure to the harms of low-value care for Medicare beneficiaries.2, 50, 5456 Reducing expenditures on low-value services provides a rare cost-neutral opportunity to redesign Medicare policies aimed to increase the use of high-value services.

Limitations

There are several limitations to note. First, our method of estimating Medicare spending on Grade D services may be less precise compared to claims data which capture reimbursement of services rendered. However, using claims to estimate the utilization of Grade D services may potentially misclassify many services, as such data lack key clinical information that informs whether or not a service is indicated.2123 For example, use of claims data alone had 56% sensitivity for identifying UTIs compared to a combination of claims and clinical information.29 In contrast, data for symptoms are considered more accurate in medical records than administrative claims data. Additionally, previous assessments found that NAMCS displays reasonably strong validity with respect to diagnoses and procedures.25, 26

Second, NAMCS may underestimate national utilization rates by not capturing non visit-based orders and overestimate utilization by measuring some services that were ordered, but not necessarily rendered and reimbursed. Additionally, because the NAMCS encounter form does not distinguish between over-the-counter and prescription medications, some of the vitamin D medications may be over-the-counter. Nevertheless, patient-driven over-the-counter medications reflect a common and under-recognized source of low-value care (e.g., NSAIDs for patients with heart disease) and physicians can still discourage use of these medications during ambulatory visits. Third, while the survey accounts for non-response bias, NAMCS response rates have declined over time. We followed the NCHS recommendations for statistical analysis and strengthened our sample by pooling several years of data for each measure. Fourth, NAMCS only reflects office-based ambulatory care (approximately 90% of U.S. ambulatory care) and does not include hospital-based ambulatory care (approximately 10% of U.S. ambulatory care).14, 25 Finally, patients are not tracked longitudinally in NAMCS. This may lead to overestimation of Grade D cervical cancer screening since the measure does not apply to women with inadequate screening in the decade prior to age 65, a group we could not identify and exclude.

CONCLUSION

Medicare beneficiaries frequently received several rigorously defined low-value preventive services, costing over $477 million in estimated US health care spending each year. The negative clinical impact and total costs of these low-value services are likely larger as these findings capture neither all D-rated services nor the cascade of downstream health care utilization after their use. Reducing the use of Grade D services represents an opportunity to improve patient-centered outcomes while safely reducing US health care spending.