Introduction

Cancer, a complex and often chronic condition, impacts millions. In 2023, the American Cancer Society estimates that 1.9 million new cancer cases will be diagnosed [1]. Nationally, Bluethmann et al. predicted that 26.1 million survivors will be living in 2040 and that 47% will live more than 10 years after their diagnosis [2]. The Colorado Department of Public Health and Environment estimates that 360,000 people with histories of cancer were currently living in Colorado as of April 2023 [3].

Despite detection and treatment advances, people with histories of cancer experience unique health challenges including treatment-related late effects. People with histories of cancer are also more likely to develop additional cancers. Of the 765,843 incident cancers within the SEER Program which were diagnosed between 2009 and 2013, approximately one-fourth of such adults aged 65 and older and more than one-tenth of such younger adults aged 20 to 64 years were experiencing their second or higher cancer. Most of these new cancers, termed second primary cancers, were diagnosed in different anatomic locations [4].

Gaps in information exchange, limited care coordination activities, and lack of clarity regarding provider roles have all been cited as reasons for fragmentation in cancer survivorship care. Such fragmentation persists as people with histories of cancer transition from active treatment to surveillance [5] [6]. A previous study conducted in Colorado designed to evaluate a cancer survivorship educational intervention for rural primary care practices revealed challenges associated with identifying people with past histories of cancer within their practices. Risendal and her team identified the need for additional research to better understand the patterns of care that Coloradans with past histories of cancer receive across multiple settings [7].

To bridge this gap, many organizations recommend that TSCPs or survivorship care plans (SCPs) be provided to people who complete curative treatment for malignancies. The presence of SCPs may be useful to identify individuals who have completed primary oncology treatment, as well as play an important role in communicating the need for preventive care following cancer treatment. However, additional research is needed to understand their impact on health care outcomes and health care utilization.

The primary goals of this study were (a) to build a comprehensive database representing adult Coloradans who completed cancer treatment within the Metro Denver University of Colorado Health system (UCHealth) and (b) to conduct a secondary analysis of this database to describe demographic characteristics and health care utilization patterns for these individuals. It was hypothesized that a considerable number of Coloradans who have been treated with curative intent for cancer did not receive recommended cancer follow-up or screenings. It was also hypothesized that individuals who lived in rural areas had lower rates of receipt of preventive care tests than those who lived in urban areas. The information gained from this comprehensive survivorship database will enable a systematic approach to population management for cancer survivorship care, provide a foundation for collaborative outreach to help reduce disparities in care, and potentially improve health outcomes throughout Colorado.

Methods

Data source

The HDC Enterprise Health Data Warehouse is hosted by Google Cloud and integrates patient clinical data and relevant billing information from the EPIC electronic health record used throughout UC Health (Epic Systems Corporation, Verona, Wisconsin). Additionally, HDC may link data from other sources, including the Center for Improving Value in Health Care’s (CIVHC) Colorado All Payers Claims Database (APCD) and the University of Colorado Cancer Registry. Data are available from 2011 to present and are updated monthly. Available variables include patient demographics, medical encounters and visits, diagnoses (including cancer diagnosis), health history (including personal, family, and social), medications, procedures, labs, billing codes, payers, and provider queries and notes. HDC may be used to generate de-identified data sets, limited data sets, and fully protected health information (PHI) data sets by request.

The data generated in this survivorship database originated from some of the HPC data sources. These sources include (a) the UCHealth Epic Caboodle enterprise data warehouse; (b) the University of Colorado Medicine Provider Billing Database; (c) the Colorado Department of Public Health and Environment’s Vaccination and Death Registry Data; and (d) the Center For Improving Value in Health Care Colorado All Payer Claims Database. HDC receives such information and enables secure data sharing and delivery to national networks (Fig. 1).

Fig. 1
figure 1

Health Data Compass. Credit to Melissa Haendel, PhD, FACMI and Julie McMurry, MPH

Inclusion and exclusion in HDC-SD

This research utilizes the Health Data Compass-Survivorship Database (HDC-SD), a subset of HDC. The HDC-SD includes data from individuals with histories of cancer who were diagnosed inside or outside of the UCHealth system. These individuals received some or all their curative-intent cancer-related care through the University of Colorado Cancer Center (UCCC). Patients included in the HDC-SD were 18–85 years of age at the time of diagnosis, had a UCHealth medical record number, and were diagnosed with a non-hematologic malignancy (specifically leukemia or multiple myeloma) from January 2011 to December 2021. Finally, patients were required to have received a completed TSCP, delivered between January 1, 2020, and December 31, 2021. Data represented in the HDC-SD were pulled on October 4, 2022.

UCHealth TSCPs are available within the EPIC Electronic Health Record Problem List. The UCHealth custom-built TSCP template specifies the patient’s treatment team, diagnostic and staging information, treatment details, follow-up recommendations as guided by the National Comprehensive Cancer Network (NCCN) and the patient’s oncology team, and overall wellness information. Wellness information includes reasons for patients to contact their oncology and primary care teams, possible late effects, mental health recommendations, health screening and immunization recommendations, advance care planning, and available health system resources including sexual health and fertility services.

Within the Metro Denver region, TSCPs are now generated based on reviews of monthly surgical, anti-cancer therapy discontinuation reports, and radiation end-of-treatment summary reports designed by an in-house EPIC analyst. At the time of the data pull, positive pathology reports provided by the Tumor Registry were used. The process of generating and delivering TSCPs has been an evolving process influenced by the Institute for Healthcare Improvement Plan-Do-Study-Act methodology [8].

Variables of interest

The objective was to use HDC-SD to identify rates of American Cancer Society (ACS) recommended screening procedures, immunizations, and health care utilization among the sample. Screening procedures were identified using a combination of procedure labels and Current Procedural Terminology (CPT) codes; codes related to diagnostic procedures were excluded from the query. Receipts of immunizations were identified using immunization and procedure labels. Health care utilization occurrences were defined as primary care visits, oncology visits, and health system-specific emergency department visits in a 9-month follow-up period after the TSCP was completed. Such data were identified based on encounter data, department type, and clinician specialty. Demographic information of interest included sex, age, and race/ethnicity, as well as urban and rural residence at the time of extraction, as defined by the U.S. Department of Agriculture Economic Research Service (ERS) 2013 Urban Influence Codes [9]. Additional information of interest including the primary cancer site, payer at the time of diagnoses, and payer at the time data was extracted (also referred to as current payer).

A total of 20 records were randomly selected to validate the HDC-SD-derived data with the EPIC-derived data. Validated data points of interest included date of birth, type of cancer, date of diagnosis, age at diagnosis, smoking history, date of TSCP completion, initial and subsequent oncology visits thereafter, initial and subsequent primary care visits thereafter, dates of recommended screening procedures, and dates of immunizations.

Statistical analysis

Chi-square tests were used to compare patient demographics, disease characteristics, socioeconomic characteristics, and health maintenance between urban and rural settings. All tests were two-sided and performed in SAS Version 9.4 (SAS Institute Inc., Cary, North Carolina) using a statistical significance level of p < 0.05. The study protocol was determined to be exempt by the Colorado Multiple Institutional Review Board.

Results

Data set validation

During the chart review process, several discrepancies were identified between the HDC-SD and individualized records within the EPIC EHR. Of the 20 patients reviewed, 10 patients (50%) within the HDC-SD had conflicting dates of diagnosis versus dates determined through the EPIC EHR. In some cases, these differences stretched from months to years. Upon further review, it was discovered that the HDC-SD diagnosis date was defined as the first encounter date associated with an oncology-specific ICD-10 code, even if the code corresponded to an unspecified tumor. To resolve this issue, the date of diagnosis was subsequently defined as the first date identifying a diagnosis code associated with a specific tumor. This adjustment resulted in the HDC-SD diagnosis dates more closely reflecting the dates identified within the EPIC EHR.

Additionally, many completed screening procedures as identified through the EPIC EHR Media tab (where information and documents from outside sources can be scanned in for information purposes and become a part of the medical record) were not identified by the HDC-SD. As a specific example, nine (45%) patients had a record of completed colorectal cancer screening in their EHR, but only two (10%) patients had a confirmed colorectal cancer screening procedure captured by HDC-SD. Documents uploaded into the EPIC EHR Media tab do not translate into a discrete field detectable by HDC; thus, services and procedures completed outside the UCHealth system may not be fully discovered by Health Data Compass (Table 1).

Table 1 Results of chart review

Descriptive analysis

The HDC-SD contains 1933 patients who completed curative-intent primary treatment between January 1, 2020, and December 31, 2021, for diagnoses of cancer. The top three cancers included breast (24%), male reproductive (23%), and cutaneous (13%; mostly early-stage melanoma). Of those included within the HDC-SD, 50.5% were women and 79% were white non-Hispanic. The majority of patients were aged 55 years and older (68%) (Table 2).

Table 2 Comparison of characteristics by urban and rural (defined by Urban Influence Codes)

All 21 Colorado Health Statistic Regions were represented within this database (Fig. 2a and b). According to the United States 2020 Census, approximately 84% of Colorado’s population resides in Health Statistic Regions 2 (Larimer County), 3 (Douglas County), 4 (El Paso), 12 (Garfield, Pitkin, Eagle, Summit, and Grand Counties), 14 (Adams County), 15 (Arapahoe Counties), 16 (Boulder and Broomfield Counties), 18 (Weld County), 20 (Denver County), and 21 (Jefferson County) [10].

Fig. 2
figure 2

a Survivorship database by Health Statistic Region, b Health Statistic Region Key

Urban vs. rural findings

The majority of patients represented in the HDC-SD lived in an urban setting at the time of data extraction (89.8%), and there was a higher percentage of females in the urban setting compared to rural (51.8% vs. 39.9%, p = 0.0010). In terms of diagnosis, the urban sample had a higher percentage of cutaneous malignancies (14.0% vs. 6.6%, p <.0001) and breast tumors (25.0% vs. 14.6%, p <.0001). However, the urban sample had a lower percentage of bladder and urologic diagnoses (7.7% vs. 18.2%, p <.0001). The majority of patients had insurance coverage at the time of diagnosis and at the time of data extraction. The urban sample contained more commercial insurance enrollees than the rural sample (48.8% vs. 38.9%, p = 0.0123), while the rural sample contained more Medicare enrollees (48.0% vs. 35.4%, p = 0.0123), both at time of diagnosis and time of data extraction. Finally, the urban sample had a smaller percentage of White non-Hispanic patients than the rural sample (78.2% vs. 85.4%, p = 0.0024).

In terms of health maintenance, there were some statistically significant findings between patients living in urban areas and rural areas. A greater percentage of eligible people aged 45 years and older within the urban sample were up to date with their colorectal cancer screening (6.2% vs. 0.6%, p = 0.0020). A greater percentage of men aged 50 years and above within the urban sample had a PSA test within the past 2 years (48.9% vs. 22.7%, p <0.0001). Additionally, a greater percentage of women aged 45 years and above within the urban sample had a mammogram in the previous 2 years (25.7% vs. 10.6%, p = 0.0009). Regarding health care utilization, a greater percentage of adults within the urban sample received a flu shot (41.7% vs. 13.6%, p<0.0001) and received one or more COVID-19 vaccines (37.8% vs. 9.6%; p <0.0001). Finally, those living in urban areas had a PCP visit (70.3% vs. 46.0%, p <.0001), an oncology visit (67.7% vs. 49.0%, p<.0001), and an emergency department visit within the UCHealth system (11.4% vs 2.0%, p<.0001) in the 9 months following completion of treatment more frequently than those in the rural setting.

Discussion

This is the first study to examine the socio-demographics and health outcomes of people who have received TSCPs using a novel regional database which combines cancer treatment variables with health care utilization to our knowledge. Utilizing various data sources from a large cancer center, the database provides the opportunity to analyze and better understand patient outcomes and health maintenance after completing primary cancer treatment. Overall, these efforts increase the capacity to describe longitudinal care patterns of individuals within and across health care systems. Such a database has the potential to be one of the first steps towards integrating panel management into the primary care medical home model of care, which can be used in the management of other chronic health conditions such as diabetes or hypertension.

Data from the HDC-SD indicate that almost 90% of people living in urban and rural counties in Colorado who received TSCPs are seeking medical care following primary treatment for past cancer diagnoses. However, many screening exams are not being completed in accordance with ACS guidelines. Additionally, disparities in preventive care services for cancer survivors in Colorado appear to exist across the urban-rural spectrum. These findings signal the need to engage more diverse groups in addressing barriers throughout Colorado.

These results also suggest the need to refine handoffs between oncology and primary care professionals. Coordinated health care following active cancer treatment between primary care and oncology care is crucial; it is one of the six standards for survivorship care as defined by the National Comprehensive Cancer Network (Version 1.2023). Additional standards include (a) surveillance for recurrence and screening for new cancers; (b) monitoring for late effects; (c) preventing and detecting late effects; (d) evaluating cancer-related syndromes; and (e) planning for ongoing care [11]. Notably, care coordination for multiple issues such as (a) general preventive health screenings, (b) management of co-morbid conditions, (c) promotion of healthy behaviors, and (d) psychosocial support is complex. Care coordination requires communication between busy stakeholders in a fragmented health care system, and is often time-consuming.

Several lessons were learned over the course of the development of the HDC-SD. Data points regarding screening procedures and immunizations were initially obtained from lists of free-text procedure labels as they appear in the EHR. Upon analysis, standardized codes, such as CPT and ICD, provided more accuracy in identifying these data points. Additionally, it was found that performing a manual review may be necessary to ensure accurate data. Such manual review allowed for data validation. It also identified and categorized data points, such as provider type, in groups that could not otherwise be identified in the EPIC EHR interface.

Finally, if TSCPs are to be used for research and quality improvement, then they need to be designed to facilitate data collection and analysis. The record review emphasized the need to restructure the EHR interface and TSCP structure to incorporate more discrete field options for data entry. Important data points, such as date of diagnoses or dates of procedures, should be entered into discrete fields rather than free-text fields so that the information may be more easily and accurately extracted from the EHR.

These results highlight the need to support and improve the care of patients who have completed active cancer treatment. These results will be used to facilitate conversation between those who work with the EHR data infrastructure and interface and clinicians to identify ways data entry may be improved to promote more accurate collection of key data points. Finally, these results and the lessons learned from this study may have great significance for individuals impacted by cancer diagnoses, as well as community and state organizations working to improve care coordination and health care delivery throughout Colorado.

Study limitations

Several limitations were noted. First, this database explored health care utilization during the COVID-19 pandemic. The pandemic contributed to screening and treatment delays, so the utilization and receipt of health care services are likely underrepresented in the database. Second, the CPT codes used for this analysis were limited to screening procedures. However, surveillance recommendations may include diagnostics studies per institutional protocols and expert recommendations. Future research should differentiate between screening and diagnostic procedures. Furthermore, the lists of CPT codes used for colorectal cancer and lung cancer screening procedures were not complete when the data were pulled from HDC. As a result, the actual rates of colorectal screening may be higher than what was captured in HDC-SD and will be corrected moving forward. Additionally, lung cancer screening was omitted from this analysis due to the inability to accurately capture the appropriate procedure screening codes, as well as largely missing data on smoking history. Patient demographic information in HDC-SD reflects patient information at the time of data extraction. Patient information, such as urban or rural residency, may have changed during this time and may not accurately represent all possible barriers to care experienced in the study timeframe.

Another significant limitation is that data uploaded from outside health systems within the EPIC Media tab cannot be captured in HDC. Consequently, we are unable to potentially capture some health care utilization, immunizations, or procedures completed outside of the UCHealth system unless such data were available through the CIVHC APCD. Finally, emergency department visits in this first iteration were limited to the UCHealth system; the UCHealth system includes locations across the state but does not represent all locations in which a patient might have visited for care.

Future research and development

The initial creation of the HDC-SD will continue to serve as a foundation for further inquiry. After refining the data collection and extraction process, future research should explore outcomes associated with patient sociodemographic and disease characteristics, as well as health maintenance behaviors. Additionally, EHR and loco-regionally available data warehouses have the opportunity to provide data (such as laboratory results and medications) which could be used to understand more about other chronic condition histories and risk factors in individuals with histories of cancer. The US Preventive Services Task Force (USPSTF) guidelines and recommendations will be used to define and inform specific laboratory results analyzed in future research. Finally, future research will benefit from additional years of data as health care utilization may be examined over longer periods of follow-up.

Implications for cancer care clinicians

These results signal a call to action for clinicians, public health officials, and policy makers. Additional attention should be directed towards increasing the overall number of screening tests performed. Clinicians understand the important role of screening to detect new malignancies and conditions early, as well as the important role of detecting reoccurrences early. Early detection contributes to timely intervention. Integrating automatic messages for both clinicians and patients regarding procedures, labs, and immunizations may serve as timely reminders promoting scheduling and completion.

The results also indicate the need for more diverse partnerships with and among clinicians and organizations across Colorado. The lack of documented cancer screenings and immunizations in rural areas compared to the urban population highlights opportunities for more research to identify the barriers this group experiences and how to overcome those. This team seeks to collaborate with groups such as the Colorado Cancer Coalition, the Colorado Rural Health Center, and the University of Colorado Cancer Center Office of Community Outreach and Engagement to improve training, outreach, and delivery of services for people with histories of cancer in rural parts of Colorado. This team also seeks to collaborate with loco-regional initiatives such as those conducted by the American Cancer Society, the Cancer Prevention and Control Research Network at the Colorado School of Public Health, and the State Network of Ambulatory Care Practices (SNOCAP) to ultimately improve care and outcomes across Colorado’s diverse population.

Furthermore, oncology and primary care clinicians should receive training and engage in conversation regarding best practices for improved and more accurate data entry within the EHR. Collaboration between clinicians and those involved with the EHR data infrastructure should include designing discrete fields to enter important dates and indicators of items such as procedures, medications, and laboratory tests to replace free-text fields. These discussions may help improve data accuracy and integrity while also minimizing the burden on clinicians and researchers.

The findings from this study demonstrate the importance of this novel database and its ability to describe diverse survivorship programs. This work may provide the framework for future research looking to describe health care utilization among specific populations and help identify strategies to bridge service gaps. It is the hope that research using these types of data may ultimately improve patient outcomes across health care systems.