Introduction

The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) is a population based cancer database that currently covers approximately 34.6% of the United States population [1]. It was started in 1973 with 9 registries but has expanded to include data from 21 registries as of 2019. Registries are carefully selected to be representative of the country as a whole with oversampling of certain minority populations to allow adequate sample size for analysis [2]. The SEER database is a widely used resource for studies of cancer incidence, mortality, outcomes, treatment patterns, and disparities [3, 4].

Relevant to this study, the SEER database is commonly used to study rural–urban cancer disparities. When linked to Medicare claims data, the SEER-Medicare database is also used to study access to care issues such as travel distance [5, 6]. However, there are limited data on how the rural population covered in the SEER database compares to the broader US rural population, and this is important for understanding generalizability of these studies. We do know that on the whole the SEER population is more urban than the non-SEER population from a recent study by of the SEER-18 database [7]. However, no studies specifically examine how the rural population covered in SEER represents the broader US rural population. SEER registries are not randomly chosen, so it may not represent the considerable heterogeneity of the US rural population with respect to demographic/socioeconomic measures and health behaviors [8, 9]. The primary objective of this study is to compare rural areas included in the SEER database with rural areas in non-SEER areas with respect to demographics, socioeconomic measures, health behaviors, health access, and cancer incidence.

Methods

Study population

We examined demographic, socioeconomic, health behaviors, and cancer incidence measures from all counties in the United States (n = 3,142) and compared travel times from their respective populated census tracts (n = 72,615) in the 2010 census.

Definition of rurality, SEER and non-SEER areas

The primary exposure was rural county/census tract SEER participant. The most recent SEER-21 data release contains cancer registries covering the entire state for 13 states (Utah, New York, New Jersey, New Mexico, Massachusetts, Louisiana, California, Kentucky, Iowa, Idaho, Hawaii, Georgia and Connecticut), as well as data from 13 counties in the Seattle-Puget Sound registry in Washington and from three counties in the Detroit metropolitan registry in Michigan [10]. The counties from these registries, as well as their corresponding census tracts, were considered SEER areas. The remainder of the geographic areas contained within the 50 states of the United States were designated non-SEER areas. For the purposes of this study, geographic areas for which SEER data only include data from Native American tumor registries, rather than from the entire population, (Alaska, Arizona and Oklahoma) were also considered non-SEER areas.

Rurality was defined at the county level using the 2017 Office of Management and Budget’s (OMB) guidelines [11]. Metropolitan counties were considered urban while micropolitan and non-core counties were considered rural. Metropolitan areas are defined by a core urban county (or counties) of 50,000 or more people, along with the surrounding counties that are highly economically and socially integrated, while micropolitan areas have a similar definition but for a core urban population of 10,000–49,999 people. Those not designated as either metropolitan or micropolitan are non-core [12]. Rurality was additionally defined at the census tract level using secondary rural urban commuting area (RUCA) codes [13]. RUCA codes are based on the OMB guidelines but also incorporate daily commuting flows. They consist of 10 primary codes which refer to the primary commuting destination but are further subdivided into 21 secondary codes to also indicate the secondary commuting destinations [13]. We used the secondary codes to categorize rurality into four levels (urban, large rural, small rural, and isolated rural) based on previous recommendations [14].

County level demographic, socioeconomic and health access measures

The 2018–2019 county level Area Health Resource File (AHRF) was used to obtain county level demographic, socioeconomic and health access measures. The AHRF compiles these measures from over 50 data sources. The latest available year across measures in the AHRF may differ depending on the latest year available in the source data. For our purpose, the most recent year was used in order to estimate the most updated representation of the populations in each area. Thus, the majority of data comes from 2017, but for some measures the data are older. Factors that were analyzed for each county included age distribution (latest year 2016), education (2013–2017 percent less than high school, at least high school, and at least college educated), unemployment rate, per capita income, percent in poverty, and percent under 65 who are uninsured. The number of primary care physicians, surgeons, internal medicine subspecialists, and radiation oncologists in each county were used to calculate physician density.

Health behaviors

The 2019 County Health Rankings (CHR) data source, published by the Robert Wood Johnson foundation and the University of Wisconsin, was utilized to obtain county level measures health behaviors (percent tobacco users, percent obese, percent excessive drinking, and percent physically inactive). The CHR compiles county level data from a variety of national data sources. Tobacco and alcohol use rates in 2019 CHR are compiled from the 2016 Behavioral Risk Factor Surveillance System (BRFSS), while obesity and physical inactivity rates are compiled from the 2015 Centers for Disease Control (CDC) Diabetes Interactive Atlas [15].

Cancer screening, incidence and mortality

County level cancer screening and incidence rates were obtained from the National Cancer Institute (NCI) and CDC’s State Cancer Profiles tool. Data from the BRFSS and National Health Interview Survey (NHIS) are combined to provide a county level modeled estimate of screening rates from 2008 to 2010 (latest time period available) [16]. The rate of breast cancer screening (defined as mammography within the past 2 years for women over 40 years old), and the rate of colorectal cancer screening (defined as ever having colorectal endoscopy or having fecal occult blood testing within the past 2 years) were analyzed.

County level age-standardized cancer incidence rates were also captured using the same tool, which uses the cancer incidence rates from the National Program of Cancer Registries Cancer Surveillance System and the SEER program [17]. All cause average cancer incidence for the latest 5 years (2012–2016) for both sexes and all ages were evaluated. Incidences were analyzed overall by SEER and non-SEER rural areas and then further stratified by race/ethnicity (White, Black, Hispanic, Asian). The states of Kansas and Minnesota were excluded from this analysis as they do not allow release of county level cancer incidence data. Additionally, if a county had fewer than 16 cases during the time period or the number of persons in a county of a particular race were low, then the county level estimates were suppressed to protect confidentiality [18].

Travel distance

To measure travel time to care, Commission on Cancer-certified (CoC) facilities (National Cancer Institute Designated, Academic, Integrated Network, Comprehensive Community, and Community Cancer Program) (n = 1,185) were geocoded using Google maps. The geographic centroid of each census tract was created using ArcGIS (Environmental Systems Research Institute, Redlands, California). Network travel time was calculated by determining the fastest route (by time) between each census tract centroid and the nearest CoC facility using Network Analyst (Environmental Systems Research Institute, Redlands, California) and US road network data (2012). Some pairs had the same origin and destination; therefore, travel time was recorded as 0 and is underestimated in those tracts.

Statistical analysis

The population adjusted county level means for each measure above were calculated and compared across SEER and non-SEER rural areas. A sub-analysis was performed by stratifying the rural group into its component micropolitan and non-core categories to compare measures across SEER and non-SEER larger rural areas with more economic integration (micropolitan) and those that are isolated (non-core). Given the large numbers of counties, statistical significance would be reached with even small differences. Therefore, we chose to calculate standardized differences between the groups to evaluate their differences. This method uses the difference in means between two groups divided by the pooled standard deviations [19]. The result is the difference in terms of standard deviation units and is often called the “effect size.” This measure is less sensitive to sample size and has been used in similar studies in the past [7]. There is no agreed upon definition of a meaningful effect size, but standard differences < 0.1 are generally considered negligible [20]. Therefore, values of 0.1 or greater were considered meaningful differences between groups.

Median travel times by SEER and non-SEER tracts are reported overall, by RUCA rural categories, and by census region. Linear regression was used to measure the overall effect of SEER status on travel time while controlling for census region (Northeast, South, Midwest, West) given that CoC facilities are not evenly distributed in the United States, and travel time can vary significantly by region [21]. Census division was not used due to small or empty cell sizes in some divisions. The regression was also weighted by 2010 census tract population. This was performed first by using census tracts from all RUCA rural categories (large rural, small rural, and isolated rural) combined and then by stratifying each rural RUCA category. A sensitivity analysis was performed in which travel time was log-transformed and coefficients were exponentiated to give a meaningful interpretation (reported as percent change in travel time). An additional sensitivity analysis was performed by omitting the state of Alaska and repeating the regressions. All statistical analyses were performed in STATA (v16.0). This study was deemed exempt by the University of North Carolina Institutional Review Board as only deidentified secondary data sources were used.

Results

Rural designations

Overall, 80% of counties in SEER areas (584 of 732), and 79% of counties (1898 of 2410) in non-SEER areas are rural. Yet, the population of these rural counties represent only 8.5% of the SEER population and 17.1% of the non-SEER population.

Demographic and socioeconomic measures

Table 1 displays results of the county-level rural SEER and rural non-SEER comparison. Overall, there were no meaningful differences in the population adjusted age, race, and sex distributions between rural SEER and rural non-SEER counties. The only difference on subgroup analysis was that SEER micropolitan areas had a lower proportion of white persons compared to non-SEER micropolitan areas (72.9% vs 78.3%, St. Diff =  − .126). There were no meaningful differences in educational attainment, unemployment, per capita income, or percent of persons in poverty between rural SEER and rural non-SEER counties.

Table 1 Comparison of demographic, socioeconomic, health behavior and health access measures among rural SEER and rural non-SEER counties

Health access and health behaviors

Table 1 shows additional measures of health access and health behaviors. The proportion of uninsured patients under 65 was slightly higher in non-SEER counties (9.6% vs 11.9% in rural SEER counties). There were no differences in any provider density measures, rates of mammography screening (67% for both groups), or rates of colorectal cancer screening (61% for both groups). Rates of tobacco smoking (18.1% vs. 18.7%), obesity (30.6% and 32%), physical inactivity (25.5% vs 26.8%), and excessive drinking (16.3% vs 16.5%) were all also similar between rural SEER and non-SEER areas, respectively (Table 1).

Cancer incidence and mortality

Table 2 shows the all cause cancer incidence in rural SEER and non-SEER counties. The number of included counties for each group are listed as some counties were suppressed due to low number of cases. Overall incidence was similar between groups (447.1 and 444.4 per 100,000; St. Diff = 0.062 in rural SEER and rural non-SEER, respectively). However, when stratified by race, minority groups (Hispanic ethnicity 376.9 vs 321.8, SD: 0.75; Asian 291.2 vs 255.7, SD 0.78) had higher incidences in rural SEER areas in general (Table 2). Three-year average percent deaths due to cancer were 21% and did not differ between rural SEER and rural non-SEER counties (Table 2).

Table 2 County level cancer incidence and mortality rates in SEER and non-SEER counties

Travel time analysis

Figure 1 shows a broad overview of travel time from each census tract to nearest CoC facility. Of 72,615 census tracts, all but 125 tracts mapped to the nearest facility. Most of these were remote census tracts in Alaska and Montana. Other tracts in Hawaii did not match as there were no complete road networks to the nearest CoC facility. Overall, 91.4% of the 2010 population lived within 60 min of a CoC facility. In all SEER areas, 93.8% live within 60 min of a CoC facility and 90.0% in non-SEER areas. A prolonged travel time of > 2 h was seen for 2.0% of the SEER population and 2.5% of the non-SEER population. The median number of CoC facilities was similar between SEER and non-SEER areas (15.5 vs 16 in SEER and non-SEER areas, respectively).

Fig. 1
figure 1

Travel time to nearest Commission on Cancer Certified facility by census tract in SEER and non-SEER areas. Blue areas are included in the SEER registry while green areas are not. Darker colors represent shorter travel times. Tracts that are white did not match in the network analysis or were unpopulated

Table 3 shows median travel times in SEER and non-SEER areas stratified by census region and level of rurality. When examined generally, median travel time was similar in rural SEER and rural non-SEER counties (52.6 vs 54.3 min, respectively). When stratified by region, median travel times were shorter in rural SEER regions except for the Midwest, where travel time was longer for rural SEER population compared to rural non-SEER (65.7 vs 47.1 min, respectively). As expected, median travel times increased as tracts increased in rurality.

Table 3 Median travel times by region and rurality in SEER vs non-SEER census tracts to nearest commission on cancer certified hospital

Table 4 shows the results of the linear regression analysis. When controlling for regional variation, SEER areas had a lower travel time overall and by all levels of rurality assessed. Rural SEER tracts on average had a 14.8 min (95% CI − 11.6,− 18.0) shorter travel time than rural non-SEER tracts. The largest reduction in travel time was in small rural SEER towns compared to the small rural non-SEER towns (16.3 min shorter; 95% CI − 9.1,− 23.6). In the sensitivity analysis using a log-linear model, similar trends were observed. Rural SEER tracts had an 11.2% reduction in travel time compared to rural non-SEER tracts (95% CI − 8.1%,− 14.3%). In our additional sensitivity models without Alaska, effect sizes were smaller, but still significantly lower, in both our linear and log-linear models (Table 4). Rural SEER tracts had an average of 8.4 min shorter travel time compared rural non-SEER tracts (95% CI − 6.5,− 10.4) in our linear model and an estimated 10% shorter travel time in rural SEER compared to rural non-SEER tracts (95% CI − 6.8%,− 13.1%) in the log-linear model.

Table 4 Regression model showing effect of SEER region on mean travel time

Discussion

Travel time to cancer care is increasing over time: a recent analysis showed a doubling in the number of patients who travel more than an hour to the nearest cancer hospital from 2005 to 2015 [22]. Regionalization to high volume centers, particularly for surgical care, is likely a major factor underlying this trend [22, 23]. Given the well-documented relationship between higher volume and improved complex cancer surgery outcomes, shifting to care to high volume facilities will improve outcomes on a population level [24,25,26]. But this comes with a cost of increasing travel for some patients, particularly rural patients, and prolonged travel may be an insurmountable barrier for vulnerable populations at many points in the cancer care continuum [27,28,29]. To study issues of travel distance or travel time in cancer care, the SEER-Medicare database provides a large cohort where travel distance/time can be calculated. However, no study has examined whether travel times are comparable for SEER and non-SEER areas. In general, our results are in line with other studies showing increasing travel with increasing rurality, with significant variation by region of the country [21, 30]. While our estimated 15-min reduction in average travel time for rural SEER and non-SEER areas is modest (and even lower when Alaska is excluded), the implications of these findings are that to the extent that travel time is a barrier to care for rural patients, the rural areas covered by the SEER-Medicare database may underrepresent that burden in comparison to the non-SEER rural areas. This is important because increased travel burden may be an important mediator for rural–urban cancer disparities [31].

There are only a few previously published analyses comparing sociodemographic factors between the entire (rural and urban) SEER and non-SEER populations. Two of these differ significantly from our study due to the use of older population data and the inclusion of fewer registries in the SEER database at the time of the study [2, 32]. A more recent 2016 study by Kuo et al. [7] used the SEER-18 database in conjunction with the AHRF and 2010 census data and found the SEER population was younger, more educated, and more urban than non-SEER population. SEER areas had a higher representation of racial minorities, which is expected as registries are included in SEER based on their ability to represent these populations, among other factors. The study also found that non-SEER counties were more affluent as defined by lower levels of unemployment and fewer persons living in poverty than SEER counties. In our study, we specifically focused on the rural population. Having data that compare rural population measures is important because the social determinants of health may partly explain worse cancer outcomes in the rural population [33]. We found the rural SEER areas to be largely representative of the rural non-SEER population. We found similar age and racial distributions, as well as similar socioeconomic and health access measures, even with stratification of rural areas by micropolitan and non-core areas.

The county level incidence rates in our study come from the US Center for Cancer Statistics (USCS), which draws data from both the SEER database and the National Program of Cancer Registries (NPCR)—a population-based cancer surveillance program funded by the Centers for Disease Control and Prevention consisting of state cancer registries included and not included in the SEER database [34]. The USCS data covers the entire US population [35]. The previous analysis by Kuo et al. used 2004–2009 USCS incidence data to compare breast and colorectal cancer incidence in SEER and non-SEER counties. They found similar cancer incidence rates in SEER and non-SEER counties across all racial/ethnic sub-groups [7]. In our analysis restricted to rural cancer incidence rates for rural counties from 2012 to 2016, we found overall similarities in cancer incidence between rural SEER and non-SEER counties. When stratified by race, we found lower incidences for White, Hispanic, and Asian patients in rural non-SEER areas, in contrast to what was found in the study including both rural and urban counties. Our findings may differ for several reasons. These studies used overall population incidence where we analyzed publicly available county level incidence data. Therefore, in years with low numbers of cases for a particular racial group, certain counties were suppressed and not included in our analysis. Additionally, since SEER registries are selected based on quality and the ability to represent minority populations, we may expect a lower number of cases captured in the non-SEER registries. Finally, our analysis uses more recent data (2012–2016 as compared to 1999).

Our analyses have several limitations. The cancer incidence comparisons may be limited due to suppressed counties, particularly for Asian patients in rural counties (78% of rural counties suppressed for this racial group). Other racial minorities had better representation (53% and 66% of rural counties suppressed for Black and Hispanic patients, respectively) and the number of suppressed counties was similar for both SEER and non-SEER groups. Travel time was measured to nearest CoC hospital. CoC hospitals report cancer cases to the National Cancer Database (NCDB) and the NCDB has been previously shown to capture approximately 70% of all cancer cases in the US [36]. However, there may be closer hospitals offering cancer services for some census tracts, particularly rural census tracts, that are not CoC certified and would not be accounted for in our analysis. We also could not account for patient insurance or preferences for certain providers/hospitals. Therefore, the hospitals that each tract mapped to may not be where patients actually receive their care. However, our objective was to measure geographic availability of services.

The SEER database is a vital and high-quality resource for the study of cancer in the United States. The rural SEER areas are representative of the broader rural population in the US across many demographic, socioeconomic, health behavior, and health access measures. Still, there are some differences that remain, particularly with respect to travel time. The SEER-Medicare database may underestimate rural travel time to the nearest specialized cancer center compared to rural non-SEER areas. Researchers should be aware of this potential bias when generalizing studies of travel distance using the SEER-Medicare database.

Availability of data and material

All data sources (U.S. Census data, County Health Rankings, Area Health Resource File, Centers for Disease Control State Cancer Profiles Tool, Commission on Cancer certified facilities, and National Program of Cancer Registries Cancer Surveillance System) are publicly available.