Introduction

The Surveillance, Epidemiology, and End Results (SEER) program, originally funded by the National Cancer Institute (NCI) in 1973, collects information of cancer incidence and survival from strategically selected cancer registries of the USA. The SEER program began by including nine registries and has expanded over time to now include 18 registries. Currently, SEER registries cover 28 % of the US population [1]. SEER registries have been the foundation for decades of population-based cancer-related research. In addition, SEER–Medicare linked databases are also widely used for cancer studies related to the population age 65 and older [2]. These data have been available for decades to researchers under certain data use agreements and have been used in numerous peer-reviewed studies.

By contrast, fewer researchers have used the more recently available United States Cancer Statistics (USCS) database of cancer registries that combines data from the National Program of Cancer Registries (NPCR) and SEER registries. The NPCR was funded by CDC since 1992 to support central cancer registries and covers 96 % of the US population including 45 states, the District of Columbia, Puerto Rico, and the US Pacific Island Jurisdictions [3]. The combination of databases from SEER and NPCR collects data from the entire nation that provides a single, pooled-state database of reconciled, comparable cancer information geocoded at the local level to facilitate cancer control planning and evaluation [4]. Using these data, CDC and NCI, in collaboration with the North American Association of Central Cancer Registries (NAACCR), certify registries that meet high data quality criteria and jointly publish official federal statistics on cancer incidence from registries that meet the USCS publication standard [5]. This comprehensive database is now available inside National Centers for Health Statistics (NCHS) and Census Research Data Centers (RDCs) to qualified researchers [6]. This combined database has been used by only a few researchers because it has not been broadly available until recently [7].

With the newly available USCS database, it is important to know if studies using this database produce similar results to the estimates from the widely used SEER data, which is a subset of it. In preparing the joint publication from NCI and CDC for the official federal cancer statistics, Wingo et al. [4] used USCS data in 1999 to compare the cancer incidence rates from SEER and NPCR data. They found that cancer incidence rates for specific sites varied by registries and that the SEER under-represented rates for all sites combined. They also found that the SEER over-represented the rate for breast cancer (BC) but under-represented the rate for colorectal cancer (CRC).

In addition, Henley et al. [7] used the USCS data before they became publicly available to examine cancer incidence during 2004–2006, for persons aged >50 at time of diagnosis. They used the SEER summary stage [8] to define late stage as including both regional and distant stages. The study examined two outcomes: age-adjustment incidence rate per 100,000 of the general population and proportions of cancer cases that were diagnosed at late stage. Henley et al. [7] found that age-adjusted incidence rates of late-stage cancer differed by age groups and by race or ethnicity. Rates of late-stage CRC incidence increased with age and were highest among black men and women. Late-stage BC incidence rates increased with age only through age 79 and were highest among the age 60–79 group and black women. The study also showed that roughly a third of new BC cases and roughly half of CRC cases were diagnosed at a late stage in the USA between the years 2004–2006. Similarly, Mobley et al. [9] using SEER registry cancer population data from 11 states (CA, CT, GA, IA, KY, LA, MI, NJ, NM, WA, and UT) over 2000–2005 found comparable results for the proportions of late-stage BC and CRC. Specifically, Mobley et al. found that proportion of late-stage CRC ranged from 49 to 58 %, while BC ranged from 26 to 33 % across the 11 SEER registry states.

In addition, other studies compared cancer mortality rates between US and SEER registries [10, 11]. For example, Frey et al. [10] used the age-adjusted mortality rate at the county level in 1975–1988 and compared race-, sex-, and cancer site-specific trends and levels between the SEER areas and the entire nation. Overall, although most of the trends and levels of mortality rates were similar between the two areas, some comparisons showed that SEER coverage areas were not representative of the US population. Specifically, SEER areas over-represented BC mortality rates in black female populations and under-represented CRC mortality rates for both white male and female populations. Using more recent data (i.e., 1992–2000), Merrill and Dearden [11] also found that SEER registries under-represented US cancer mortality for CRC in white males and females and black males. In general, they found SEER tended to under-represent the US mortality in several cancer sites and across the majority of the SEER registries, with the worst under-representation in Utah and New Mexico.

The geographic areas in the SEER registries were strategically selected by the SEER program to ensure a high-quality cancer reporting system and to include epidemiologically significant population subgroups [1]. What is not known is the extent to which SEER regions are representative of the nation in terms of population demography and geographic socioeconomic characteristics. This is the main contribution of this paper, which is important because such factors have been demonstrated to relate strongly and significantly to cancer incidence and stage of diagnosis [9, 1215].

Two rather dated studies have addressed the socioeconomic and demographic representativeness of the SEER population. One focused on the general population [16] while the other focused on the elderly population [2]. Nattinger et al. [16] used 1990 data to examine the generalizability of SEER for factors relevant to epidemiologic and healthcare research by classifying 198 counties as the SEER coverage areas and 2882 as the non-SEER areas. They found the SEER areas to be more urban, more affluent, and with lower unemployment rates than the USA as a whole.

The Warren et al. [2] study compared socio-demographic characteristics between persons 65 years and older in the SEER coverage counties and the US population age 65 and older, using 1990 census data. They found that the age and sex distributions for individuals 65 years and older in the SEER areas were comparable with that of the US elderly population. However, the SEER areas had a lower proportion of white persons and a higher proportion of persons of other races (persons who were not white, African-American, or unknown race). In addition, these older persons in the SEER areas were much more likely to have resided in an urban setting, a more affluent area, or an area with higher managed care penetration among the Medicare enrollee population than the average US residents age 65 years and older.

Our paper seeks to make an important contribution, because the two studies regarding the representativeness of the SEER registry populations and their geographic settings are quite dated [2, 16]. Many socioeconomic and demographic changes have occurred in the USA over the past 20 years. According to the US Census Bureau the percentage of white population decreased from 80.3 % in 1990 to 72.4 % in 2010, whereas the percentage of black population only increased slightly (12.1 to 12.6 %) and the percentage increased from 7.6 to 12.1 % for the other nonwhite race populations [17]. The entire population is also aging, with an increasing percentage of older people and a decreasing percentage of younger people as time passes [18]. Also, there was a decline in the middle class with the Great Recession during 2007–2009, which affected the mix of socioeconomics across the nation. In addition, the SEER registries have expanded to include all of Georgia and California as compared to including only the Atlanta and San Francisco areas in these states, which was the SEER coverage for the study by Nattinger et al. [16]. Furthermore, the states of Kentucky, Louisiana, and New Jersey also joined the SEER program after 2000. Thus, the SEER areas today include many more rural areas and minority populations than the earlier SEER areas. What is not known is whether the cancer population data obtained from SEER registries today are still generalizable and representative of the entire US cancer population, and whether the geographic settings of the SEER areas are representative of the USA, in terms of population demographics and socio-ecological factors. Finally, it is well documented that the SEER registries were originally selected for better data quality as compared to the other cancer registries. Over the years CDC, NCI, and NAACCR have worked together to help all cancer registries achieve standards that reflect high-quality data. It is unknown to what extent the data quality differs between SEER and non-SEER registries in current data.

The purpose of this study is to use the newly available USCS database and stratify the data into SEER areas and non-SEER areas, to determine to what extent the SEER registries are generalizable to the nation, and in which dimensions they are not fully generalizable. Specifically, this paper provides an update and current comparison of the cancer cases, demographics, socioeconomic factors, and data quality in SEER areas as compared with the rest of the USA. For the cancer case comparison, we examined the comparability of late-stage cancer diagnosis, focusing on BC and CRC, the two cancers that are preventable through screening but remain among the top four most commonly diagnosed cancers in the USA [19]. We also compared population demographics and socioeconomic factors in SEER and non-SEER areas. For dimensions of data quality, we focus on the key components of (1) completeness of case ascertainment and (2) missing data on race, sex, and age. This study answers the following questions: Are the SEER registries representative of the US cancer population in terms of late-stage BC and CRC diagnoses? Are they representative of the current socioeconomic and demographic characteristics which underpin the observed health outcomes in national statistics? Is the data quality comparable between SEER registries and the rest of the registries in the USA?

Methods

Study sample

We obtained data from the USCS Restricted Access Dataset and selected people of all ages with a first breast or colorectal cancer diagnosed in 2004–2009. BC was identified by the third edition of International Classification of Diseases for Oncology (ICD-O-3) code of C50, and CRC was identified by the ICD-O-3 codes of C18, C19.9, C20.9, and C26.0 [20]. The dataset was prepared by NCHS, whereby BC or CRC that is not the primary cancers, records with unknown cancer stage or unstaged cancer in the SEER summary stage variable, and missing data in age, race, or gender were excluded. From this dataset, we further excluded records when diagnosis was by autopsy or from death certificates (<1 % of all cases). For BC, we also excluded males. Using the SEER summary stage 2000 variable provided in the USCS database, we coded regional or distant diagnosis as late stage, and in situ or localized diagnosis as early stage. We then created a binary indicator for each individual specifying whether their cancer was diagnosed at a late stage or not.

Definition of SEER and non-SEER areas

Three states (Kansas, Maryland, and Minnesota) were not present in the USCS database over the timeframe of this study. We excluded an additional state, Virginia, because data were not available until 2007. Among the included 46 states, 10 are completely covered by SEER registries (HI, CA, UT, NM, IA, KY, GA, CT, NJ, and LA) and two others (WA and MI) have partial coverage over some counties by SEER registries (including Clallam, Grays Harbor, Island, Jefferson, King, Kitsap, Mason, Pierce, San Juan, Skagit, Snohomish, Thurston, and Whatcom counties in WA; Macomb, Oakland, and Wayne counties in MI). To provide a clear comparison for the population characteristics between SEER and non-SEER areas, we focused on the SEER areas that represent entire populations; thus, the SEER registries in Alaska, Oklahoma, and Arizona that cover native Americans and Alaska Natives only were not counted as ‘SEER areas’ in our study. Therefore, we define SEER areas as the above-mentioned 10 SEER states and the counties of WA and MI that are covered by SEER registries. The rest of the nation’s counties were defined as non-SEER areas. Figure 1 shows the geographic areas of the SEER and non-SEER areas in this study. We assigned cancer patients to the appropriate group based on their county or state of residence at the time of diagnosis.

Fig. 1
figure 1

SEER areas and non-SEER areas in the USA

To assess the representativeness of SEER registries for the nation, we first compared the similarity of cancer burden between SEER and non-SEER areas. Specifically, we compared the proportions of late-stage BC and CRC diagnosis between SEER and non-SEER coverage areas, across subgroups of age and racial or ethnic groups. We defined five age groups (age <40, 40–49, 50–64, 65–74, and 75+) and five racial or ethnic groups (white, African-American, Asian, Hispanic, and other). We also provided comparisons for each racial or ethnic group by the five age groups.

Next, we assessed the comparability of the underlying population demographics and socioeconomic characteristics measured at county level, using census 2000 and 2010 data. We identified 612 counties in the SEER areas and 2529 (or 2531 in year 2010) counties in the non-SEER areas. We then compared county-level population demographic information, poverty, health insurance, educational attainment, unemployment rate, and percent of population living in urban areas, between SEER areas and non-SEER areas. Demographic data in 2000 were obtained from the Area Health Resource Files (AHRF), whereas the 2010 data were obtained from census 2010; poverty data were from the Small Area Income and Poverty Estimates (SAIPE); health insurance information was from the Small Area Health Insurance Estimates (SAHIE). Data for educational attainment and the unemployment rate in 2000 were available in the AHRF, whereas these variables were not available from AHRF for 2010. For the comparison of these two variables, we used the similar estimates from the 2005–2009 American Community Survey (ACS) five-year estimates at county level.

For the data quality comparison, we obtained Data Quality Indicators published from Cancer in North America (CINA) for cancer data from 2008 to 2012 [21] to compared case completeness and missing data on race, sex, and age variables. Registries that did not meet publication criteria as indicated in the CINA data were excluded from our analysis (Minnesota in 2008–2012, Arkansas in 2010, and Nevada in 2011–2012 were excluded from our analyses). Because the data quality criteria were evaluated annually, we compared the data quality between SEER and non-SEER areas separately for each year from 2008 to 2012 (the most recent data). NAACCR defines the case completeness indicator for each registry as the percentage derived from dividing observed age-adjusted cancer incidence rate by the expected age-adjusted incidence rate, which is a weighted summation of race-specific case completeness ascertainment for white and black races [22]. Missing data on race, sex, and age were defined as the percentage of the data fields that have missing or unknown values.

Statistical analyses

We performed three sets of comparisons, and the analytic units are different for the comparisons. The analysis concerning the difference in late-stage BC and CRC between SEER and non-SEER areas was based on person-level data; the comparison of area-level socio-demographics was based on county-level data; the comparison of registry data quality was based on registry-level data. For the analyses concerning late-stage BC and CRC and area-level socio-demographics, the sample sizes are large and reach statistical significance in conventional tests even with very small differences. Instead of using statistical significance tests to evaluate the differences between SEER and non-SEER areas, we used the ‘standardized difference’ to quantify the similarity (or differences). The standardized difference approach takes the difference in the variables of interest between the SEER and non-SEER areas and divides it by the pooled standard deviation of the two groups (see the equations below). Thus, the standardized difference estimate is expressed per unit of standard deviation, which is scale-free and less sensitive to sample size. The standardized difference is often called the ‘effect size’ in social science and has been widely used in comparative effectiveness research of medical studies to compare the balance of treatment and control groups in order to assess treatment effect using observational data [23].

In this study, we applied the following formula to compute standardized difference for dichotomous variables (whether a person had late-stage cancer diagnosis or not) in Eq. (1) and continuous variables (county-level characteristics and data quality) in Eq. (2).

$${\text{Standardized}}\, {\text{difference}}\, {\text{between two proportions}} = \frac{{\left( {P1 - P2} \right)}}{{\sqrt {\left( {P1\left( { 1 - P1} \right) + P2\left( {1 - P2} \right)} \right)/2} }}$$
(1)

where P1 is the average proportion of late-stage cancer diagnosis for people in SEER areas and P2 is the same measure for people in non-SEER areas.

$${\text{Standardized}}\, {\text{difference}}\, {\text{between}}\, {\text{two continuous}}\, {\text{variables}} = \frac{{\left( {\overline{{x_{1} }} - \overline{{x_{2} }} } \right)}}{{\sqrt {{{\left( {s_{1}^{2} + s_{2}^{2} } \right)} \mathord{\left/ {\vphantom {{\left( {s_{1}^{2} + s_{2}^{2} } \right)} 2}} \right. \kern-0pt} 2}} }}$$
(2)

where \(\overline{{x_{i} }}\) is the average score and s 2 i is the variance, for SEER areas and non-SEER areas indicated as i.

Because the samples of SEER areas and non-SEER areas differ in size, the pooled standard deviation (i.e., denominator of Eq. 2) can be estimated using \(\sqrt {\frac{{\left( {n1 - 1} \right)s_{1}^{2} + \left( {n2 - 1} \right)s_{2}^{2} }}{n1 + n2 - 2}}\) where ni is the sample size for SEER and non-SEER groups.

Although the standardized difference provides an intuitive estimate for comparability between groups, there is no consensus about what threshold of standardized difference indicates imbalance or meaningful difference. An absolute value of the standardized difference of <0.1 (an effect that is <1/10 of a standard deviation in the covariate of interest) is often suggested by researchers to indicate a negligible difference in proportions or means between groups [23]. Therefore, we used the 0.1 criterion for the standardized difference as a cutoff point to evaluate our findings. All analyses were conducted using SAS software, version 9.3 (SAS Institute, Cary, NC, USA).

Results

We identified 1,151,096 cases of BC and 658,470 cases of CRC in the 46 states with data available in 2004–2009. The SEER registry coverage areas contained less than one-third of BC or CRC total cancer cases. Among the total cancer cases, 30.8 % of BC cases and 54.2 % of CRC cases were diagnosed at late stage. The proportions of late-stage diagnosis differed slightly between SEER and non-SEER areas for both cancers: 30.7 % in SEER areas and 30.9 % in non-SEER areas for BC; 55.2 % and 53.8 % for CRC, respectively.

The comparisons of the proportion of late-stage BC or CRC by age groups and by race or ethnicity are presented in Table 1. Among all the 105 comparisons, 10 have absolute value of standardized difference greater than 0.1, which are considered not comparable. Thus, in the vast majority of the comparisons, we found that the proportions of late-stage BC or CRC diagnoses were comparable between the cancer populations in the SEER and non-SEER coverage areas. The 10 non-comparable groups are only found in people with CRC. Specifically, for both males and females with CRC, the SEER coverage areas had a meaningfully lower proportion of late-stage diagnoses than the non-SEER areas for people in ‘other’ races. For this racial group, the pattern was found in all age groups except for age 40–49, 75 years and older males, and 65 years and older females. For females with CRC, we also found Asian and Hispanic women aged <40 had a meaningfully higher proportion of late-stage diagnosis in SEER areas than in non-SEER areas. For all women with BC, our data suggested that the proportion of late-stage diagnosis was comparable between SEER and non-SEER areas.

Table 1 Comparisons of proportions of late-stage CRC and BC diagnoses across SEER and non-SEER coverage areas, by race or ethnicity and age

Table 2 presents the results of the analysis of whether the demographic and socioeconomic characteristics were similar between the SEER and non-SEER areas in both 2000 and 2010. As shown in Table 2, overall we found the patterns (i.e., standardized difference between SEER and non-SEER areas) were similar and consistent in both 2000 and 2010. Using the 0.1 criterion as the cutoff for a meaningful difference, we found SEER areas had higher percentages of younger (<age 44) population than the rest of US areas did, in both years. There were also smaller percentage of whites and American Indians and a higher percentage of African-Americans and Asians in the general population in SEER areas, as compared to the non-SEER areas. SEER areas also had a higher percentage of the population in poverty, lower educational attainment, and higher unemployment rate, compared to non-SEER areas. For the comparison of health insurance, the SEER areas had a higher percentage of uninsured in 2000, compared to the non-SEER areas. However, the percentage of uninsured population increased and became equivalent in both areas by 2010. Similarly, we found a higher Medicare managed care penetration rate in SEER areas than in non-SEER areas in 2000, whereas this difference disappeared by 2010. Finally, the SEER areas had higher percentages of population living in urban areas than the non-SEER areas, in 2000 and 2010.

Table 2 Comparisons in county-level socio-demographic characteristics between SEER and non-SEER areas, in 2000 and 2010

As for the data quality, we found that case completeness ascertainment has meaningful difference between areas with SEER areas being better than non-SEER areas in every year (Table 3). The percentages of missing data in race were small (<4 %) and not different between SEER and non-SEER areas (data not shown). Overall, the average percentages of missing data in age and sex variables were almost zero in the majority of the registries (data not shown), and no difference was found between the two comparison areas.

Table 3 Mean and standard deviation of case completeness between SEER and non-SEER areas by year

Discussion

This paper updates some important comparisons between SEER and non-SEER areas, in several dimensions, to assess the current generalizability of the SEER Registries. First, using USCS data we examined the representativeness of proportions of late-stage BC and CRC diagnoses, by cancer population subgroups. We found males and females with other race (persons who were not white, African-American, Asian, or Hispanic) were less likely to be diagnosed with late-stage CRC in the SEER versus non-SEER areas. On the other hand, Asian and Hispanic females aged <40 were more likely to be diagnosed at late-stage CRC in SEER areas than that in non-SEER areas. No difference was found for females with BC between the SEER and non-SEER comparison areas. We concluded that the SEER registry areas slightly over-represent Asian and Hispanic female subgroups with CRC, slightly under-represent people of other race with CRC, and are not different across the vast majority of comparisons. In addition, our findings using cancer cases in 2004 to 2009 for proportion of late-stage BC and CRC were similar to results from Henley et al. [7].

One finding worth noting is the lower proportion of late-stage CRC for several age groups of men and women in the other race or ethnicity group in SEER areas as compared to non-SEER areas. The relatively lower proportion of late-stage CRC in the SEER areas may be partially due to the fact that American Indians and Alaska Natives (AI/AN) were excluded from SEER areas in our study design. Given the current structure of the SEER program that includes only AI/AN from Alaska, Arizona, and Cherokee Tribal Jurisdictional Service Area of Oklahoma, it is not possible to assess the difference between SEER and non-SEER areas for ‘other’ racial or ethnic group. Future study is needed to better understand the disparities of cancer burden with this racial or ethnic group. To accomplish this, we need the USCS database covering the entire nation, which would not be apparent in studies using SEER registry data.

The important focus of this study was the comparison of population demographics and socioeconomic factors between SEER and non-SEER areas. In these comparisons, we found that SEER areas had a higher percentage of younger people and lower percentage of white population than non-SEER areas, which is consistent with the comparison by Warren et al. [2] using 1998 registry data with 1990 census data for the elderly US population. By contrast, we found that SEER areas had higher percentages of poor and a lower average employment rate, which is inconsistent with findings by Warren et al. [2] and Nattinger et al. [16] from earlier time periods. Those studies reported that SEER regions were more affluent and had lower unemployment than the rest of the nation. This inconsistency between our study and earlier studies may be explained by the expansion of SEER registries in 2000 to include Kentucky, Louisiana, New Jersey, and Greater California, as these states (except New Jersey) have higher than national average poverty rates [24]. Furthermore, after this expansion, SEER areas still have more people living in urban areas than the rest of the nation, similar to the previous findings by Warren et al. [2] and Nattinger et al. [16].

Our data also showed that Medicare managed care penetration rate was higher in SEER areas in 2000 but equivalent in 2010, compared to non-SEER areas. This may be explained by considerable expansion in the managed care program across the US after implementation of the Medicare Prescription Drug, Improvement, and Modernization Act in 2006 [25]. The Act renamed the Medicare + Choice program the Medicare Advantage (MA) program and made it much more attractive to seniors by adding prescription drug coverage to all MA plans. Specifically, Medicare managed care penetration rate increased from 15 % in 2000 to 24 % in 2010 and continued to increase [26]. Our data showed that non-SEER areas had higher increase in Medicare managed care penetration rate than SEER areas, with the average increase of 10.7 % and 8.8 %, respectively. Beginning in 2006, more isolated rural areas of the USA had access to a Medicare managed care plan for the first time. Prior to this, Medicare managed care penetration was predominately an urban phenomenon. Because the non-SEER areas are less urban/more rural than the SEER areas, these non-SEER areas saw a greater expansion in Medicare managed care plans than the more urban SEER areas, during the time of our study.

As regard to data quality issues, our data indicate that SEER areas tend to have more complete data than the non-SEER areas in terms of case completeness. This is not surprising as SEER registries were originally selected for program inclusion because of their relatively higher data quality. However, the conclusion that non-SEER areas have lower data quality needs to be stated with caution. First, the measure of case completeness used in the quality assessment assumes the ratio of age-adjusted cancer incidence rate to age-adjusted mortality rate is constant across geographic areas within cancer sites, race, and gender [22]. In addition, the measure is based on two race groups, white and black, and ignores the population in all other racial or ethnic groups. Thus, the measure may not well reflect the true completeness of case ascertainment in both SEER and non-SEER areas. Although NAACCR uses several adjustments to make the measurement more accurate, other researchers have proposed a new method to evaluate the case completeness [27]. The new method, taking into account geographic heterogeneity and including more registry-specific data to predict expected incidence rates, showed an improved case completeness estimate than the method from NAACCR. Future research is needed to re-evaluate the differences in case completeness between SEER and non-SEER areas.

Although the USCS data cover the entire nation, our conclusions regarding the late-stage BC and CRC comparisons across SEER and non-SEER areas are based on only 46 of the 50 states (Kansas, Maryland, Minnesota, and Virginia were not included in our analyses). However, the state-level incidence rates of BC and CRC in these four excluded states are similar to the national average [28], so it is unlikely that our findings will be significantly different from other analyses using all states and had these states’ data been available.

In summary, using BC and CRC data our study demonstrated that SEER registries were not significantly different from non-SEER areas in terms of average age–race-specific proportions of late-stage BC or CRC, except for estimates of late-stage CRC for other race and young Asian and Hispanic women. Although case completeness was better in SEER areas than non-SEER areas, SEER areas had greater economic disadvantage and greater minority diversity among the population. Researchers using SEER data who attempt to generalize results to the entire nation need to be aware of these differences.

Our study also demonstrated several advantages to researchers from having available the entire USCS database in the RDCs. It allows researchers to study the entire cancer population of the US, with data quality comparable to the well-known SEER registries. The database which covers the entire nation also allows researchers to conduct geospatial analysis, and to examine local community or contextual predictors of late-stage diagnoses and other cancer outcomes. The database also allows better examination of racial and ethnic disparities in late-stage diagnoses of CRC or other cancers, which would not be possible by using SEER data alone. In addition, for rare cancers, USCS data have the advantage of providing a larger sample size than SEER data—about three times larger for BC and CRC. Larger sample sizes for rare cancers may allow for more rigorous analyses than are possible with SEER registry data alone. These rich opportunities for population-based research were not possible prior to the release of these RDC-enabled combined registry data.