Introduction

Minority race and ethnicity are associated with worse health outcomes compared to White non-Hispanic populations [1, 2]. For example, Black or African American (AA) children with Crohn’s disease, a form of inflammatory bowel disease (IBD), are 30% more likely to develop perianal fistulas and require surgery than White children [1, 3]. Black or AA children with end-stage kidney disease (ESKD) have lower access to transplantation, and worse survival rates [4,5,6,7]. Mortality among Black or AA infants with congenital heart disease (CHD) is 1.4 times higher than for non-Hispanic White infants [8, 9]. Hispanic ethnicity is also associated with a 2.6 times increased odds of mortality between palliative surgeries among children with hypoplastic left heart syndrome [8, 9]. The reasons for these disparities are often unclear, but are hypothesized to be barriers to health care, differential treatment, and systemic racism [10].

The electronic health record (EHR) houses patient data used for providing health care. Research studies using these data typically rely on sociodemographic information recorded in the EHR, which then serves as the foundation of understanding and reporting disparities. Disparity research is generally conducted under the assumption that this recorded information accurately represents the population of interest. Errors in recording sociodemographic information may lead to biased estimates of associations with race or ethnicity, and therefore can impede efforts to reduce health disparities. However, the fidelity of EHR-recorded race and ethnicity has not been adequately evaluated, particularly in pediatrics.

To have confidence in EHR-reliant disparity research, we investigated the accuracy of recorded race and ethnicity compared to patient/guardian self-report.

Methods

We included pediatric patients who were prospectively enrolled in 2 studies at the University of Michigan across 4 pediatric subspecialties, cardiology, transplant hepatology, nephrology, and pediatric gastroenterology (10/2014–1/2019). The first included children are those who underwent colonoscopy for either known or suspected IBD [11]. The second was a registry of pediatric solid organ transplant recipients (heart, liver, and kidney) [12]. In each, patients and/or guardians reported the patient’s race and ethnicity (from this point forward called “self-report”). Both studies queried race providing the same options (American Indian/Alaska Native, Asian, Black or African American (AA), Native Hawaiian/Pacific Islander, multiracial, or other). Both studies queried Hispanic or Latino ethnicity. The transplant study allowed text input to clarify “other” race. The transplant registry questionnaire was available in English and Spanish language, while the colonoscopy study was available only in English. Patients were not excluded for missing data.

We abstracted information from the Epic (Verona, WI) EHR. The method by which patient sociodemographic information was obtained for entry into the EHR is unknown for any given patient, and likely varied. The standard procedures at our institution involve patients being registered in person where registrar staff enters sociodemographic information into the EHR for patients who first present to the institution through the emergency department or at hospital admission. For others, this information is obtained by the registrar staff via telephone when scheduling the first clinic appointment at the institution. In both instances, the registrar staff are supposed to request self-reported race and ethnicity, but this is not consistently done. The reasons for not requesting self-report and frequency with which this occurs are unknown. The original source of sociodemographic information and route through which it was entered into the EHR is not recorded in the EHR, and therefore is unknown.

Clinical and sociodemographic information as recorded in the EHR were abstracted by a researcher blinded to self-reported information. We compared EHR data to the self-reported information obtained from the prospective studies described above. Self-reported information was considered the “gold standard.” We produced descriptive statistics. Cohen’s kappa was calculated to evaluate agreement between self-report and EHR data. This study was approved by University of Michigan’s Institutional Review Board.

Results

In total, 503 patients were identified (20% colonoscopy, 45% hepatology, 20% nephrology, 15% cardiology), 42% female, median age 12.8 years (interquartile range 7–16). The age distribution was bimodal, with peaks in the first year and at 16 years, reflecting different study populations. Ninety-six percent self-reported race (N = 484; 73% White, 16% Black or AA, 4% Asian, 5% multiracial, and 2% other). Eighty-two percent self-reported ethnicity (N = 410; 8% Hispanic/Latino, 74% non-Hispanic/Latino, and 18% missing). Among those who self-reported both race and ethnicity (N = 392), 337 (86%) self-reported as non-Hispanic White. Nineteen (4%) did not self-report race, 93 (18%) did not self-report ethnicity, and one person did not self-report both race and ethnicity. In the EHR, 8 people (2%) had missing race, and 11 (2%) had missing ethnicity data. There were no patients who were missing both EHR and self-reported data; however, 476 (95%) had both self-report and EHR-recorded race, and 400 (80%) had both self-report and EHR-recorded ethnicity.

Agreement between self-reported and EHR-recorded race was substantial (kappa = 0.77, 95% confidence interval [CI] 0.72–0.83; Table 1) [13]. Race was discordant among 10% (47 of 476). Among the patients who self-reported ethnicity and where EHR ethnicity was recorded (N = 400), EHR-recorded ethnicity generally had strong agreement with self-report (kappa = 0.77, 95% CI 0.65–0.89), while 4% (14 of 400) had discordant ethnicity information. Among those who self-reported Hispanic/Latino ethnicity and reported their race (N = 21), self-reported race was less accurately recorded in the EHR (kappa = 0.26, 95% CI 0–0.54). Race did not match among 43% (9 of 21 with EHR-recorded race). All race concordance information is reported in Table 2, including those with missing self-report. Among all who self-reported Hispanic/Latino ethnicity (regardless of self-reported race), 12.8% (21 of 164) were misclassified as non-Hispanic White in the EHR. Conversely, few who self-reported as Non-Hispanic White were misclassified in the EHR as Hispanic/Latino and/or non-White 2.7% (9/337).

Table 1 Self-reported vs. EHR-documented race
Table 2 Self-reported vs. EHR-documented race among self-reported Hispanic/Latino

Discussion

We found that self-reported race was accurately recorded in the EHR in the majority of cases at our institution. However, among the subgroup of those who self-reported as Hispanic/Latino ethnicity, 43% had inaccurate race information recorded in the EHR. In addition, 13% of all patients who self-reported non-White or Hispanic/Latino were misclassified in the EHR as non-Hispanic White.

The literature is replete with examples of disparities in health care and health outcomes [1, 3, 14,15,16]. These studies commonly rely on data sources without verifying racial or ethnic information accuracy. Prospective studies can collect self-reported race and ethnicity. However, because prospective studies are costly and resource-intensive, patient-based studies often rely on secondary sources of sociodemographic data which may not include self-reported information. Patient studies that rely on EHR data therefore remain commonplace. To our knowledge, few studies have addressed the accuracy of sociodemographic information in the EHR, and none in pediatrics.

Disparities in access to care, care provided, and outcomes of care have been widely reported. For example, Black or AA children with Crohn’s disease have greater than twice the incidence of perianal disease complications than White patients (odds ratio 2.47) and shorter time to hospital readmission (hazard ratio 1.16) [1, 3]. Black and Hispanic children with ESKD have less access to deceased donor transplants [17]. This association is mitigated among Hispanic children who have private insurance, but not among Black or AA children. Black or AA adolescents are also twice as likely to begin dialysis than White children [18]. Once on dialysis, Black or AA children do more poorly and are less likely to meet quality of care performance measures [19, 20]. In CHD, mortality rates are 53% higher among Black or AA infants than White infants [8, 21, 22]. Despite the preponderance of evidence documenting racial and ethnic disparities in health care, there have been no studies evaluating the accuracy of recorded race and ethnicity in pediatrics, so it is unclear how well these findings reflect the scope and magnitude of disparities in these populations [23].

Few validation studies have been published. The first by Klinger et al. was an adult smoking cessation study in which EHR commonly inaccurately reflected self-reported race [24]. They found greater than 30% of patients self-reported identification with multiple racial or ethnic groups, which was inconsistent with how race was recorded in their own medical record. The other study by Magaña López et al. evaluated two cohorts of adults, one with rheumatic diseases and the other stem cell transplant recipients [25]. They found race recorded in the EHR was inaccurate among nearly 40% of those who self-identified as Hispanic/Latino. Notably, their population had a larger proportion of Hispanic patients than ours. Unlike our study, they interviewed patients (about 1/3 in Spanish), and included patients residing outside the USA. We could find no published comparable studies in pediatric populations.

To our knowledge, there are no EHR-based studies evaluating any bias that may exist in reports of pediatric disparities. There are administrative claims-based studies that demonstrate how missing self-reported race and ethnicity introduces bias into disparity estimates. A study by Brown et al. used pediatric data from pediatric patients included in two databases from the Florida Healthy Kids program and Children’s Medical Services Network, both of which included family self-reported information [26]. This study found that excluding patients with missing race/ethnicity information significantly reduced the apparent disparity, and biases quality of care measures downward relative to non-minority children.

Hospital EHR provides an opportunity to consistently collect self-reported information on patient race and ethnicity, which would then yield data that more accurately reflects the true magnitude of disparities. However, there is wide variation in the methods with which race and ethnicity information are collected across pediatric hospitals [27]. For example, in a survey of 93 Children’s Hospitals in North America, 95% of hospitals routinely collect data on patent race and ethnicity. Among the hospitals that collected these data, only 68% use standardized categories, and only 13% had an option for “multiracial” or “multiethnic,” or allowed for selection of multiple race categories. Furthermore, only 70% of hospitals surveyed had explicit training for staff on how to collect patient race/ethnicity data [27].

Our study found missing data similar to that reported by Cowden et al. We additionally found inaccuracies in the non-missing race and ethnicity data that was recorded in the EHR. Our findings further emphasize the importance of accurate collection of self-reported information. Demographic information in the EHR should be provided directly from patient self-report to ensure the most accurate information. Sitpati et al. provide important recommendations for improving the EHR, such as standardizing complete documentation of demographic information, and requiring these fields to be completed [28]. Cowden et al. further recommend standardizing methods for collecting this information [27]. Our study provides further evidence to support such reforms.

The limitations of our study include those common to single institution studies. It is unclear how practice patterns at our institution differ from other institutions. It is also unclear how generalizable the findings are to other pediatric disciplines. However, the patient EHR data were collected in the same manner in subspecialty clinics as in general pediatric clinics within our institution, so there is no reason to expect documentation should differ from that of general pediatrics. The study also could not account for practical or methodological differences in how registrar staff collected or entered information into EHR—it may be that some staff were better at querying race and ethnicity in a culturally appropriate way. Another limitation is the composition of our patient population. The majority of our patients are White, and only 28% are non-White or Hispanic. Additionally, while we did not exclude non-English speaking patients, one of our surveys (administered to 20% of participants) was only available in English.

Despite these limitations, our study is important as there is a paucity of research on the accuracy of sociodemographic information in EHR systems, and none in pediatrics. This study demonstrates that the EHR recording of sociodemographic information may not be accurate for some individuals, especially among Hispanic/Latino individuals. It is important to consider that the magnitude of information bias introduced by mischaracterization of race or ethnicity may differ at other institutions. This bias may vary with local demographics, with the processes with which sociodemographic information is recorded, and with the unconscious bias of individuals recording this information. It is therefore important to study this critical gap in the evidence base more broadly across multiple institutions.

Conclusion

In conclusion, race and ethnicity are often inaccurate for patients who identify as Hispanic/Latino in our study, and are therefore under-represented in the EHR data, corroborating and expanding on the limited evidence from the published literature. Inaccurately recorded race and/or ethnicity has important implications for the reliability of disparity research and for informing child health policy. Information on race and ethnicity has been required since the implementation of the “meaningful use” provision within the American Recovery and Reinvestment Act in 2009, but the available evidence indicates this has yet to be fully implemented [29]. Reliable processes are needed to incorporate self-reported race and ethnicity information in the EHR at institutional and national levels.