Abstract
Objective
Racial and ethnic disparities are commonplace in health care. Research often relies on sociodemographic information recorded in the electronic health record (EHR). Little evidence is available about the accuracy of EHR-recorded sociodemographic information, and none in pediatrics. Our objective was to determine the accuracy of EHR-recorded race and ethnicity compared to self-report.
Methods
Patients/guardians enrolled in two prospective observational studies (10/2014–1/2019) provided self-reported sociodemographic information. Corresponding EHR information was abstracted. EHR information was compared to self-report, considered “gold standard.” Agreement was evaluated with Cohen’s kappa.
Results
A total of 503 patients (42% female, median age 12.8 years) were identified. Self-reported race (N = 484) was 73% White, 16% Black or African American (AA), 4% Asian, 5% multiracial, and 2% other. Self-reported ethnicity (N = 410) was 9% Hispanic/Latino, and 88% non-Hispanic/Latino. Agreement between self-reported and EHR-recorded race was substantial (kappa = 0.77, 95% CI 0.72–0.83). Race was discordant among 10% (47/476). Hispanic/Latino ethnicity also had strong agreement (kappa = 0.77, 95% CI 0.65–0.89). Among those who self-reported Hispanic/Latino and reported race (N = 21), race was less accurately recorded in the EHR (kappa = 0.26, 95% CI 0–0.54). Race did not match among 43% with recorded race (9/21). Among self-reported racial and/or ethnic minorities, 13% (12/164) were misclassified in the EHR as non-Hispanic White.
Conclusions
We found race and ethnicity are often inaccurately recorded in the EHR for patients who self-identify as minorities, leading to under-representation of minorities in the EHR. Inaccurately recorded race and ethnicity has important implications for disparity research, and for informing health policy. Reliable processes are needed to incorporate self-reported race and ethnicity in the EHR at institutional and national levels.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Minority race and ethnicity are associated with worse health outcomes compared to White non-Hispanic populations [1, 2]. For example, Black or African American (AA) children with Crohn’s disease, a form of inflammatory bowel disease (IBD), are 30% more likely to develop perianal fistulas and require surgery than White children [1, 3]. Black or AA children with end-stage kidney disease (ESKD) have lower access to transplantation, and worse survival rates [4,5,6,7]. Mortality among Black or AA infants with congenital heart disease (CHD) is 1.4 times higher than for non-Hispanic White infants [8, 9]. Hispanic ethnicity is also associated with a 2.6 times increased odds of mortality between palliative surgeries among children with hypoplastic left heart syndrome [8, 9]. The reasons for these disparities are often unclear, but are hypothesized to be barriers to health care, differential treatment, and systemic racism [10].
The electronic health record (EHR) houses patient data used for providing health care. Research studies using these data typically rely on sociodemographic information recorded in the EHR, which then serves as the foundation of understanding and reporting disparities. Disparity research is generally conducted under the assumption that this recorded information accurately represents the population of interest. Errors in recording sociodemographic information may lead to biased estimates of associations with race or ethnicity, and therefore can impede efforts to reduce health disparities. However, the fidelity of EHR-recorded race and ethnicity has not been adequately evaluated, particularly in pediatrics.
To have confidence in EHR-reliant disparity research, we investigated the accuracy of recorded race and ethnicity compared to patient/guardian self-report.
Methods
We included pediatric patients who were prospectively enrolled in 2 studies at the University of Michigan across 4 pediatric subspecialties, cardiology, transplant hepatology, nephrology, and pediatric gastroenterology (10/2014–1/2019). The first included children are those who underwent colonoscopy for either known or suspected IBD [11]. The second was a registry of pediatric solid organ transplant recipients (heart, liver, and kidney) [12]. In each, patients and/or guardians reported the patient’s race and ethnicity (from this point forward called “self-report”). Both studies queried race providing the same options (American Indian/Alaska Native, Asian, Black or African American (AA), Native Hawaiian/Pacific Islander, multiracial, or other). Both studies queried Hispanic or Latino ethnicity. The transplant study allowed text input to clarify “other” race. The transplant registry questionnaire was available in English and Spanish language, while the colonoscopy study was available only in English. Patients were not excluded for missing data.
We abstracted information from the Epic (Verona, WI) EHR. The method by which patient sociodemographic information was obtained for entry into the EHR is unknown for any given patient, and likely varied. The standard procedures at our institution involve patients being registered in person where registrar staff enters sociodemographic information into the EHR for patients who first present to the institution through the emergency department or at hospital admission. For others, this information is obtained by the registrar staff via telephone when scheduling the first clinic appointment at the institution. In both instances, the registrar staff are supposed to request self-reported race and ethnicity, but this is not consistently done. The reasons for not requesting self-report and frequency with which this occurs are unknown. The original source of sociodemographic information and route through which it was entered into the EHR is not recorded in the EHR, and therefore is unknown.
Clinical and sociodemographic information as recorded in the EHR were abstracted by a researcher blinded to self-reported information. We compared EHR data to the self-reported information obtained from the prospective studies described above. Self-reported information was considered the “gold standard.” We produced descriptive statistics. Cohen’s kappa was calculated to evaluate agreement between self-report and EHR data. This study was approved by University of Michigan’s Institutional Review Board.
Results
In total, 503 patients were identified (20% colonoscopy, 45% hepatology, 20% nephrology, 15% cardiology), 42% female, median age 12.8 years (interquartile range 7–16). The age distribution was bimodal, with peaks in the first year and at 16 years, reflecting different study populations. Ninety-six percent self-reported race (N = 484; 73% White, 16% Black or AA, 4% Asian, 5% multiracial, and 2% other). Eighty-two percent self-reported ethnicity (N = 410; 8% Hispanic/Latino, 74% non-Hispanic/Latino, and 18% missing). Among those who self-reported both race and ethnicity (N = 392), 337 (86%) self-reported as non-Hispanic White. Nineteen (4%) did not self-report race, 93 (18%) did not self-report ethnicity, and one person did not self-report both race and ethnicity. In the EHR, 8 people (2%) had missing race, and 11 (2%) had missing ethnicity data. There were no patients who were missing both EHR and self-reported data; however, 476 (95%) had both self-report and EHR-recorded race, and 400 (80%) had both self-report and EHR-recorded ethnicity.
Agreement between self-reported and EHR-recorded race was substantial (kappa = 0.77, 95% confidence interval [CI] 0.72–0.83; Table 1) [13]. Race was discordant among 10% (47 of 476). Among the patients who self-reported ethnicity and where EHR ethnicity was recorded (N = 400), EHR-recorded ethnicity generally had strong agreement with self-report (kappa = 0.77, 95% CI 0.65–0.89), while 4% (14 of 400) had discordant ethnicity information. Among those who self-reported Hispanic/Latino ethnicity and reported their race (N = 21), self-reported race was less accurately recorded in the EHR (kappa = 0.26, 95% CI 0–0.54). Race did not match among 43% (9 of 21 with EHR-recorded race). All race concordance information is reported in Table 2, including those with missing self-report. Among all who self-reported Hispanic/Latino ethnicity (regardless of self-reported race), 12.8% (21 of 164) were misclassified as non-Hispanic White in the EHR. Conversely, few who self-reported as Non-Hispanic White were misclassified in the EHR as Hispanic/Latino and/or non-White 2.7% (9/337).
Discussion
We found that self-reported race was accurately recorded in the EHR in the majority of cases at our institution. However, among the subgroup of those who self-reported as Hispanic/Latino ethnicity, 43% had inaccurate race information recorded in the EHR. In addition, 13% of all patients who self-reported non-White or Hispanic/Latino were misclassified in the EHR as non-Hispanic White.
The literature is replete with examples of disparities in health care and health outcomes [1, 3, 14,15,16]. These studies commonly rely on data sources without verifying racial or ethnic information accuracy. Prospective studies can collect self-reported race and ethnicity. However, because prospective studies are costly and resource-intensive, patient-based studies often rely on secondary sources of sociodemographic data which may not include self-reported information. Patient studies that rely on EHR data therefore remain commonplace. To our knowledge, few studies have addressed the accuracy of sociodemographic information in the EHR, and none in pediatrics.
Disparities in access to care, care provided, and outcomes of care have been widely reported. For example, Black or AA children with Crohn’s disease have greater than twice the incidence of perianal disease complications than White patients (odds ratio 2.47) and shorter time to hospital readmission (hazard ratio 1.16) [1, 3]. Black and Hispanic children with ESKD have less access to deceased donor transplants [17]. This association is mitigated among Hispanic children who have private insurance, but not among Black or AA children. Black or AA adolescents are also twice as likely to begin dialysis than White children [18]. Once on dialysis, Black or AA children do more poorly and are less likely to meet quality of care performance measures [19, 20]. In CHD, mortality rates are 53% higher among Black or AA infants than White infants [8, 21, 22]. Despite the preponderance of evidence documenting racial and ethnic disparities in health care, there have been no studies evaluating the accuracy of recorded race and ethnicity in pediatrics, so it is unclear how well these findings reflect the scope and magnitude of disparities in these populations [23].
Few validation studies have been published. The first by Klinger et al. was an adult smoking cessation study in which EHR commonly inaccurately reflected self-reported race [24]. They found greater than 30% of patients self-reported identification with multiple racial or ethnic groups, which was inconsistent with how race was recorded in their own medical record. The other study by Magaña López et al. evaluated two cohorts of adults, one with rheumatic diseases and the other stem cell transplant recipients [25]. They found race recorded in the EHR was inaccurate among nearly 40% of those who self-identified as Hispanic/Latino. Notably, their population had a larger proportion of Hispanic patients than ours. Unlike our study, they interviewed patients (about 1/3 in Spanish), and included patients residing outside the USA. We could find no published comparable studies in pediatric populations.
To our knowledge, there are no EHR-based studies evaluating any bias that may exist in reports of pediatric disparities. There are administrative claims-based studies that demonstrate how missing self-reported race and ethnicity introduces bias into disparity estimates. A study by Brown et al. used pediatric data from pediatric patients included in two databases from the Florida Healthy Kids program and Children’s Medical Services Network, both of which included family self-reported information [26]. This study found that excluding patients with missing race/ethnicity information significantly reduced the apparent disparity, and biases quality of care measures downward relative to non-minority children.
Hospital EHR provides an opportunity to consistently collect self-reported information on patient race and ethnicity, which would then yield data that more accurately reflects the true magnitude of disparities. However, there is wide variation in the methods with which race and ethnicity information are collected across pediatric hospitals [27]. For example, in a survey of 93 Children’s Hospitals in North America, 95% of hospitals routinely collect data on patent race and ethnicity. Among the hospitals that collected these data, only 68% use standardized categories, and only 13% had an option for “multiracial” or “multiethnic,” or allowed for selection of multiple race categories. Furthermore, only 70% of hospitals surveyed had explicit training for staff on how to collect patient race/ethnicity data [27].
Our study found missing data similar to that reported by Cowden et al. We additionally found inaccuracies in the non-missing race and ethnicity data that was recorded in the EHR. Our findings further emphasize the importance of accurate collection of self-reported information. Demographic information in the EHR should be provided directly from patient self-report to ensure the most accurate information. Sitpati et al. provide important recommendations for improving the EHR, such as standardizing complete documentation of demographic information, and requiring these fields to be completed [28]. Cowden et al. further recommend standardizing methods for collecting this information [27]. Our study provides further evidence to support such reforms.
The limitations of our study include those common to single institution studies. It is unclear how practice patterns at our institution differ from other institutions. It is also unclear how generalizable the findings are to other pediatric disciplines. However, the patient EHR data were collected in the same manner in subspecialty clinics as in general pediatric clinics within our institution, so there is no reason to expect documentation should differ from that of general pediatrics. The study also could not account for practical or methodological differences in how registrar staff collected or entered information into EHR—it may be that some staff were better at querying race and ethnicity in a culturally appropriate way. Another limitation is the composition of our patient population. The majority of our patients are White, and only 28% are non-White or Hispanic. Additionally, while we did not exclude non-English speaking patients, one of our surveys (administered to 20% of participants) was only available in English.
Despite these limitations, our study is important as there is a paucity of research on the accuracy of sociodemographic information in EHR systems, and none in pediatrics. This study demonstrates that the EHR recording of sociodemographic information may not be accurate for some individuals, especially among Hispanic/Latino individuals. It is important to consider that the magnitude of information bias introduced by mischaracterization of race or ethnicity may differ at other institutions. This bias may vary with local demographics, with the processes with which sociodemographic information is recorded, and with the unconscious bias of individuals recording this information. It is therefore important to study this critical gap in the evidence base more broadly across multiple institutions.
Conclusion
In conclusion, race and ethnicity are often inaccurate for patients who identify as Hispanic/Latino in our study, and are therefore under-represented in the EHR data, corroborating and expanding on the limited evidence from the published literature. Inaccurately recorded race and/or ethnicity has important implications for the reliability of disparity research and for informing child health policy. Information on race and ethnicity has been required since the implementation of the “meaningful use” provision within the American Recovery and Reinvestment Act in 2009, but the available evidence indicates this has yet to be fully implemented [29]. Reliable processes are needed to incorporate self-reported race and ethnicity information in the EHR at institutional and national levels.
Data Availability
Not applicable.
Code Availability
Not applicable.
References
Adler J, Dong S. Eder Sj, Dombkowski Kj, ImproveCareNow Pediatric Ibdlhs. Perianal Crohn disease in a large multicenter pediatric collaborative. J Pediatr Gastroenterol Nutr. 2017;64:E117–24.
Okafor PN, Stobaugh DJ, Van Ryn M, Talwalkar JA. African Americans have better outcomes for five common gastrointestinal diagnoses in hospitals with more racially diverse patients. Am J Gastroenterol. 2016;111:649–57.
Dotson JL, Kappelman MD, Chisolm DJ, Crandall WV. Racial disparities in readmission, complications, and procedures in children with Crohn’s disease. Inflamm Bowel Dis. 2015;21:801–8.
Amaral S, Mcculloch CE, Black E, Winnicki E, Lee B, Roll GR, et al. Trends in living donation by race and ethnicity among children with end-stage renal disease in the United States, 1995-2015. Transplant Direct. 2020;6:E570.
Amaral S, Patzer R. Disparities, race/ethnicity and access to pediatric kidney transplantation. Curr Opin Nephrol Hypertens. 2013;22:336–43.
Ku E, Mcculloch CE, Grimes BA, Johansen KL. Racial and ethnic disparities in survival of children with ESRD. J Am Soc Nephrol. 2017;28:1584–91.
Ng DK, Moxey-Mims M, Warady BA, Furth SL, Munoz A. Racial differences in renal replacement therapy initiation among children with a nonglomerular cause of chronic kidney disease. Ann Epidemiol. 2016;26(780):7E1.
Collins JW Jr, Soskolne G, Rankin KM, Ibrahim A, Matoba N. African-American: White disparity in infant mortality due to congenital heart disease. J Pediatr. 2017;181:131–6.
Ghanayem NS, Allen KR, Tabbutt S, Atz AM, Clabby ML, Cooper DS, et al. Interstage mortality after the Norwood procedure: results of the Multicenter Single Ventricle Reconstruction Trial. J Thorac Cardiovasc Surg. 2012;144:896–906.
Chen J, Vargas-Bustamante A, Mortensen K, Ortega AN. Racial and ethnic disparities in health care access and utilization under the Affordable Care Act. Medical Care. 2016;54:140–6.
Adler J, Eder SJ, Gebremariam A, French KR, Moncion I, Singer AAM, et al. Development and testing of a new simplified endoscopic mucosal assessment for Crohn’s disease: The Sema-Cd. Inflamm Bowel Dis. 2021;27:1585–92.
Bilhartz JL, Lopez MJ, Magee JC, Shieck VL, Eder SJ, Fredericks EM. Assessing allocation of responsibility for health management in pediatric liver transplant recipients. Pediatr Transplant. 2015;19:538–46.
Landis JR, Koch GG. Measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Singer AAM, Bloom DA, Adler J. Factors associated with development of perianal fistulas in pediatric patients with Crohn’s disease. Clin Gastroenterol Hepatol 2021;19:1071–3.
Sewell JL, Velayos FS. Systematic review: the role of race and socioeconomic factors on IBD healthcare delivery and effectiveness. Inflamm Bowel Dis. 2013;19:627–43.
Kandavel P, Eder SJ, Adler J. ImproveCareNow Network Pediatric ILHS. Reduced systemic corticosteroid use among pediatric patients with inflammatory bowel disease in a large learning health system. J Pediatr Gastroenterol Nutr. 2021;73:345–51.
Patzer RE, Amaral S, Klein M, Kutner N, Perryman JP, Gazmararian JA, et al. Racial disparities in pediatric access to kidney transplantation: does socioeconomic status play a role? Am J Transplant. 2012;12:369–78.
Ferris ME, Miles JA, Seamon ML. Adolescents and young adults with chronic or end-stage kidney disease. Blood Purif. 2016;41:205–10.
Atkinson MA, Neu AM, Fivush BA, Frankenfield DL. Ethnic disparity in outcomes for pediatric peritoneal dialysis patients in the ESRD Clinical Performance Measures Project. Pediatr Nephrol. 2007;22:1939–46.
Leonard MB, Stablein DM, Ho M, Jabs K, Feldman HI. North American Pediatric Renal Transplant Cooperative S. Racial and center differences in hemodialysis adequacy in children treated at pediatric centers: a North American Pediatric Renal Transplant Cooperative Study (NAPRTCS) report. J Am Soc Nephrol. 2004;15:2923–32.
Lopez KN, Morris SA, SexsonTejtel SK, Espaillat A, Salemi JL. US mortality attributable to congenital heart disease across the lifespan from 1999 through 2017 exposes persistent racial/ethnic disparities. Circulation. 2020;142:1132–47.
Peyvandi S, Baer RJ, Moon-Grady AJ, Oltman SP, Chambers CD, Norton ME, et al. Socioeconomic mediators of racial and ethnic disparities in congenital heart disease outcomes: a population-based study in California. J Am Heart Assoc. 2018;7:E010342.
Sofia MA, Rubin DT, Hou N, Pekow J. Clinical presentation and disease course of inflammatory bowel disease differs by race in a large tertiary care hospital. Dig Dis Sci. 2014;59:2228–35.
Klinger EV, Carlini SV, Gonzalez I, Hubert SS, Linder JA, Rigotti NA, et al. Accuracy of race, ethnicity, and language preference in an electronic health record. J Gen Intern Med. 2015;30:719–23.
Magana Lopez M, Bevans M, Wehrlen L, Yang L, Wallen GR. Discrepancies in race and ethnicity documentation: a potential barrier in identifying racial and ethnic disparities. J Racial Ethn Health Disparities 2017;4(5):812–8.
Brown DP, Knapp C, Baker K, Kaufmann M. Using Bayesian imputation to assess racial and ethnic disparities in pediatric performance measures. Health Serv Res. 2016;51:1095–108.
Cowden JD, Flores G, Chow T, Rodriguez P, Chamblee T, Mackey M, et al. Variability in collection and use of race/ethnicity and language data in 93 pediatric hospitals. J Racial Ethn Health Disparities. 2020;7:928–36.
Sitapati AM, Berkovich B, Arellano AM, Scioscia A, Friedman LS, Millen M, et al. A case study of the 1115 waiver using population health informatics to address disparities. Jamia Open. 2020;3:178–84.
Blumenthal D, Tavenner M. The, “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363:501–4.
Funding
This work was supported in part by the Pediatric Research Organization for Kids with Intestinal Disorders (PRO-KIIDS).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Joann Samalik and Jeremy Adler. The first draft of the manuscript was written by Joann Samalik. All authors approved the final manuscript.
Corresponding author
Ethics declarations
Ethics Approval
The study was approved by the Institutional Review Board of the University of Michigan.
Consent to Participate and for Publication
Written consent and assent was obtained from the patients and/or parents/guardians for participation and publication. No images were included in this manuscript.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Samalik, J.M., Goldberg, C.S., Modi, Z.J. et al. Discrepancies in Race and Ethnicity in the Electronic Health Record Compared to Self-report. J. Racial and Ethnic Health Disparities 10, 2670–2675 (2023). https://doi.org/10.1007/s40615-022-01445-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40615-022-01445-w