Introduction

The assessment of resource utilization is increasingly important in this era of rising health care costs. The value of health care has been defined as outcomes achieved relative to dollars spent [1], making methods to effectively measure the costs of our health care delivery of critical importance to considering value and quality among institutions [2].

Despite decades of conversation concerning quality of care, a wide variety of institutional practice habits persist. As early as the 1990s, Wennberg unveiled a pattern of tremendous regional variation in medical practice in the USA [3]. His team detailed geographic variances in many areas ranging from physician supply to diagnostic testing to intervention rates and determined that different regions have a twofold variation in per capita Medicare spending. Of greater concern, the areas of increased expenditure did not confer improved outcomes for the patients [4]. Similar examples seen in many individual medical subspecialties have led to a concerted turn toward value in the medical arena, resulting in the use of large datasets to examine institutional variation in both outcomes and costs in the practice of medicine. Table 1 details those databases referenced throughout this document.

Table 1 Referenced datasets to ascertain resource utilization across institutions in pediatric medicine

Collaboration in pediatric clinical research

The challenges involved in the conduction of clinical research studies in children are well described. First, there are relatively few children with serious medical problems. For example, the American Cancer Society estimated 1.8 million new cancer cases in 2016, but only 10,380 involving patients aged 0–14 [5]. Secondly, there are unique ethical and regulatory restrictions that apply to the pediatric population [6]. These challenges have created a relative paucity of published data related to children compared to their adult counterparts [6].

The relative scarcity of potential participants makes creating a study that can be adequately powered to answer a given question particularly challenging. Campbell et al. reviewed randomized clinical trials in the Archives of Diseases in Childhood from 1982 to 1996 and reported that half of the studies had less than 40 enrollees [7]. To combat these challenges, some pediatric research collaborative networks developed. The Children’s Oncology Group has been a model for decades in enrolling large numbers of member institutions to expand participant enrollment for trials in which single institution experience would be impossible to power. The Cystic Fibrosis Foundation formed a clinical trials network in 1998 that has served to generate trials for the development of novel therapies for cystic fibrosis [8]. The Pediatric Heart Network was established by the National Heart, Lung, and Blood Institute in 2001 and has been a robust source of collaboration for clinical trials in congenital and acquired pediatric heart disease since its inception.

Big data in pediatric medicine

The development of clinical trial networks has pushed forward multicenter evidence-based research in pediatrics. However, multicenter trials carry enormous logistical and financial challenges. They require significant resources to generate a trial that may be adequately powered to answer a single query.

The rise of electronic medical records has coincided with an increase in development of administrative databases. These data sets have the potential to aggregate hundreds of thousands of people, far outnumbering any feasible clinical trial. In addition, data entry does not require the time or expertise of clinical health care staff. These databases are relatively inexpensive and often efficient at answering a number of clinical questions.

These types of data can be useful when directed at problems where clinical trials are underpowered or underfunded. For example, they may uncover subtle effects that cannot be elucidated with a limited sample size. Administrative datasets provide information on a large number of unselected patients across the country, often providing a more objective data sample than single institution studies, which may be biased toward publication of favorable outcomes. The regional and size variation present in institutions contributing to administrative datasets may also provide a sample more representative of nationwide practice, when compared to single center reports or clinical trials performed at a network of large academic centers.

It is important to recognize that administrative datasets are not without their limitations. Often, these include International Classification of Diseases Ninth Revision, Clinical Modification (ICD-9) codes, which may not be nearly as detailed or current as coding designed for specific areas of medical subspecialties. In addition, these datasets lack detailed patient-level information, limiting the ability to risk adjust for baseline patient characteristics. Administrative databases also generally do not allow the follow-up of individual patients over time. Finally, they are more prone to coding errors and inaccurate case ascertainment than data entry performed by clinical staff [911].

Examples of the use of datasets to describe institutional variation in pediatrics

There has been a proliferation of published pediatric research using large datasets in the past 3 years. For example, Sun and colleagues utilized the US Nationwide Inpatient Sample (NIS) to identify 12,512 patients undergoing tonsillectomy in 2009, finding a median cost of $4393 and mean cost of $7525. They found the need of mechanical ventilation had a marked effect on the total encounter cost [12]. Meier et al. studied hospital costs for same-day pediatric adenotonsillectomy surgery within a multihospital network. They identified 26,602 cases from 18 hospitals from 1998 to 2012, ultimately showing significant variation in costs at different facilities (range $1029–$2385/case) [13].

Rice-Townsend and colleagues have performed several studies utilizing the Pediatric Health Information System (PHIS) database to examine practice variation and costs in pediatric surgery. Utilizing a cohort of 13,328 patients with appendicitis from 34 children’s hospitals from 2010 to 2011, they concluded that median hospital costs differed fourfold for patients with uncomplicated disease, suggesting significant variation in practice [14]. This group also examined 2544 patients with intussusception using PHIS and found significant practice and cost variation among hospitals for this procedure as well [15].

The PHIS database, run by the Children’s Hospital Association (CHA), has also been employed by several other investigators to explore cost variation. Kharbanda et al. used the PHIS database to examine variation in resource utilization among pediatric emergency departments. Using diagnoses of asthma, gastroenteritis, or simple febrile seizure, they compared more than 250,000 Emergency Department (ED) visits at 21 institutions. Although practice and resource utilization varied across institutions, higher costs were not associated with lower rates of hospitalization or repeat ED visits [16]. In addition, Tieder et al. identified 24,890 admissions for diabetic ketoacidosis at 38 children’s hospitals using the PHIS database. They compared resource utilization, length of stay, and readmission rates, finding a wide variance among institutional practices [17]. Finally, Brimley et al. evaluated the variation of costs and mortality among children with leukodystrophy in US children’s hospitals using the PHIS database. They identified 122 patients and described a wide variety of costs at different institutions, although they did find a correlation between higher volumes of patients and increased cost efficiency [18].

Derrington and colleagues utilized the Massachusetts Pregnancy to Early Life Longitudinal Data System (PELL) to investigate the cost variation among different racial and ethnic backgrounds for children with Down syndrome. This was achieved with data collected on 504 children with Down syndrome and 468,000 in the control cohort. The study revealed higher costs in the birth hospitalization for both non-Hispanic black and Hispanic children when compared to non-Hispanic whites [19].

The Society of Thoracic Surgeons Congenital Heart Surgery Database (STS CHSD) is a voluntary registry that contains clinical outcome information on all congenital and pediatric cardiovascular operations performed at participating centers. Husain et al. accessed the STS CHSD to evaluate geographic variation among infants undergoing cardiac repair. They identified 23,379 patients in 94 centers with significant regional variation in all seven diagnostic groups examined [20]. The STS CHSD was also used by Jacobs et al. to describe variation in cost and outcomes among congenital heart centers [21]. They queried eight benchmark pediatric cardiac operations from performed from 2005 to 2009 and ultimately examined 18,375 index operations at 74 centers. Jacobs found significant interinstitutional variation in postoperative length of stay, which was most prominent for more complex operations.

Merging of clinical and administrative datasets

The linkage of clinical and administrative datasets has recently been utilized as a method to overcome the limited clinical details available in administrative datasets. Clinical registries and clinical trials contain detailed, adjudicated, clinical information on the patient level, whereas valuable cost data is only included in the administrative datasets. The method of linking datasets via indirect patient identifiers has been championed by Dr. Sara Pasquali in the field of congenital heart surgery. The linked clinical and administrative datasets allow for the examination of costs for congenital heart surgery, adjusted for important underlying preoperative risk factors [22].

For example, Pasquali et al. linked clinical data from the STS Congenital Heart Surgery Database with resource utilization data in PHIS to describe the cost variation for nine operations of differing complexity. A significant variation across centers in adjusted hospital costs per patient (up to ninefold) was observed for each operation. Differences in length of stay and complication rates explained 28 % of the between center variation and high-volume centers had lower costs for the most complex operations [23•].

Recently, the novel strategy of linking administrative datasets with clinical trial data has been explored in pediatric oncology and congenital heart surgery. The Children’s Oncology Group (COG) merged clinical trial data from a phase III COG trial for patients with de novo acute myeloid leukemia at 43 centers with resource utilization data from PHIS using probabilistic matching. There was a 94 % success in linkage, and the standardized costs, blood product usage, and anti-infective exposures were described across centers [24].

Similar methods were employed to examine the impact of postoperative complications on costs for the Norwood operation. Detailed prospectively collected clinical information and postoperative complications data collected in the Pediatric Heart Network’s Single Ventricle Reconstruction (SVR) Trial, a trial evaluating shunt types in patients with hypoplastic left heart syndrome, was linked at the patient level with cost data from the Case Mix administrative database. There was successful linkage of 98 % of eligible patient records, resulting in a study population of 334 patients. The adjusted hospital costs were demonstrated to increase with number of postoperative complications [25] and the hospital costs varied nearly fivefold across centers [26].

Conclusions

The assessment of resource utilization is increasingly important in this era of rising health care costs and among efforts to accurately ascribe value in medicine. The use of large administrative datasets is commonly employed to measure healthcare costs. The studies identified above represent the result of a search of studies using large datasets to examine cost variation published in a 3-year span from 2012 to 2015. A wealth of outcomes and comparative cost analyses have been acquired across the medical spectrum by investigating readily available data in administrative datasets.

Although administrative datasets can efficiently provide cost information on a large number of patients across centers, administrative data does have its limitations. These shortcomings are particularly related to data entry errors, limited ability to track patients over time, and absence of detailed clinical information to control for patient-level risk factors which can clearly affect hospital stay and costs. It is the authors’ belief that most of these limitations can be mitigated with the linkage of clinical and administrative databases, resulting in valuable datasets that could not be established individually. Identification of practice variation utilizing these linked clinical and administrative datasets can help direct initiatives to both improve outcomes and reduce costs across hospitals, improving the value of pediatric medicine.