Background and significance

Cerebral palsy is eventually diagnosed in approximately 5% of preterm infants and can have significant consequences for quality of life.1,2 Historically diagnosis of CP was delayed until after 1 year of age but since publication in 2017 of early detection guidelines the age of CP diagnosis has decreased. This is important since early targeted therapies during periods of rapid brain development can improve long-term functional outcomes.3 More work is needed to identify preterm infants in the neonatal intensive care unit (NICU) at high risk of CP to facilitate individualized therapies before and after discharge home.4

Atypical heart rate (HR) patterns in infants in the NICU can reflect central nervous system injury and dysfunction and predict a wide range of adverse outcomes.5,6,7,8 Our group has focused on the discovery of simple HR metrics using time series analysis, and we previously reported that low variability of HR, particularly in the first week after birth, is associated with severity of intracranial hemorrhage, hypoxic-ischemic brain injury, and long-term neurodevelopmental challenges.9,10 More recently, we studied early HR patterns in NICU patients of all gestational ages with or without eventual diagnosis of CP, and found that simple HR metrics from the first week after birth, such as low standard deviation of every-2-s HR, added to clinical risk factors (lower maternal age, outborn delivery, lower 5-min Apgar score) for CP prediction.11 Since gestational age impacts both HR characteristics and risks for CP, in the current work, we focus on preterm infants and tested both early (first week) and later (37 weeks postmenstrual age) HR patterns. We additionally sought to determine whether using highly comparative time series analysis (HCTSA) methodology that we previously used to identify HR patterns predicting mortality, we could discover novel HR patterns providing even better risk stratification for CP.12,13,14

HCTSA was first described in 2013 by Fulcher et al. and involves analyzing time series data using hundreds of algorithms derived from multiple interdisciplinary fields.14 The algorithms are deployed using many sets of parameter values, resulting in thousands of operations. In medicine, HCTSA has been used primarily for analysis of electroencephalogram data.13 Our group has used the HCTSA approach to discover previously unknown patterns of HR predicting NICU mortality.12 The long-term objective of our group is to develop parsimonious models that may eventually be useful in the clinical setting, and the objective of the work presented here is to use HCTSA to discover new HR metrics early and later in the NICU stay for prediction of later diagnosis of CP. We note that in our current work, we do not use traditional analysis of HR variability, which requires storage and analysis of electrocardiogram waveforms to measure inter-heartbeat intervals. Rather, we are using a simplified approach of analyzing the bedside monitor vital sign data files, with average HR displayed every 2 s.

Methods

Design

This is a retrospective cohort study and was approved by the UVA Institutional Review Board with a waiver of consent.

Patients

We considered for inclusion all patients born at <37 weeks’ gestation admitted to a level IV NICU at UVA Children’s Hospital from 2009 to 2019 with at least 4 h of archived HR data within the first 7 days after birth and followed at our center beyond 3 years of age.

High-risk infants cared for in the NICU at our center are followed after discharge by Neurodevelopmental and Behavioral Pediatrics through at least 2 years of age, and longer if developmental delays are identified. During this 10-year study period, the median age of CP diagnosis for the sample was 16 months (range 4–105 months). The diagnosis of CP was confirmed by an experienced neurodevelopmental clinician (L.L.), by reviewing the records of every child with an ICD-9 or ICD-10 code associated with CP.

The no CP group included all infants <37 weeks’ gestational age admitted to the NICU 2009–2019, with at least 4 h of HR data in the first week after birth and seen at our institution beyond age 3 with no ICD code indicating diagnosis of CP. We excluded a small number of late preterm infants with hypoxic-ischemic encephalopathy treated with therapeutic hypothermia (1 CP case and 10 no CP) since HR is impacted by low core temperature.15 We also excluded a small number of infants with later diagnoses of autism spectrum disorder (10 CP cases and 55 no CP) since, in our recent work, we discovered aberrant HR patterns during the NICU stay in those infants.16 As in our recent work, our comparison “no CP” group includes all eligible preterm NICU patients with data available and known to not have CP. Thus, we have a large comparison group which increases our ability to detect smaller differences between the two groups, increases precision by reducing sampling error, and improves generalizability with a large and diverse group of infants without CP.11

Clinical variables

Demographic and clinical characteristics and diagnoses during the NICU stay were obtained from the NICU database (NeoData, Isoprime, Chicago, IL) and electronic medical records (Epic).

Early clinical variables (known at birth) included maternal age, delivery mode (vaginal or C-section), outborn status, multiple gestation, gestational age, birthweight and head circumference, sex, race, and 5-min Apgar score. Later clinical variables included intraventricular hemorrhage (IVH), post-hemorrhagic hydrocephalus requiring a ventriculoperitoneal shunt, seizures, administration of diazepam for hypertonicity, need for gastrostomy tube for feeding, and length of NICU stay. Variables that had at least 5 cases in either the no CP or CP groups were included in the modeling.

Heart rate data collection and analysis

As part of the standard of care at our institution, all NICU patients have bedside monitor vital sign data collected and stored for research with waiver of consent due to the observational, non-interventional nature of this work. For the current work, we used HR data displayed on bedside monitors every 2 s and archived using the BedMaster system (Hillrom™).

HR data were initially analyzed in all 10-min segments in two 7-day periods. Building on our prior work, we analyzed four mathematical moments of HR data (mean, SD, skewness, kurtosis). In addition, we applied nearly 2000 HCTSA operations on HR data from all 1176 infants included in the current study to identify those with the best ability to distinguish between CP and no CP infants in univariate analysis. To reduce the computational burden, we first focused on HR data from the first day after birth and identified 12 HCTSA metrics with the highest univariate AUC for CP prediction to move to the next phase of analysis on week-long HR data. We chose to focus on first-day data since HR patterns soon after birth better reflect insults occurring in the peripartum period and since we used the full 24 h period after birth, the impact of interventions that might transiently impact the HR will be minimized. The 12 HCTSA metrics derived from this preliminary analysis are listed in Table 1, together with a brief description. And finally, also building on prior work, we analyzed 20 additional HCTSA algorithms called “medoids.” In our mortality prediction study, we used unsupervised clustering to identify 20 mutually exclusive subsets of HCTSA algorithms with relatively high correlation within a cluster and low correlation outside the cluster that explained the largest amount of variance (81%) found in the full HCTSA algorithm set. The center algorithms from each of 20 clusters are termed medoids and represent a point in the cluster from which the sum of distances to other data points in the cluster is minimal.

Table 1 Heart rate HCTSA variables with highest CP prediction.

For the current work, we selected 24 HR metrics to move to the second phase of analysis on week-long data. These are shown in Supplementary Table 1 and include 4 mathematical moments, 12 HCTSA top CP predictors, and 8 medoid top CP predictors from univariate first-day analysis. These 24 metrics were applied to HR data from two 7-day periods: the first week after birth (week 1) and at 37 weeks PMA (week 37). The early time period was expected to capture autonomic dysfunction associated with perinatal distress, inflammation, and hypoxia-ischemia, and the later time period at term corrected age to capture sustained nervous system dysfunction and cumulative brain insults that may occur during the NICU stay. For infants discharged from the NICU prior to 37 weeks PMA, we analyzed HR in the 7-day period before discharge.

Every 10-min HR segment was analyzed and the median value for each of the 24 features was identified for the two 7-day periods, week 1 and week 37.

Statistical analysis and modeling

Demographic and clinical characteristics, including medical/neurological comorbidities and HR variables for infants with and without CP, were compared to evaluate baseline differences using a t-test or nonparametric Mann–Whitney U test for continuous variables and χ2 or Fisher’s exact test for categorical variables.

Early and late predictive models were developed for week 1 and week 37 time frames. We tested parsimonious models, including no more than five variables (one predictor for every 10 CP cases in the sample), to avoid overfitting.

Optimal multivariable models were selected based on best fit as measured by the minimum Akaike’s Information Criteria (AIC). Fit is reported as McFaddens-adjusted r-squared ([1],[2]), which transforms the AIC to a value between 0 and 1. A stepwise algorithm was used to find the initial best-fit models of increasing size. Best-fit models were identified that were locally optimal in that the fit was reduced if any of 3 modifications occurred: (1) any feature not in the model was added; or (2) any feature in the model was removed; or (3) any feature in the model was replaced with a feature not in the model.

Ten-fold cross-validated AUC (cvAUC) was used as the primary performance measure. The ten folds were fixed for all calculations and stratified to have equivalent numbers of CP cases. Variable importance was measured as the drop in cvAUC when the variable is removed from the model. For models of size greater than 4, the final model was selected as largest model with all variables having importance of at least 0.005. The lift of the model was calculated as the positive predictive value (PPV) at a threshold equal to the event rate, divided by the event rate. At this lift value, the number of positives is equal to the number of CP cases (57), so the PPV is also equal to the sensitivity. We display the predictive performance of individual variables and combined models in Fig. 1 and Supplementary Figs. 13 as fold-increased risk of CP compared to the overall risk of 1 for the entire cohort.

Fig. 1: CP predictors in the final model.
figure 1

a HR and clinical variables were analyzed in 57 preterm NICU patients with eventual diagnosis of CP and 1119 without CP. The optimized model included one clinical variable (birthweight, grams BWT) and three HR metrics, one from week 1 HR data (RobustSD week 1) and two from week 37 HR data (LongSD3 and Symb3Diffacb week 37). For the four variables in the final model, the relative risk for CP is shown on the y-axis and the value of the variable on the x-axis. The gray shading shows the 95% confidence intervals and the red lines indicate a significant difference (confidence interval not including one) in infants with and without CP. The black dotted lines indicate where 95% of the values lie. HR metrics are shown in units for each variable. b Fold-increased risk of CP based on percentile for each variable and for the final combined model.

Results

Cohort characteristics

In the years of the study (2009–2019), 3726 infants <37 weeks’ GA were admitted to the UVA NICU. Of these, 1176 met inclusion criteria described in “Methods.” A diagnosis of CP was made for 57 (4.8%), and 1119 patients without CP were identified as a comparison group.

Univariate analysis of clinical variables predicting CP

Demographics and clinical characteristics of the cohorts are displayed in Table 2. The groups differed significantly with respect to gestational age, birthweight and head circumference, outborn status and 5-min Apgar score (all p values <0.05). White race characterized 75% of infants with CP and 67% of those without CP, a difference that was not statistically significant (p = 0.195). Fifty-four percent of infants without CP were male compared to 49% in the CP group (p = 0.496). IVH was associated with CP diagnosis, and all three patients who required a ventriculoperitoneal shunt for post-hemorrhagic hydrocephalus were diagnosed with CP. Length of NICU stay was longer for the CP group compared to the no CP group (median 88 days IQR 49–122 versus 25 days IQR 14–54, p < 0.001).

Table 2 Cohort characteristics.

Univariate analysis of HR metrics predicting CP

In the first 24 h after birth, 1061 infants (90%) had HR data and the median number of h of data was 23 (IQR 16–24) for the initial HCTSA analysis. For the HCTSA analysis on HR data from the first 7 days after birth, median hours of HR data available was 159 h (IQR 116–167), and for the 7 days at 37 weeks PMA median 155 h (IQR 131–163). Thus, HR data availability was 95 and 98% of the 7-day periods, respectively.

Univariate risk of CP based on each of the 24 metrics described in methods was analyzed for week 1 and week 37 HR data. Of note, the eight medoids identified by unsupervised clustering from our previously published HCTSA mortality prediction work12 were outperformed for CP prediction in multivariate analyses described below by the 12 HCTSA operations identified by deploying the ~2000 HCTSA operations on the first-day HR data. We therefore show univariate results only for the mathematical moments (Supplementary Fig. S1) and the 12 HCTSA top performers for week 1 HR and week 37 HR (Supplementary Figs. S2 and S3, respectively).

Multivariable modeling of early and late clinical and HR data for CP prediction

The performance of six multivariate models for CP prediction is shown in Table 3. The early clinical model included three variables known at birth (birthweight, race, and 5-min Apgar score) and had cvAUC of 0.784 (0.721, 0.836), with birthweight being the most predictive variable. A discharge clinical model included NICU length of stay and had only slightly improved cvAUC of 0.791 (0.721, 0.839).

Table 3 Performance of multivariate models to predict CP.

A model adding HR metrics from the first week after birth to birthweight led to a cvAUC of 0.827 (0.773, 0.868), with the most predictive HR metrics being low RobustSD and high LongSD3. Week 37 HR data had additive value, with the optimum model combining birthweight, week 1 HR, and week 37 HR metrics having cvAUC 0.853 (0.805, 0.892).

Figure 1a shows the fold-increased risk of CP of each of the four optimal model features along with 95% confidence intervals. Figure 1b shows these curves as a function of the percentile of the values. The output of the optimum parsimonious multivariate model is also plotted.

Figure 2a shows 1 h of HR (averaged every 10 min) from the first week after birth for the six infants in the CP cohort with the lowest values of the HR metric we call “RobustSD.” RobustSD is a measure of variability of HR calculated using a moving threshold algorithm and was also identified in our prior work to provide a good prediction of NICU mortality.12 For week 1 RobustSD, 13 infants had median values <0.32, with 6 having CP for PPV of 46%. Figure 2b shows 1 h of 10-min averaged HR from 37 weeks PMA for the six infants in the CP cohort with the highest value of the HR metric we call LongSD3. This reflects the long-term variability of HR and is the cubed value of the difference in HR values 100 s apart. For week 37 LongSD3, 17 infants had a median value >2000, with six infants later diagnosed with CP, giving a PPV of 35%.

Fig. 2: HR patterns illustrating RobustSD and LongSD3 HR metrics for CP prediction.
figure 2

One hour of 10-min average HR is shown for six preterm NICU patients eventually diagnosed with CP who had the most extreme values of the first week and 37 weeks PMA HR metrics included in the final prediction model. a A low value for “RobustSD” in the first week after birth had a positive predictive value (PPV) for CP of 6/13 = 46%. b A high value for LongSD3 at 37-week PMA had PPV for CP of 6/17 = 35%. Note that these HR data are from 12 infants later diagnosed with CP (there was no overlap in infants with the lowest RobustSD week 1 and the highest LongSD3 week 37).

Another HR metric that was retained in the final model is symb3_diff_acb_w37, which measures the frequency of ACB patterns when the difference of the HR is quantized to a three-letter alphabet. A high frequency of this HR pattern predicts low CP risk.

Discussion

In this large single-center retrospective study of preterm NICU patients, we aimed to develop predictive models for eventual diagnosis of CP using HR data from early (first week after birth) and late (37 weeks postmenstrual age) in the NICU stay. Using HCTSA methodology, we identified a measure of low variability of every-2-s HR (RobustSD) in the first week after birth and a measure of high variability (LongSD3) at term corrected age (37 weeks’ PMA) that added to clinical variables for CP prediction. Multivariate modeling that included week 1 clinical and HR analysis gave good prediction (AUC 0.827), and adding week 37 HR data significantly improved prediction (AUC 0.853).

Low variability of HR is a sign of autonomic nervous system dysfunction and has been shown by us and others to be associated with the severity of acute brain injury and risk for adverse outcomes in neonates.6,9,17,18,19,20,21 The current study builds on our prior work on CP prediction in which we used simple HR metrics (the four mathematical moments of mean, standard deviation, skewness and kurtosis) in a cohort of both preterm and term NICU patients.11 In that work, we found that low standard deviation and negative skewness of HR (toward decelerations) in the first week after birth had AUC for CP prediction. We note that in both that study and the current work, we are not using traditional methods of measuring HR variability by analysis of inter-heartbeat intervals from the electrocardiogram waveform. Rather, we are using a simplified approach of analyzing the bedside monitor vital sign data files, with average HR displayed every-2-s, this method has the disadvantage of not informing on sympathetic-parasympathetic balance, which requires analysis of RR intervals from the ECG waveform and is complicated in neonates with fast and irregular respiratory rates.22,23 An advantage of our method is that it is more likely to be generalizable to centers that are unable to analyze ECG waveforms but can collect and analyze HR displayed as beats per minute updated every 2 s on NICU bedside monitors.

In an attempt to discover novel atypical HR patterns reflecting risk for long-term neuromotor dysfunction, we deployed thousands of operations on first-day HR data using HCTSA to hone in on metrics with the greatest ability to distinguish between infants with and without eventual diagnosis of CP. In our prior work on 6000 NICU patients, HCTSA was computationally intensive and we therefore performed clustering analysis to identify 20 medoid algorithms that accounted for 81% of the variance in the sample for predicting NICU mortality. In the current work, we used those medoids but also used HR data from the first 24 h after birth for the first pass at HCTSA to identify algorithms that predicted CP. We chose to focus on the first day after birth for this initial analysis in order to capture pathologic patterns reflecting acute inflammatory or ischemic insults in the immediate antepartum and intrapartum periods. By analyzing both previously identified mortality-predicting medoid algorithms and newly identified CP-predicting algorithms from first-day HR data, we were able to greatly reduce the computational burden of HCTSA. Another important consideration is that preterm infants with severe insults and instability, at the highest risk for CP, have a higher chance of mortality, and the current work was limited to NICU survivors. It is likely that model performance would be even greater for predicting a combined outcome of death or CP.9,12

In contrast to our prior study, which included both preterm and term infants, we focused the current analyses on preterm infants since gestational age impacts HR characteristics and CP risk. We found that birthweight was the top clinical predictor of eventual diagnosis of CP, and longer length of stay in the NICU was also associated with CP diagnosis but did not add significantly to the final model. In our cohort, sex was not a significant predictor of CP which is in contrast to other studies that have found higher CP risk among male preterm infants.24,25,26 We found that clinical variables had a good prediction of CP, with birthweight retained in the final model. Adding HR analytics from both early and later in the NICU stay greatly strengthened predictive performance, with the optimum model having higher cvAUC, better fit, and higher PPV at a threshold equal to the event rate. We excluded a small number of preterm infants undergoing hypothermia therapy for HIE and in the future could apply the HCTSA approach for late preterm and term infants undergoing hypothermia therapy since finding early biomarkers for adverse outcomes in this population is an important research priority.

We discovered several new HR metrics that reflect HR variability and have high predictive performance for CP. The first and strongest is a metric we call “RobustSD,” with a low value reflecting a paucity of HR accelerations and decelerations (Fig. 2a). This finding of low variability of HR in week 1 predicting an adverse neurodevelopmental outcome is not surprising and is supported by prior work by us and others.6,9,10,18,19,20,27,28,29 An unexpected discovery that predicted CP at week 37 was very high values of a metric we call LongSD3, which reflects long-term, extremely low-frequency variability. The metric is the cubed value of the difference between HR values 100 s apart. Figure 2b shows 1 h of HR at week 37 for the six babies with CP with the highest values of LongSD3. While the physiology of this HR pattern cannot be known from this retrospective analysis, we speculate that infants at high risk for CP may have atypical autonomic control and abnormal state regulation, leading to frequent spikes in HR.

A major strength of our study is the availability of continuous HR data from a large number of NICU patients and our ability to deploy thousands of time series operations to discover novel HR patterns predicting long-term neuromotor dysfunction. A limitation of the study is that the monitors used at our site only display the average HR every 2 s. Other sites use monitors that display HR every second, which would require some minor modification of the algorithms before implementation. To overcome this site variability and concerns about aliasing, models could be developed directly from the interbeat RR intervals derived from ECG waveforms. The tradeoff is the significant amount of additional data processing and management required and this may not be practical in resource-limited settings. Several other limitations deserve consideration. First, given the retrospective nature of this work, we are not able to discern whether the atypical HR pattern of high LongSD3 represents abnormal state regulation or response to external stimuli. Second, we only have standard neuroimaging (head ultrasound) and not MRI, which has some predictive value for CP. Third, we did not account for all the clinical variables that may contribute to CP risk either prenatally (intrauterine inflammation, hypoxia/ischemia, genetics), or postnatally (hypocapnia, hypoxia/ischemia, sepsis, dexamethasone exposure).30 And finally, we did not include physical examination as an assessment for CP risk. Our NICU only recently implemented routine General Movement Assessment for preterm infants.31 Presence of cramped synchronized general movements has a good ability to discriminate CP risk, but we do not have general movement assessment results for the majority of infants included in this study.32 It is possible that, since the performance of general movement assessment is not feasible for all infants in all NICUs, discovering biomarkers such as atypical HR patterns may identify the highest-risk infants that might benefit from this specialized assessment. Another potential use of HR pattern analysis is for risk stratification in clinical trials of neuroprotective agents.

Conclusions

Using HCTSA of every-2-s HR data from NICU bedside monitors, we discovered metrics in preterm NICU patients that reflect atypical HR variability and add to clinical variables to predict CP. With further development and external validation in large multicenter cohorts, HR analysis may serve as a biomarker for CP prediction, identifying the highest-risk patients for targeted therapies prior to and after NICU discharge. Earlier therapies during periods of rapid brain development may, in turn, improve long-term functional outcomes.