Introduction

Approximately 400,000 infants are born preterm in the United States annually and 15 million worldwide.1,2 Over the last 50 years, preterm mortality and morbidity have improved dramatically but remain high for the most preterm infants.3,4,5 Optimum care practices to maximize survival and improve neurodevelopmental outcomes continue to require exploration. Intraventricular hemorrhage (IVH) is a significant morbidity of and highly correlated with mortality among preterm infants. IVH occurs in approximately 25% of infants born under 1500 g and may lead to long term neurodevelopmental impairment, decreased survival and long, difficult, costly hospital stays.5,6,7,8 Over half of IVH episodes are silent and most others have only subtle signs to alert the clinician.9,10 Prediction of IVH may help identify higher risk newborns, possibly allowing corrective action such as reduction in hypoxemia, hypercarbia, more detailed blood pressure measurement, reduction in handling and stress and provision of preventative medications.7

Heart rate variability (HRV) analyzes the subtle differences in time intervals between heartbeats and can be extracted from data collected, routinely and noninvasively, by bedside monitors. HRV is thought to retain information about the various physiological controls on the heart rate including the autonomic inputs to the heart and respiratory cadence.11 HRV has been associated with multiple neonatal illnesses, notably including sepsis and overall mortality, and so seems a promising avenue for research as a diagnostic tool.12,13

Prediction of IVH to date has typically relied on combinations of clinical variables although no easily accessible clinical tool exists to guide routine care.14,15,16,17 Shifts in cardiovascular dynamics are widely believed to contribute to the pathogenesis of IVH, and so it may well be expected that such shifts would leave a recognizable signature in HRV.18,19,20 Indeed, previous work has shown some predictive power of HRV for IVH.21,22,23,24 Notably, the Heart Rate Characteristic (HRC), a proprietary algorithm marketed as the HeRO system and extensively studied for prediction of neonatal sepsis, correlated with severe IVH and mortality at 24 hr of life (HOL) and at 7 days of life (DOL).25 This time spread is intriguing as most IVH is believed to occur within 48–72 HOL and suggests there may be some detectable perturbation in HRV both around the time of and hours to days after the hemorrhage occurs.7,26 While HRC was associated with IVH, it did not improve prediction beyond clinical models.

A particularly interesting component of HRC is sample entropy (SE).27 Stemming from information theory research, SE measures the randomness of a series of numbers. A series with predictable patterns would have a low entropy while a truly random series would have very high entropy. This gives SE useful statistical properties that are relatively more stable compared to other HRV measures, such as frequency spectral analyses, which tend to show rapid swings, broad variance and very high skewness especially in long time recordings.28,29,30 As SE is part of the standardized, open source PhysioNet HRV software, it can be calculated readily from any high resolution bedside cardiac monitor.31 Thus, in the current study, we sought to characterize SE in premature infants during the first week of life and to explore its association with IVH and, given its extremely strong correlation with IVH, mortality.

Methods

Patient population and clinical outcomes

The Stanford University Institutional Review Board approved this study. Clinical and cardiac monitoring data for all infants born at Lucile Packard Children’s Hospital from June 2017 through December 2021 with a gestational age (GA) of less than 32 weeks at birth, a common threshold for risks of premature birth, were reviewed.5 Eleven infants were excluded as no cardiac monitoring data was available for them due to death in the delivery room or prior to being placed on monitors in the neonatal intensive care unit (NICU). Ten infants were excluded who died after 14 DOL so as to avoid confounding from later complications that we would not expect to have a physiological signature in the first week of life.

Head ultrasound (HUS) results were reported by board certified pediatric radiologists with grade (Gr) determined using Papile designation.32 Unless clinically indicated due to critical illness or an acute change, head ultrasounds were routinely obtained on DOL 7. When serial HUS were performed on the same infant, the most severe grade was reported. Nine infants who died before a HUS were excluded from the IVH specific analyses. Infants with IVH Gr3 and 4 were grouped as severe IVH.17 Given both the strong correlation between mortality and IVH early in life in extremely premature infants and the likelihood some such infants may have died before IVH could be diagnosed by HUS, a composite outcome group of infants who either had severe IVH or died was constructed.4

As both IVH and mortality are much more likely at lower GA, to facilitate data exploration, a control group of infants was defined who survived without IVH or with IVH Gr1 only and who were born before 29 wk GA. This group had demographic characteristics very similar to those of the severe IVH and mortality groups (Table 1).

Table 1 Clinical characteristics of the study population.

In order to explore how HRV may complement known clinical predictors of IVH, clinical data were gathered from the electronic medical record. A clinical predictive model for IVH was adapted from Luque et al.16 Clinical variables included GA at birth, birthweight, delivery mode, infant sex, one minute APGAR score, receipt of at least one dose of antenatal steroids at least 24 hr before delivery and infant need for mechanical ventilation. Given the use of time-specific models, an infant was coded as mechanically ventilated if ventilation had begun by the relevant time point. As intubation-surfactant-extubation (INSURE) was part of clinical practice in this study period, mechanical ventilation was defined by occurrence for at least 4 hr contiguously.33

HRV data

Stanford University has established a warehouse of bedside monitor data.34 Monitor data were obtained using Philips MX800 monitors and Philips PIC iX Data Warehouse Connect. Beat locations and classifications were computed by a real-time arrhythmia algorithm analyzing 500 sample-per-second diagnostic quality ECG waveforms with 0.05–150 Hz. bandwidth. R-R intervals were measured with 0.5 ms resolution via parabolic ECG waveform interpolation. The open source benchmark PhysioNet Cardiovascular Signal Toolbox (https://www.physionet.org/content/pcst/1.0.0/) was used for R-R interval time series preprocessing and SE computation.31 The preprocessing module retained the default parameters apart from the missing data tolerance which was increased to 50% given the prevalence of rejected, and thus missing, beats in noisy infant ECG signals. The Toolbox converted R-R intervals into SE using 5-minute computation windows stepped at 30-second intervals with a radius of similarity (r) of 0.2 and an embedding dimension (m) of 3.35

Statistical analysis

SE was averaged by hour of life for each infant. Plots by clinical outcome group used a 1-dimensional Gaussian smoother with sigma = 1.36

Logistic regression models were constructed to better characterize the potential predictive power of sample entropy relative to the clinical outcomes of IVH and mortality. Given substantial inter-individual variability, each infant’s baseline SE was first calculated as the average in their first 4 HOL. The main predictive variable was the difference between baseline SE and SE averaged over a 4-hour window before various HOL censor points. Baseline SE itself was used for the time point at 4 HOL.

Regression models were tested using SE alone, the clinical model alone and the combination of SE with the clinical model. Predictive power was assessed using the area under a receiver operating curve (ROC AUC). To obtain confidence intervals (CI) and estimate generalizability, a bootstrapped test and train approach was utilized with 500 iterations on a random 20% of the sample data left out of the regression training data.37 The PhysioNet Cardiovascular Signal Toolbox ran in MATLAB version R2021b (https://www.mathworks.com/products/matlab.html). All other analyses were performed using Python version 3.7.6 with logistic regression from Statsmodels version 0.13.5 (https://www.statsmodels.org/).

Results

Demographics and outcomes

In total, 389 infants were included in this study. Demographic data is available in Table 1. Supplemental Table 1 shows how cohort sizes were impacted by mortality over time. Among infants with known IVH Gr3-4, seven died by 7 DOL. Among the infants with mortality by 14 DOL, 24 died by 7 DOL. Among infants in the entire cohort who required any mechanical ventilation by 7 DOL, it was initiated at a median of 2 h of life (IQR < 1–13 h). Among those who had severe IVH or died, median intubation was within the first hour of life. As expected, adverse outcomes were highly correlated with lower GA and were broadly comparable to published incidence rates.4,6

Sample entropy by clinical group

Fig. 1 plots SE, averaged by HOL, across the various clinical outcome groups over the first 7 DOL. SE appears generally higher in the IVH and mortality groups relative to controls albeit with considerable variance. Fig. 2 better illustrates this dynamic for the main adverse outcome groups and their matched controls with shading of the population distribution. Starting around 8 HOL of life, while it covered a broad total range, mean SE was noticeably and consistently higher in the adverse outcome groups relative to controls. While more variance was expected in the smaller adverse outcome groups, SE was also skewed higher in those groups with much of the 68th to 95th percentile region well above the 95th percentile for controls. Strikingly, before approximately 8 HOL, the means were reversed with the adverse outcome groups lower albeit with extensive overlap in their distributions.

Fig. 1: Mean SE by hour of life grouped according to clinical outcomes.
figure 1

a Infants groups according to maximum grade of IVH on cranial ultrasound b Infants grouped according to mortality within 14 days of life (c) Infants grouped according to composite outcome of either IVH Gr3-4 or mortality within 14 days of life. While not part of subsequent statistical analyses, to illustrate SE trends in a subset of controls with a mean GA similar to the affected groups (approximately 26 wk, see Table 1), the Control curves show infants born < 29 wk who are controls for each outcome. For visualization only, curves were smoothed with a one-dimensional Gaussian kernel with sigma = 1.

Fig. 2: Distribution of SE by hour of life grouped according to clinical outcomes depicting only the cases and gestational age-matched controls.
figure 2

a Infants groups according to maximum grade of IVH on cranial ultrasound. b Infants grouped according to mortality within 14 days of life. c Infants grouped according to composite outcome of either IVH Gr3-4 or mortality within 14 days of life. For visualization only, curves were smoothed with a one-dimensional Gaussian kernel with sigma = 1. Lines depict the mean SE by hour of life. Dark shading depicts the 32–68 percentile range, and light shading the 5–95 percentile range. All groups initially showed a slightly lower mean in the adverse outcome groups, a situation which reversed around 12 HOL.

Predictive performance of SE

In univariate logistic regressions, SE was highly associated with all outcomes from 24 to 96 HOL. The associations were strongest at 72 HOL with p-values on the order of 10−4. Odds ratios (OR) were generally 1.2–1.5 per 0.1 units of SE increase from the infant’s baseline.

In multivariate logistic regressions using the clinical model as covariates, results were very similar. Consistent with the findings in Figs. 1 and 2, there were also significant associations between SE for mortality and the composite outcome at 4 HOL but in the opposite direction. In the first few hours of life, an overall decrease of 0.1 units in SE was associated with an OR of 1.5 for mortality (p < 0.01) and 1.3 for the composite outcome (p = 0.04).

We used ROC AUC with a bootstrap approach to measure the predictive utility of SE. Fig. 3 summarizes the results using the training data. Addition of SE generally yielded a modest increase in AUC compared to the model with clinical variables only. At 96 HOL, the 95% CI of the clinical plus SE models did not overlap with those of the clinical only model for all 3 outcome groups. Similar results were observed in the out of sample tests, but, given the relatively small N values in the hold out samples, CIs overlapped among the 3 models (data not shown). Given the small remaining number of infants who died between DOL 5 and 14 (see Supplementary Table 1), the results for mortality at HOL 120 and after were omitted.

Fig. 3: Additive impact of SE on logistic models at different HOL censor points measured by ROC AUC mean and 95% confidence intervals estimated from bootstrap analysis, 500 iterations on random 80% samples of the training data.
figure 3

a IVH Gr3-4 as the predicted outcome. b Mortality as the predicted outcome. c Composite of IVH Gr3-4 or mortality as the predicted outcome. Three different logistic models analyzed different predictive variables - sample entropy (SE) alone, the clinical model alone and SE combined with the clinical model - for each outcome. Mortality data are omitted for 120–168 HOL as the shrinking size of the adverse outcome group made the analysis unreliable.

Discussion

Our study showed a marked difference in SE, an HRV metric which has shown promise in several neonatal studies to predict adverse outcomes, between control infants and those who either developed IVH or died in the first week of life.35,38,39,40 It showed significant correlation with eventual mortality as early as 4 HOL and with severe IVH by 24 HOL. When combined with a clinical prediction model that itself achieved AUCs around 0.83, adding SE improved prediction performance. At 96 HOL when it performed best, the combination of SE with the clinical model yielded AUCs 0.9 and above for all three outcome groups and performed significantly better than the clinical model alone, a performance not seen in previous studies of HRV to predict IVH.22,25

The pattern of SE differences between infants with and without adverse outcomes was surprising and somewhat unique compared to previous reports in the literature. As expected, we saw lower SE in infants with adverse outcomes relative to controls particularly early in life when risk of IVH is believed to be highest.7 Several similar studies have used lower SE alone or in combination with other HRV metrics to predict sepsis.28,35,41,42 Sullivan et al. used the HRC algorithm, marketed commercially as HeRO, to predict IVH and mortality.25 While that algorithm is proprietary and includes various features, it seems from published descriptions to be predicting adverse outcomes from decreases in SE.35,43 Lower SE has also been reported recently among more premature and lower birthweight infants generally.38 Indeed, the observation that gravely ill neonates tend to show a fixed heart rate was one of the earliest insights into HRV as a field of study.44 Thus, while we did not observe strong predictive power at 4 HOL, observing a significant association between lower SE and mortality was consistent with similar literature. That the association was significant so early in life suggests many of the contributing factors of mortality are detectable at or shortly after birth.

Interestingly, we found significantly higher SE in infants with adverse outcomes beyond approximately 24 HOL. To our knowledge, this is the first such report in a neonatal population. While there was substantial variance in the overall population, mean SE was consistently higher across the three adverse outcome groups. That this result was maintained through bootstrap test and train iterations suggests it is not merely a function of a few outlier infants. To our knowledge, the only factor that is documented to increase SE is administration of dexamethasone.45 No infants in our population received dexamethasone during the study period given its potential association with poorer neurodevelopmental outcomes.46

On manual review of the R-R tracings around the 24 HOL mark where the SE rise was most pronounced, we noted that transient decelerations seemed less common. Periods with variability around a steady baseline seemed unusually common and, as expected, these periods produced high SE.35 An example from the IVH group that juxtaposes brief decelerations more typical of the overall population sample with one of these high SE periods is shown in Supplemental Fig. 1. We also observed spikes – unusually short or long intervals – that may represent artifact or, in some cases, premature beats. As the preprocessing step of the PhysioNet Cardiovascular Signal Toolbox eliminated beats 20% longer or shorter than their predecessor, most of these did not influence our data. As an added precaution, we analyzed a sample of our data with such spikes using a more stringent preprocessing filter that eliminated beats 10% longer or shorter than their predecessor and then eliminated the top and bottom 0.5% of the remaining intervals. As Supplemental Fig. 2 shows, the impact on SE was modest even in areas with spikes. As expected, retaining some spikes reduced SE relative to the more stringent filter. Thus, to be confounding our result of increased SE, there would have to have been a dearth of such spikes within 20% of their predecessors in the IVH and mortality groups. Such a dearth would itself be a striking finding, but we saw spikes scattered across the population, including the much larger control population, and so do not believe this is a major factor.

A key limitation of our findings is that the exact timing of IVH is not known. Literature suggests most IVH in premature infants takes place in the first 48–72 HOL.5,8 As such, the subtly lower SE observed in the first 4–8 HOL in infants with adverse outcomes may be a marker for impending hemorrhage, an acute symptom of hemorrhage or a reflection of the clinical illnesses that tend to be found in cases of IVH. That this effect was so strong in the mortality group who may or may not have IVH as well suggests this difference in SE is a nonspecific reflection of clinical illness present early in life and correlated with both IVH and mortality.

The prominent increase we observed in SE around 72–96 HOL may then reflect a prolonged physiological reaction to the acute injury. That a rise in SE has also been documented from dexamethasone administration could hint at the underlying mechanism.45 Perhaps by this time, many infants who have suffered severe injuries are mounting an endogenous anti-inflammatory process that has a physiologic signature similar to a glucocorticoid effect. Potentially consistent with this hypothesis, a secondary analysis of the PREMILOC trial has shown endogenous cortisol levels were elevated around 24 HOL in infants with severe IVH relative to controls.47

There is no reason to expect IVH events to cluster around a specific HOL point, nor is there strong evidence about the shape of their statistical distribution over time. This variability in IVH timing may then explain the transition in our adverse outcome groups from slightly lower SE very early in life to notably and persistently higher SE by 48–72 HOL. Under this hypothesis, in IVH groups at the earliest HOL, most infants would be in the acute or premonitory phase with lower SE. Around 8–24 HOL, the population would be mixed with some infants already recovering from IVH or similar illness with higher SE and some still in the lower SE acute phase. By 72 HOL, the majority of IVH events may have occurred and so the population average would more consistently reflect higher SE.

Indeed, this uncertainty of IVH timing across our population filters the type of HRV changes detectable by these methods. Similar studies on the acute deterioration of sepsis hinge on the ability to identify a specific clinically relevant time point and align HRV data to it.12,48 Even if most IVH events in our population did occur in the first 48–72 HOL, that still leaves a broad range of time for a fairly brief clinical event. As such, by these methods, we could only have detected a change that is both prominent and persistent through an affected infant’s first days of life. A shorter lived pattern in SE or other HRV metrics could well exist around the time of or even before IVH events, but, being dispersed over different HOL points in different infants, such patterns would be averaged into the background in our current data. If the true timing of IVH events could be more narrowly defined, future research may find such detectable patterns in HRV around the time of IVH events that could be diagnostically useful.

To this end, we selected simple model features - the difference between latest SE and baseline or, for 4 HOL, just the average SE - for this analysis. Use of a simple logistic regression and a bootstrapped test and train approach to estimate AUC was to reduce the scope for overfitting and to improve the generalizability of findings for future research. Retrospective HRV studies often yield vast amounts of data with two to three R-R intervals per infant per second and various HRV metrics commonly estimated every 30 s. Datasets typically require tens of gigabytes or beyond and leave broad scope for overfitting when multiple features or complex models are used.49,50

Historically, HRV studies have struggled with a lack of standardized methods and so making comparison among studies has been challenging.51 While a widely adopted, proprietary system could be one solution, another would be an open source standard like the PhysioNet Cardiovascular Signal Toolbox.31 It can be used to calculate a very wide range of HRV metrics including time domain, frequency domain and phase rectified signal averaging. It allows easy customization to a wide range of input data from neonates to adults. Broad adoption of this standard could allow HRV researchers to more readily collaborate.

To our knowledge, this is the first reported use of the PhysioNet HRV benchmark in young infants. Intriguing preliminary work has deployed the benchmark for intrauterine diagnosis.52,53 Entropy measures have gained great attention recently for their potential power in this population and favorable statistical properties.54,55 More widespread adoption of benchmark algorithms throughout the field would be welcome.

Limitations

This study had important limitations. It was conducted retrospectively at a single center with potentially limited variation in clinical care. Despite a fairly large sample size, there was only a modest incidence of the outcomes of severe IVH and mortality. This is an especially important limitation in our out of sample ROC AUC testing where a typical hold out sample had only a few infants with adverse outcomes and so limited the ability to estimate generalizability of HRV relative to the clinical model.

Not knowing the specific timing of IVH events is a clear limitation that biases our approach to detect only HRV changes that are persistent through time. We speculate but cannot confirm that our strongest predictive power from elevated SE at 96 HOL is likely after most IVH events have already occurred. It is possible lower SE may presage IVH or other acute events, but better determination of IVH timing would be needed to assess this.

Only the highest reported IVH grade was used as the outcome. Some infants may have had progressive hemorrhages over time. If acute events are associated with lower SE, it would be expected to drop toward the end of the expected time window for IVH, that is, roughly 48 HOL and beyond. If true, this would bias our results to under-reporting a potentially true rise in SE after early IVH events.

Timing of initiation of end of life care may also be a potential limitation. Severe HRV depression has long been associated with impending mortality.44 Infants transitioned fully to comfort care would have been disconnected from monitors and not contribute to HRV data. However, this practice and its timing relative to severe HRV depression was likely to have been highly variable. Half the infants in our sample died by the third DOL and therefore large HRV changes in a few infants could be skewing the results especially at later HOL.

In conclusion, HRV and, in particular, SE continues to show promise as a diagnostic tool in neonatal populations. To our knowledge, this is the first report of elevated SE relative to controls in infants who experienced the adverse outcomes of IVH or mortality. While previous work with HRV has also shown some predictive power for IVH, we observed improvement on the predictions of common clinical models not seen in those studies.22,25 The uncertain timing of IVH is a major limitation which may be addressed in the future with more frequent HUS monitoring and larger populations. Future research would benefit from continued use of standardized, open-source HRV software like the PhysioNet Cardiovascular Signal Toolbox to facilitate collaboration among different centers.31 As bedside technology delivery improves, more research into algorithms that incorporate both clinical variables and high frequency data such as HRV is needed. NICU care of the future will include clinical decision making tools which integrate individual patient factors with multiple streams of physiologic data, of which HRV will be a core component. Use of standardized, open source HRV measures correlating with patient outcomes is a vital step in the innovation of patient care.