Introduction

Recent advances in neonatal care have led to a significant rise in the survival rate of preterm infants and improvements in their quality of life [25]. However, despite these advances, a significant proportion of very preterm infants (< 32 weeks) continue to have neurodevelopmental disability such as delayed neurodevelopment, cerebral palsy and sensory deficits [4]. Ongoing milder neurological impairment, reduced IQ, linguistic and motor skills, poor attention span and reduced social interaction abilities may also occur [5, 15, 17, 21, 27]. White matter injury is thought to be an important determinant for an adverse neurodevelopmental outcome [4]. Very preterm infants are at high risk of haemorrhagic, ischemic or inflammatory-induced white matter injury (WMI) and other brain damage [29]. Therefore, better detection and characterisation of these pathologies in early life may help with prognosis, anticipate needs and devise appropriate early interventions.

The most severe of neonatal WMI is cystic periventricular leukomalacia (cPVL), which consists of localised cystic necrotic lesions, and results in significant neurological deficits [4, 8]. During recent decades, the characteristics of WMI have shifted from classic PVL to more subtle or diffuse WMI. These lesions are detected radiologically on cranial ultrasound (cUS) as echogenicities, or on MRI as diffuse excessive high signal intensity (DEHSI) regions [28]. Echogenicity is defined as areas of ‘brightness’ of higher intensity than the choroid plexus, while DEHSI is white matter signal intensity greater than that of normal unmyelinated white matter on T2-weighted MRI images [30]. It is thought that these injuries are the result of damage to the late precursor oligodendricytes and subsequent loss of axonal myelination [9, 29]. Although echogenicities on cUS has been shown to correlate with DEHSI on MRI, it is still unknown whether both are representative of the same phenomenon [30].

MRI is the gold-standard method for detecting neonatal brain injuries [4]. However, with recent improvements in machine capabilities, cUS has become an accurate and cost-effective technique for detecting most cystic lesions and is routinely used for sequential bedside screening without exposure to ionising radiation [4]. Previous studies have shown that MRI is more sensitive in detecting WMI than cUS [19, 28]. However, the prognostic accuracy of subtle white matter-related MRI abnormalities for long-term developmental outcomes is debatable [24, 30]. The aim of this study was to assess reliability of a classification system for WMI in extremely preterm infants < 28 weeks gestation on both sequential cUS and term equivalent age (TEA) cUS using MRI as a reference standard [29, 30]. We also investigated accuracy of both cUS and MRI WMI scores in predicting long-term neurodevelopmental outcomes at 1 year and 3 years corrected age.

Materials and methods

This study was a retrospective cohort study conducted at a single tertiary Neonatal Intensive Care Unit (NICU) in Australia. The study was approved by ACT Health Human Research Ethics Committee (ETHR.14.194).

Patients

Extremely preterm infants < 28 weeks gestation, born between 1 January 2006 and 30 June 2014, were eligible for the study. Only those infants who had an MRI scan at term equivalent gestation were included in the study. It was routine practice to perform MRI at term equivalent age in all neonates < 28 weeks in our unit. General information, including date of birth, birth weight, gestational age (GA), date and GA of worst and term-equivalent cUS date, GA of last brain MRI, and Bayley’s neurodevelopmental score were obtained for each patient from NICU electronic database and patient medical records. Morbidity data including culture positive sepsis, necrotising enterocolitis (NEC) (stage 2 and above), bronchopulmonary dysplasia (BPD), retinopathy of prematurity (ROP) needing laser treatment and patent ductus arteriosus (PDA) were also collected from the database.

Image collection and analysis

The list of eligible infants was obtained from the neonatal database of our unit. From this list, babies who had an MRI scan done at term equivalent age were included for final analysis. Serial cUS and MRI images for these patients were obtained from the radiology information system and picture archiving communication system (RIS-PACS) and reviewed by the investigators.

Ultrasound was performed on either Phillips iU22 using a curved array transducer probe 8/5 MHz, or GE LOGIQ using a broad-spectrum linear transducer 9L-D 2–8 MHz. For all cranial ultrasounds, images were acquired though the anterior fontanel. At least eight images are acquired on coronal planes and eight on sagittal planes. MRI imaging was performed on a Siemens MAGNETOM Avanto, 1.5 T, using standard protocol for neonatal brains. Standard neonatal brain sequence is composed of an axial T2-weighted (T2-W) transverse, T1-W sagittal (TR 1500 ms), fluid-attenuated inversion recovery (FLAIR; slice thickness 4 mm), T1-W transverse (TR 1500 ms), transverse susceptibility weighted imaging (SWI), diffusion weighted imaging (DWI) and space 3D T1 inversion recovery. Sagittal and transverse T1-W images allow assessment of midline brain structures, particularly the corona radiata and corpus callosum. Transverse T2-W and FLAIR images have been shown to be complementary in children. SWI is sensitive to changes in local field inhomogeneity and is valuable in trauma and vascular malformations. DWI is useful in examining acute white matter changes. All radiology images were reviewed by two primary investigators, with more than 15 years combined radiology experience, Drs OK and RJ. The radiologists were blinded to the babies’ gestation and long-term outcomes. Any disagreements in grading were resolved by consensus.

Only those with both an ultrasound and MRI were graded. Both the worst cUS (chosen from review of all serial cranial ultrasound scans in NICU) and term-corrected cUS were graded for all patients. The images were graded as per the classification system proposed by Leijser et al. [18]. The grading system is well acknowledged, and for consistency with established literature, no modifications were made (Table 1).

Table 1 Grading system of different brain injuries based for cranial ultrasound and magnetic resonance imaging

Hyper-echogenicity visualised on ultrasound was defined as regions hyper-echoic to the choroid plexus. Ventricular index (VI) was measured from the lateral wall of the frontal horn of the lateral ventricles to the septum pellucidum.

Neurodevelopmental assessment

Bayley Scales of Infant Development, Third Edition (BSID-III), was used for developmental assessment at 1 and 3 years of age. This assessment consists of five scales—cognition, receptive language, expressive language, fine motor and gross motor—and has been validated in the USA. It is currently used in the neonatal follow-up clinics in our unit. The composite scores of the BSID-III combine expressive and receptive language together, and gross and fine motor together. Abnormal neurodevelopmental outcome was defined as a diagnosis of cerebral palsy (any grade), sensorineural or conductive deafness requiring bilateral hearing aids or cochlear implants, bilateral blindness (vison < 6/60 in better eye) or developmental delay on BSID III. Developmental delay was defined as scaled score < 7 on any of the five subscales of BSID III assessment [16]. The five subscales were used as they identify differences between the subsections of the language and the motor scales that may not be evident with the composite scores. Patients that were not testable on BSID-III due to significant neurosensory impairment and/or global developmental delay were given an overall score < 7. Where a BSID III assessment was not available, a diagnosis of developmental delay by the developmental Paediatrician at 1 and 3 years was considered as an abnormal neurodevelopmental outcome. We also compared cUS and MRI grades with each of cognitive, motor and language delay (defined as a composite score < 85), a combined Bayley’s - CB III score < 80 (average of cognitive and language scores), global developmental delay (composite scores < 70 in all domains) and a diagnosis of cerebral palsy. Median composite scores for each domain were also compared between those with and without brain injury on worse and TEA cUS and TEA MRI scans.

Statistical analysis

Sensitivity, specificity, PPV and NPV of individual cUS grades in comparison with MRI grades (gold standard) were calculated using 2 × 2 tables. Sensitivity, specificity, PPV and NPV were also calculated for individual cUS and MRI grades and long-term neurodevelopmental disability at both 1 and 3 years. Chi-square test was used to compare cUS and MRI grades to delay in individual developmental domains. Median composite scores of babies with no brain injury were compared to those with any injury using Mann-Whitney test. Multivariate logistic regression analysis was used to determine independent association of radiological injury on cUS or MRI on neurodevelopment after adjusting for known confounding factors including gestation < 26 weeks, weight < 750 g, culture-positive sepsis, NEC stage two and above, ROP needing laser and haemodynamically significant PDA. Statistical software used was SPSS (version 25.0). p < 0.05 was considered significant.

Results

Patients

Eighty-six patients were eligible for the study, having received both a cUS and MRI. Only 75 of these cases were available for grading. In the rest, the MRI images could not be located. Table 2 shows the general characteristics of the neonates included in this study.

Table 2 Characteristics of study population

Abnormal cUS was identified in the worst of sequential cUS in 37 of the 75 cases, and 29 term cUS images, while 24 had abnormal MRI changes. The images were analysed and the grades shown in Table 3. Of the patients with TEA ultrasound abnormalities, all had increased periventricular echogenicity, nine patients had periventricular cysts/cavities and three patients had a ventricular index (VI) > 13 mm. Twenty-four patients had MRI abnormalities. All had white matter changes, 10 had periventricular cysts and 3 infants had VI > 13 mm (as seen on cUS). Comparative graded injuries of the same patients using cUS and MRI are shown in Fig. 1.

Table 3 Comparison of number of neonates with brain injury detected on cranial ultrasound and/or magnetic resonance imaging
Fig. 1
figure 1

Grade 1 injury as visualised on coronal cranial ultrasound (a) demonstrating asymmetrical lateral ventricles (arrows). T2-weighted magnetic resonance imaging (b) from a different patient showing normal appearing white matter and prominent lateral ventricles (arrows). Grade 2 injury on coronal cranial ultrasound (c) demonstrating small, localised periventricular cysts (arrow). T2-weighted magnetic resonance imaging (d) from the same patient showing inhomogeneous DEHSI and a VI of 14 mm, as indicated (arrows). Grade 3 injury on coronal cranial ultrasound (e) demonstrating multicystic lesions (arrow), grade IV germinal matrix haemorrhage and enlarged lateral ventricles. T2-weighted magnetic resonance imaging (f) from the same patient showing haemorrhagic and cystic lesions (arrows), periventricular leukomalacia and ventriculomegaly (VI > 15 mm)

Predictive value of cUS for MRI changes

The sensitivity, specificity, positive and negative predictive values for cUS as compared to MRI are shown in Table 4. The table shows high PPV of severely abnormal term-equivalent age (TEA) cUS, but lower PPV in infants with mild to moderately abnormal TEA cUS. The predictive value for the worst severe cUS injuries was lower compared to TEA cUS. Absence of a moderate to severe injury was a good predictor of having a normal MRI (NPV). Overall, any injury on serial cUS injury and term cUS injury had limited predictive values in predicting an abnormal MRI.

Table 4 Predictive values of cranial ultrasound brain injury compared to magnetic resonance imaging (gold standard), n = 75

Predictive value of cUS and MRI for neurodevelopmental outcome

Neurodevelopmental outcomes at 1 and 3 years corrected age were available for 68 and 57 of the neonates, respectively. The remaining neonates (including one with grade 3 change on both ultrasound and MRI, one with grade 2 changes on cUS and grade 3 change on MRI, and several with grade 1 or 2 changes on either cUS or MRI) did not have follow-up at The Canberra Hospital, and their records were not available for this study. Thirty (45%) infants had abnormal neurodevelopment at 12 months, while 17 (29.8%) infants were abnormal at 36 months. The sensitivity, specificity, positive and negative predictive values for different grades of cUS and MRI in predicting abnormal neurodevelopment are shown in Table 5. Severe grade 3 injuries on TEA-US had high predictive values in predicting abnormal neurodevelopment at both 1 and 3 years of age. All grades of MRI and worst serial cUS injuries poorly predicted abnormal neurodevelopment at both 1 and 3 years. Absence of an injury either on a cranial ultrasound or an MRI did not predict a normal outcome.

Table 5 Predictive value of different grades of cUS and MRI lesions in determining neurodevelopmental outcome

Bivariate analysis comparing individual grades as well as any injury on worst serial cUS, TEA cUS and MRI did not show any statistical significance to neurodevelopmental outcomes including composite scores, both at 1 and 3 years. There was no statistically significant difference in median composite scores between those with and without brain injury on worse and TEA cUS and TEA MRI scans. Multiple logistic regression also did not show a significant correlation between imaging injury and neurodevelopmental outcomes at 1 and 3 years. The only factor significantly associated with abnormal neurodevelopment was ROP needing laser surgery (p = 0.001).

Discussion

This study retrospectively assessed and compared cranial ultrasound near-term corrected age and MRI at term corrected age using a grading system that included white matter changes as well as other changes thought to be related to white matter injury. Sensitivity, specificity and predictive values, of serial cUS and TEA cUS in detecting brain injuries, were compared to the gold standard, MRI. Additionally, the predictive values for cUS and MRI findings in determining long-term neurodevelopmental outcome were assessed to establish if cUS is sufficient in predicting neurodevelopmental outcome without the need for MRI. We used a classification system, based on the one used by Leijser et al. [18].

We found that term cUS had good PPV in predicting MRI brain injury compared to worst cUS. All grade 3 injuries at near-term cUS had similar injuries on MRI. Three out of five grade 3 injuries on serial cUS improved or resolved by term, thus decreasing the PPV of early cUS. This also highlights the importance of doing a repeat cUS scan at TEA as many white matter changes in preterm neonates are transient in nature. All grades of term cUS and worst cUS had good NPV for finding changes on MRI, suggesting that most neonates who had normal cUS also had normal MRI. Our findings are similar to previously published studies. Leijser et al. [18] compared sequential cUS from birth and TEA with MRI performed at term in 110 preterm infants (< 32 weeks). They found that PPV for TEA cUS was high for severely abnormal brain injury on MRI, but not for mild/moderate injury. Similarly, Horsch et al. [12] compared paired cUS and MRI done at term-equivalent age and showed that all severe cUS abnormalities identified on MRI were also detected by cUS at term. Moreover, Rademaker et al. [23] found that subtle WMIs are more detectable on MRI than cUS, but that a normal ultrasound excluded a severe MRI lesion in almost all cases. There are several explanations for these results. MRI uses high-resolution technique and systematically acquires a series of images of the whole brain. In contrast, ultrasound images are highly operator dependent. Although there is a standard protocol for performing cranial ultrasounds, there are variations in techniques between sonographers. The appearance of the images may be susceptible to the settings used on the ultrasound machine, especially the gain and time gain compensation, which may significantly alter the appearance of white matter. This makes assessment of subtle white matter abnormalities less reliable.

With regard to neurodevelopmental outcomes, we found that all grades of MRI, including grade 3 injury, were poorly predictive of neurodevelopmental impairment both at 1 and 3 years. We saw similar results for sequential cUS. Interestingly, severe (grade 3) injury on term cUS was highly predictive of neurodevelopment at both 1 and 3 years. The negative predictive value for MRI was also low compared to cUS, suggesting that several babies had abnormal neurodevelopment despite a normal MRI. Nineteen babies with no changes on MRI had abnormal neurodevelopmental outcome at 1 year, 16 of which also had a normal term equivalent cranial ultrasound. These figures were 12 and 8 respectively at 3 years. This is consistent with previous studies [6, 10, 23, 24, 31]. It is possible that imaging only evaluates visible abnormal anatomical morphology and does not account for radiologically occult factors (such as gestation and comorbidities) as well as later childhood influences when assessing outcomes. Furthermore, brain growth in extreme preterm infants may be globally delayed even without overt WMI [1, 13, 14].

Other studies [12, 20] have also found cranial ultrasound and MRI to be equally predictive of cerebral palsy and early childhood neurodevelopmental outcomes in preterm infants. More recently, Edwards et al. [7] in a large cohort study of 511 preterm infants less than 33 weeks gestation showed that MRI predicted abnormal neurodevelopment at 20 months only slightly better than cUS (0.74 vs 0.64). Skiöld et al. [26] also found that while MRI is sensitive in detecting brain injuries, very preterm infants had poorer performance overall on BSID-III at 30 months than term-born controls regardless of whether a brain injury was seen on MRI or not. We choose to use BSID composite score < 85 as one of the outcome measure as previous study showed that BSID scores < 70 could underestimate neurological disability [16].

Other studies [2, 3] have shown correlation between abnormal MRI at term equivalent age and abnormal neurodevelopment. Cheong et al. in a cohort of 197 moderate and late preterm infants showed that larger total brain tissue, white matter and cerebellar volumes at term-equivalent age are associated with better neurodevelopment. Similarly, Brouwer et al. demonstrated that higher global brain abnormality scoring was associated with poor motor outcome and learning performance. These studies used volumetric brain measurements including both subcortical and deep grey matter as well as posterior limb of internal capsule myelination. Our scoring did not use volumetric measurements which may possibly explain the difference. We, however, did look at PLIC myelination separately but did not find any difference in neurodevelopmental outcomes between babies with no/sparse myelination and normal or moderate myelination. Most NICU’s around the world would not have access to these advance volumetric measurements or paediatric neuroradiology expertise; hence, we believe that our findings are relevant to most units.

There are several limitations of our study. Of the 156 eligible babies, nearly half of them did not have a term MRI due to death or transfer to regional hospitals, and hence were excluded from the study. Whilst several babies had grade 2 injury on both cUS and MRI, only two babies had grade 3 injury on TEA cUS and five babies had grade 3 injury on TEA MRI which makes it difficult to draw a definite conclusion. Secondly, this was a retrospective study limited to a single centre. In this study, the cUS and MRI were not obtained on the same day; however, our radiologists examined serial cUS, identifying both the worst cUS and the term cUS to allow for the most accurate assessment. Furthermore, BSID-III scores were absent for several of the infants, due to failure to return for neurodevelopmental assessment, incomplete Bayley’s examination or infant mortality. For the infants with incomplete Bayley’s tests, the clinical notes, letters and other medical records were consulted to identify any physical or cognitive developmental abnormalities. In almost all cases where Bayley’s examinations were incomplete due to non-compliance of the infant, it was noted that there was some degree of developmental delay and this was considered as a positive result. In this study, the outcomes were assessed at relatively young age (12–36 months). Some babies with subtle white matter changes may have mild cognitive defects and impaired school performance, which were not considered in our investigation. The importance of cerebellar injury in preterm infants has become increasingly recognised to be associated with neuro-motor, behavioural, and cognitive delays [10]. The cerebellum can be visualised on CUS with mastoid views. None of our cUS scans showed cerebellar bleeds; however, majority of our images did not have mastoid views, as these did not become part of routine views until 2015. Nevertheless, we did not find any cerebellar bleeds even on term MRI in our cohort.

Despite these limitations, our study had many strengths. We evaluated extremely preterm neonates < 28 weeks gestational age—infants with the greatest risk of neurodevelopmental abnormalities. Our grading system enabled accurate comparison of cUS and MRI images, using a simple and validated classification that also included other changes related to WMI. All images were scored by experienced radiologists and double-checked to ensure grading consistency, and both the radiologists were blinded to neurological outcomes.

Conclusions

This study demonstrates that TEA cUS can reliably identify severe brain abnormalities that would be seen on MRI imaging and positively predict abnormal neurodevelopment at both 1 and 3 years. Although MRI can pick up more subtle abnormalities that may be missed on cUS, their predictive value on neurodevelopmental impairment is poor. There is insufficient evidence that the routine use of term-equivalent or discharge screening brain MRIs in preterm infants improves long-term outcome [11]. The predictive uncertainty of these tests potentially can have significant mental and social impact on the parents of these vulnerable infants [22]. Advanced neuroimaging techniques like diffusion tensor imaging, cortical surface area and cerebral volumetric measures [10] may improve prognostic abilities in the very preterms. However, they are not widely available or easily interpretable, and hence their use should be limited to research settings only. Alternate tools like general movement assessment, ongoing assessment of infants at high-risk of abnormal neurodevelopment until school age, and timely referral and initiation of early intervention provide most value for our vulnerable preterm population.