Introduction

For nearly three decades, the rate of preterm birth (before 37 weeks of gestation) and/or low birth weight (below 2500 g) steadily increased worldwide [1,2,3]. Currently, 10% of all live births in developed countries are preterm [1, 4, 5]. Concomitantly, the survival rates for very early preterm births have grown and radiologists are increasingly exposed to adult survivors of preterm birth [6], particularly since especially survivors of very preterm birth or very low birth weight (VP/VLBW, < 32 weeks of gestation and/or birth weight < 1500 g) are predisposed to long-term neurodevelopmental disabilities that persist into adulthood [6,7,8,9,10,11,12]. Therefore, knowledge of brain structural changes due to preterm birth is crucial to avoid misdiagnoses.

The immature brain is susceptible to the consequences of preterm birth and abnormal brain development may contribute to neurodevelopmental symptoms that manifest throughout life after preterm birth [13]. MRI detects structural and functional brain alterations related to preterm birth [6, 8, 14,15,16]. In children, conventional MRI uncovers abnormal signal intensity in white matter, particularly in periventricular areas and along the visual pathways, undulating ventricular borders, ventriculomegaly, cerebral atrophy, thinning of the corpus callosum, and delayed myelination [15, 17, 18]. Special MRI methods such as relaxometry, water fraction, or diffusion measurements relate the white matter lesions to axonal or oligodendrocyte injury and abnormal structural connectivity [19,20,21] while functional MRI relates these structural findings to impaired functional connectivity [22]. Volpe [23] coined the sum of these white and gray matter abnormalities as “the encephalopathy of prematurity” and has identified it as the principal determinant of neurodevelopmental outcome.

Brain abnormalities are well documented in newborns, infants, and children and persist into the adult life [12]. VLBW adults have a higher incidence of neurosensory deficits, a greater burden of illness, lower IQ scores, and poorer educational achievement than their peers born with normal birth weight [12, 24,25,26] and lower quality of life [27]. However, few studies describe the adult preterm brain from a neuroradiological perspective [28,29,30,31]. We hypothesized that the brain abnormalities still present in adulthood would form a specific MRI pattern related to premature birth. Therefore, we aimed to investigate the structural brain abnormalities and the diagnostic accuracy of their qualitative and quantitative analysis in term born and VP/VLBW adults.

Methods

Subjects

This study is part of the Bavarian Longitudinal Study (BLS), a prospective, geographically defined whole-population study of neonatal at-risk children, who were followed-up from birth into early adulthood [32,33,34,35,36]. The assessment in adulthood included brain MRI—where this was feasible—performed between Jan 2011 and Dec 2013 at two sites and this is a cross-sectional analysis of the MRI data of all the 67 subjects at one of the sites: 40 VP/VLBW and 27 term born controls (Table 1). The allocation into one of these groups was done based on information collected at birth. By design, birth weight and gestational age were significantly lower in the VP/VLBW group. IQ (full-scale IQ) was assessed with the German version of the Wechsler Adult Intelligence Scale (WAIS III) in adulthood [37].

Table 1 Background characteristics of VP/VLBW and term control study participants. p values are derived from comparing control and preterm groups by t tests or Fisher’s exact test (for sex)

The study was approved by the local ethics committees and all participants gave written informed consent.

Image acquisition

MRI assessments were carried out on a 3-T scanner (Philips Achieva or Ingenia, Philips Healthcare) with an 8-channel head coil. Whole brain, high-resolution isotropic T1-weighted (T1-w), 3D fluid-attenuation inversion recovery (FLAIR), and 3D T2-weighted (T2-w) images were acquired (Supplementary Table 1). For both, FLAIR and T2-w sequences, we used a fast spin echo acquisition, based on the variable refocusing flip angle technique (sweep technique) [38].

Data analysis

We evaluated the MR images quantitatively through automatic segmentation, manual measurements and qualitative ratings of white matter and lateral ventricle aspect. All evaluations were done blinded to group membership (subject’s birth status) (Fig. 1).

Fig. 1
figure 1

Analysis summary. On T1-w (a) and FLAIR images (c), we visually noted the presence of deformed lateral ventricles with billowed or bulged lateral walls of the posterior lateral ventricle (arrows in (a)) as well as the presence of white matter lesions or of regions with “dirty” appearing white matter (with intermediate signal intensity between lesions and normal-appearing white matter) (arrowheads in (c)). We also manually measured the widths of splenium (sw), isthmus (iw), body (cw), and genu (gw) of the corpus callosum (d) and automatically measured the volumes (b) of deep (red) and cortical gray matter (blue), white matter (yellow), lateral ventricles (green) as well as anterior and posterior corpus callosum (orange). Dotted vertical line in (d) delimits the anterior and posterior corpus callosum

For the quantitative analysis, a trained neuroscientist (AJ with 14 years of experience) automatically segmented and analyzed the volumes of lateral ventricles, white matter and gray matter (cortical and deep), and midline corpus callosum (anterior and posterior halves), all normalized for individual head size. This was done using FSL [39] (FMRIB Software Library v5.0, https://fsl.fmrib.ox.ac.uk/fsl) and its tools SIENAX and FIRST. The main steps involved extracting brain and skull images from the single whole-head 3D T1-weighted data, affine-registration to Montreal Neurological Institute (MNI) space (using the skull image to determine the registration scaling), tissue-type segmentation with partial volume estimation, masking with standard space masks (for lateral ventricles and corpus callosum), and, finally, extraction of normalized segmented volumes.

Manual measurements and qualitative analysis were done on the local PACS, on CE certified diagnostic monitors. Blinded to subject’s birth status, a board-certified neuroradiologist (EH with 16 years of experience) and a resident radiologist (MM with 5 years of experience) consensually evaluated the T1w, FLAIR, and T2w 3D images––with regard to the width of the corpus callosum, the aspect of the lateral ventricles (normal or deformed/enlarged) and presence/absence of white matter lesions or abnormalities (Fig. 1). Hereby, they manually measured the widths of the splenium, isthmus, body, and genu of the corpus callosum. They also visually noted the presence of deformed lateral ventricles with billowed or bulged lateral walls of the trigone and posterior horn of the lateral ventricle as well as the presence of white matter lesions or of regions with “dirty”-appearing white matter defined as diffuse or patchy moderate T2-hyperintensity that is between lesions and normal-appearing white matter, according to Ge et al. [40].

To compute inter-rater agreement, a third rater (VK, a board-certified neuroradiologist with 8 years of experience) performed a separate additional reading of the 3D T1w, T2w, and FLAIR images from all the subjects.

Model selection and specification

As the dependent variable was binary (preterm birth) the appropriate generalized linear model was a logistic regression. The quantitative and qualitative variables resulting from automated segmentation, manual measurements, and qualitative ratings were used as explanatory variables in several prediction models. A detailed description of all the models is presented in Supplementary Table 2. As the goodness of fit should not be measured in any form of r-square ([41], page 425), the area under the curve (AUC) was chosen as more appropriate for model comparison. As the explanatory variables might be correlated, a model has to be found which yields an acceptable goodness of fit by avoiding collinearity; as collinearity would disguise potential relevant factors as “not statistically significant.”

Model sensitivity

The final model comprised the smallest set of explanatory variables yielding a goodness of fit that is not statistically significantly different from the goodness of fit of the full model and insofar fulfilled the “law of parsimony” (Occam’s razor).

Statistics

To test for differences between VP/VLBW adults and controls, we used independent samples t tests (continuous variables) and Fisher’s exact test (categorical variables). Except for the volume of the lateral ventricles, we could not find any statistically significant deviation from a normal distribution (Shapiro-Wilk tests) in any of the other variables involved in the modeling. In the case of volume of the lateral ventricles, the Mann-Whitney U test delivered similar results as the t test. We evaluated the inter-rater agreement by means of Krippendorff’s alpha coefficient. We analyzed diagnostic accuracy with logistic regression and receiver operator characteristic (ROC) and report performance at an optimal cut-point of 0.5 to maximize simultaneously both sensitivity and specificity. The models’ accuracy and data robustness were validated via bootstrapping [42] with 10,000 resamples. We used R Statistics (version 3.3.2 for Mac OS X, R Core Team 2016 www.R-project.org) and STATA software (StataIC 14, StataCorp 2017). p values are presented as uncorrected and corrected for multiple comparisons with the false discovery rate method [43]. For all analyses, we set statistical significance at p value < 0.05 and we report means and range if not otherwise specified.

Results

Figure 2 depicts the results of the visual rating, of manual measurements, and of automatic segmentation. Exemplary images of lateral ventricles from all VP/VLBW and control participants are presented in Fig. 3.

Fig. 2
figure 2

Results of the automatic segmentation (volumes), manual measurements (width) and visual rating analyses (aspect) including uncorrected p values from t tests or, for proportions, Fisher exact tests

Fig. 3
figure 3

Exemplary images of lateral ventricles from all controls (on the right) and VP/VLBW patients (on the left). To enable easy visual evaluation, all images have been registered to a standard brain space, the Montreal Neurological Institute (MNI) space

Visual evaluation showed that compared with controls, VP/VLBW subjects had significantly larger and deformed lateral ventricles (65% of VP/VLBW and 33% of controls, odds ratio = 3.31, p = 0.036) with billowed or bulged lateral walls of the posterior lateral ventricle (posterior cella media, trigonum and occipital horn) towards the white matter and tended to have more white matter alterations (52% of VP/VLBW and 26% of controls, odds ratio = 3.08, p = 0.06). Manual measurements revealed a thinner isthmus of the corpus callosum in VP/VLBW subjects (p = 0.048). After correcting for multiple comparisons, manual measurements and visual evaluations were no longer significant.

The results of the inter-rater analysis revealed a poor agreement for the visual evaluation (49% percent agreement, Krippendorff’s alpha − 0.018, p value 0.881 on rating the lateral ventricles as deformed and a 63% percent agreement, Krippendorff’s alpha 0.214, p value 0.085 on rating the white matter aspect as altered). However, we found an excellent agreement for the manual measurements (99%, 97%, 96% respectively 97% percent agreement, Krippendorff’s alpha 0.832, 0.467, 0.434 respectively 0.672 and all p values < 0.001 for the width of the genu, anterior part, posterior part (isthmus) respectively splenium of the corpus callosum).

All the preterm born subjects (n = 40) and two term born subjects (n = 2) underwent cranial ultrasound at birth for detection of perinatal brain injury. Of these, four subjects (all preterm) had signs of intracranial hemorrhage on the ultrasound images, two of grade III, one of grade II, and one of grade I. The two subjects with grade III intracranial hemorrhage showed the following associated alterations on the adulthood MRI: T2-hyperintense lesions with marginal gliosis right periventricular and consecutive compensatory enlargement of the right lateral ventricle to this defect in one subject, and prominent enlargement of the lateral ventricles and overall thinned periventricular white matter in the other subject. Adulthood brain imaging of the other two preterm born subjects, with grade I and II intracranial hemorrhage, did not indicate any associated structural or ventricular alterations.

In adulthood, none of the subjects in our cohort had any neurological symptoms of CSF circulation disturbance. The neuro-cognitive performance of the VP/VLBW individuals in adulthood was the subject of another work [25] but we can confirm in our sample that VP/VLBW had significantly lower IQ scores compared to term born controls (p = 0.002, Table 1).

Automatic segmentation (normalized for head size) revealed higher lateral ventricle volume (p = 0.003), lower posterior calossal volume (p = 0.001), and lower deep gray matter volume (p = 0.001) in the VP/VLBW group. White matter and cortical gray matter volumes or size of the anterior corpus callosum did not differ between VP/VLBW and controls (Table 2). Absolute, not normalized, brain volumes were lower in VP/VLBW than in controls (1181 cm3 in VP/VLBW versus 1280 cm3 in term born controls, p = 0.0007).

Table 2 MRI findings. p values are derived from comparing control and preterm groups by t tests (continuous variables) and Fisher’s exact tests (categorical variables) and are presented as values uncorrected and corrected for multiple comparisons with the false discovery rate method [43]

A logistic regression model including all variables could correctly classify 87% of cases with 88% sensitivity and 85% specificity. Models that included only automatic segmentation or automatic segmentation and manual measurements performed similarly well as the full model (p = 0.32 and p = 0.97 respectively, Table 3), with the automatic segmentation achieving the second best performance from all models with an accuracy of 84%. Visual inspection alone had the lowest classification accuracy, 63% and a specificity of 41%. Based on ROC comparison, models that included only neuroradiological parameters (visual ratings and/or manual measurements) were significantly weaker (p < 0.05, Table 3).

Table 3 Diagnostic performance of models including MRI variables; optimal cut-point was set to 0.5 to maximize simultaneously both sensitivity and specificity

The variance inflation factor, an index that measures severity of the collinearity (linear relationship) among explanatory variables, showed excessive collinearity that can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. Including only a subset of representative variables to reduce collinearity and performing a forward stepwise multivariate logistic regression resulted in a model with three independent variables—volume of deep gray matter, ventricular volume, and white matter aspect—that correctly classified 81% of cases with 85% sensitivity and 74% specificity. However this model, was significantly weaker than the full model (p = 0.02, Table 3).

Results from the bootstrap validation confirmed the robustness of all our logistic regression models (Table 3). Details of all the logistic regression models analyzed and the variables in the models, including the odds ratio, are presented in Supplementary Table 2.

The male proportion in both groups (control, 8 female of 27 = 30%; preterm, 16 male of 40 = 40%) showed no statistically significant difference (Fisher’s exact test; p = 0.4438). A model including sex as a confounder was statistically not significantly different from the model without sex (Likelihood ratio test p = 0.1408).

Discussion

This study describes and quantifies the main brain abnormalities in VP/VLBW born adults compared to term born controls focusing on the diagnostic accuracy of the qualitative and quantitative analysis with regard to these brain abnormalities.

More than half of VP/VLBW subjects showed a deformed aspect of the lateral ventricles and alterations of white matter: The shape of the lateral wall of the dorsal lateral ventricles was deformed and enlarged, the white matter appeared “dirty” (with intermediate signal intensity between lesions and normal-appearing white matter). A similar pattern was previously found in preterm born newborns, infants, children [23, 44], adolescents [45, 46], and adults [31, 34].

Apart for the qualitative alterations apparent on visual inspection, in the VP/VLBW subjects, we measured significantly larger lateral ventricles, thinner midline posterior corpus callosum, and observed more white matter alterations. Although these findings are prevalent in the VP/VLBW population, the accuracy of the visual inspection alone was only 63% in our study and the addition of manual measurements improved the accuracy to 73%.

Although the inter-rater agreement was excellent for manual measurements, the categorical variables showed a poor inter-rater agreement. Together with the poor accuracy, these results suggest that visual assessment is inaccurate to reliably predict premature birth in young adults. The absence of a specific preterm MRI pattern was previously suggested [47], when comparing extremely preterm born children and very preterm 19-year-olds with age-matched controls. Like the present study, they found ventricular dilatation, white matter affection, and thinning of the corpus callosum in the preterm born groups but all three groups shared the same morphological pathology, albeit with higher frequencies in premature cohorts [47].

Although statistically not reliable enough to diagnose prematurity in young adults, the knowledge of some morphological features in the brains of adults with preterm birth status could save the radiologist from wrong diagnostic conclusions. Subjects with a thin corpus callosum, “dirty” white matter, and deformed lateral wall of the dorsal lateral ventricle should be asked for prematurity. Without this knowledge, the bulged ventricles could lead to the false diagnosis of CSF circulation disturbance, which together with a small corpus callosum could lead the radiologist to consider a malformation. None of the subjects in our cohort had any neurological symptoms of CSF circulation disturbance. However, compensated chronic hydrocephalus, which may be caused by perinatal intraventricular hemorrhage in premature newborns (4/40 in our cohort), cannot be ruled out. However, the prematurity-related ventricle deformation described here differs from ventricle enlargement in conditions with hydrocephalus. In the former, damage of the parietal white matter near the lateral ventricles causes an ex vacuo deformation of the lateral ventricle wall in this region. In contrast, increased inner wall tension in hydrocephalus often leads to a ballooning of the ventricle horns instead of the lateral walls, and moreover the horns are often surrounded by T2-hyperintense halos and the third ventricles are ballooned, which was not the case in our subjects investigated here.

Similarly, the “dirty” white matter can lead to the false diagnosis of a white matter disease, such as MS or cerebral small vessel disease. If such findings are present in an adult, the simple question of preterm delivery/low birth weight could avoid misdiagnosis and its consequences.

Other signs suggestive of preterm birth have been described in the literature. Some authors found that preterm born infants have a higher prevalence of head shape abnormalities such as elongated head shape (dolichocephaly) [48]. Additionally, regional biometric differences reflecting impaired cerebellar size or deviating head diameters were reported in children [49] and adults [28] born preterm and may be associated with cognitive and motor outcomes [49]. Of these, our study confirmed the lower brain size of VP/VLBW, which was the reason why we normalized all brain volumes analyzed in this study.

The automatic segmentation with its subsequent analysis of normalized volumes, had a better accuracy (84%) than the visual evaluation and confirmed that VP/VLBW adults have significantly higher volumes of lateral ventricles and smaller volumes of posterior corpus callosum and deep gray matter. However, the use of automatic segmentation is time consuming and, as of now, clinically impractical.

Our study confirms the results of other studies that describe the adult preterm brain from a neuroradiological perspective [28,29,30,31]. We could confirm the smaller head size and the posterior dilatations of the lateral ventricles described by Aukland et al. [28] and Bjuland et al. [29], the lower volume of the posterior corpus callosum reported by Bjuland et al. [29], the thinner corpus callosum and the white matter alterations described by Odberg et al. [31]. Also, we could confirm previous reports of reduced deep gray matter volume in preterm born subjects [14, 29, 50, 51].

In contrast to some previous studies [14, 30, 50, 51] no decrease of white matter or cortical gray matter volumes could be found in our study. This is however not surprising as previous studies used finer volume measurements that highlight the regional distributions of white and gray matter. These studies reported both increases and decreases of regional volumes, which would cancel each other out in the coarse measures of whole gray or white matter volumes used in the present study. Studies that used similar coarse measures as reported here, such as for instance Bjuland et al. [29], were also unable to find differences in normalized gray and white matter volumes between preterm and term born adults [29].

Our results confirm that in VP/VLBW, the structural brain alterations related to preterm birth persist into adulthood. In our study, the highest diagnostic accuracy was achieved by adding the results of automatic and manual measures to the visual rating. The model using all quantitative and qualitative variables correctly classified 87% of cases. These results confirm the notion that the more parameters are considered in combination, the better the diagnostic accuracy.

The rapid development of computer-assisted methods of analysis and decision support through machine learning will augment and optimize human decision-making and ultimately allow for precision medicine [52]. For instance, converting images to minable data and extraction of image features with deep learning algorithms—a field called radiomics—may ultimately generate predictive image-based phenotypes of disease and precision medicine. Our study demonstrates that the additional consideration of features like automatically determined volumes of lateral ventricles, deep gray matter and posterior corpus callosum improves the accuracy of visual assessment.

Last but not least, the IQ of VP/VLBW group was lower than that of term born controls, which is in line with the results of Madzwamuse et al. [25] who described the neuro-cognitive profile of the full BLS sample in adulthood [25]. Taken together, these results suggest that by adulthood, VP/VLBW as a group do not outgrow their general cognitive deficits [25] nor, as our study confirmed, do they outgrow the structural brain alterations.

Limitations

One limitation of our study is the inclusion of a subset of the full cohort. However, as the main aim of the study was to investigate the preterm adult brain from a neuroradiological perspective, to avoid site effects, we chose using all data that were collected in only one of the two study sites. Although the moderate sample size limits the statistical power to detect minor group differences, the aim of the present study was to evaluate the utility of these group differences from a neuroradiological point of view rather than to find group differences.

Further, our analysis included parameters most often affected in preterm born infants, children, and adolescents, namely the brain with its ventricles, the white matter, and the gray matter. However, future studies may include other parameters (biometric, cerebellar measurements, etc.) and use more complex machine learning algorithms to aid clinical decision-making.

Conclusion

Although prevalent, visual MR findings have low accuracy in diagnosing preterm birth-related brain alterations in young adults. Automatic segmentation measurements (of lateral ventricles and deep gray matter), alone or in combination with manual measurements, can improve diagnostic accuracy but they are time consuming and clinically impractical. It may be useful to ask for prematurity before initiating further diagnostics in subjects with thinner corpus callosum, dorsally deformed lateral wall of lateral ventricles, and “dirty” white matter.