Introduction

Rates of spontaneous cerebrospinal fluid (sCSF) leaks in the United States are increasing [1, 2]. sCSF leak and encephalocele of the lateral skull base are thought to be due to chronically increased intracranial pressure causing erosion of the tegmen tympani and tegmen mastoideum and has been associated with obstructive sleep apnea and idiopathic intracranial hypertension (IIH, previous known as pseudotumor cerebri or benign intracranial hypertension) [3,4,5,6]. Diagnosis of IIH by the modified Dandy criteria requires lumbar puncture to confirm elevated intracranial pressure and neuroimaging that shows no other etiology for intracranial hypertension, such as a mass or structural lesion [7,8,9,10,11]. This requires patients to undergo an invasive procedure, and it can be difficult to identify an elevated pressure if a patient is actively leaking CSF due to the temporary decrease in pressure from egress of fluid acting as a form of “auto diversion” of CSF. Patients with sCSF leak may not develop the typical signs, symptoms, and imaging findings, such as papilledema or magnetic resonance imaging (MRI) findings including posterior globe flattening, optic nerve ectasia and tortuosity, and empty sella [12,13,14,15,16,17,18,19,20] until after skull base repair, which can result in delayed or missed diagnosis [21,22,23,24,25,26].

Prior studies have found associations between skull base features identifiable on imaging studies and IIH, sCSF leak, and encephalocele, including thinning of the calvarium and skull base [27,28,29,30,31,32,33,34] and an enlarged foramen ovale (FO) [35, 36]. Manual measurement of these features requires skill in interpreting neuroimaging, is time-consuming, and is prone to inter-observer variation.

U-Net is a convolutional neural network (CNN) that is commonly used for automated image segmentation, which is the process of precisely delineating a structure on an image [37]. U-Net can perform both 2D (one slice at a time) and 3D (entire volume) predictions. 2D predictions have the benefit of being performed at full spatial resolution (~ 0.5 mm in-plane), while 3D predictions have increased accuracy due to improved spatial awareness but are performed at a lower spatial resolution (~ 1.5 mm) due to memory constraints. Previous researchers have described 2D–3D hybrid prediction techniques to get both the full resolution and accuracy benefits of 2D and 3D techniques, respectively [38, 39]. Others have also described a “cascaded” segmentation technique, first identifying a structure of interest on a whole volume, then zooming in on that structure to make additional predictions [40]. We hypothesized that a cascaded, or iterative prediction technique could provide accurate full-resolution segmentations of the FO on axial computed tomography (CT) images for area determination.

The future value of this automated FO measurement lies in the development of a predictive model of sCSF leak and encephalocele that would be non-invasive, time-efficient, and not require user expertise. This could be used to identify patients at increased risk of sCSF leak or encephalocele who may benefit from additional diagnostic testing and/or treatment. Among patients who undergo operative repair, this would also enable determining which patients are at increased risk of surgical failure (recurrent leak) or contralateral disease (new onset sCSF leak and/or encephalocele).

Methods

Training data gathering

After approval from the institutional review board at the University of Nebraska Medical Center (IRB# 412–19-EX), training data was gathered from a dataset of 295 CT head studies [41]. These were performed on a variety of General Electric (GE, Boston, MA) CT scanners at our hospital and surrounding outpatient clinics. All CTs were obtained as axial/contiguous acquisitions, with 0.625 mm slice thickness and reconstructed in standard algorithm with iterative reconstruction technique. All CTs were obtained with a 512 × 512 imaging matrix, and a field-of-view of either 250 or 320 mm. Exclusion criteria were excessive motion or streak artifact obscuring measurement of the FO.

Training data labeling

CT studies were first aligned based on landmarks of the bilateral cochleas and nasal bridge approximating the anterior commissure—posterior commissure line (AC-PC line), in a technique described in a submitted, unpublished manuscript. Manual FO labels were placed for each CT image using 3D Slicer (version 4.11.20210226) [42, 43]. Labels were placed over the entire FO at the cranio-caudal center of the FO using a 3 mm spherical brush and editable intensity range of -3000–350 Houndsfield units (HU). Manual segmentation was performed by a single author (S.C.), with another author (J.C.) verifying the accuracy of the manual labels.

U-Net training

The base CT head studies and labels were augmented by a rotated version of each CT head, with up to a 30-degree rotation in the sagittal plane, and a rotated and scaled version of each CT head, rotated up to 30 degrees in the sagittal plane, and scaled by a factor of 0.5–1.0. Rotation was only done in the sagittal plane due to axial and coronal alignment performed prior to predictions. Cropped images and labels with a spacing of 0.5 × 0.5 × 0.625 and size of 128 × 128 × 64 were generated around each FO for training of a second full resolution model. Each of these 3 paired versions was then flipped. Of the 3 versions for the original and flipped images, 1 was selected from each (basically selecting each for FO once), for a total of 2 per subject and 590 total images. 90% of the images were used for training, and 10% were used for validation of the CNN.

U-Net training was performed using the Project MONAI Python toolkit [44]. Spacing for training of the full CT head images was 1.5 × 1.5 × 1.5 mm in x,y,z dimensions, resampled from 0.5 × 0.5 × 0.625. Spacing for training of the cropped images was at the original resolution of 0.5 × 0.5 × 0.625. A model for only the right FO was trained. Prediction for the left FO was accomplished by flipping the image and using the right FO model. Training was stopped when there was no improvement in mean Dice coefficient for 50 epochs.

Iterative prediction algorithm design

An iterative prediction technique was implemented. First, prediction of a FO was performed on the full head CT at 1.5 mm spacing. Then, the CT head was cropped around the FO at a size of 128 × 128 × 64 and spacing of 0.5 × 0.5 × 0.5. Prediction was then performed on that image with the separately trained cropped model. The full-resolution prediction was then transferred back onto a full resolution version of the CT head. Failures were detected by absence of prediction data, or abnormal spacing of the predicted foramina (< 35 or 65 mm apart, > 6 mm difference on z-axis, > 5 mm difference on y-axis). Figure 1 shows an example of full head and cropped sizes and spacings used for training and prediction.

Fig. 1
figure 1

Example of full head and cropped sizes and spacings used for training and prediction. Training was performed on both the full head and cropped labels. Prediction was done on the full head 1.5 mm spacing images, (a) axial and (b) coronal. Then the image was cropped around the prediction, (c) axial and (d) coronal, and prediction was done again on the cropped 0.5 mm spacing image. The full-resolution cropped prediction was then transferred back to the whole head image

Model testing

Testing of the model was performed on a separate dataset of 554 patients, including 34 patients with known sCSF leak or encephalocele confirmed at the time of surgical repair of the skull base, and 520 control patients. This dataset included CT head, CT temporal bone, CT face, and CT sinus studies. For each subject, the CT was first aligned using the same landmark-based technique. If the nasal bridge was not visible, such as on a temporal bone CT, the image was registered to a template CT head, then just the cochleas were aligned along the z- and y-axes. Predictions were then made for the right and left FO as detailed above. FO area was calculated for each predicted slice (number of voxels * voxel area), and the slice with the largest area was selected and recorded.

Processing was performed on a computer with an Intel Core i7-9700K CPU (Intel Corporation, Santa Clara, CA), and an NVIDIA GeForce GTX 1080 Ti GPU (Nvidia Corporation, Santa Clara, CA).

Statistical analysis

Demographic and clinical variables including age at time of imaging, sex, race, ethnicity, height, and weight were collected. Patient charts were reviewed for pertinent comorbidities, and the following were recorded as present or absent: diabetes mellitus (DM), osteoporosis, hypertension (HTN), chronic kidney disease (CKD), obstructive sleep apnea (OSA), idiopathic intracranial hypertension (IIH), and hydrocephalus. Patients in the sCSF leak/encephalocele group were matched 5:1 to control patients on age and sex using a nearest neighbor matching algorithm.

Categorical variables were reported as counts and percentages, and continuous variables were reported as mean and standard deviation, or median and interquartile range (IQR) if skewed. Associations between categorical variables were assessed using chi square tests, or Fisher’s exact test when expected cell sizes were low. Associations between group and continuous variables were assessed using independent samples t-tests. Receiver operating characteristic (ROC) curves were constructed to evaluate the diagnostic ability of the FO size as a binary classifier, and the area under the curve (AUC) was calculated for each ROC curve. Statistical analysis was performed in R version 4.1.2 (R Foundation for Statistical Computing, Vienna, Austria). All tests were two-tailed with statistical significance defined by an α = 0.05.

Results

Training data gathering

295 CT head studies on unique patients were selected for CNN training data creation. The minimum age was < 1 year, maximum age was 92 years, and mean (SD) age was 54.5 (19.3) years. 51% were male.

U-Net training

After augmentation by rotation, scaling, and flipping, 590 studies were semi-randomly selected for U-Net training (90%) and validation (10%) as detailed in the methods section. Mean Dice scores were 0.66 for the full head / 1.5 mm spacing right FO and 0.81 for the cropped / 0.5 mm spacing right FO.

The FO areas predicted by the CNN in this training set compared favorably with the manual ground truth measurements on the same imaging studies. The mean (SD) manually measured right FO area was 22.8 (7.1) mm2, and the left FO area was 22.7 (6.8) mm2. The mean (SD) predicted right FO area 22.4 (6.4) mm2, and the predicted left FO area was 22.2 (6.6) mm2. The difference in the sample mean foramen size between measured and predicted values was 0.4 mm2 for the right foramen (1.8% of the foramen area), and 0.5 mm2 for the left foramen (2.2% of the foramen area). At the level of individual foramina, there was a mean (SD) difference of 1.52 (2.35) mm2 between the manually measured and predicted right FO, and a mean difference of 1.73 (3.28) mm2 for the left FO, with an overall mean difference of 1.62 (2.66) mm2 (Table 1).

Table 1 Comparison of ground truth (manual measurements) with predicted area for training CT dataset (n = 295)

Model application

A separate dataset of 554 CT studies was used for evaluation of the segmentation algorithm in identifying patients with lateral skull base sCSF leak or encephalocele. The mean (SD) age of patients at the time imaging studies was 63.6 (17.9) years and ranged from 19 to 101 years. 49% were male. This included 34 imaging studies from patients with known lateral skull base sCSF leak or encephalocele that was confirmed at the time or surgical repair via a middle cranial fossa approach and 520 controls without IIH, CSF leak, or encephalocele.

Segmentation failed 9 times, including 8 times on control patients. Seven failures were due to excessive motion during image acquisition (an example is shown in Fig. 2A), and one failure was due to an uncorrectable gantry tilt issue. Segmentation failed once on a patient in the sCSF leak/encephalocele group who had numerous arachnoid granulations along the skull base obscuring the FO (Fig. 2B).

Fig. 2
figure 2

Segmentation failures: A Coronal CT image showing excess motion during image acquisition, which was responsible for the majority of segmentation prediction failures. B Failure of segmentation in patient with known idiopathic intracranial hypertension and CSF leak. Note the numerous arachnoid granulations along the central skull base obscuring the foramen ovale (arrow)

Patients in the sCSF leak/encephalocele group were matched 5:1 to controls using a nearest neighbor algorithm. Mean (SD) age was 53.3 (11.6) years in the sCSF leak/encephalocele group, and 53.2 (20.2) years in the control group. There were no significant differences in age (P = 0.95) or sex (P = 0.37) between the matched groups. Body mass index (BMI) was significantly higher in the sCSF leak/encephalocele group (median 37.0 kg/m2 vs. 24.4 kg/m2, P < 0.001). The rates of diabetes mellitus (44.1% vs. 20.6%, P = 0.004) and OSA (29.4% vs. 8.8%, P = 0.002) were higher in the sCSF leak/encephalocele group compared to the control group. The rates of osteoporosis (14.1% vs. 0%, P = 0.002) and CKD (8.8% vs. 0%, P = 0.08) were higher in the control group compared to the sCSF leak/encephalocele group. No significant differences were found for the rates of HTN (P = 0.35), IIH (P = 0.17), or hydrocephalus (no patients in either group).

The mean (SD) FO area was significantly higher in the sCSF leak/encephalocele group: 25.4 (6.1) mm2 versus 22.2 (6.2) mm2 (P = 0.008) (Table 2). There was a larger difference between left and right FO among patients in the sCSF leak/encephalocele group, with a mean (SD) difference of 7.2 (5.3) mm2 between the 2 sides, compared to the control group, which had a mean (SD) difference of 4.2 (4.7) mm2, but this did not reach statistical significance (P = 0.56). Comparison of the larger of the 2 foramina for each patient yielded an even greater difference between the 2 groups: 29.0 (7.7) mm2 versus 24.3 (7.6) mm2 (P = 0.002) (Fig. 3). A binomial test of the patients with measurable foramina showed that in neither the sCSF leak/encephalocele group (n = 33, P = 0.49, 95% CI 0.39–0.75) nor the control group (n = 179, P = 0.18, 95% CI 0.48–0.63) was the left or right foramen significantly larger than compared to the contralateral side. The laterality of the larger FO was ipsilateral to the laterality of the CSF leak or encephalocele in only 17 (50%) patients, which was not significantly different from chance (P = 0.52, 95% CI 0.34–0.69).

Table 2 Patient Demographics and Baseline Characteristics of patients with CSF leak or encephalocele compared to matched controls
Fig. 3
figure 3

Comparison of mean areas of the foramen ovale for control patients versus patients with sCSF leak or encephalocele. Significant differences in the means were found between the two groups for the left FO, right FO, mean (mean value comparing each patient’s left and right side) FO, and the larger (the larger of the two comparing each patient’s left versus right side) FO. Error bars are the standard error of the mean

ROC curves were constructed to assess the diagnostic ability of the FO area as a binary classifier to identify patients with sCSF leak/encephalocele. Figure 4 shows ROC curves created to show the predictive value in identifying patients in the sCSF leak/encephalocele group compared to age- and sex- matched controls using (A) the mean FO size (mean of left and right foramen) for each patient, and (B) using the larger FO (left or right) for each patient. The area under the ROC curve (AUC) was 0.65 when using the area of the average (mean of left and right) FO for each patient. This increased to an AUC of 0.69 when using the area of the larger of the 2 foramina (left or right) for each patient in both the sCSF leak/encephalocele and control groups. Using a cutoff value of 30 mm2 to diagnose an enlarged FO, as described by Butros et al. in patients with IIH,19 gives a sensitivity of 30% and a specificity of 79% for the sCSF leak/encephalocele group.

Fig. 4
figure 4

ROC curves were constructed to show the predictive value in identifying patients in the sCSF leak/encephalocele group compared to age- and sex- matched controls using (A) the mean foramen ovale size (mean of left and right foramen) for each patient, and (B) using the larger foramen ovale (left or right) for each patient

Discussion

Our data show the utility of using a CNN with an iterative technique to produce accurate full-resolution 3D segmentation of the FO utilizing U-Net. This allows the segmentation and analysis of smaller detailed structures such as the FO, circumventing memory restrictions previously encountered with full-resolution prediction. It also continues to benefit from the spatial awareness of 3D as opposed to 2D prediction, further increasing accuracy. While the increased accuracy of segmentation at full resolution is intuitive, the Dice score increase from 0.66 to 0.81 also objectively demonstrates an improvement. The model was also quite robust, only failing to segment the FO on CT studies for clear reasons (e.g., motion artifact, unidentifiable FO). Finally, no extra manual effort was required for this implementation, as the cropped images and labels were automatically generated. Differences in measurements between the manual segmentations and the predicted (deep learning algorithm) measurements were < 2 mm2 on average and are not simply due to errors of the CNN but are a function of the inherent challenge in the interpretation of voxels at the edge of foramen, with intermediate opacity between bone and soft tissue, which is a challenge that is always present in manual measurements as well.

The clinical utility of this technique lies in its ability to predict the risk of patients developing a lateral skull base sCSF leak or encephalocele, using data (CT images) that are already readily available for many patients. There is currently no agreement on screening or management of sCSF leak in patients with IIH or other pathologies associated with elevated intracranial pressure [45]. While reported rates of surgical repair are high, there is room for improvement in identifying patients at high risk of sCSF leak or encephalocele, which could potentially be prevented through targeted interventions, and among patients who undergo surgical repair, identifying patients at risk of surgical failure due to increased intracranial pressure that was not identified prior to repair due to a CSF fistula [46,47,48,49].

Prior studies have shown the application of deep learning algorithms for automated detection of intracranial hemorrhage, calvarial fracture, midline shift, and mass effect in patients with head trauma or stroke symptoms using datasets of head CT scans [50,51,52]. A deep learning framework has also been used for differentiating patients with normal cognition, mild cognitive impairment, Alzheimer’s disease (AD), and non-AD dementias using MRI brain imaging together with demographic and clinical data [53]. Similarly, beyond the usefulness of a non-invasive imaging-based diagnostic tool to identify patients at risk of sCSF leak or encephalocele, the methods used in the present study could be used together with other demographic and clinical variables as part of a larger model to stratify and direct patients to undergo further diagnostic workup with lumbar puncture to confirm IIH or to suggest treatment with weight loss interventions, acetazolamide, or ventriculoperitoneal shunting to reduce intracranial pressure [54, 55].

The present study has several limitations. First, alignment of non-head CT studies (e.g., CT sinus, CT temporal bone) could be somewhat inconsistent if all landmarks were not present for alignment in the axial plane, which could slightly alter the area measurement for that study. Registration to an aligned template was meant to mitigate that inconsistency. Also, the precise areas reported in this study are specific to the training labels and model utilized for prediction. Labeling was done with a -3000–350 HU editable intensity range. As the interface between bone and soft tissue is somewhat indistinct, a higher range would have led to larger FO areas. Precise reproduction of study results would need to adhere to the above editable intensity range for labeling. Finally, generalizability of the findings may be reduced since the training dataset was derived from a single CT scanner vendor (GE), though from a variety of scanner models. Wider use of the prediction model would likely require additional training data from more vendors.

Conclusion

An iterative prediction technique allows for accurate and full resolution 3D segmentation of medical images. We described the use of an automated method of measuring the FO and the application of this technique for predicting patients with sCSF leak or encephalocele. Future applications combining these measurements with other clinically relevant data may further increase the predictive power and clinical utility.