Introduction

Computed tomography (CT) has traditionally played a substantial role in neuroimaging and continues to do so. The technique has excellent spatial resolution, reasonable contrast resolution, speed, the potential for performing contrast studies, and 3D reconstruction. Despite these facts, magnetic resonance imaging (MRI) is often the method of choice [1, 2]. One important task neuroimaging researchers have sought to solve is skull-stripping or brain extraction, that is, removal of the skull, neck, background and eye-balls. Skull-stripping can be thought of as an instance of global brain image segmentation, which represents the correct identification/classification of individual image pixels into a set of definite categories without prior knowledge or human input. If done successfully, one can then quantify total intracranial volume (tICV: the sum of cerebrospinal fluid, gray matter and white matter volumes). tICV is a morphometric measure of interest in brain disorders and cognitive aging [3, 4], and is used as a proxy for head size. The volume of the intracranial cavity is directly related to brain growth and is considered to remain relatively stable after brain development ceases in youth [5, 6]. For neuroimaging research, there exist a number of open-source software packages for quantitative MRI evaluation including FreeSurfer (http://surfer.nmr.mgh.harvard.edu), FSL (http://www.fmrib.ox.ac.uk/fsl) and SPM (http://www.fil.ion.ucl.ac.uk/spm). Despite these advances, automated computation of tICV from MRI or CT images remains an open area of neuroimaging research [7, 8].

The aim of this study was to automatically segment tICV from CT images by combining morphological operations (opening: erosion followed by dilation using an elliptical structuring element), intensity thresholding, robust statistics and a mixture of modelling based on maximum likelihood estimation.

Material and methods

All studies were approved by the Regional Ethical Committee in Stockholm and the Radiation Protection Committee at Karolinska University Hospital in Huddinge, and participants were informed and provided written consent for inclusion.

Subjects

Eighteen patients (age range 63 to 81 years with a mean of 73 years and a standard deviation (SD) of 5.8 years), 8 women (age range 70 to 81 years with a mean age of 77 years and an SD of 4.5 years) and 10 men (age range 63 to 79 years with a mean age of 71 years and an SD of 5.3 years), referred to a local Memory Clinic (Karolinska University Hospital, Huddinge, Sweden) were retrospectively selected. The selection was made from those who had undergone imaging examination of the brain in the context of memory investigation and were examined using the same protocol. Patients with other pathologies, such as intracranial tumours or infarcts were excluded. Ten out of the 18 patients underwent imaging twice the same day. The remaining eight patients were examined using both CT and MRI within a seven-day period, and six months later, a second examination was performed using the same protocol. This resulted in a total of 36 CT examinations from 18 patients examined twice, and 16 MRIs from 8 patients examined twice.

Acquisition: CT and MRI

A 64-channel CT system (GE Medical Systems, LightSpeed VCT) was used without intravenous contrast (orientation, axial; scan type, helical; tube voltage, 120 kV; tube current, 100-300 mA; detector area, 20 mm × 0.625 mm; voxel size, 0.4199 × 0.4199 × 2.5 mm3; rotation time, 0.5 s; effective radiation dose, 1.7 mSv; pitch 0.531:1).

For MRI acquisition, a 1.5 Tesla system (Siemens, Avanto) was used. The protocol included a T1 Magnetization-Prepared Rapid Acquisition with Gradient Echo (MPRAGE) coronal pulse sequence (TR, 1910 ms; TE, 3.14 ms; flip angle, 15 degrees; voxel size, 0.449 × 0.449 × 1.4 mm3; 160 slices).

For both acquisitions, full brain coverage was required with at least one slice totally above and one totally below to ensure total intracranial volume was fully included. Each image was checked visually for quality control to assess whole brain coverage and that no major artefacts were present [9, 10].

Phantom

A Siemens phantom was examined on both MR and CT systems to check the resulting images were not distorted or magnified in some way by the acquisition parameters, imaging modality or segmentation algorithm. The real volume of the phantom was 2570 ml, and was filled with a fluid similar to cerebrospinal fluid (CSF).

Manual delineation on CT brain images and CT/MRI phantom images

Manual measurements were performed on the CT images using ITK-SNAP (http://www.itksnap.org) [11]. All delineations were performed by a single trained radiologist following anatomical landmarks and avoiding major veins (intensity of the major veins overlaps with the intensity of intracranial structures creating bridges between brain parenchyma and eyeballs, which is why veins were removed as part of the automated segmentation). The procedure consisted of manually drawing the boundary between the brain and the skull from original slices presented in the axial plane (ITK-SNAP allows tracing in axial, sagittal and coronal projections simultaneously). Realignment was not necessary due to the large size of the intracranial cavity, but reorientation to follow radiological convention was performed using fslswapdim from FSL. Brightness was increased to improve identification of the boundary of the dura mater. The slice in which the brain initially appeared was selected as the starting point and from that point every slice was traced. Tracing was stopped at the inferior part of the brain at the level of the foramen magnum, which was localized on the sagittal projection. All slices in between were traced, although it has been reported other sampling mechanisms (tracing every tenth slice) would also yield accurate estimations [12]. By adding the traced volumes from each segmented slice, tICV was computed.

For phantom images, manual delineation was performed on T1-weighted MRI and CT images. The spherical shape of the phantom made it easier to delineate the boundary.

MR image processing

Structural MR images were skull-stripped using the Brain Extraction Tool (BET) from FSL [13]. Briefly, this method uses a deformable surface model that is fitted to the brain surface. Raw T1-weighted images were reoriented to follow radiological convention using fslswapdim. The application of the BET on the reoriented images resulted in unsatisfactory results due to inclusion of non-brain structures. Thus, these images were cropped using fslroi, to remove the neck, and the intensity threshold flag was set to 0.1 instead of the default 0.5 (‘-f’ parameter on BET) [7, 14]. Once the BET’s parameters were tuned, reoriented and cropped, images were fed into sienax for automated segmentation [15, 16] keeping the previously determined BET parameters.

Mixture modelling based on maximum likelihood estimation

Mixture modelling refers to a statistical procedure for estimating the parameters of a linear mixture of statistical distributions from a finite sample. For estimating parameters of a known model from its sample, the method of maximum likelihood looks for the parameter point for which the observed sample is most likely (plausible) [17].

The distribution of a pixel’s intensity expressed in Hounsfield units (HUs) within the range -50 to 200 is composed of two separate frequency curves: CSF and brain tissue. Such frequency curves are discrete (histogram), given that HU values are integers. Thus, a suitable way to model such curves is by a binomial distribution. For a pixel’s intensity histogram modelled by a binomial distribution, there is one parameter to be determined: p (success probability). Matching a mixture of a binomial distributions model with unknown parameter p and an intensity histogram can be done numerically applying maximum likelihood estimation (Bregman soft-clustering [18]), resulting in the most likely model for the frequency curves contained in the intensity histogram.

Automated tICV segmentation

The following image processing procedure was implemented based on open-source libraries for reading DICOM medical files (Grassroots DICOM - http://gdcm.sourceforge.net), programming real-time computer vision applications (OpenCV – http://opencv.org) and writing interpreted scripts (Lua – http://www.lua.org). This algorithm was designed to automatically perform skull stripping and eye-ball removal on axial CT head examinations (Fig. 1).

Fig. 1
figure 1

Diagram illustrating the proposed algorithm for automated CT image segmentation. CT = computer tomography

  • Pixels between -50 to 200 HU were assigned to a “tissue-mask” (for calibrated CT scanners, brain tissue and CSF information lies within this range). Pixels with values greater than 200 HU contain mainly bone information and were assigned to a bone-mask.

  • Starting from the apex of the skull and following the superior to inferior direction, the slice in which the brain initially appeared was selected as the starting point and from that point every slice was traced by binary operations until the eye-balls were first visible.

  • Starting at the level of the eye-balls, information from the traced slice just above the current one was used to estimate a region-of-interest (ROI) and correctly extract the brain tissue (at this level, the brain inside the skull does not form a closed connected component).

  • A probability distribution function (PDF) was generated by accumulating intensity histograms from each slice. Using maximum likelihood estimation (Bregman soft-clustering [18]) the parameters of a mixture of binomial distributions were estimated.

  • Robust estimation of minimum Imin and maximum Imax intensities was obtained from the fitted distribution. Imin/Imax corresponds to the minimum/maximum intensity value having a probability greater than 10-5 of belonging to the fitted distribution (due to floating/double arithmetic, any value less than 10-5 was set to zero).

  • Pixels between the Imin and Imax were selected and the resulting mask was processed by applying an opening morphological operation (erosion followed by dilation using an elliptical structural element) and kept as “most-probable-brain-mask.” For successful eye-ball removal, information from the traced slice just above the current one was used as prior spatial information and only overlapping ROIs were retained.

Whether application of the previous algorithm was successful was determined by visually inspecting the resulting masks and looking for missing regions or incomplete removal of non-brain tissue. Manual corrections can be performed if necessary, and in case of failure, the algorithm can be run again with different parameters. Optionally, a refinement can be achieved by applying the watershed algorithm. We did not observe any failure, and manual corrections were not necessary.

Statistical analysis

The Pearson’s correlation coefficient was computed to test for linear correlation. P-values less than 0.05 were considered statistically significant. Agreement was compared in terms of volume difference and average volume as shown in a Bland-Altman analysis graph. Reliability assessment based on volume differences was complemented by computation of the Dice coefficient [19] to assess the overlap between manual and automated segmentation masks. The Dice coefficient (DC) is defined as \( \mathrm{D}\mathrm{C}=\frac{2\times \left({V}_1\cap {V}_2\right)}{V_1+{V}_2} \).

Results

Eight out of 18 patients were used to validate our algorithm against manual delineation. The remaining ten were used to assess the robustness of our algorithm. Finally, the eight patients used to validate our algorithm also had an MRI performed within seven days and were thus used to estimate the comparability of tICV measurements between CT and MRI. Table 1 presents all resulting volumes obtained and considered for this analysis. A schematic representation of our proposed segmentation algorithm is shown in Fig. 1. After every image was processed, the resulting segmented images were visually inspected (Fig. 2). Typical segmentation masks for manual and automated CT segmentation and automated MRI segmentation are shown in Fig. 3.

Table 1 Intracranial volumes from eight patients imaged twice six months apart and ten patients imaged twice the same day
Fig. 2
figure 2

Visual inspection of automated CT image segmentation by the proposed algorithm

Fig. 3
figure 3

Visual comparisons of tICV segmentation results. (a) Original CT axial slice displaying a manual delineation result and (d) reconstructed sagittal slice, (b) automated segmentation original axial mask using our proposed algorithm and (e) reconstructed sagittal slice, (c) automated segmentation reconstructed axial mask of a T1w MRI scan using sienax from FSL, and (f) reconstructed sagittal slice. T1w = T1 weighted; CT = computer tomography; MRI = magnetic resonance imaging; tICV = total intracranial volume

tICV from CT

The average intracranial volume obtained by applying the developed algorithm was smaller than the average volume obtained from manual delineation by a trained radiologist (Student’s t-test one-tailed, paired means, P = 0.011; Fig. 4a). The difference in tICV between images, applying the developed algorithm, showed less variability compared to those manually traced (F statistics = 0.004; P < 10-7). Bland-Altman analysis showed a bias of 31 mL and an SD of 30 mL over a range of 1265 to 1526 mL. Linear regression analysis showed a good correlation (Fig. 5a):

Fig. 4
figure 4

Comparisons of tICV segmentation results. (a) Eight patients were segmented by our proposed algorithm, manually traced and its MRI processed using sienax from FSL, (b) the remaining ten patients were segmented by our proposed algorithm. A pair of consecutive points represents a single patient with [B] baseline and [R] repeated scans. tICV = total intracranial volume; CT = computer tomography; MRI = magnetic resonance imaging

Fig. 5
figure 5

Scatter plots and correlations for tICV calculations. (a) Comparison between our proposed algorithm vs. manual delineation (b) comparison between our proposed algorithm vs. MRI segmentation using sienax from FSL. tICV = total intracranial volume; CT = computer tomography; MRI = magnetic resonance imaging

$$ \mathrm{tICV}\left[ manual\right]=1.05\times \mathrm{tICV}\left[ automated\right]-33.78\left({\mathrm{R}}^2=0.88;\;\mathrm{in}\;\mathrm{milliliters}\right) $$

The degree of correlation as measured by the Pearson’s correlation coefficient was r = 0.94 between the algorithm and manual tracing. The overlap between manually and automatically segmented volumes (average Dice coefficient was 0.90 ± 0.02) was indicative of good reliability.

Agreement between repeated CTs on the same day (Fig. 4b) obtained by applying our algorithm was very good, as shown by a Bland-Altman analysis graph (Fig. 6). Automated measurements showed a bias of -1.5 mL and an SD of 6.4 mL over a range of 1010 to 1520 mL (one sample Student’s t-test , one-tail, P = 0.76).

Fig. 6
figure 6

Bland-Altman analysis comparing automated tICV calculation for two images (same day). Blue horizontal line represents the mean and green horizontal lines represent the mean ±2 SD; SD = standard deviation; tICV = total intracranial volume

tICV from CT compared to tICV from MRI

The average tICV obtained by applying the developed algorithm was smaller than the average volume obtained from automated segmentation on MRI (Student’s t-test one-tailed, paired means, P = 1 × 10-5; Fig. 4a). Comparison of the volumes generated by FSL on MRI and our algorithm on CT showed a bias of 116.9 mL and a SD of 33.4 mL over a range of 1309 to 1558 mL. The degree of correlation as measured by the Pearson’s correlation coefficient was r = 0.92 (linear regression R2 = 0.84) between automatically calculated volumes in MRI and CT (Fig. 5b).

Phantom measurements

Phantom volumes assessed with CT/MRI differed from the actual volume of the phantom (MRI: 3 % larger; CT: 1.5 % smaller); results are presented in Table 2. A reason for this may be that the BET (MRI-based skull-stripping algorithm from FSL) was optimized to follow a brain-like shape instead of a spherical contour, while our proposed algorithm was optimized to find intensity differences but not to compensate for partial volume artefacts.

Table 2 Segmented volumes of a phantom scanned on both MRI and CT systems

Discussion

In this study, we developed and validated a new algorithm to automatically compute tICV from CT images by modelling the intensity histogram using maximum likelihood estimation, while removing extracranial tissue and the eye-balls. The algorithm is fully automated, and when applied to our dataset, resulted in successful segmentation without requiring manual post-processing. The algorithm demonstrated good agreement with manual segmentation performed by a trained radiologist, and high correlation with the FSL pipeline. Moreover, the observed subject variability was low on repeated acquisitions performed the same day or six months apart, suggesting good consistency.

There are examples in the literature of successful morphometric analysis based on CT images. However, they are based on manual segmentation, only single-slice segmentation [2022], the use of MRI templates, linear measurements [23, 24], or require user interaction [25]. We have opted for a fully automated, template-free approach, since manual segmentation of the brain is time consuming and requires training and care; single-slice segmentation suffers from low reliability; and the use of MRI templates requires optimal registration. Although linear measurements have been previously explored and used as an indirect marker of brain atrophy [23], its accuracy and reliability are subject-dependent.

Our study adds to the current literature by providing results of successful automated brain segmentation on CT from a retrospective selection of elderly patients referred to a local memory clinic. We took the approach of using manual delineation by a trained radiologist as the gold-standard, repeated measurements on the same day and six months apart and combined MRI and CT acquisitions. This enabled assessment of reliability and consistency and a comparison of tICVs obtained from MRI to those obtained from CT.

Advantages of this algorithm are computation of robust intensity ranges by modelling the intensity histogram as a mixture of binomial distributions and fitting the sample via maximum likelihood estimation.

We believe this new algorithm has clinical importance because CT remains widely used in primary care for dementia investigation [26]. The likelihood of developing dementia is increased in subjects with low tICV [27]. Also, in morphometric studies, tICV is a measure of interest since models of brain regions associated with cognitive measures vary depending on how tICV is included [28]. Future work to extend the algorithm to estimate brain and CSF volumes could potentially be clinically relevant, since the ratio between brain volume and CSF volume has been shown to be a sensitive measure of atrophy [29].

The significant differences observed between modalities (MRI vs. CT) suggests that comparison between modalities in absolute terms is not possible, since MRI-derived volumes were systematically larger than the CT-derived ones. While CT allows a great delineation of the skull, beam-hardening and partial volume artefacts contaminate the pixels in the boundary bone vs. brain tissue/CSF. We believe this is the main reason why CT volumes are systematically smaller than MRI-derived ones (a phenomenon also observed in phantom measurements). Although the FSL pipeline was used since it is fully automated, other software libraries might be used instead, yielding different results [7, 8]. On MRI, T1w alone gives a good estimate of tICV but the precise boundary cannot be reliably delineated. The presence of artefacts reduces the size of the intracranial cavity; MRI-based software corrects the by registration to a standard space where prior spatial information helps guide and compensate intensity-based segmentation. Our proposed algorithm was based on intensity information and neighbouring spatial information to guide the segmentation and no prior information was used.

This study has some limitations. First, the sample population was relatively small but the encouraging results demonstrated smaller variation in volumes obtained by using automated methods as opposed to manual tracing. Second, other sources of measurement error on CT exist and were not considered, including beam-hardening artefacts, partial volume-averaging, and slice thickness and separation. Third, the presented algorithm is based on pixel intensity, and is, therefore, highly sensitive to beam-hardening artefacts, resulting in distortions that create attenuation values higher than the actual ones. However, the application of robust statistics by construction of the probability distribution function and further maximum likelihood estimation allowed for adjustment to the degree of intensity variability seen in this sample. Fourth, the absence of large, manually segmented CT datasets prevents the creation of spatial priors that could be used to further refine the segmentation as done for MRI.

Conclusion

This study demonstrates that it is possible to estimate tICV in an unbiased and automated way. Although CT imaging restricts the quantitative analyses that can be performed, given the great contrast between bone and brain, it seems sufficient to give a reliable and consistent estimate of tICV. Our results suggest that measures of tICV depend on the modality used for acquiring the image and that results might not be comparable among different modalities. The simplicity and practicality of automated software and the clinical availability of CT, highlights the need to develop more tools and algorithms that could be used in clinical research.