Keywords

1 Introduction

Accurate segmentation of the pediatric brain MR images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is one of the most pivotal steps to characterize early brain development [1, 2]. However, compared with adult brain MR images, pediatric brain MR images exhibit low tissue contrast caused by the ongoing myelination and severe imaging artifacts caused by head motion, creating challenging tasks for tissue segmentation [3]. Therefore, existing tools developed for adult brains, e.g., BrainSuite [4], FSL [5], FreeSurfer [6], and HCP pipeline [7], often perform poorly on the pediatric brain MR images.

Recently, many efforts based on convolutional neural networks have been devoted to the pediatric brain segmentation and achieved encouraging results. For instance, Wang et al. [8] designed an anatomy-guided densely-connected U-Net architecture to perform the infant brain segmentation task, with the anatomical prior as guidance to improve the segmentation accuracy. Nie et al. [9] proposed to train fully convolutional networks for each modality image, and then fuse their high-layer features together to obtain final segmentation. Zöllei et al. [10] proposed an automated segmentation and surface extraction pipeline (named as Infant FreeSurfer) for T1-weighted (T1w) neuroimaging data of infants aged 0–2 years. However, these previous works may not handle the motion/Gibbs artifacts. For example, as shown in Fig. 1, Infant FreeSurfer could achieve an accurate tissue segmentation result on an artifacts-free image (Fig. 1(a)), but fails on an image in presence of motion/Gibbs artifacts, as shown in Fig. 1(b).

Fig. 1.
figure 1

Impact of artifacts for tissue segmentation. Results by Infant FreeSurfer [10] on pediatric brain T1w MR images without/with motion/Gibbs artifacts at 24 months of age. The red and orange dashed ellipses indicate some unreliable results, which are mainly caused by motion/Gibbs artifacts as shown in (b).

Moreover, the collaborative use of multi-domain images (acquired from different imaging sites) is more prevalent recently, which makes the segmentation task more difficult. Generally, a model trained on a specific-site dataset often performs well on testing subjects from the same site, but poorly on subjects from other sites with different protocols/scanners. This is called the “multi-site issue” or “domain-shift issue” problem in medical image analysis. For instance, a MICCAI grand challenge on 6-month infant brain MRI segmentation from multiple sites (i.e., iSeg-2019, https://iseg2019.web.unc.edu/) reported and discussed this critical issue [1].

To accurately segment multi-site pediatric images with artifacts, we present a multi-scale self-supervised learning (M-SSL) framework in this paper. In the training stage, inspired by [8], we first train a segmentation model based on the downsampled images to estimate coarse tissue probabilities and build a global anatomic guidance. We then train another segmentation model based on the original images to estimate fine tissue probabilities. The global anatomic guidance and the fine tissue probabilities are integrated as inputs to train a final segmentation model. In the testing stage, to alleviate the multi-site issue, we propose an iterative self-supervised learning (SSL) strategy to train a site-specific segmentation model based on a set of reliable training samples, which are automatically generated and iteratively updated, for a to-be-segmented site. The main contributions of this paper are summarized as follows:

  1. 1.

    We propose a framework to accurately segment multi-site pediatric brain MR images with motion/Gibbs artifacts.

  2. 2.

    We leverage downsampled tissue segmentations to build a global anatomic guidance, which alleviates the motion/Gibbs artifacts.

  3. 3.

    We propose an iterative SSL strategy to train site-specific segmentation models to minimize the multi-site issue.

2 Dataset and Proposed M-SSL Method

Dataset and Preprocessing.

T1w pediatric brain MR images used in this study for training were from the UNC/UMN Baby Connectome Project (BCP) [11]. They were acquired at around 24 months of age on Siemens Prisma scanners with 160 sagittal slices using parameters: TR/TE = 2400/2.2 ms and voxel resolution = 0.8 × 0.8 × 0.8 mm3. We randomly selected 5 subjects with manual labels as a training dataset. For validation, T1w MR images with real artifacts were from University of Houston, which were acquired with 160 sagittal slices using parameters: TR/TE = 1900/2.98 ms and voxel resolution = 1.0 × 1.0 × 1.0 mm3. For image preprocessing, the resolution of all images was resampled into 0.8 × 0.8 × 0.8 mm3, then in-house tools were used to perform skull stripping, intensity inhomogeneity correction, and cerebellum removal.

2.1 The Proposed Method

We propose a multi-scale self-supervised training (M-SST) framework to accurately segment tissues for multi-site pediatric MR images with motion/Gibbs artifacts, consisting of training and testing stages as shown in Fig. 2. We first elaborate on the training stage, consisting of training three segmentation models and a confidence model to detect reliability of automated segmentation results. Then, we design the testing stage to train a site-specific segmentation model based on a set of reliable training samples for the to-be-segmented site. Finally, we introduce the implementation details of the proposed method.

Fig. 2.
figure 2

Illustration of our M-SSL method for brain segmentation of pediatric MR images affected by motion/Gibbs artifacts. Training stage: Downsampled and original images with simulated artifacts, are input to train SegM-A to build a global anatomical guidance in the downsampled image space and SegM-B for 3 tissue probability maps in the original image space, respectively. Then a four-channel input (one signed distance map from SegM-A and three probability maps from SegM-B) are automatically generated for the training of SegM-C. Finally, a ConM is trained to evaluate the reliability of those automated segmentations at the voxel level. Testing stage: After inputting testing subjects into the trained SegM-A, -B and -C, we can obtain automated segmentation results. Then an iterative SSL strategy is proposed to train a site-specific segmentation model SegM-D for the to-be-segmented site.

2.2 Training Stage

The architecture of a segmentation model can be chosen from U-Net [12], V-Net [13], U-Net++ [14], ADU-Net [8], and nnU-Net [15] et al. In this paper, we adopt ADU-Net as the segmentation architecture, which demonstrates outstanding performance on pediatric brain segmentation. As shown in Fig. 2, in the training stage, benefiting from downsampling and simulated motion/Gibbs artifacts, we first train two segmentation models (named as SegM-A and SegM-B) to generate global anatomic guidance and the fine tissue probabilities, which are integrated as inputs to train the third segmentation model (named as SegM-C) later. Next, an error map, defined as the differences between ground truth and automated segmentations from SegM-C, is regarded as targets to train a confidence model (named as ConM) that is able to automatically detect reliability of automated segmentation results at the voxel level. Figure 3 presents the effectiveness of the confidence map evaluated on the automated segmentation of a testing subject, where the testing subject is not included in the training dataset of ConM. It can be seen that some gyral shapes (circled by a yellow dotted ellipse) in the right figure of Fig. 3(d) are not reasonable. The ConM can effectively detect these unreasonable regions, as indicated by the red color in Fig. 3(c), and also in the left figure of Fig. 3(d).

Fig. 3.
figure 3

The effectiveness of confidence map on a testing subject acquired at 24 months of age. (a) T1w testing image, (b) segmentation result by SegM-C, (c) the corresponding confidence map (generated by ConM, where some unreliable (darker) regions of WM are marked with red color), and (d) the 3D WM rendering result (the red region corresponds with the red region of (c)). Note that the testing subject is excluded in the training dataset of ConM.

Global Anatomic Guidance.

Prior knowledge, e.g., the cortical thickness is within a certain range, could be employed as an anatomical guidance for the tissue segmentation [1, 8]. Considering the artifacts as high-frequency noise, we can alleviate the artifacts by simply downsampling the original images. Moreover, the downsampled images allow for a large receptive field during the network training. Therefore, instead of estimating the anatomical guidance from the original images with artifacts, we work on the downsampled images to train the SegM-A. Based on the trained model SegM-A, we then upsample the segmentation result into the original image space and construct a signed distance map with respect to the boundary of WM/GM to incorporate the cortical thickness as an anatomical guidance. In detail, after upsampling the result from SegM-A (see Fig. 4(a)), we can derive a label image (see Fig. 4(b)). Based on the label image, it is straightforward to construct a signed distance map with respect to the boundary of WM/GM, as shown in Fig. 4(c). Basically, the function value at each voxel is the shortest distance to its nearest point on the boundary of WM/GM, taking positive value for voxels inside of WM, and negative value for voxels outside of WM.

Fig. 4.
figure 4

(a) Shows the probability maps estimated by the SegM-A, then the maps are upsampled to the original size. (b) Is the generated label according to the upsampled probability maps. (c) Illustrates the signed distance map with respect to the WM/GM boundary.

2.3 Testing Stage

Due to the multi-site issue, the trained model in the source site cannot be directly applied to the to-be-segmented site. To alleviate the multi-site issue, we propose an iterative self-supervised learning (SSL) strategy to train a site-specific segmentation model for the to-be-segmented site. Based on the idea that the better the probability maps input to SegM-C, the better the outputs of SegM-C, we therefore replace the three tissue probability maps (part of the input of SegM-C) with the output of SegM-C. By iteratively updating the probability maps, the results of SegM-C are gradually refined (i.e., at Round N). Then, we apply the SSL method [2] to further refine the results at Round N, which can effectively refine the segmentation results of testing subjects from multiple sites. In detail, we utilize the SSL method to automatically generate a set of reliable training samples from the testing subjects, which are used to train a site-specific segmentation model (i.e., SegM-D). Finally, the testing subjects are directly input to the trained SegM-D to derive final results.

2.4 Implementation Details

In our experiment, we set 1.0 and 0.35 as the weight parameters for simulated motion and Gibbs artifacts [16] respectively, to preprocess MR images in the training stage of Fig. 2. Then, we randomly extract 1,000 patches (size: 32 \(\times \) 32 \(\times \) 32) from each training subject. The loss for segmentation models (i.e., SegM-A, SegM-B, and SegM-C) is cross-entropy, and a spatially-weighted cross-entropy loss [2] is used for training SegM-D. The loss of ConM is multi-task cross-entropy. The kernels are initialized by Xavier [38], and we use SGD optimization strategy. The learning rate is 0.005 and multiplies by 0.1 after each epoch.

3 Experimental Results

To demonstrate the performance of our proposed M-SSL method, we first make ablation studies to verify the importance of each component, like downsampling, signed distance map, and the proposed iterative SSL strategy. Then, we validate our method on pediatric MR images with real motion/Gibbs artifacts. Finally, the method is applied on multi-site pediatric brain subjects from the iSeg-2019 challenge [1] to report quantitative analysis.

3.1 Ablation Study

Influence of Downsampling and the Signed Distance Map.

To validate the effectiveness of downsampling and signed distance map, we make an ablation study to compare the results obtained by SegM-C trained without/with signed distance map (generated from original image space or downsampled image space) as shown in Fig. 5. Obviously, compared with the results in the second and third columns of Fig. 5, the gyrus of WM tissue (the fourth column in Fig. 5) are clearer and more reasonable with the guidance of the signed distance map (generated from downsampled image space).

Fig. 5.
figure 5

The importance of downsampling and the signed distance map. From left to right: T1w image and the WM results obtained by SegM-C trained without/with the signed distance map (generated from original image space or downsampled image space).

Importance of Simulated Artifacts and Iterative SSL Strategy.

In the training stage of Fig. 2, we use simulated motion/Gibbs artifacts to preprocess the intensity images to train SegM-B. Then, we propose an iterative SSL strategy to refine the segmentation results during the testing stage as discussed in Sect. 2.3. We are wondering whether these manners are helpful to improve the accuracy of testing results or not. Figure 6 shows the results of a testing subject with real motion/Gibbs artifacts generated by SegM-B and the iterative SSL strategy. First, we can see the SegM-B trained on images with simulated artifacts is more robust to deal with testing images with real artifacts (the second and third columns in Fig. 6). Second, by leveraging the proposed iterative SSL method, some ring-like tissue results caused by artifacts are gradually alleviated indicated by red arrows (see the fourth to sixth columns in Fig. 6). Therefore, the simulated artifacts added for training subjects and the proposed iterative SSL strategy are helpful to improve the accuracy of testing results.

Fig. 6.
figure 6

Comparison of segmentation results for testing subjects with real artifacts. From left to right: T1w image, the results generated by SegM-B (trained on images without/with simulated artifacts), SegM-C and SegM-D (by iterative SSL strategy). Some regions are indicated by red arrows to show the difference, and the corresponding 3D rendering WM results are circled by red dotted circles.

3.2 Comparison Results on Pediatric Brain Images with Real Artifacts

We first verify the performance of our method on 9 brain T1w images with real motion and Gibbs artifacts at 24 months of age. In this experiment, we compare with three state-of-the-art pipelines/tools, including 1) FreeSurfer [6], 2) Infant FreeSurfer [10], and 3) volBrain [17]. Thanks to the freely releasing these pipelines/tools from pioneers, we can directly apply these pipelines/tools to derive tissue segmentation results according to their manuals. Figure 7 presents exemplary tissue segmentation results of one testing image with severe artifacts, obtained by three competing methods and our method. We can observe that most results generated by competing methods show ring-like tissue shapes circled by green dotted ellipses, which are mainly caused by artifacts. However, our results have smoother and more reasonable tissue segmentations as shown in the last column of Fig. 7. The qualitative comparison clearly demonstrates the advantage of the proposed method in terms of accuracy.

Fig. 7.
figure 7

Tissue segmentation comparison between FreeSurfer [6], Infant FreeSurfer [10], volBrain [17] and our proposed method on a pediatric brain MR image with severe artifacts. The first column shows pediatric T1w images with artifacts. The second to fifth columns are the corresponding segmentation results generated by three competing methods and our method, which are shown with 2D slices and 3D rendering WM tissues, respectively.

3.3 Comparisons on Multi-site Infant Subjects in the ISeg-2019 Challenge

To quantitively validate of our proposed method, we test the multi-site brain images in the iSeg-2019 challenge. According to the review article [1], we choose one testing site (i.e., Stanford University, exhibits different distribution in comparison of other sites) to test our method, which consists of five testing subjects, one of them are significantly affected by motion/Gibbs artifacts. We compare our method with top 3 methods (i.e., QL111111, Tao SMU, and FightAutism) in the challenge as reported in Table 1, in terms of CSF, GM and WM results. From Table 1, our method achieves the highest Dice ratio in terms of GM and WM results, with significant difference compared with others (p-value < 0.05).

Table 1. Dice ratio (%) of cross-site brain segmentation results on five testing subjects from the iSeg-2019 challenge. “+” indicates that our proposed method is significantly better than the top three methods with p-value < 0.05.

4 Conclusion

To conclude, we propose a multi-scale self-supervised learning (M-SSL) framework to accurately segment tissues for multi-site pediatric brain MR images with motion/Gibbs artifacts. According to the above experiments, the M-SSL method can achieve encouraging results compared with several state-of-the-art methods. In future work, we will further improve our method and test on more multi-site subjects with artifacts.