Keywords

1 Introduction

The human brain in healthy subjects is approximately bilateral symmetrical and divided into two cerebral hemispheres that are separated by the ideal midline on the axial plane of CT images. However, various pathological conditions, such as traumatic brain injuries, strokes, and tumors, could break the symmetry by distorting the ideal midline (IML) to deformed midline (DML) and lead to brain midline shift (MLS). As a sign of increased intracranial pressure, the degree of MLS can serve as a quantitative indicator for physicians to make diagnosis and outcome prediction more accurate. For example, the guideline of Brain Trauma Foundation recommended emergency surgery for any traumatic epidural, subdural, or intracerebral hematoma causing an MLS larger than 5 mm  [5]. Since the complex and quantitative analysis of MLS is challenging and time-consuming for neurologists, computer-aided brain midline delineation could not only improve the accuracy and efficiency of MLS estimation  [11] but also reduce the interrater variability among neurologists  [8].

Traditional methods for brain midline delineation are classified into two types: symmetry-based  [1, 6] and landmark-based ones  [2, 7]. For example, Liao et al.  [6] decomposed the deformed midline into three segments and formulated the central curved segment as a quadratic Bezier curve, which is fit by using local symmetry. Liu et al.  [7] proposed to build the deformed midline by localizing the anatomical points. However, these traditional methods may fail in the cases with largely deformed brain due to the following two reasons: (1) The midline is relatively difficult to be identified given low soft-tissue contrast; (2) The predefined anatomical points or parts may not be visible due to large deformation  [11].

Recently, approaches based on deeplearning  [8, 10, 11] have served in brain midline delineation, which can overcome the above issues to some extent. Hao et al.  [11] formulated the brain midline delineation as a regression task and proposed a regression-based line detection network. Pisov et al.  [8] introduced a two-head convolutional neural network with shared input layers to predict the midline limits and regress the midline coordinates. However, the performance of such regression-based methods is limited due to the following aspects: (1) They ignore the structural connectivity prior that the midline is a connected and smooth curve. (2) The feature extraction network is not well designed for a largely deformed midline, or harder to train due to the high complexity. (3) They all share a common assumption that for each vertical coordinate y there is at most one horizontal coordinate x of midline pixel, which may fail in some extreme poses of the brain. For taking the structural connectivity prior into account, Wang et al.  [10] proposed a post-processing stage called pathfinding based on the segmentation probability map to build the midline. Their method can not be trained end-to-end which is sub-optimal.

To address such issues, this paper proposes a context-aware refinement network (CAR-Net) to enhance the feature extraction ability and introduce a novel connectivity regular loss (CRL) to incorporate prior knowledge of midline structural connectivity. Specifically, the main contributions are summarized as follows: (1) We propose a context-aware refinement network (CAR-Net) to refine and integrate the base feature pyramid for exploring more discriminative contextual features and larger receptive field. (2) We introduce a novel connectivity regular loss (CRL) to model the connectivity prior explicitly and guarantee the connectivity of the predicted midline. (3) We address the prerequisite ignored by the previous regression-based method and present a simple pose rectification module to satisfy the above prerequisite. The proposed method is evaluated on the CQ dataset and one inhouse dataset with the results showing that our method outperforms three state-of-the-art methods with fewer parameters.

Fig. 1.
figure 1

The illustration of the pipeline of our proposed method for brain midline delineation, which consists of three parts, (a) rectification, (b) localization and (c) regression.

2 Method

Figure 1 shows the pipeline of our proposed method for brain midline delineation, which consists of three parts, (a) rectification, (b) localization and (c) regression. First, we present a pose rectification network to align the source CT image \(I_{S}\) to a canonical pose image \(I_{A}\). Second, the proposed context-aware refinement network (CAR-Net) takes the aligned CT image \(I_{A} \in R^{H \times W}\) as input and generate the midline limits \(\hat{Y}_{L} \in R^{H}\) (the vertical range of midline coordinates) through the limits head  [8] and the segmentation probability map \(\hat{Y}_{B} \in R^{H \times W}\) of the midline band(the width expanded midline). Finally, the regression head  [11] takes the segmentation probability map \(\hat{Y}_{B}\) as input and outputs the midline coordinates \(\hat{Y}_\text {C} \in R^{H}\). In addition, the midline coordinates \(\hat{Y}_\text {C}\) is multiplied by the transformation matrix \(\varPhi \) and obtain the adjacent coordinate difference vector \(\varDelta \hat{Y}_{C} \in R^{H}\), which can be utilized to compute the connectivity regular loss \(L_{CR}\). We adopt the same structure of the limits head  [8] and the regression head  [11].

2.1 Pose Rectification Module

Previous methods share a common assumption that for each vertical axis coordinate y, there is at most one horizontal coordinate x of midline pixel, which may fail in some extreme poses of the brain, due to improper distance, angle or displacement between the camera and patients, especially in real clinical application. Thus, we present a pose rectification network to align the images to the standard pose, which can guarantee the above assumption.

As shown in Fig. 1(a), the anterior flax point \(P_{1}\) and posterior flax point \(P_{2}\) of the ground truth midline are used to calculate the rotational angle and the brain center, which can form as a rigid transformation. Then, we can align the source CT image \(I_{S}\) to form the target image pair \(I_{T}\). Given \(I_{S}\), the pose rectification network \(\phi \) transforms \(I_{S}\) to \(\phi (I_{S})\). Specifically, we use a light-weighted ResNet-18  [3] as the backbone of \(\phi \) and minimize the loss \(L_{2}(\phi (I_{S}), I_{T})\). The output of the pose rectification network is a group of parameters (\(t_{x}\), \(t_{y}\), \(\theta \)) of rigid transformation. \(t_{x}\) and \(t_{y}\) stand for horizontal and vertical displacements and \(\theta \) stands for the rotational angle. To this end, \(I_{S}\) is transformed to \(I_{A}\) following:

$$\begin{aligned} I_{A} = \phi (I_{S})=B\left( \left( \begin{array}{ccc} { \cos \theta } &{} {\quad - \sin \theta } &{} {\quad t_{x}} \\ { \sin \theta } &{} { \quad \cos \theta } &{} {\quad t_{y}} \end{array}\right) G(I_{S}),\quad I_{S}\right) \end{aligned}$$
(1)

where B stands for a bilinear interpolating function, and G represents a regular grid function. Furthermore, the aligned images are center cropped to a uniform size for the midline delineation.

2.2 Context-Aware Refinement Network

For the midline delineation task, the normal parts of the midline are easy to process. However, it is difficult to locate the shifted parts of the largely deformed midline accurately, which requires a larger receptive field and more discriminative contextual information. As shown in   [12], low-level and high-level features are complementary by nature, where low-level features are rich in spatial details and high-level features are rich in semantic concepts. Therefore, based on the feature pyramid representation \(\{ f_{i}\mid i=1,2,3,4,5\} \) generated by U-Net  [9], we attach a context-aware feature refinement module, which can refine each scale features and integrate them adaptively to explore more discriminative contextual features and achieve larger receptive field for the harder shifted parts of the deformed midline.

Specifically, as shown in Fig. 1(b), we first refine each scale feature map \(f_{i}\) to obtain local refined feature representation \({f_{i}}^{l}\) by applying several basic convolution blocks. Given the trade-off between effectiveness and efficiency, more basic convolution blocks are stacked into deeper layers. Then we adopt the SE block  [4] as the channel-wise attention, which can recalibrate the local refined representation \({f_{i}}^{l}\) to extract more discriminative features \({f_{i}}^{a}\) for a specific scale. Finally, the representative features \({f_{i}}^{a}\) of different levels are integrated via bilinear interpolation upsampling, concatenating and one basic convolution block to form the context-aware refinement representation \(f^{R}\). Compared to the feature representation \(f_{1}\) of original UNet, the context-aware refinement representation \(f^{R}\) have larger receptive field and more discriminative contextual information.

2.3 Connectivity Regular Loss

For the supervision of the midline coordinates, the previous regression-based methods only used mean square error loss (MSE)  [8, 11]. They ignored the structural connectivity prior that the brain midline is a continuous curve, which may lead to the possible discontinuity of the midline. The segmentation-based method  [10] proposed post-processing, which relies heavily on the segmentation probability map of the midline and cannot be optimized in an end-to-end way. Based on the above observations, we propose a novel continuity regular loss (CRL) to incorporate structural connectivity prior, which can keep the morphology consistency between the predicted midline and the ground truth midline.

Specifically, we first give the definition of the midline connectivity. For the midline coordinates \(X=(x_{1}, x_{2}, ..., x_{n})^\mathrm {T}\), if \(|x_{i} - x_{i-1}| \le \delta \) holds for every \(i = 2,3,...,n\), we call the the midline coordinates X satisfy \(\delta \)-connectivity. Then we denote \(\varDelta X=(0, \varDelta x_{1}, \varDelta x_{2}, ..., \varDelta x_{n})^\mathrm {T}\), where \(\varDelta x_{i}=x_{i} - x_{i-1}\) for every \(i = 2,3,...,n\). The derivation between X and \(\varDelta X\) are as follows:

(2)

where \(\varPhi \) is the transformation matrix. Thus, we define the CRL as follows, which can effectively punish the disconnectivity between adjacent coordinates with the margin \(\delta \) to guarantee the predicted midline coordinates \(\hat{Y}_{C}\) satisfy \(\delta \)-connectivity.

$$\begin{aligned} L_{C R}(\hat{Y}_{C}) = f(\varDelta \hat{Y}_{C}) = f(\varPhi \cdot \hat{Y}_{C}),\ \text {where} \, f(\textit{\textbf{x}}) = \sum _{i=1}^{n} \max \left( 0,|x_{i}|-\delta )\right) \end{aligned}$$
(3)

2.4 Loss Function and Optimization

The whole framework is trained in an end-to-end way except the pose rectification network. The loss function \(\mathcal {L}_{\text {limits}}\) of midline limits \(\hat{Y}_{L}\) is the binary cross entropy loss and the loss function \(\mathcal {L}_{\text {seg}}\) of segmentation probability map \(\hat{Y}_{B}\) is the weight cross entropy loss. For the supervision of midline coordinates \(\hat{Y}_{C}\), we take \(L_{1}\) loss as the regression loss \(\mathcal {L}_{\text {reg}}\) and connectivity regular loss \(L_{CR}\) as the regular term. The total loss function of the midline delineation is defined as:

$$\begin{aligned} \mathcal {L}_{total}=\lambda \mathcal {L}_{\text{ limits }}+\gamma \mathcal {L}_{\text{ seg }} + \xi \mathcal {L}_{reg} + \mu \mathcal {L}_{CR} \end{aligned}$$
(4)

where \(\lambda ,\gamma , \xi \) and \(\mu \) denote the balanced weights of different parts.

In the inference phase, the source input CT image \(I_{S}\) is first aligned to the standard pose image \(I_{A}\) by the pose rectification network. Then the aligned image \(I_{A}\) is sent to the CAR-Net and regression head successively to obtain the midline limits \(\hat{Y}_{L}\) and the midline coordinates \(\hat{Y}_{C}\). The midline limits \(\hat{Y}_{L}\) are converted to binary one by a suitable threshold. Then the midline coordinates \(\hat{Y}_{C}\) is multiplied by the midline limits \(\hat{Y}_{L}\) with Hadamard product to form the real midline coordinates. Finally, we draw the real midline coordinates into the aligned image \(I_{A}\), as shown in Fig. 1(c).

3 Experiments

Dataset and Evaluation Metric. We evaluate our method on the CQ dataset and one inhouse dataset. The CQ dataset is a subset of CQ500 datasetFootnote 1, which consists of 63 midline shift subjects and the same number of healthy subjects. 59% of the subjects have a significant midline shift (\(\ge \)5 mm) and the mean MLS is \(7.59 \pm 5.16\) mm. Our inhouse dataset consists of 203 CT series which have different degrees of MLS caused by cerebral hemorrhage. 78% of the subjects have a significant midline shift (\(\ge \)5 mm) and the mean MLS is \(9.04 \pm 5.54\) mm. For both datasets, a total of 10 CT slices with the largest brain area in each subject were selected to be manually delineated by doctors for the midline golden standard. For the CQ dataset and our inhouse dataset, we randomly split the dataset into 76/20/30 and 120/30/53 as train/validation/test set respectively. We employ four metrics to measure the midline delineated by different methods, including line distance error (LDE)  [11], max shift distance error (MSDE)  [11], hausdorff distance (HD)  [10] and average symmetric surface distance (ASD)  [10].

Implementation Details. For data pre-processing, each CT slice is resampled to uniform resolution (\(0.5 \times 0.5\) mm\(^{2}\)), aligned by the pose rectification network and then center cropped into a patch with the size of \(400 \times 304\) and \(400 \times 336\) for the CQ dataset and our inhouse dataset respectively. Random horizontal flipping is applied as cheap data augmentation. The proposed model is implemented in Pytorch. We use Adam to train the model by setting \(\beta _1\) = 0.9, \(\beta _2\) = 0.99 with an initial learning rate of \(1e^{-3}\). The poly learning rate policy is employed. The batch size for training is set to 24, and the maximum number of epochs is set to 200. In Eq. (4), we set \( \lambda = \gamma = \xi = 1\) and \(\mu = 0.5\). And the margin \(\delta \) in Eq. (3) is set to 1. Moreover, the results and training details of the pose rectification network are presented in the supplementary material.

Effect of Context-Aware Refinement Network. We replace the CAR-Net with plain U-Net in our pipeline as the baseline model. In order to obtain more contextual features, we attach a context-aware refine module based on the feature pyramid generated by the U-Net. For verifying the effectiveness of the CAR-Net, we perform ablation study on proposed CAR-Net under two loss conditions, one is training with CRL and the other is training without CRL. As shown in the last four rows of Table 2, under both loss conditions, we observe that CAR-Net yields better performance consistently in four evaluation metrics on both datasets, compared to the baseline model. As shown in Fig. 2, the segmentation probability map of midline generated by the CAR-Net is more accurate, especially in shifted parts of largely deformed midline. The quantitative and qualitative results demonstrate that our proposed CAR-Net can obtain more contextual features, which can predict the largely deformed midline better.

Fig. 2.
figure 2

Qualitative comparison results of segmentation probability maps between the baseline model and the proposed CAR-Net.

Fig. 3.
figure 3

Qualitative comparison between the CAR-Net with or without CRL.

Effect of Connectivity Regular Loss. To verify the effectiveness of the proposed CRL, we conduct experiments with the baseline model and CAR-Net. It could be observed from the last four rows of Table 2 that employing CRL achieves better performance compared to the model without CRL supervision, especially in the MSDE and HD metric, which indicates the proposed CRL can reduce the error of maximum shift significantly. Furthermore, in the inference stage, the CRL of the predicted midline can also serve as a connectivity indicator to verify the performance gain of the structural connectivity. As shown in Table 1, the connectivity indicator of the model with CRL is far smaller than counterpart without CRL. In summary, the proposed CRL can improve not only the distance performance of the midline delineation but also the midline structural connectivity effectively. Some qualitative comparisons are shown in Fig. 3, which further demonstrated the effectiveness of the CRL.

Table 1. Quantitative results of the connectivity indicator in terms of mean (std) on the inhouse dataset and the CQ500 dataset.

Comparisons to State-of-the-Art. We provide qualitative and quantitative comparisons to three state-of-the-art algorithms of brain midline delineation: RLDN  [11], Pisov et al.  [8] and MD-Net  [10] on our inhouse dataset and the CQ dataset. All the experiments take the aligned image \(I_{A}\) as input for fair comparison. As shown in Table 2, our proposed model performs better than all the three methods in four evaluation metrics on both datasets, only except the comparable ASD on the inhouse dataset with the MD-Net. The experiment shows the good generalization capability and promising effectiveness of our proposed method. Figure 4 shows some delineation results of the challenging deformed brain midline. It can be inferred that our proposed method can delineate a more accurate and smoother midline, compared to the other methods, which can provide more accurate clinical judgement of pathological deformation of brain. Furthermore, the parameters of our proposed model are 3.90 M, fewer than the ones of the other three methods, which can meet the needs of practical application better.

Table 2. Quantitative results on the inhouse dataset and CQ dataset. “*” means that MD-Net is combined with a post-processing stage, which is not an end-to-end method.
Fig. 4.
figure 4

Qualitative comparison between baseline, RLDN, Pisov et al., MD-Net and CAR-Net with CRL, showing two examples of the midline delineation.

4 Conclusions

We propose a context-aware refinement network (CAR-Net) to explore more discriminative contextual features and larger receptive field, which is crucial for the shifted parts of largely deformed midline. Besides, a novel connectivity regular loss (CRL) is introduced to guarantee the structural connectivity. Moreover, we address the prerequisite that the brain CT image must be in the standard pose, which is ignored by previous regression-based methods. A simple pose rectification network is presented to align the source input image to the standard pose image. The proposed method is evaluated on the CQ dataset and one inhouse dataset with the results showing that our method outperforms three state-of-the-art methods with fewer parameters.