Embedding Weighted Feature Aggregation Network with Domain Knowledge Integration for Breast Ultrasound Image Segmentation

Liu, Yuxi; An, Xing; Cong, Longfei; Dong, Guohao; Zhu, Lei

doi:10.1007/978-3-030-60334-2_7

Yuxi Liu¹⁶,
Xing An¹⁶,
Longfei Cong¹⁶,
Guohao Dong¹⁶ &
…
Lei Zhu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12437))

Included in the following conference series:

1434 Accesses

Abstract

Breast cancer is the most common cancer in women, and ultrasound imaging is one of the most widely used approaches for diagnosis due to its non-radioactive process, ease of operation and low cost. Moreover, image segmentation plays a great role in medical image analysis, since it affects the accuracy of computer aided diagnosis (CAD) results. However, the malignant mass of breast in ultrasound images often appears irregular boundary and indistinct margin which is difficult to distinguish from other surrounding tissues. Therefore, breast ultrasound images segmentation is significant for diagnosis, and it has attracted the attention of researchers for many years. In this paper, we propose a weighted feature aggregation network with fusing domain knowledge for two-dimensional breast ultrasound images segmentation. (I) We modify the U-Net by adding a classification branch, in which BI-RADS category information is applied as the classification label. (II) In order to deal with the artifacts in ultrasound, such as posterior shadowing, we conduct Squeeze-and-Excitation (SE) block and aggregation mechanism to compose the up-sampling part in U-Net. (III) We employ the conditional random field (CRF) to optimize segmentation to make the boundaries more continuous and integral after getting the output of U-Net. The experiment conducted on a challenging and representative dataset includes more than three thousand two-dimensional breast ultrasound images. Our method achieves Jaccard Index of 84.9%, Matthew correlation coefficient of 90.9%, and Dice Coefficient of 90.8% in testing which demonstrates the potential clinical value of our work.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A neural network with a human learning paradigm for breast fibroadenoma segmentation in sonography

Article Open access 14 January 2024

Two-stage ultrasound image segmentation using U-Net and test time augmentation

Article 29 April 2020

A Second-Order Subregion Pooling Network for Breast Lesion Segmentation in Ultrasound

Keywords

1 Introduction

Breast cancer is the most common cancer in women, and has a very high mortality rate around the world. Early detection and diagnosis are of great significance for breast cancer treatment, which is beneficial to raise the survival rate. For breast cancer, ultrasound imaging is one of the most efficient and widely used diagnostic methods due to its non-radioactive process, ease of operation and low cost. Additionally, the Breast Imaging Reporting and Data System (BI-RADS) [1] provides standard terminology to describe breast mass as well as classification system for ultrasound. Moreover, it has been proved that BI-RADS is helpful to doctor’s diagnosis and the following therapeutic plan.

Image segmentation is an important procedure in research and clinical practice. Accurate segmentation result benefits a lot to the tasks such as diagnosis and treatment planning. Recently, much attention has been paid to the field of deep learning based segmentation methods. For example, the aggregation methods are employed to boost information flow in proposal-based instance segmentation framework [2]. A fast scanning deep convolutional neural network is proposed to segment the breast tumor region in histopathological images [3]. Moreover, a number of research efforts have been devoted to breast ultrasound images segmentation so as to further improve the performance of computer aided diagnosis (CAD) system for breast cancer. We can broadly divide these works into three major types. Firstly, using convolutional neural network such as FCN and U-Net to segment the mass in breast ultrasound images directly [4]. Secondly, using traditional image processing techniques, i.e. thresholding, clustering and active contour model, to segment breast mass in ultrasound images [5, 6]. Thirdly, this type of method emphasizes on integrating domain knowledge to further improve the accuracy of results with Convolutional Neural Networks (CNN) or traditional methods [7].

It should be noticed that the diagnosis results of CAD are highly correlated with the accuracy of breast mass segmentation performance. However, segmentation for breast ultrasound images is still a challenging problem due to the following three reasons. (I) Recent works on CNN based segmentation methods usually consider breast cancer as benign and malignant, whereas it should be classified into four major types (BI-RADS 2, 3, 4 and 5). In other words, domain knowledge of clinical diagnosis in BI-RADS is not fully used and integrated. Another reason for using BI-RADS category instead of benign and malignant is that the lesions’ grade lower than 4a are not recommended for biopsy so that there have no pathological diagnosis results for some cases. (II) Artifacts in breast ultrasound images misleads the algorithm in finding a real mass, especially for malignant mass with a shadow behind the posterior border. (III) The obtained mask of CNN always get a rough border, which is not exact enough to characterize the specific local representation of malignancy, such as microlobulated and spiculated.

In view of these issues, we proposed a U-Net [8] based network to segment breast mass accurately and effectively in ultrasound images. The main contributions of our work are three-fold. (I) A classification branch is added in U-Net with integrated domain knowledge to supervise the detection and segmentation of mass. (II) An embedding weighted aggregation module is introduced to fuse the multi-scale attention information in decoding layers, in order to improve the segmentation performance of malignant mass. (III) A fully connected conditional random field (CRF) module is appended at the end of network, which will further increase the segmentation accuracy of mass with indistinct boundaries.

2 Method

The architecture of the proposed framework is illustrated in Fig. 1. The proposed network is primarily based on U-Net, incorporated with a classification branch to integrate the clinical diagnosis knowledge. In the last fewer layers of the network, weighted feature aggregation module and CRF module are embedded. With the above methods, the presented work can not only increase the accuracy of mass segmentation with regular boundary and distinct margin, but also improve the performance to detect malignant mass with artifacts in ultrasound images.

2.1 Domain Knowledge Integration Branch

Inspired by multi-task leaning strategies [9], classification branch is introduced to the U-Net. The generalization ability of breast mass segmentation could be improved by adding a joint learning branch. It is worth noting that BI-RADS divide the mass in breast ultrasound images into several grades according to its possibility of malignancy. Different levels of mass have divergence on the symptom of boundary. Generally, the lower possibility of malignancy the mass is, the more regular and smoother the boundary will be. Therefore, to integrate this domain knowledge, we add a classification branch after the final convolution layer of the top-down path way to predict the BI-RADS category of the mass, instead of using the label of benign and malignant in [7]. As illustrated in Fig. 1, for this classification branch, the inputs go through a stack of $ 3 \times 3 $ Conv + BatchNorm + ReLU layers, and then the global convolutional layer generates $ 1 \times 1 \times {{\rm C}} $ feature vector. Finally, a fully connected layer with softmax activation is applied to get the probability for classification, and the softmax cross entropy is used as the loss function. With the BI-RADS information as the supervised label, breast mass detection and segmentation can achieve a better performance.

2.2 Weighted Feature Aggregation Module

Recently, Feature Pyramid Network (FPN) [10] is one of the most used approaches to solve the multi-scale problem in object detection and segmentation. Usually, feature maps of the same scale are summed up along the channel dimension in FPN module. However, not all features in high-level layers are effective for locating the objects. Moreover, artifacts in ultrasound images cause the boundary of mass unclear or invisible such as posterior shadowing where the area posterior to the mass appears darker. As a result, it is difficult for the CNN network even the clinical experts to find the correct edge of the malignant mass. A novel architectural unit called as Squeeze-and-Excitation (SE) block, which can distinguish the importance of different channels of the neural network is introduced in [11]. Inspired by this view, we propose a weighted feature aggregation module to extract and aggregate features and information from multi-scale layers to optimize the performance of mass segmentation. As illustrated in Fig. 1, output of the last four convolutional layers in the decoding path are fed into the SE block to extract the important information of each layer. The details of SE block can be seen in Fig. 2 and the reduction ratio $ {{\rm r}} $ is set to be 16. The outputs of SE block in each layer are then passed to a $ 2 \times 2 $ upsampling layer and a $ 3 \times 3 $ convolutional layer with ReLU activation to keep the same dimension of output image as the feature maps in next stage. Finally, all feature maps are summed up along the channel dimension to form the final inputs of the last step. In short, our aggregation module can extracts useful information and combines it with features in each layer through multi-scale information fusion. Thus, region of the mass and its edge in the final feature maps are emphasized in the network.

Finally, a $ 1 \times 1 $ convolutional layer followed by sigmoid activation is applied to generate the output. Instead of only using cross-entropy loss for segmentation, we jointly optimize the Dice loss and the cross entropy loss for segmentation task during the training stage, which is defined as:

$$ {\mathcal{L}}_{{\rm seg}} = \lambda_{1} {\mathcal{L}}_{{\rm Dice}} + \lambda_{2} {\mathcal{L}}_{{\rm CE}} $$

(1)

where $ \lambda_{1} $ and $ \lambda_{2} $ are the weights of $ {\mathcal{L}}_{{\rm Dice}} $ and $ {\mathcal{L}}_{{\rm CE}} $, satisfied with $ \lambda_{1} = 0.6 $ and $ \lambda_{2} = 0.4 $ in this paper. The cross-entropy loss function penalizes pixel classification errors while the dice loss function measures the overlap between the predicted areas and the ground truth. Finally, the training loss of the whole network is

$$ {\mathcal{L}}_{{\rm total}} = {\mathcal{L}}_{{\rm cls}} + \alpha {\mathcal{L}}_{{\rm seg}} $$

(2)

where $ \alpha $ is the hyper-parameter balancing $ {\mathcal{L}}_{{\rm cls}} $ and $ {\mathcal{L}}_{{\rm seg}} $. In our experiment, weight $ \alpha $ is set to 1.

2.3 CRF Refine Module

In fact, it is common that the malignant mass and tissues around it have similar appearance in breast ultrasound images. That is to say, the margin of the malignant mass often appears indistinct, hence it leads to a decrease in accuracy of the output mask of the mass generated by the network directly. To address this problem, we add a fully connected CRF module at the end of our proposed network. This can improve the continuity and integrity of the contour of the mass by allowing spatial constrains between different objects. Given the probability map from the U-Net and the same size of input ultrasound images, we formulate the final results as the inference from CRF model. The energy function of our CRF model is defined as:

$$ {{\rm E}}\left( {{\rm x}} \right) = \sum\nolimits_{{\rm i}} {\uppsi_{{\rm u}} \left( {x_{{\rm i}} } \right)} + \sum\nolimits_{{{{\rm i }}\; < \;j}} {\uppsi_{{\rm p}} \left( {x_{{\rm i}} ,x_{{\rm j}} } \right)} $$

(3)

where $ \uppsi_{{\rm u}} \left( {x_{{\rm i}} } \right) $ is the unary potential term which computed independently for each pixel by a classifier that produces a distribution over the label assignment $ x_{{\rm i}} $. And $ \uppsi_{{\rm p}} \left( {x_{{\rm i}} ,x_{{\rm j}} } \right) =\upmu\left( {x_{{\rm i}} ,x_{{\rm j}} } \right)\mathop \sum \limits_{{{{\rm m}} = 1}}^{{\rm K}} {{\rm w}}^{{\left( {{\rm m}} \right)}} {{\rm k}}^{{\left( {{\rm m}} \right)}} \left( {{{\rm f}}_{{\rm i}} ,{{\rm f}}_{{\rm j}} } \right) $ is the pairwise energy term measuring likelihood of the neighboring pixel pair where $ {{\rm k}}^{{\left( {{\rm m}} \right)}} \left( {{{\rm f}}_{{\rm i}} ,{{\rm f}}_{{\rm j}} } \right) $ is a Gaussian kernel, $ {{\rm f}}_{{\rm i}} ,{{\rm f}}_{{\rm j}} $ is the feature vectors for pixel $ {{\rm i}} $ and $ {{\rm j}} $, $ {{\rm w}}^{{\left( {{\rm m}} \right)}} $ are linear combination weights, and $ \upmu\left( {x_{{\rm i}} ,x_{{\rm j}} } \right) $ is the label compatibility function. We choose the label $ x_{{\rm i}} $ as our final label and minimize the energy function with 5 iterations based on the mean field approximation algorithm [12].

3 Experiments

Datasets.

We conducted experiments on 3341 two-dimensional breast ultrasound images which collected from different hospitals using Mindray Resona 7 Ultrasound Imaging System (Mindray, Shenzhen, China). All the data is reviewed by several experienced ultrasonic physicians and the final diagnosis is obtained by majority voting. There is at least one mass in each image in the dataset. The dataset is divided into six categories, i.e. category 2, category 3, category 4A, category 4B, category 4C, and category5, according to the BI-RADS guideline. There are 702, 883, 358, 356, 291, 753 data for each category, respectively. The data among grades 4A, 4B, and 4C are similar to each other in terms of texture and shape. In most cases, it is not easy to be distinguished even for clinical experts. Moreover, there has a critical imbalance problem of the data, and will cause the divergence in the training stage. Concerning the above problems, we separate the dataset into 4 categories, i.e. category 2, category 3, category 4, and category 5, and randomly split it as training and testing sets in the proportion of 80% and 20%, respectively.

Implementation Details.

We automatically cropped all the images with Otsu’s thresholding method and only remained the image content, in order to remove the useless regions such as background, probe information, and imaging parameters. Then data augmentations including rotation, shifting, cropping, zooming, and flipping were employed. And the input images were resized to $ 256 \times 256 $. The initial weight of the backbone network in our proposed method is Resnet-50 [13], which was pre-trained on ImageNet [14], and the parameters of other layers were randomly initialized. The whole frame work was trained on a NVIDIA Titan Xp GPU with batch size of 16. Adam optimizer with a momentum of 0.9 and a weight decay of 0.001 were used to optimize our models. We trained the network for 100 epochs and stop when the validation loss does not decrease significantly and it took approximate 10 h to train the network on our breast dataset.

Evaluation Metrics.

We choose U-Net as the baseline network and adopt Jaccard Index, Matthew correlation coefficient (Mcc), and Dice coefficient for quantitative evaluation. These three metrics are defined as

$$ {{\rm Jaccard}}\;{{\rm Index}} = \frac{{\rm TP}}{{{{\rm FP}} + {{\rm FN}} + {{\rm TP}}}} $$

(4)

$$ {{\rm Mcc}} = \frac{{{{\rm TP}} \times {{\rm TN}} - {{\rm FP}} \times {{\rm FN}}}}{{\sqrt[2]{{\left( {{{\rm TP}} + {{\rm FP}}} \right) \times \left( {{{\rm TP}} + {{\rm FN}}} \right) \times \left( {{{\rm TN}} + {{\rm FP}}} \right) \times \left( {{{\rm TN}} + {{\rm FN}}} \right)}}}} $$

(5)

$$ {{\rm Dice}}\;{{\rm coefficient}} = \frac{{2 \times {{\rm TP}}}}{{2 \times {{\rm TP}} + {{\rm FP}} + {{\rm FN}}}} $$

(6)

where TP refers to true positives, FP refers to false positives, TN refers to true negatives, and FN refers to false negatives.

Quantitative Analysis.

We report the segmentation results using the evaluation metrics in Table 1. Our model outperforms Mask R-CNN [15] and U-Net in all three evaluation metrics for breast ultrasound images segmentation task. We have trained and tested the model for three times with the same hyper-parameters to eliminate the influence of random factors. Moreover, Fig. 3 presents qualitative results of different methods on five ultrasound images. As shown in the figure, all methods have a great performance on the mass with a smooth and regular boundary in the first row. As for the mass has an irregular and indistinct border, from the second to fourth rows in Fig. 3, segmentation results of Mask R-CNN and U-Net have some parts of missing or over-segmentation in the indistinct areas. The first two methods cannot detect the small mass in the last row while our method detects and segments the mass accurately.

Table 1. Segmentation performance comparison on breast dataset

Full size table

Ablation Study.

We conducted a set of ablation experiments to evaluate the contributions and effectiveness of each component of the proposed methods: (i) U-Net (Baseline), (ii) U-Net equipped with domain knowledge branch, (iii) U-Net with domain knowledge branch and aggregation module, (iv) U-Net with aggregation and CRF modules, (v) our proposed method. The results are shown in Table 2. We choose U-Net as the baseline network, which achieves 80.29%, 86.89%, and 86.71% in terms of Jaccard Index, Mcc, and Dice Index, respectively. When we append the proposed domain knowledge integration, it yields results of 81.64%, 88.50%, and 88.59% (Jaccard, Mcc, and Dice). With the weighted aggregation module appended, it brings 0.86%, 0.74%, and 0.49% improvement for Jaccard, Mcc and Dice index. Furthermore, when we adopt all three methods, it significantly improves the performance by 4.56%, 4.04%, and 4.15% for the evaluation metrics compared with the baseline network, which shows the effectiveness of the proposed method.

Table 2. Ablation studies on our network measured by Jaccard Index, Matthew Correlation Coefficient (Mcc) and Dice coefficient.

Full size table

4 Conclusion

In this paper, we proposed a U-Net based approach for the challenging task of breast mass segmentation on breast ultrasound images. The proposed method takes advantage of both domain knowledge integration and weighted feature aggregation, and obtains an improved performance on malignant mass segmentation. The experiment results demonstrate that our techniques can tackle the segmentation problems of irregular boundary, indistinct margin, and posterior shadowing in breast ultrasound images. Our method provides a fast and accurate ultrasound images processing tool and can be applied to other instance segmentation task in medical field. In the future, the specific BI-RADS information and biopsy results will be investigated and utilized to fine-tune the network when more data is collected.

References

D’Orsi, C., et al.: ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System (2013)
Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE CVPR (2018)
Google Scholar
Su, H., et al.: Region segmentation in histopathological breast cancer images using deep convolutional neural network. In: IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 55–58 (2015)
Google Scholar
Yap, M.H., Goyal, M., Osman, F., et al.: Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J. Biomed. Health Inform., 1 (2017). https://doi.org/10.1109/jbhi.2017.2731873
Moon, W.K., et al.: Tumor detection in automated breast ultrasound images using quantitative tissue clustering. Med Phys. 41(4), 042901 (2014)
Article Google Scholar
Moraru, L., Moldovanu, S., Biswas, A.: Optimization of breast lesion segmentation in texture feature space approach. Med. Eng. Phys. 36(1), 129–135 (2014)
Article Google Scholar
Cao, Z., et al.: Breast tumor detection in ultrasound images using deep learning. In: Wu, G., Munsell, B.C., Zhan, Y., Bai, W., Sanroma, G., Coupé, P. (eds.) Patch-MI 2017. LNCS, vol. 10530, pp. 121–128. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67434-6_14
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE CVPR, pp. 2117–2125 (2017)
Google Scholar
Hu, J., Shen, L., Albanie, S.: Squeeze-and-excitation networks, pp. 7132–7141 (2018). https://doi.org/10.1109/cvpr.2018.00745
Zheng, S., Jayasumana, S., Romera-Paredes, B., et al.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 ICCV, pp. 2980–2988 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Mindray BioMedical Electronics, Co., Ltd., Shenzhen, China
Yuxi Liu, Xing An, Longfei Cong, Guohao Dong & Lei Zhu

Authors

Yuxi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xing An
View author publications
You can also search for this author in PubMed Google Scholar
Longfei Cong
View author publications
You can also search for this author in PubMed Google Scholar
Guohao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhu .

Editor information

Editors and Affiliations

University College London, London, UK
Yipeng Hu
TU Wien and Medical University of Vienna, Vienna, Austria
Roxane Licandro
University of Oxford, Oxford, UK
J. Alison Noble
King’s College London, London, UK
Jana Hutter
Kitware Inc., New York, NY, USA
Stephen Aylward
King’s College London, London, UK
Andrew Melbourne
Harvard Medical School and Children’s Hospital, Boston, MA, USA
Esra Abaci Turk
Hewlett Packard, Barcelona, Spain
Jordina Torrents Barrena

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., An, X., Cong, L., Dong, G., Zhu, L. (2020). Embedding Weighted Feature Aggregation Network with Domain Knowledge Integration for Breast Ultrasound Image Segmentation. In: Hu, Y., et al. Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis. ASMUS PIPPI 2020 2020. Lecture Notes in Computer Science(), vol 12437. Springer, Cham. https://doi.org/10.1007/978-3-030-60334-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-60334-2_7
Published: 01 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60333-5
Online ISBN: 978-3-030-60334-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)