Introduction

Breast cancer is a mostly spreading disease frequently diagnosed in middle-aged women around the globe with the second highest death rate after lugs cancer. According to a recent BC survey, about 287,850 new cases will be diagnosed in 2022 in America. Approximately 43,250 women will die due to BC this year. About 82% of BC-diagnosed women are above the age of 50 and the highest death ratio of 91% is observed in this age group [1, 2]. The early detection of BC with proper treatment can help to overcome the mortality rate. Currently, different imaging technologies including mammography, ultrasonography, and biopsy are utilized to detect BC manually by pathologists. The manual BC detection method is a complex and laborious task due to inherited noise which is also subjective and may be diagnosed differently by different pathologists [3]. Computer-aided diagnosis of BC can play a vital role in normalizing the detection differences of this deadly disease. Automating the manual diagnosis of BC deep learning is playing a novel role [4, 5]. Many deep learning-based methods are being proposed at large scale for BC segmentation as studies show that ninety percent of BC positively diagnosed individuals may be cured by proper treatment [6].

Breast ultrasound (U/S) imaging is a cheap and low-radiation technology that can effectively be used for BC detection at its early stage. The manual diagnosis of BC using breast U/S imaging is a difficult task and needs expert clinicians for the final diagnosis. Due to advancements in technology, deep learning is being used to aid the diagnosis process of different diseases [7]. Deep learning showed exceptional performance in medical image segmentation specifically BC detection and segmentation [8, 9]. Recently, Vigil et al. [10] proposed a BC segmentation method by applying a dual-intended mechanism on a U/S imaging dataset. Their method achieved a 78.5% accuracy to accurately classify the BC. Wang et al. automated the segmentation of BC from U/S images using different flavors of existing ResNet models. A unique combination of pre-trained models was utilized to produce the automatic segmentation method [11]. Ayana et al. [12] introduced another transfer learning-based method for BC segmentation from U/S images and reported a classification accuracy of 99% on the U/S dataset. Recently, U-net based models have gained increased attention in the segmentation of BC due to their high performance as Umer et al. [13] proposed BC detection using the feature selection method. A connected U-net-based automated method for BC segmentation is presented in [14]. In the connected BC segmentation method, the two U-net models are used in concatenated fission with the help of skip connections.

This work proposed a U-shaped auto encoder-based multi-attention triple decoder (MATD) convolution network for BC segmentation from U/S imaging. The proposed method introduced a multi-scale convolution-based encoder network for diverse spatial image feature extraction. The learned input image features at different scales are then processed through a multi-scale triple decoder network for BC segmentation. In each decoder network to highlight the tumor region attention mechanism is also utilized for best segmentation performance. The outputs of the multi-scale decoder blocks were concatenated for the accurate segmentation of BC. The main contributions of the proposed multi-attention triple decoder convolution network are given below.

  • A single multi-scale encoder block is introduced in the contracting path to capture diverse image features at different scales.

  • To transform the multi-scale learned image features, a triple decoder network is introduced to process the learned image feature separately.

  • To highlight the tumor region at different scales, a multi-attention mechanism is implemented in each decoder network for accurate segmentation of BC.

  • The output of the multi-attention triple decoder network is concatenated to predict the segmentation mask of input breast U/S images.

  • The comparison of the proposed multi-attention triple decoder convolution network with the existing method is also carried out.

Related Work

BC detection and segmentation are an active area of research due to the very high death ratio of this disease. BC detection and segmentation are being automated by using deep learning-based automated models. Deep learning is gaining increased attention in the medical field due to its outstanding detection performance. Many deep learning-based BC segmentation models are presented for the detection and segmentation of BC from the U/S modality. Luo et al. [15] implemented two parallel networks for the segmentation-guided classification of BC using the U/S imaging dataset and attention mechanism. Their proposed BC segmentation-guided classification achieved an accuracy of 90.78%. Yan et al. [16] proposed a method for BC segmentation by implementing an attention mechanism to reduce spatial information loss during feature learning. For feature learning a hybrid method with a dilation factor was introduced for efficient feature learning and their method achieved a segmentation IOU of 81.8%. In another recent work, a double attention-based global and local guided U-net-based method with cascaded convolution and residual connections is introduced for accurate BC segmentation [17]. The attention mechanisms in the segmentation methods have gained increased intention due to higher performance. A lot of BC segmentation methods in literature utilized attention mechanisms. A dual attention mechanism with a multi-scale convolution process for BC segmentation from U/S images is proposed in [18]. Another attention-based method for BC segmentation is presented by Farooq et al. [19].

Due to high performance in segmentation tasks, autoencoder-based U-shaped methods with different implementations are being utilized to automate the BC segmentation task from U/S images. Tong et al. [20] introduced a novel loss function with an embedded attention-based deep learning model for BC segmentation from the U/S imaging dataset. In their method to replace the conventional convolution method, a residual convolution method was implemented for performance enhancement. In another recent work, a quantization-assisted BC segmentation method is presented with a fusion mechanism [21]. Vianna et al. [22] presented the comparative analysis of U-net and SegNet deep CNN models for BC segmentation from U/S images and reported a dice of 86.3% with U-net and dice of 81.1% with SegNet. A mixed attention loss-based U-net model with four attention loss functions and a selective kernel method was proposed in [23]. Their modified version for BC segmentation produced a dice of 92.2%. A fusion method for BC segmentation from U/S images by implementing the residual convolutional attention method was introduced in [24]. In their method, a fusion attention mechanism was introduced with the channel attention method, and dice of 92.1% was reported for the BC segmentation task from U/S images. Li et al. [25] introduced a multi-scale fusion and focal loss function in U-net for BC segmentation and reported a dice of 95.35. In their method, a dilated convolution operation with different scales was introduced to enhance the segmentation performance.

BC segmentation from U/S images is being automated by using deep learning-based semantic image segmentation methods. A spatial attention mechanism for BC classification from U/S images using deep CNN is presented by Lu et al. [26]. In their method, a pre-trained ResNet model was utilized for feature learning. Punn et al. [27] proposed an inception-based model using cross spatial attention method for BC segmentation from the U/S imaging dataset. In their BC segmentation model, residual connections were utilized in the attention mechanism. Wang et al. recently introduced a breast lesion localization method using a U/S dataset [28]. For breast lesion localization, an enhancement method was implemented and then the segmented data was processed through the classification model for the classification task. A contour optimization method in a deformed U-net with an adversarial mechanism is presented in [29] for BC segmentation from the U/S imaging dataset. A dice of 89.7% was achieved by their modified segmentation model. Cho et al. [30] proposed a new method for BC localization and classification consisting of multi-stage segmentation with a residual fusion mechanism. In their method in the first step classification is performed to check the malignancy of input images and the classified images are further passed to the segmentation network of BC localization. Gong et al. [31] recently presented a U/S imaging-based BC classification method. BC segmentation by using a breast mammography dataset was proposed by Peng et al. [32] in which a hybrid loss function was introduced. Lou et al. [33] introduced a U-shaped deep CNN model for BC segmentation from U/S images. In their work, a context-aware-based fusion mechanism was introduced with the residual pyramid. The proposed context-level fusion method tried to overcome the contextual gap during the feature learning process and their method achieved higher performance.

From the above discussion of previous deep learning-based methods for BC segmentation, it can be concluded that most of the previous studies utilized a fixed receptive field-based model. The encoder paths in existing methods are developed with the limitation of single or fixed-sized convolution operations. Furthermore, the multi-scale encoder with multi-scale decoding mechanisms is rarely discussed. In this regard to enhance the segmentation performance, this work proposed a single multi-scale encoder-based multi-attention triple decoder convolution network for BC segmentation from the U/S imaging dataset.

Proposed Method

This work proposed a single multi-scale encoder-based multi-attention triple decoder convolution network for BC segmentation from the U/S imaging dataset. The proposed MATD convolution network is presented in Fig. 1. The proposed BC segmentation model is composed of a single multi-scale encoder and a multi-decoder network. The multi-decoder network is designed to tackle the problem of fixed receptive fields. The proposed multi-decoder network is developed by using three decoders with three different receptive fields including 3 × 3, 5 × 5, and 7 × 7 decoders. The contracting path of the proposed multi-attention triple decoder convolution network utilized multi-scale convolution operations. The encoder of the proposed convolution network is composed of a 3 × 3 convolution block containing a convolution operation of a 3 × 3 kernel followed by 30% dropout, batch normalization, and ReLU activation. The convolution result of the 3 × 3 convolution block is passed to the 5 × 5 convolution block, and finally, the convolution result of the 5 × 5 convolution block is passed to the 7 × 7 convolution block. After the multi-scale convolution operation, a max pooling layer is implemented. The convolution operation, activation operation, and max pooling method are mathematically defined in Eqs. 1, 2, and 3, respectively, where W is the weight, \(\varphi\), and bias factor.

Fig. 1
figure 1

The general workflow of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images

$${Conv}_{img}=\textstyle\sum_{-k}^{k}{kernel}_{stride}^{size} *\left({Input}_{img} .W\right)+\varphi$$
(1)
$${Activation}_{ReLU}= f\left({Conv}_{img}\right)=\left\{\begin{array}{c}0,\;{Conv}_{img}<0\\ 1,\;{Conv}_{img}\ge 0\end{array}\right.$$
(2)
$${Max\_pooling}_{\mathrm{img}}= {MAX}_{stride}^{win}* {Input}_{img}$$
(3)

The contracting path of our convolution network is implemented with four multi-scale encoder blocks. Each encoder block utilized a batch normalization operation to reduce the overfitting problem. The batch normalization procedure is mathematically described in Eqs. 46. For batch normalization of the input U/S image dataset, the mean of the inputted images was calculated by using Eq. 4, the variance of the inputted images was computed utilizing Eq. 5, and finally, to normalize the input, a batch-level normalization operation was implemented using Eq. 6 where \(\omega\) and \(\tau\) are learning parameters.

$${\overline{Mean} }_{m\_batch}=\frac{1}{N}\sum_{1}^{N}{Input}_{img}$$
(4)
$${{{V}_{\sigma }}^{2}}_{m\_batch}=\frac{1}{N}{\sum_{1}^{N}{Input}_{img}- {\overline{Mean} }_{m\_batch})}^{2}$$
(5)
$${\widehat{Batch}}_{normal}=\frac{{Input}_{img}-{\overline{Mean}}_{m\_batch}}{\sqrt{{V_\sigma^2}_{m\_batch}+\varepsilon}}\xrightarrow{produced}\tau{\widehat{Batch}}_{normal}+\omega$$
(6)

After the encoder path, a multi-scale bridge block was implemented to transform the learned image features. The learned highly discriminative image features at different scales are transformed into the multi-decoder network by using skip connections. In this work, three different receptive fields including three, five, and 7 sized kernels were utilized to capture the diverse spatial features, and three decoders were utilized to regenerate the BC segmented images from these features. For each convolution scale, a separate decoder was implemented to enhance the segmentation performance. In each decoder network, an attention mechanism was used to highlight the tumor region at a different scale. Finally, the outputs of the multi-scale decoders were concatenated and a 1 × 1 convolution operation was carried out to get the segmented output of breast U/S images. The expansion path of the proposed convolution network is implemented by using a 3 × 3 decoder, a 5 × 5 decoder, and a 7 × 7 decoder. In each multi-scale decoder network, the up-convolution operation of the respective receptive field was carried out. The learned images feature of the 3 × 3 convolution block of the encoder was transformed into the 3 × 3 decoder network by using skip connections. The learned features from the 5 × 5 encoder block were transformed to the 5 × 5 decoder network by using skip connections, and similarly, the learned image features from the 7 × 7 convolution block of the encoder were transformed to the 7 × 7 decoder network with the help of skip connections. Each decoder network contained four decoder blocks of respective receptive fields and in each decoder block a transpose convolution operation, attention mechanism, and concatenation operation followed by respective convolution operation are implemented. The BC U/S dataset is passed to the encoder of the MATD convolution model and the segmented BC images are outputted from the 1 × 1 output layer. The input image size of the proposed multi-attention triple decoder convolution network is 128 × 128.

Single Multi-scale Encoder Network

This work introduced a MATD convolution network for BC segmentation from U/S images. For this purpose, a single multi-scale encoder network is designed for diverse spatial feature learning. In most of the previous BC segmentation methods, fixed sized convolution operation mechanism is utilized that is unable to accurately segment the different-sized tumors. Moreover, the fixed receptive field may miss the spatial information that can be handled using multi-scale convolution operation. In this regard, we proposed a multi-scale single-encoder network for feature learning. The proposed multi-scale single encoder network includes four encoders and in each encoder, a 3 × 3 convolution, 5 × 5 convolution, and 7 × 7 convolution is implemented to accurately segment the diversely shaped breast tumors. Each multi-scale convolution block is implemented using respective block-sized convolution operation followed by activation that was carried out using ReLU function, batch normalization that was implemented at mini-batch size, thirty percent dropout layer, and max pooling operation. The learned spatial multi-scale features are transformed into a decoder network using skip connections. For each scale of encoder convolution operation, a separate decoder of the respective scale is designed to enhance the BC segmentation performance.

Multi-scale Triple Decoder Network

This work introduced a single multi-scale encoder and multi-scale triple decoder network for BC segmentation from the U/S imaging dataset. The proposed multi-scale triple decoder network is introduced to process the learned multi-scale features separately. For this purpose, three decoder networks with different receptive fields including a 3 × 3 decoder, 5 × 5 decoder, and 7 × 7 decoder are developed. The learned input features in the encoder at scale 3 × 3 are transformed to the 3 × 3 decoder network using skip connections. The learned input features in the encoder at scale 5 × 5 are transformed to the 5 × 5 decoder network using skip connections. Similarly, the learned input features in the encoder at scale 7 × 7 are transformed to the 7 × 7 decoder network using skip connections. To get the final predicted segmentation mask the outcomes of the multi-scale decoder network are combined using a concatenation operation. Each multi-scale decoder network was developed using four decoder blocks. In each decoder block of the multi-scale decoders network a transpose convolution operation of the respective scale, a concatenation operation to concatenate the transpose operation output with the learned image features transformed through skip connections, and an up convolution operation of the respective scale was carried out. The experimental outcomes validated that the proposed single multi-scale encoder and multi-attention triple decoder network for BC segmentation from U/S images achieved the best segmentation results due to separate processing of multi-scale learned features in the multi-scale triple decoder network.

Multi-attention in Triple Decoder Network

In this work, a multi-attention triple decoder network is proposed in which three multi-scale decoder networks are introduced to retain the spatial information of learned features at different scales. The proposed multi-attention triple decoder mechanism is presented in Fig. 2. The multi-scale learned spatial image features in encoder blocks are passed to the triple decoder network through skip connections. To accurately segment BC using U/S images, the learned multi-scale feature maps are processed separately by three decoders. For multi-scale attention of breast tumor region, an attention mechanism in each decoder block was introduced for high segmentation performance. The input of the 3 × 3 decoder attention mechanism is the outcome of the transpose operation of the decoder network and learned spatial image features of the encoder block with 3 × 3 convolution operation which are transformed using skip connections. In the 3 × 3 decoder attention block to highlight the tumor region, 1 × 1 convolution operations are performed at both inputs of the 3 × 3 decoder block. In the next step, an element-wise summation operation was implemented followed by ReLU activation, one more 1 × 1 convolution, and sigmoid activation. The output of this activation is further multiplied with 3 × 3 learned encoder features to suppress the other irrelevant information and to attain more information from the tumor region. Similarly, the attention mechanism was implemented in 5 × 5 decoder blocks and 7 × 7 decoder blocks to capture tumor region spatial features at different scales.

Fig. 2
figure 2

Multi-attention mechanism of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images

The multi-attention mechanism of the proposed segmentation model is implemented in multi-scale decoder networks separately and then the output of the multi-decoder networks is concatenated to retain the multi-scale learned spatial features information.

Datasets

This work utilized two publicly available breast U/S datasets for the evaluation of the proposed MATD convolution network for BC segmentation from U/S imaging datasets. All the results were computed by using 30:70 ratios for testing and training the multi-attention triple decoder convolution network. Detail of each utilized dataset is presented below in bullets.

  • BUSI dataset: The first dataset used to evaluate the proposed MATD convolution network for BC segmentation from U/S images is the BUSI dataset. The BUSI dataset is available freely for research purposes and is being utilized extensively for BC segmentation and classification tasks. The contributor to this dataset is Dhabyani et al. [34]. The collection process of this dataset is carried out at Baheya Hospital. A total of 780 breast U/S images were collected and annotated by radiologists. The collected dataset is available for research purposes with ground truth images and has an average image size of 500 × 500.

  • UDIAT dataset: The second UDIAT open-source dataset was utilized to test the proposed MATD convolution network for BC segmentation from U/S images. The UDIAT U/S dataset is contributed by Yap et al. [35] and is a collection of only 163 BC U/S images with ground truths. The data of this repository is available in PNG format with an average size of 500 × 500. The UDIAT breast U/S imaging dataset was prepared and annotated in UDIAT Diagnostic Centre.

Evaluation Metrics

Performance evaluation of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from the U/S imaging dataset was carried out by using different metrics such as dice similarity coefficient (DC), recall (Re), precision (Pr), Jaccard coefficient (JC), and accuracy (Ac). The formulation of each used metric is given in Eqs. 7, 8, 9, 10, 11. Where \({ground}_{truth}\) is showing the ground truth and \({predicted}_{seg}\) are representing predicted masks, \({True}_{Positive}\), \({True}_{Negative}, {False}_{positive}, \mathrm{and }{False}_{nagative}\) are showing the true positive, true negative, false positive, and false negative respectively.

$$Dice\;Coefficient\;({ground}_{truth},{predicted}_{seg})=\frac{2\left|{{ground}_{truth}}_{(i)} \cap {predicted}_{seg}\right|}{\left|{{ground}_{truth}}_{(i)}+{{predicted}_{seg}}_{(i)}\right|}$$
(7)
$$Jaccard({ground}_{truth},{predicted}_{seg})=\frac{\left|{{ground}_{truth}}_{(i)} \cap {{predicted}_{seg}}_{(i)}\right|}{\left|{{ground}_{truth}}_{(i)}\cup { {predicted}_{seg}}_{(i)}\right|}$$
(8)
$$Recall=\frac{{True}_{Positive}}{{True}_{Postive}+{False}_{negative}}$$
(9)
$$Precision=\frac{{True}_{Positive}}{{True}_{Positive}+{Fl}_{pos}}$$
(10)
$$Accuracy=\frac{{True}_{Positive}+ {True}_{negative}}{{True}_{Positive}+ {True}_{negative}+{False}_{positive}+{False}_{negative}}$$
(11)

Results

The experimental outcomes of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from the U/S imaging dataset are presented in this section. The implementation of our segmentation methodology was carried out by using the TensorFlow library in Python version 3.6. For the implementation of the proposed MATD convolution network, Dell precision corei7 m4800 workstation with 20 GB of RAM, and 2 GB of NVidia graphic card was utilized. For the evaluation of our MATD convolution network method, two publicly available breast U/S image datasets were utilized. The proposed BC segmentation method was trained from scratch using both U/S datasets for 90 epochs using an initial learning rate of 0.0001, Adam optimizer, and mini-batch size of 8.

Experiment 1 on UDIAT Dataset

In the first experiment, the BC segmentation results are computed by using the proposed MATD convolution network on the breast U/S imaging UDIAT dataset. The segmentation results of the proposed segmentation network are tabulated in Table 1. In this experiment, the BC segmentation results of the U-shaped multi-attention triple decoder convolution network were computed by using different configurations. A DC of 66.34% was recorded with a single decoder network, a DC of 84.48% was recorded with a double decoder network, a dice of 88.83% was recorded with a triple decoder network, and a high dice of 90.45% was recorded by using proposed MATD convolution network. The multi-attention triple decoder network significantly improves the segmentation dice scores. The Jaccard index for BC segmentation using a single decoder was 50.65% which was improved by using a double decoder network and recorded as 73.85%. The proposed triple decoder network improves the Jaccard index at 80.84% and finally, the proposed multi-scale attention-based triple decoder network achieved the highest Jaccard of 83.40%. This experiment validated that the proposed MATD convolution network performed better with multi-attention triple decoder implementation on the UDIAT BC U/S dataset.

Table 1 Segmentation outcomes using MATD convolution network for BC segmentation from U/S images on the UDIAT dataset

The visual segmentation performance of the proposed U-shaped multi-attention triple decoder convolution network on the UDIAT dataset is presented in Fig. 3. To show the image level BC segmentation from U/S images using the proposed MATD convolution network, these results were computed by randomly choosing 24 breast U/S images from the testing set of the UDIAT repository. The results are presented in the form of predicted segmentation of BC using a MATD convolution network and respective ground truth images. For more understanding of the visual outcomes of the U-shaped multi-attention triple decoder convolution network on the UDIAT repository, a DC score of each tested image is also shown in the results.

Fig. 3
figure 3

The Segmentation visual results of the MATD convolution network for BC segmentation from U/S images on UDIAT dataset

Experiment 2 on BUSI Dataset

In the second experiment, the BC segmentation results are computed by using the proposed MATD convolution network on the breast U/S imaging BUSI dataset. The segmentation outcomes using the BUSI repository are given in Table 2. In this experiment, the segmentation outcomes of the proposed U-shaped multi-attention triple decoder convolution network were also computed by using different configurations. A DC of 64.88% was recorded with a single decoder network, a DC of 83.01% was recorded with a double decoder network, a dice of 86.12% was recorded with a triple decoder network, and a high dice of 89.13% was recorded by using proposed MATD convolution network. The multi-attention triple decoder network significantly improved the segmentation dice scores on the BUSI dataset. The Jaccard index for BC segmentation using a single decoder was 49.58% which was improved by using a double decoder network and recorded as 71.55%. The proposed triple decoder network improves the Jaccard index at 79.01%, and finally, the proposed multi-scale attention-based triple decoder network achieved the highest Jaccard of 82.31% on the BUSI dataset. This experiment validated that the proposed MATD convolution network performed better with multi-attention triple decoder implementation on the BUSI BC U/S dataset.

Table 2 Segmentation outcomes of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images using the BUSI dataset

The visual image level breast tumors localized segmentation results of the proposed U-shaped multi-attention triple decoder convolution network on the BUSI dataset are given in Fig. 4. The visual BC segmentation results as localized tumors from U/S images using the proposed U-shaped multi-attention triple decoder convolution network were computed by randomly choosing 24 breast U/S images from the testing set of the BUSI repository. The results are presented in the form of predicted segmentation of BC using a MATD convolution network and respective ground truth images. For more understanding of the visual outcomes of the U-shaped multi-attention triple decoder convolution network on the BUSI repository a DC score of each tested image is also shown in the results.

Fig. 4
figure 4

Segmentation visual results of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images using the BUSI dataset

Comparison with Existing Methods Using BUSI Dataset

The segmentation result comparison of BC segmentation using the U-shaped multi-attention triple decoder convolution network with existing techniques on the BUSI repository is given in Table 3. For the fair comparison of the U-shaped multi-attention triple decoder convolution network, different well-known image segmentation methods are selected. The comparison was carried out by implementing five existing methods on BUSI dataset including U-Net by Ronneberger et al. [36], U-Net +  + by Zhou et al. [37], DeepLabv3 + by Chen et al. [38], PSP-Net by Zhao et al. [39], and MSU-Net by the Su et al. [40]. To further highlight the contributions of this work, three state-of-the-art methods including [41,42,43] are used to compare the performance of our model with their reported results on BUSI dataset. This comparison concluded that the proposed MATD convolution network method achieved the best DC of 89.13% using the BUSI repository. The bar graphs of the proposed MATD convolution network are presented in Fig. 5 for a better understanding of the comparison. The visual predictions in terms of localized tumors on output images using the proposed U-shaped multi-attention triple decoder convolution network, U-Net, U-Net +  + , PSP-Net, MSU-Net, and DeepLabv3 + are given in Fig. 6. The predicted localized breast tumor segmentation-based comparison showed that the proposed MATD convolution network achieved the highest DC score. This comparison was conducted by using four random U/S images from BUSI repository. From the comparison of the proposed method with existing methods, it can be concluded that proposed multi-scale encoder performed the vital role in extracting the diverse features that was not introduced in earlier studies. The newly introduced multi-scale decoder network with multi-attention mechanism significantly improved the segmentation performance due multi-scale attention mechanism. The more importance of different components of proposed multi-decoder network on BUSI dataset is shown in ablation study in Table 2. The addition of each decoder in the MATD network significantly improved the segmentation performance which is showing the uniqueness of the proposed model.

Table 3 Comparison of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images with existing methods on the BUSI repository
Fig. 5
figure 5

Bar graphs comparison of proposed MATD convolution network for BC segmentation from U/S images using the UDIAT dataset

Fig. 6
figure 6

Visual segmentation including dice scores comparison of the U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images with existing methods on the BUSI repository

Comparison with Existing Methods Using UDIAT Dataset

The segmentation result comparison of BC segmentation using the proposed U-shaped multi-attention triple decoder convolution network with existing techniques on the UDIAT repository is tabulated in Table 4. For the fair comparison of the U-shaped multi-attention triple decoder convolution network, different well-known image segmentation methods are selected. The comparison was carried out by implementing five existing methods on UDIAT dataset including U-Net by Ronneberger et al. [36], U-Net +  + by Zhou et al. [37], DeepLabv3 + by Chen et al. [38], PSP-Net by Zhao et al. [39], and MSU-Net by the Su et al. [40]. To further highlight the contributions of this work, four state-of-the-art methods including [41,42,43, 46] are used to compare the performance of our model with their reported results on UDIAT dataset. This comparison concluded that the proposed MATD convolution network method achieved the best DC of 90.45% on the UDIAT dataset. The line graphs of the proposed MATD convolution network are presented in Fig. 7 for a better understanding of comparison. The visual predictions in terms of localized tumors of the proposed U-shaped multi-attention triple decoder convolution network method with five implemented methods including U-Net, U-Net +  + , PSP-Net, MSU-Net, and DeepLabv3 + are presented in Fig. 8. The predicted localized breast tumors segmentation-based comparison showed that the proposed MATD convolution network achieved the highest DC score. This comparison was conducted by using four random U/S images from the UDIAT repository. From the comparison of the proposed method with existing methods, it can be concluded that proposed multi-scale encoder performed the vital role in extracting the diverse features that was not introduced in earlier studies. The newly introduced multi-scale decoder network with multi-attention mechanism significantly improved the segmentation performance due multi-scale attention mechanism. The more importance of different components of proposed multi-decoder network on UDIAT dataset is shown in ablation study in Table 1. The addition of each decoder in the MATD network significantly improved the segmentation performance which is showing the uniqueness of the proposed model.

Table 4 Comparison of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S images with existing methods using the UDIAT repository
Fig. 7
figure 7

Line graphs comparison of proposed MATD convolution network for BC segmentation from U/S images using the UDIAT dataset

Fig. 8
figure 8

Comparison of the proposed U-shaped multi-attention triple decoder convolution network for BC segmentation from U/S imaging with existing techniques including dice scores using the UDIAT repository

Discussion

BC segmentation from U/S images is being automated by implementing deep learning-based methods. Recently, many auto-encoder-based U-net methods are presented to automate the BC segmentation task. Most of the existing methods applied fixed receptive field-based encoder networks with single decoder networks which are unable to capture the multi-scale features with large spatial information. In recent works on BC segmentation, most of the methods utilized a single attention mechanism to enhance the segmentation performance. This work proposed a MATD convolution network for BC segmentation from U/S images. A multi-scale convolution block with different receptive fields is implemented in the contracting path of the MATD model to capture the diverse spatial high-level image features at different scales. The captured spatial features at different scales are then transformed into multi-scale triple decoder networks. Each decoder network utilized the attention mechanism to highlight the tumor region at different scales. The results of the proposed MATD convolution network model show that the multi-attention triple decoder network significantly improves the segmentation outcomes. The proposed MATD convolution network produced the highest DC of 90.45% and DC of 89.13% on the UDIAT and BUSI datasets, respectively. The real-life application of the U-shaped multi-attention triple decoder convolution network is the implementation of this method in hospitals for BC segmentation from U/S images. The computation efficiency of the proposed model with other competitors’ methods is presented in Table 5. The computation comparison is conducted based on the number of training parameters, inference time, trained model size, and mean frame per second. The computation comparison showed that the MATD method is best suited for BC segmentation with low computation cost.

Table 5 Computation efficiency comparison of the proposed MATD model with existing methods

Conclusion

This work proposed a MATD convolution network for BC segmentation from U/S images. The proposed segmentation method introduced a multi-scale convolution method in the encoder path of the model. The multi-scale learned spatial image features are transformed into a triple decoder network for predicated segmentation mask regeneration. The triple decoder network was implemented at different scales to handle each learned scaled spatial image feature in the encoder path. A multi-attention mechanism is also introduced in each decoder block to highlight the tumor region. The proposed MATD convolution network for BC segmentation from U/S images is composed of four encoder and four decoder blocks. For the transformation of the learned multi-scale spatial image features in the contraction path, the skip connections were utilized. The segmentation results showed that the proposed multi-attention triple decoder network produced the highest segmentation DC. Two publicly available BC U/S image datasets were used to test the performance of the proposed MATD convolution network for BC segmentation from U/S images including BUSI and UDIAT datasets. The proposed method produced a DC score of 90.45% on the UDIAT U/S image dataset and a DC score of 89.13% on the BUSI U/S dataset. In the future, this work will further be enhanced to implement the multi-encoder framework for BC segmentation by using more U/S imaging datasets.