Rethinking U-Net Deep Neural Network for Spine Radiographic Images-Based Spine Vertebrae Segmentation

Tavana, Parisa; Akraminia, Mahdi; Koochari, Abbas; Bagherifard, Abolfazl

doi:10.1007/s40846-023-00828-6

Rethinking U-Net Deep Neural Network for Spine Radiographic Images-Based Spine Vertebrae Segmentation

Original Article
Published: 13 October 2023

Volume 43, pages 574–584, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Medical and Biological Engineering Aims and scope Submit manuscript

Rethinking U-Net Deep Neural Network for Spine Radiographic Images-Based Spine Vertebrae Segmentation

Download PDF

217 Accesses
Explore all metrics

Abstract

Objective

Today, lifestyle changes cause spine abnormalities. Radiographic images are used to diagnose scoliosis, a common spine abnormality. Existing machine-learning algorithms are essential to assist doctors in diagnosis, treatment planning, and interventional guidance. As a subset of machine learning algorithms, deep neural networks can automatically extract features from images to segment and classify medical problems. To enhance the interpretability of radiographic images, this study used a deep, improved U-Net neural network to automatically segment spinal vertebrae.

Method

By modifying the architecture and loss function, the improved U-Net network could be more accurate. To enhance the interpretability of radiographic images and expedite evaluation, this study focuses on the automated segmentation of spinal vertebrae using the improved U-Net neural network. In this machine-learning algorithm, for every deviation between the target label and the network output, the penalty of the loss function is taken into account during the back-propagation of the partial derivative of the loss function.

Results

Finally, the new architecture of the improved U-Net network and combining the weighted cross-entropy loss and dice loss, focusing them on imbalanced and overlapping data sets will increase the accuracy of the improved U-Net network by 4% compared to the conventional U-Net network with binary cross-entropy loss function for the segmentation of the columns’ spine vertebrae, making it a better and more accurate interpretation of the spine by the doctors.

Conclusion

This improves the interpretation of the vertebral column for medical professionals and has a positive impact on orthopedists’ assessments of patients with spinal deformities.

Vertebra Segmentation for Clinical CT Images Using Mask R-CNN

Computer-Aided Diagnosis for Determining Sagittal Spinal Curvatures Using Deep Learning and Radiography

Article 11 March 2022

The measurement of Cobb angle based on spine X-ray images using multi-scale convolutional neural network

Article 12 July 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A common form of scoliosis affecting adolescents is adolescent idiopathic scoliosis (AIS). Lateral curvature of the spine is a medical condition. Spine curves are three-dimensional and can be described as “S” and “C” shapes. It has been observed that the state of some patients is not stable and that the curvature of the spine increases over time. There are usually no mild scoliosis-related problems, but severe cases can cause heart and lung-related problems. It is essential for the clinical evaluation and treatment planning of AIS to make an accurate quantitative assessment of spinal curvature [1].

Radiographic images of the anterior-posterior spine are a standard method to analyze AIS. There is an urgent need for advanced computerized methods that support physicians in diagnosing, planning therapy, and guiding interventional procedures in light of the growing volume of imaging examinations and the complexity of their assessment. To improve the accuracy and speed of treatment plans, machine learning algorithms should be used to improve the interpretability of the results. Machine learning algorithms are used to segment radiographic images for faster diagnosis, better interpretability, and noise removal. The process of segmenting an image involves dividing it into regions with similar characteristics. Machine learning algorithms such as deep convolutional neural networks (CNNs) are important for the analysis of medical images due to their automatic feature extraction capabilities. U-Net is one of the deep CNNs which focuses on segmenting small data sets without requiring complex hardware due to its special architecture. A combined loss function and modified architecture have been presented in this article based on the advantages of the U-Net network to improve the conventional U-Net network in the extraction of spine vertebrae in terms of edge detection, overlapping of spine vertebrae, and segmentation accuracy.

Considering the exact location of the vertebrae of the spine, the article is divided into the following sections: Sect. 2 reviews the literature review, Sect. 3 describes the general framework for the article and the proposed method, and Sect. 4 describes the dataset and results of the study.

2 Literature Review

Recent advances in deep learning have demonstrated the advantages of automatic spinal curvature assessment. Tavana et al. studied classical machine learning methods and pre-trained deep neural networks to classify the type of spinal curvature [2]. An ensemble voting approach was presented by Tavana et al. to improve the classification of spinal curvature types [3]. Wu et al. [4, 5], Sun et al. [6], and Galbusera et al. [7] extracted vertebrate landmarks and Cobb angles directly from spine images based on one-stage bottom-up approaches [8]. Zhong et al. automatically detected the deep learning-based Cobb angle from X-ray CT images [1]. Wang et al. first design a segmentation network that accurately segments the two spine boundaries, and the score-map is then used to input the original X-ray images of the spine into another angle estimation network so that Cobb’s angle can be predicted with high precision using regression [9]. A multi-view extrapolation network was used by Wang et al. [10] to predict the Cobb angle directly. In an article by Nicolaes et al. [11, 12], 3D convolutional networks were used to diagnose vertebral fractures from CT scans. According to Lin et al., radiographic images were used to segment the spine and estimate its curvature using a deep neural network called Seg4Reg [13]. Khanal et al. used two deep neural networks (Faster RCNN and Dense-Net) to identify vertebral and landmark locations and then fitted the landmark curves to estimate Cobb’s angle [14].

Due to the necessity of segmenting spinal vertebrae for better interpretation of radiographic images and for helping doctors plan more effective and accurate treatment, this study aims to discuss the improved architecture of the U-Net network as well as the combination loss function for more accurate spinal vertebrae segmentation.

3 General Framework

Segmentation is an intermediate step in image analysis [15], which involves segmenting an image into different parts with a strong correlation in the region of interest (ROI) within the image [9, 16]. In medical image segmentation [17], the aim is to represent a given input image in a meaningful manner that allows for the study of anatomy, the identification of the region of interest (ROI), and the development of treatment plans. Medical image segmentation assists in the analysis of medical images by highlighting the region of interest within the image [18].

Due to the method of preparing medical images, the type of pathology, and various biological changes, segmenting medical images is a challenging task [15]. To analyze medical images, a medical imaging specialist is required, but there are few of these specialists [19]. It has been demonstrated in recent years that deep learning networks have contributed to the development of newer image segmentation models with improved performance. In addition, deep neural networks are highly accurate in segmenting popular datasets [18]. A segmentation process involves assigning class labels to each pixel within an image. During image classification, only one label is assigned to each image while each pixel is assigned a class label during image segmentation. As a result, segmentation is prone to class imbalance problems. Classes with a large number of pixels can achieve high accuracy while classes with a small number of pixels are less accurate [20]. For the segmentation of medical images, U-Net [21] is a well-known convolutional neural network. A creative combination of a symmetric contracting path and an expansive path is employed in this network, which ultimately results in a more efficient and faster segmentation process. Moreover, it has been widely used for various special problems and has proven to be very effective in segmenting data [22]. In this paper, an improved architecture of the U-Net network and a combination of the dice loss function and the weighted cross-entropy loss function are presented to improve the performance of the conventional U-Net network. Figure 1 shows the general the framework of this paper.

3.1 The Proposed Modified U-Net Network

In this paper, an improved U-Net is presented that focuses on segmenting spinal vertebrae. In this network, the architecture has been modified and expanded so that it requires very few training images and provides more accurate segmentation. Initially, the radiographic images were entered into the network as input and then in the end, the segmentation of the vertebrae of the spine was extracted from the network. In the right corner, the most important details about the U-Net network are indicated. As with the original U-Net, this network consists of a contracting path (left) and an expansive path (right), as well as a bottleneck [21].

However, batch normalization after each convolutional layer, as well as a 0.2 dropout layer after each convolutional block on the contracting path and before each convolutional block on the expansive path were utilized in this study. In the batch normalization process, the outputs of the convolutional layers are normalized to have a mean of zero and a standard deviation of one, and the dropout layer is used to deactivate some neurons of the hidden layer to prevent the network from overfitting. The other modification has been made to the number of filters in each convolutional block to make it more efficient. In the first convolutional block, there are 32 filters, which will double in the following four blocks and increase to 512 at the end. Figure 2 illustrates the modified network architecture in detail.

3.1.1 Down-Sampling or Contracting Path

This path is composed of five blocks. The down-sampling path consists of.

• 2 × (Convolution Layer (3 × 3) with ReLU activation function and batch normalization).

• Max Pooling (2 × 2) and Drop out layer (0.2).

A contracting path is used to capture the semantics or context of the input image for its segmentation. By using convolutional and pooling layers, it extracts features that describe what is in an image.

There is a bottleneck between the expanding and contracting paths of the network. The bottleneck consists of two convolutional layers with batch normalization.

3.1.2 Expanding or Up-Sampling Path

The expanding path, also known as the decoder, consists of five blocks. In the up-sampling path, the following elements are present.

• Deconvolution layer with stride 2 and drop out layer (0.2)

• Concatenation with the corresponding copied feature map from the contracting path.

2 × (Convolution Layer (3 × 3) with ReLU activation function and batch normalization).

Convolution layer (1 × 1) at the end of the expansive path.

This expanding path retrieves the feature map size and adds spatial information for the segmentation image by using up-convolution layers. By using skip connections, course contextual information from the contracting path will be transferred to the up-sampling path.

It is possible to maintain the dimensions of the image by adding zero padding layers. The zero padding method adds zero rows and columns to the input matrix to control the size of the output feature map [23].

3.2 Loss Functions

In neural network training, the cost function plays a crucial role in adjusting the weights of a neural network to create a better-fitting machine learning model. In feedforward propagation, the neural network is run on training set data, and outputs are generated in the case of classification, indicating the probability or confidence in possible labels. By comparing these probabilities to the target labels, the loss function calculates a penalty for any deviation between the target label and the neural network’s output. The partial derivative of the loss function is calculated for each trainable weight during backpropagation. These partial derivatives are used to adjust the weights. Under normal conditions, backpropagation iteratively adjusts the trainable weights of a neural network to produce a model with a lower loss [24].

For segmentation, weighted cross-entropy loss is a loss function that classifies each pixel in an image, adding additional weight to adjust the importance of positive labels. However, the dice and intersection over union (IOU) losses are calculated using a ratio between the prediction result and the ground truth, which provides a measure of the overlap between the prediction result and the ground truth. That is, it predicts the entire image. Different loss functions can be applied to the U-Net network to predict segmentation results from different perspectives. With different loss functions on the modified U-Net network, it is expected that the output of Fig. 2 will be improved as a result of combining loss functions. As a result, the modified U-Net is trained to see the output of all these loss functions.

U-Net is the network mechanism used in the development of the model. The model is capable of segmenting the spine. As part of the study, the following loss functions have been explored for the improved U-Net network: binary cross-entropy, WCE, dice loss function, IOU loss function and finally, the combination of weighted binary cross entropy and dice loss.

3.2.1 Binary Cross Entropy

Generally, cross-entropy [25] refers to the difference between two probability distributions for a given random variable or set of events. Since segmentation is pixel-level classification, it is widely used for classification purposes [26].

$${L}_{BCE}\left(y\text{,}\widehat{y}\right)=-\frac{1}{n}\sum _{i=1}^{n}({y}_{i}\text{log}\left({\widehat{y}}_{i}\right)+\left(1-{y}_{i}\right)\text{log}\left(1-{\widehat{y}}_{i}\right))$$

(1)

The sum of these results occurs over the $n$ pixels, and for each pixel, $i$ represents the position of the pixel,${y}_{i}$ shows the ground-truth value, and ${\widehat{y}}_{i}$ indicates the predicted value of the pixel [27].

3.2.2 Weighted Binary Cross-Entropy

A variant of binary cross-entropy is weighted binary cross-entropy (WCE) [28]. Positive examples are weighted by some coefficients. It is commonly used in cases of skewed data. [24]. Weighted cross entropy is defined as follows:

$${L}_{WBCE}\left(y\text{,}\widehat{y}\right)=-\frac{1}{n}\sum _{i=1}^{n}(\beta {y}_{i}\text{log}\left({\widehat{y}}_{i}\right)+\left(1-{\beta }_{ }\right)[\left(1-{y}_{i}\right)\text{log}\left(1-{\widehat{y}}_{i}\right)])$$

(2)

False negatives and false positives can be tuned with $\beta$ values, e.g., if you want to reduce the number of false negatives, then set $\beta$ > 1, similarly to decrease the number of false positives, set $\beta$ < 1 [26].

3.2.3 Intersection-Over Union (IOU)

According to [29], the IOU loss can solve the imbalance between the two classes (foreground and background) in the segmentation problem. Its function ${L}_{IOU}$ is defined by:

$${L}_{IOU}=1-\frac{\sum _{i=1}^{N}\sum _{j=1}^{C}{y}_{i\text{,}j}{\widehat{y}}_{i\text{,}j}+\epsilon }{{\sum _{i=1}^{N}\sum _{j=1}^{C}{(y}_{i\text{,}j}+{\widehat{y}}_{i\text{,}j}-{y}_{i\text{,}j}{\widehat{y}}_{i\text{,}j})+\epsilon }_{ }^{ }}$$

(3)

This is a formula in which N is the number of pixels, C is the number of classes, and $\varepsilon$ represents a smoothing constant that prevents the denominator from being zero.

3.2.4 Dice Loss

The dice loss is proposed in reference [30] to solve the medical image segmentation problem where the foreground occupies only a small region of the background. It is defined by:

$${L}_{Dice}=1-\frac{2\sum _{i=1}^{N}\sum _{j=1}^{C}{y}_{i\text{,}j}{\widehat{y}}_{i\text{,}j}+\epsilon }{{\sum _{i=1}^{N}\sum _{j=1}^{C}{(y}_{i\text{,}j}+{\widehat{y}}_{i\text{,}j})+\epsilon }_{ }^{ }}$$

(4)

3.2.5 Combo Loss

Some studies have combined distribution-based loss with a region-based loss for small diffuse structures segmentation because dice loss is unsuitable for small diffuse structures. Combo loss is defined as weighted sums of dice loss and WCE loss, to benefit from both Dice and WCE loss [31, 32], which is defined as follows:

$${L}_{WDSC}={-\alpha L}_{WCE}-(1-\alpha ){L}_{DSC}$$

(5)

$${L}_{WDSC}=\alpha \left(-\frac{1}{n}\sum _{i=1}^{n}\beta ({y}_{i}\text{log}\left({\widehat{y}}_{i}\right))+\left(1-{\beta }_{ }\right)[\left(1-{y}_{i}\right)\text{log}\left(1-{\widehat{y}}_{i}\right)]\right)-(1-\alpha )\frac{2\sum _{i=1}^{N}\sum _{j=1}^{C}{y}_{i\text{,}j}{\widehat{y}}_{i\text{,}j}+\epsilon }{{\sum _{i=1}^{N}\sum _{j=1}^{C}{(y}_{i\text{,}j}+{\widehat{y}}_{i\text{,}j})+\epsilon }_{ }^{ }}$$

(6)

${L}_{WCE}$, ${L}_{DSC}$ are weighted cross entropy and dice loss functions, respectively, in formula 5. The hyper-parameter α can also be used to control weighted cross-entropy loss and dice loss.

This is also a pixel classification problem due to the spinal vertebral segmentation. The cross-entropy loss term was used to verify each pixel individually. The weighted cross-entropy loss, however, assesses each and every pixel. Vertebrae usually have a small surface area in anterior-posterior radiographs images. Thus, the segmentation network trained using a cross-entropy loss function is biased toward the background rather than the vertebrae. In addition to dice loss, combo loss is capable of handling input class imbalances, such as segmenting vertebrae from a background. Furthermore, the network can be penalized for false positives and negatives using the weighted cross-entropy loss term to force them to learn better parameters. Experimental results show that the combo loss function is more robust than the weighted cross-entropy loss function and dice loss function [33].

4 Experimental Results

In this section, the experiments and evaluation techniques used to test the performance of the proposed model have been presented. Tensor-Flow has been used as the backend along with the Keras Deep Learning Open Source Framework. All the experiments were conducted on an HP Pavilion Power laptop with the Intel(R) Core(TM) i7-7700HQ CPU @ 2.80 GHz processor. The rest of the hardware specifications of the laptop used for the experiment can be seen in Table 1.

Table 1 Hardware specifications of the computer used for training

Full size table

4.1 Experimental Dataset

The spine data set is available at http://spineweb.digitalimaginggroup.ca/ [5] and has been used for segmentation. There are 609 anterior-posterior radiographic images in this data set. The four corners of each vertebra were extracted by two health professionals at the London Health Sciences Center. Each radiology image consists of 17 vertebrae, each defined by its four corners. The data set is divided into training and testing sections, with 481 images for the former and 128 images for the latter one (Fig. 3).

4.2 Pre-Processing and Data Augmentation

Each image should be pre-processed following the deep neural network. Resizing and normalizing were two crucial steps in this process. As a result of its defined architecture, radiological images are required for the input of the neural network. In the case of the modified U-Net network, the image should be resized and normalized with the network standard.

The training of a neural network requires a large amount of data. Due to the small and limited dataset, the parameters are weakened, and the learned networks are not well generalized. By using existing data, this problem can be partially addressed by data augmentation. The dataset includes 481 training images and 128 testing images. Data augmentation settings for the images are shown in Table 2. Finally, there are 102400 images have been obtained.

Table 2 Data augmentation techniques used in the proposed method

Full size table

4.3 Label Setting

Since the learning process is supervised, image segmentation requires a label or ground truth. Two specialized physicians determined the four corners of each spine vertebra (white points in Fig. 4) in this dataset (for 17 vertebrae). Each vertebra’s label has been received through its four corners. In the modified U-Net network, additional information beyond the vertebrae is removed, leaving only the position of the vertebrae visible in the ground truth image (the spine vertebra range is marked in white, while the outside is marked in black).

4.4 Hyper-Parameter Tuning

This network architecture is based on the original U-Net architecture. As a result of this research, additional batch normalization and drop-out layers have been added to the network architecture, and the number of filters in each convolutional block has been changed. Therefore, it is necessary to train the network from scratch, for this purpose, input images and segmentation masks are used. Several experiments were conducted during the training process by tuning the hyper-parameters of the network.

As hyper-parameter values are selected in deep learning models, such as the learning rate and the number of filters, the optimal values for the given variables were also determined by the grid search to optimize results for the validation set (e.g., one round of cross-validation). The best result has been obtained in equal contributions (i.e., 0.5) of dice and weighted cross entropy terms. False positives need to be penalized more for the model to detect better vertebral boundaries and intervertebral distances (i.e.,$\beta =0\text{.}75$).In addition, the batch size is 64, the epoch is 50, and the drop-out is 0.2, which Adam optimization with a learning rate of 0.001 has been used. In Table 3, all the details of the proposed network architecture are presented.

Table 3 The tuning of hyper-parameters for the proposed network architecture

Full size table

4.5 Performance Metrics

An output image is generated, separating vertebral areas from the background for each input image. The network output is also compared with a ground truth image.

The first criterion for evaluating the results’ accuracy is the total accuracy criterion. This indicates how well the network produced the output image according to the diagnosis of a radiologist:

$$Accuracy=\frac{TP+TN}{N}$$

(7)

As shown in Eq. 7, TP is the number of pixels that correspond to the vertebrae of the correctly detected spine (i.e., these pixels are the vertebrae of the spine both on the neural network’s output and the ground truth image). TN represents the number of correctly detected non-vertebral pixels, and N indicates the total number of pixels in the input image.

Since more non-vertebral pixels exist in the spine than vertebral pixels, the accuracy criterion may not be accurate in this case. Precision, recall, and dice similarity coefficient are therefore used.

$$Precision=\frac{TP}{TP+FP}$$

(8)

$${\text{Recall = }}\frac{{{\text{TP}}}}{{{\text{TP + FN}}}}$$

(9)

$$DSC=\frac{2TP}{2TP+FP+FN}$$

(10)

In Eq. 8, 9, and 10, TP represents the number of correctly detected vertebrae pixels. The FP value indicates the number of non-detected spinal non-vertebra pixels (i.e., in the neural network output, the pixels are vertebrae, whereas, in the ground truth, they are non-vertebrae). FN represents the number of vertebrae pixels that failed to be identified (that is, the output of the neural network represents the pixels of the non-vertebrae, whereas the ground truth represents the pixels of the vertebrae).

4.6 Results

As part of the training process, parameters and hyper-parameters were tuned to test and evaluate the performance of the proposed network. As with the training dataset, the test dataset is pre-processed and normalized. As shown in Fig. 5, the proposed loss functions result in the following outputs. The size of all the images in Fig. 5 is the same.

According to the proposed loss functions, Table 4 displays the accuracy, precision, recall, and Dice similarity coefficient for the improved U-Net network for spine vertebrae segmentation and compares this network with conventional U-Net [21], MultiResNet [34], Pre-trained Mask-RCNN101 [35], U-Net++ [36], and Dense-U-net [37].

Table 4 Comparing the improved U-net with conventional U-net, pre-trained Mask-RCNN, MultiResNet, U-Net++, and dense U-net using the different loss functions for segmentation of spinal vertebrae

Full size table

As shown in Fig. 5; Table 4, the improved U-Net network has a better architecture for spine vertebrae segmentation than that of conventional U-Net [21], MultiResNet [34], Pre-trained Mask-RCNN101 [35], U-Net++ [36], and Dense U-Net [37] networks. As the number of filters in each convolution block changes, the number of convolutional blocks increases, batch normalization is applied to the convolutional blocks, and drop-out layers are added after max-pooling and up-convolution layers to segment the spine vertebrae. In addition, the combo loss function (${Loss}_{WDSE})$, which is made from the combination of the dice loss function and the weighted cross entropy loss function with $\alpha$ value, has a significant effect by tuning false positives and false negatives with the $\beta$ value in the performance as well as accurately detecting the vertebrae’s position.

5 Discussion and Conclusion

There is a significant increase in spinal abnormalities as a result of changing people’s lifestyles. Scoliosis is a spinal deformity characterized by an abnormal structure in the spine. The gold standard for diagnosing spine abnormalities is radiographic images. A large number of patients, the time-consuming examinations, and the small number of doctors call for the use of machine learning algorithms to help doctors and speed up evaluations. This article uses the improved U-Net neural network for better interpreting radiographic images and tries to improve the segmentation results by combining the loss function with improved U-Net.

As the IOU function loss, false positive predictions have a lower error rate than false negative predictions. The probability of receiving a false positive prediction is higher for a category with few pixels, so the network might settle for false negatives. False negatives and false positives are penalized less by the Dice Loss than by the IOU Loss, and the difference between false positives and false negatives is smaller. Accordingly, the imbalance in probabilities should be compensated slightly, leading to improved performance. The 2-class penalizes all errors from both perspectives, resulting in a smaller difference between false negatives and false positives. Additionally, weighted cross-entropy loss is used for classifying pixels in images, which also adjusts positive labels’ importance based on additional weight. Dice loss functions combined with weighted cross-entropy have a positive effect on the network’s final results, and the network can easily extract the spine vertebrae for the doctor to evaluate.

Data Availability

The data that support the findings of this study are available from http://spineweb.digitalimaginggroup.ca [5], but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission from http://spineweb.digitalimaginggroup.ca [5].

Abbreviations

Adam:: Adaptive moment estimation
AIS:: Adolescent idiopathic scoliosis
CNN:: Convolutional neural networks
CT:: Computerized tomography
DSC:: Dice similarity coefficient
Faster R-CNN:: Faster region-based convolutional neural network
FP:: False positive
FN:: False negative
IOU:: Intersection-over union
Mask-RCNN:: Mask region-based convolutional neural networks
ReLU:: Rectified linear unit
ROI:: Region of interest
TP:: True positive
TN:: True negative
WCB:: Weighted binary cross entropy
y_i :: The ground-truth value
ŷ:: The predicted value
∝:: The amount of weighted cross entropy loss and dice loss terms contribution to building the proposed loss function
β:: The value for tuning false negatives and false positives
C:: The number of classes
N:: The number of pixels
L_BCE :: Binary cross-entropy loss function
L_Dice :: Dice loss function
L_IOU :: IOU loss function
L_WBCE :: Weighted cross-entropy loss function
L_WDSC :: Combo loss function (the proposed loss function)

References

Zhong, Z., Li, J., Zhang, Z., Jiao, Z., & Gao, X. (2019). “A coarse-to-fine deep heatmap regression method for adolescent idiopathic scoliosis assessment,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 101–106.
Tavana, P., Akraminia, M., Koochari, A., & Bagherifard, A. (2023). Classification of spinal curvature types using radiography images: deep learning versus classical methods. Artif. Intell. Rev., 56, 1–33.
Article Google Scholar
Tavana, P., Akraminia, M., Koochari, A., & Bagherifard, A. (2023). An efficient ensemble method for detecting spinal curvature type using deep transfer learning and soft voting classifier. Expert Systems with Applications, 213, 119290.
Article Google Scholar
Wu, H., Bailey, C., Rasoulinejad, P., & Li, S. (2018). Automated comprehensive adolescent idiopathic scoliosis assessment using MVC-Net. Medical Image Analysis, 48, 1–11.
Article PubMed Google Scholar
Wu, H., Bailey, C., Rasoulinejad, P., & Li, S. (2017). “Automatic landmark estimation for adolescent idiopathic scoliosis assessment using BoostNet,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 127–135.
Sun, H., Zhen, X., Bailey, C., Rasoulinejad, P., Yin, Y., & Li, S. (2017). “Direct estimation of spinal cobb angles by structured multi-output regression,” in International conference on information processing in medical imaging, pp. 529–540.
Galbusera, F., Bassani, T., Costa, F., Brayda-Bruno, M., Zerbi, A., & Wilke, H. J. (2018). Artificial neural networks for the recognition of vertebral landmarks in the lumbar spine. Comput Methods Biomech Biomed Eng Imaging Vis, 6(4), 447–452.
Article Google Scholar
Tao, R., Xu, S., Wu, H., Zhang, C., & Lv, C. (2019). “Automated spinal curvature assessment from X-ray images using landmarks estimation network via rotation proposals,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 95–100.
Wang, S., Huang, S., & Wang, L. (2019). “Spinal curve guide network (SCG-Net) for accurate automated spinal curvature estimation,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 107–112.
Wang, L., Xu, Q., Leung, S., Chung, J., Chen, B., & Li, S. (2019). Accurate automated Cobb angles estimation using multi-view extrapolation net. Medical Image Analysis, 58, 101542.
Article PubMed Google Scholar
Nicolaes, J. (2019). “Detection of vertebral fractures in CT using 3D convolutional neural networks,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 3–14.
Nicolaes, J. (2020). Automated detection of vertebral fractures in CT using 3D convolutional neural networks. In European Calcified Tissue Society 2020.
Lin, Y., Zhou, H. Y., Ma, K., Yang, X., & Zheng, Y. (2019). “Seg4Reg networks for automated spinal curvature estimation,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 69–74.
Khanal, B., Dahal, L., Adhikari, P., & Khanal, B. (2019). “Automatic cobb angle detection using vertebra detector and vertebra corners regression,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 81–87.
Olabarriaga, S. D., & Smeulders, A. W. M. (2001). Interaction in the segmentation of medical images: A survey. Medical Image Analysis, 5(2), 127–142.
Article CAS PubMed Google Scholar
Thoma, M. (2016). “A survey of semantic segmentation,” arXiv Prepr. arXiv1602.06541.
Sharma, N., & Aggarwal, L. M. (2010). Automated medical image segmentation techniques. J Med physics/Association Med Phys India, 35(1), 3.
Google Scholar
Malhotra, P., Gupta, S., Koundal, D., Zaguia, A., & Enbeyle, W. (2022). Deep neural networks for medical image segmentation. J. Healthc. Eng. https://doi.org/10.1155/2022/9580991
Article PubMed PubMed Central Google Scholar
Rajpurkar, P., et al. (2018). Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. Plos Medicine, 15(11), e1002686.
Article PubMed PubMed Central Google Scholar
Fujii, H., Tanaka, H., Ikeuchi, M., & Hotta, K. (2021). X-net with different loss functions for cell image segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3793–3800.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241.
Liu, H. (2019). “Semi-supervised Semantic Segmentation of Multiple Lumbosacral Structures on CT,” in International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 47–59.
Albawi, S., Bayat, O., Al-Azawi, S., & Ucan, O. N. (2018). Research article social touch gesture recognition using convolutional neural network. Comput. Intell. Neurosci. https://doi.org/10.1155/2018/6973103
Article PubMed PubMed Central Google Scholar
Ho, Y., & Wookey, S. (2019). The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. Ieee Access : Practical Innovations, Open Solutions, 8, 4806–4813.
Article Google Scholar
Yi-de, M., Qing, L., & Zhi-Bai, Q. (2004). Automated image segmentation using improved PCNN model based on cross-entropy, in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 743–746.
Jadon, S. (2020). “A survey of loss functions for semantic segmentation,” in 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7.
Zhao, H., & Sun, N. (2017). “Improved U-net model for nerve segmentation,” in International Conference on Image and Graphics, pp. 496–504.
Pihur, V., Datta, S., & Datta, S. (2007). Weighted rank aggregation of cluster validation measures: A monte carlo cross-entropy approach. Bioinformatics, 23(13), 1607–1615.
Article CAS PubMed Google Scholar
Rahman, M. A., & Wang, Y. (2016). “Optimizing intersection-over-union in deep neural networks for image segmentation,” in International symposium on visual computing, pp. 234–244.
Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation, in fourth international conference on 3D vision (3DV), 2016, pp. 565–571.
García-Lorenzo, D., Francis, S., Narayanan, S., Arnold, D. L., & Collins, D. L. (2013). Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Medical Image Analysis, 17(1), 1–18.
Article PubMed Google Scholar
Zhang, Y., Liu, S., Li, C., & Wang, J. (2021). Rethinking the dice loss for deep learning lesion segmentation in medical images. J Shanghai Jiaotong Univ, 26(1), 93–102.
Article Google Scholar
Thanh, N. C., & Long, T. Q. (2021). CRF-EfficientUNet: An Improved UNet Framework for Polyp Segmentation in Colonoscopy images with combined asymmetric loss function and CRF-RNN Layer. Ieee Access : Practical Innovations, Open Solutions, 9, 156987–157001.
Article Google Scholar
Ibtehaz, N., & Rahman, M. S. (2020). MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Networks, 121, 74–87.
Article PubMed Google Scholar
Ullo, S. L., et al. (2021). A new mask R-CNN-based method for improved landslide detection. IEEE J Sel Top Appl Earth Obs Remote Sens, 14, 3799–3810.
Article Google Scholar
Zhao, C., Shuai, R., Ma, L., Liu, W., & Wu, M. (2022). Segmentation of skin lesions image based on U-Net++. Multimed Tools Appl, 81(6), 8691–8717.
Article Google Scholar
Wu, Y., Wu, J., Jin, S., Cao, L., & Jin, G. (2021). Dense-U-net: Dense encoder–decoder network for holographic imaging of 3D particle fields. Optics Communication, 493, 126970.
Article CAS Google Scholar

Download references

Acknowledgements

The authors are sincerely grateful to Dr. M. R. Chehrassan, H. Mohseni, and M. Taghavi for answering numerous questions related to scoliosis and spinal disorders with great patience.

Funding

The authors declare that no funds, grants or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Parisa Tavana
Mechanical Rotary Equipment Research Department, Niroo Research Institute, Tehran, Iran
Mahdi Akraminia
Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Abbas Koochari
Bone and Joint Reconstruction Research Center, Shafa Orthopedic Hospital, Iran University of Medical Sciences, Tehran, Iran
Abolfazl Bagherifard

Authors

Parisa Tavana
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Akraminia
View author publications
You can also search for this author in PubMed Google Scholar
Abbas Koochari
View author publications
You can also search for this author in PubMed Google Scholar
Abolfazl Bagherifard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by PT, MA, AK and AB. The first draft of the manuscript was written by PT and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mahdi Akraminia.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

The data set was obtained from http://spineweb.digitalimaginggroup.ca/ that its protocols are in accordance with the ethical standards of our institution and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Therefore, as the personal information of the patients has not been used in this study and the privacy of the patients has been respected, an ethics approval declaration is not required.

Consent to Participate

The data that support the findings of this study are available from http://spineweb.digitalimaginggroup.ca [5], but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

Consent to Publish

The data that support the findings of this study are available from http://spineweb.digitalimaginggroup.ca [5], but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 34.4 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tavana, P., Akraminia, M., Koochari, A. et al. Rethinking U-Net Deep Neural Network for Spine Radiographic Images-Based Spine Vertebrae Segmentation. J. Med. Biol. Eng. 43, 574–584 (2023). https://doi.org/10.1007/s40846-023-00828-6

Download citation

Received: 13 June 2023
Accepted: 20 September 2023
Published: 13 October 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s40846-023-00828-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rethinking U-Net Deep Neural Network for Spine Radiographic Images-Based Spine Vertebrae Segmentation

Abstract

Objective

Method

Results

Conclusion

Similar content being viewed by others

Vertebra Segmentation for Clinical CT Images Using Mask R-CNN

Computer-Aided Diagnosis for Determining Sagittal Spinal Curvatures Using Deep Learning and Radiography

The measurement of Cobb angle based on spine X-ray images using multi-scale convolutional neural network

Explore related subjects

1 Introduction

2 Literature Review

3 General Framework

3.1 The Proposed Modified U-Net Network

3.1.1 Down-Sampling or Contracting Path

3.1.2 Expanding or Up-Sampling Path

3.2 Loss Functions

3.2.1 Binary Cross Entropy

3.2.2 Weighted Binary Cross-Entropy

3.2.3 Intersection-Over Union (IOU)

3.2.4 Dice Loss

3.2.5 Combo Loss

4 Experimental Results

4.1 Experimental Dataset

4.2 Pre-Processing and Data Augmentation

4.3 Label Setting

4.4 Hyper-Parameter Tuning

4.5 Performance Metrics

4.6 Results

5 Discussion and Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to Participate

Consent to Publish

Additional information

Publisher’s Note

Supplementary Information

Supplementary material 1 (PDF 34.4 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation