Keywords

1 Introduction

Adolescent idiopathic scoliosis (AIS) is an abnormal curvature, in which the spine curves abnormally from side to side and rotates. The Cobb measuring method is the gold standard used in quantifying the curve. The Cobb angle (CA) is measured from the most tilted vertebra (end vertebra) above and below the apex (most laterally placed vertebra) of the curve on radiographs taken either in the anterior–posterior or the posterior-anterior view on the coronal plane [1]. Generally, the manual procedure requires lines to be drawn onto a hardcopy of radiographic films and the angle between the two lines are measured using a protractor (Fig. 1).

Fig. 1
figure 1

Measurement of Cobb angle (CA)

In addition to being time consuming, CA measurement of spine is unreliable [2] and is associated with inter-observer and intra-observer variations [3]. Reported accuracies of measuring CA vary from 2° to 11° [46] with measurements differing up to 5° even with the same end vertebrae selected [4, 7]. The emergence of computerized digitization of radiographs has allowed for semi-automatic assessment of CA where the picture archiving and communications system (PACS) allows for an in-built function enabling the user to digitally draw the lines for the end vertebrae and the system automatically measures the CA. This has shown to have good reliability and less variations compared to the manual method [6, 8]. However, this is method is still dependent on the manual inputs from the user.

The availability of an accurate reproducible CA measurement method is important as the assessment of the CA is used to diagnose AIS and to guide decisions regarding curve progression as well as therapeutic options including surgical interventions. In addition, the method has be user-friendly and takes less time compared to the manual and semi-automatic methods.

The development of computer vision technologies [9], machine learning methods [10, 11], and deep learning methods [1214] led to attempts for the transition from traditional and semi-automatic CA assessment to automated CA measurement. In this instance, the X-ray images were processed by means of computerized learning method to enable CA measurement and prediction. The various machine learning methods used in scoliosis clinical practice, including screening, diagnosis and classification, have previously been reported [11]. Vertebrae detection is an important stage to identify the landmarks of interest on the spine image for CA measurement. Studies on the vertebrae and spine detection based on machine learning methods [10, 11] and deep learning methods [1214] have been previously reported. Bernstein et al. showed that a neural network (NN) can be applied to automatically train the vertebrae centroids detection [10]. However, automatic detection of the vertebrae in X-ray images can be difficult. Computer vision task is challenging in X-rays compared to computed tomography (CT) and magnetic resonance imaging (MRI) images due to multiple overlapping shadows of the ribs and pelvis, as well as, a differences in contrast between thoracic and lumbar vertebrae regions [15, 16]. Convolutional neural network (CNN) architecture can help overcome this problem [10]. Researchers in the field of deep learning (DL) have previously develop fully automated methods for Cobb angle measurement (Table 1). The methods had successfully detected and segmented the vertebrae and the spine. However, most of the studies were not collaborated with the clinicians and the results obtained were not compared with measurements made in the real-world setting.

Table 1 Brief summary of selected papers in CA measurement using deep learning methods

In this paper, we propose a CNN for spine vertebra detection, CA evaluation and curve severity classification in AIS. The main objective is to automate and augment (1) the detection and the assessment of the CA, and (2) the confirmation of presence of scoliosis, based on standard spine X-ray images. The severity of the scoliosis is also classified into mild (10°–25°), moderate (>25° to 40°), and severe (> 40°).

2 Proposed Method

2.1 Datasets

The collection and labelling of spinal images were performed by the public AASCE MICCAI 2019 anterior–posterior X-ray images dataset [17]. The input images vary in sizes from 359 × 973 to 1427 × 3755. Each image contains 17 vertebrae from the thoracic (upper spine) and lumbar (lower spine). The image input resolution is set to 1024 × 512 for the algorithm development. A total of 962 images are used as follows, 481 images for training, 323 images for validation, and 158 images for testing are used. Each vertebrae is located by 4 corner landmarks. The ground-truth of the 68 landmarks or points in each image is provided by the dataset.

2.2 System Overview

The 50-layer ResNet [18] is used to classify 68 landmarks to obtain the corner offset of spine. This CNN consists of several convolutional layers that learn the local features of the images and generate the classifications. The proposed network (Fig. 2) includes pooling layers (average pool and max pool), classification, and corner offset. Combination of semantically similar features into a single feature reduces the dimensions of the extracted features and fully connected layers, and gives a final probability value for the class. Network depth has been previously shown to be beneficial to classification accuracy [19]. However, its performance can become saturated with resultant rapid decrease in performance as the network gained greater depth. This issue can be fixed by the ResNet framework [20] where a shortcut connection is added for every three convolution layers across the deep network. These shortcut connections performed identity mapping without additional parameters which can increase computational complexity. This simplification of network optimization during the training process enables ResNet to achieve a higher accuracy from deeper networks when performing image classification tasks.

Fig. 2
figure 2

Vertebrae detection network

The ResNet50 architecture is mainly composed of residual blocks (Fig. 2). Residual connection in ResNet architecture maintains connection to gain knowledge during training and speed up model training time by increasing network capacity. Batch normalization with ReLU activation is added for each convolutional layer. Bi-cubic interpolation is used as upscaling method. The skip connection technique is performed to exploit high-level semantic information and low-level fine details to improve model performance.

During the training process, a fine-tuning technique is applied to transfer the connection weights from the pre-trained model to our model and retrain the model to the current task. This model accepts an image as input and performs a fully connected layer as a final assessment. Finally, the model outputs the bounding box of each target object as well as the corresponding category label.

The X-ray images used contain 17 vertebrae, where each vertebrae has 4 corner landmarks: top-left, top-right, bottom-left and bottom-right. Therefore, each image has a total of 68 landmarks. The order of the landmarks is used to accurately localize the vertebrae, so that the slope of each can be known. The landmarks were separated into different groups to obtain an output feature map with a channel number of 68. Then, a heat map of the center point [21] is constructed to obtain a corner offset maps using a convolutional layers for landmark localization.

Landmarks of each corner of the vertebrae were obtained using the corner offset. The corner offset was obtained from the center of the heat map to the vertebrae margin using L1 loss to optimize the corner offset at the midpoint.

2.3 Cobb Angle Measurement for Classification

A review study on the classification of AIS is presented in [22]. The Author reviewed the clinical classification of AIS from a few previous studies. It mentioned that the classification provides a better and more reliable tool to assist surgeons in determining the appropriate method of treatment for certain curve pattern. In addition, with the developing methods in 3D reconstruction may be used as a basis classification for new therapeutic concepts [22].

In this study, the application of CNN for AIS classification is presented. The steps to calculate the CA from the position of each corner found is presented in Fig. 3. After detecting an object on the X-ray image, detected bounding boxes are displayed on the spine. Boxes with a score of more than 0.5 were extracted. From the location of the detected boxes, the center point of each vertebrae is found to remove some outliers based on the anatomy of spine, where the adjacent vertebrae should not be far apart from each other. If the x-axis center of the detected bounding box is more than half the width of the box from the x-axis center of its two closest neighbors (top and bottom), the box is rejected as an outlier. Otherwise, the position of the box is reconsidered based on the position of the nearest boxes.

Fig. 3
figure 3

CA measurement for classification

Following this, the depth of the curve at the found position of the corner box is calculated. For each of the two vertebrae, the distance between the bottom-left point of the upper box and the top-left of the lower box, and the bottom-right point of the upper box and the top-right of the lower box is calculated. The apex of the spinal is found as the deepest part of the curve.

For each box above the apex, the slope of each vertebrae is measured based on the position between top-left and top-right to detect the most-tilted vertebrae above the apex. For each box below the apex, the slope of each vertebrae is measured based on the position between bottom-left and bottom-right to detect most-tilted vertebra below the apex. The Cobb angle is then calculated as the angle of the intersection between two lines from the most-tilted vertebrae above the apex and most-tilted vertebrae below the apex.

3 Results

The datasets was trained on the RTX2060 GPU with Intel Core-i7 processor. Figure 4 shows the performance of the training dataset and the validation dataset when training the network. The models are initialized from the pre-trained weights on ImageNet. The network was trained with a learning rate of 0.0001 with Adam optimizer during training. The batch and epoch sizes are set as 2 and 100, respectively.

Fig. 4
figure 4

Performance of the dataset in network training

Figure 5 shows the result of the detection of vertebrae in the spine on the X-ray images. The condition of the patient's spine is classified as normal if the measured CA is less than 10°. For mild, moderate, and severe AIS, the CA measurements are 10° to 25°, >25° to 40°, and >40°, respectively.

Fig. 5
figure 5

Detection results: a normal, b mild, c moderate, and d severe

The performance metrics comparison of the four classes is summarized in Table 2. Precision rate (PR), Recall, and F1-measure [23] can be computed as follows:

Table 2 Performance metrics comparison
$$PR = \frac{TP}{{TP + FP}}$$
(1)
$$Recall = \frac{TP}{{TP + FN}}$$
(2)
$$F1 - measure = 2 \times \frac{PR \times Recall}{{PR + Recall}}$$
(3)

where TP is true positive, FP is false positive, and FN is false negative. TP is the detected area of the vertebrae and corresponds to the associated class. FP is the detected area not associated with the vertebrae. FN is the area associated with the vertebrae that is not detected.

Severe AIS has the lowest PR, Recall, and F1-measure compared to the other classes. The increase in the curvature made it difficult for the vertebrae to be detected in the area of the arch. In some cases, the vertebrae is completed undetected or misrepresented.

Mild AIS has the highest PR, Recall, and F1-measure compared to other classes. The network can detect the vertebrae well as the spine is not too curved. In addition, the X-ray images have good lighting and contrast conditions for this class of spine in our dataset. Normal spine has lower accuracy than mild AIS as some image conditions for this class are not optimum. This resulted in a high number of FP and FN in the detection.

4 Discussion

The proposed architecture using CNN accurately detected the location of each of the 17 vertebrae in the spine X-ray. In addition to this, the bounding box was evaluated to be sufficient in its accordance with the vertebra positions. Its performance was accurate to provide the information needed to detect the superior and inferior end vertebrae, enabling the CA to be evaluated correctly.

The detection results also showed that the proposed architecture can be used to identify the vertebrae in X-ray images of different contrast and lighting conditions. Our test on several images with poor contrast and lighting conditions yielded good results. Importantly, CA measurements and curve classification were able to be accurately accomplished even when the detection process failed to identify one or two vertebrae. This is was a key part of the algorithm as X-ray images may come in different contrast and lighting qualities in the clinical setting, depending the severity of the curve as well as the patient’s body habitus.

Previous studies using CNN [1214] focus on vertebrae detection and measurement of CA under certain conditions but did not classify the severity of scoliosis. The method we proposed was able to measure CA from normal to severely scoliotic spine (up to 81°). This was an important step as severely abnormal curvatures were often difficult to detect.

There are some limitations to our study. In this proposed CNN, more errors in detection had occurred in images where the X-ray were of different sizes and when it involved larger areas from the neck to the hip which were not important landmarks for vertebrae detection. Further improvements with automatic image cropping to satisfy the conditions for optimal vertebrae detection is ongoing. Lastly, the results from this CNN were not validated against the clinicians’ CA measurements (which remains the gold standard). This important final step will be crucial in confirming that this CNN will be capable in augmenting the specialist clinician’s ability to accurately measure CA and may be used as a tool for non-specialist clinicians and nurses to assess CA in AIS patients.

5 Conclusions

A convolutional neural network for vertebrae spine detection, Cobb angle measurement and curvature severity classification in X-ray images of adolescent idiopathic scoliosis is proposed in this paper. The detection of vertebrae and classification had an accuracy of 0.9 (90%). Upon clinical validation, this architecture may be used as tool to augment Cobb angle measurement in X-ray images of patients with adolescent idiopathic scoliosis in a real-world clinical setting. A developed CNN method is also possible to be implemented for the real-time assessment or monitoring of scoliosis patients in the future [24].