Keywords

1 Introduction

Adolescent idiopathic scoliosis (AIS) causes lateral curvature of the spine and rotation of the thorax and usually occurs in adolescents at or around puberty [1]. The diagnosis of AIS is based on accurate measurement of the Cobb angle. The Cobb angle refers to the angle between the upper and lower endplates of the end vertebrae. Manually measuring the Cobb angle clinically requires the radiologist to measure the angle of inclination of each vertebra on the patient’s anterior and posterior radiographs, which is time-consuming, and the accuracy is affected by factors such as end vertebra selection, intra-observer and inter-observer variation, and so on. Therefore, it is necessary to propose an accurate and robust automatic Cobb angle measurement method.

With the development of deep learning, many methods for automatic Cobb angle measurement have been proposed. We roughly classify these methods into two categories: (1) Direct estimation of the Cobb angle, these methods regress Cobb angle [2,3,4,5] from the original image [5] or coarse processing results of the original image such as coarse segmentation results [2, 3], spine centerline [4], etc., which achieve end-to-end Cobb angle measurement, but ignore the importance of vertebral structures and lack interpretability. (2) Curvature-based Cobb angle estimation methods that rely on vertebral features [6,7,8,9,10,11]. These methods first extract vertebral feature information through neural networks, and then perform curvature calculation post-processing based on the extracted features. For example, some works detect landmarks and thus calculate Cobb angles by the landmarks [6,7,8, 10]. Some works calculate Cobb angle by segmenting the vertebrae and finding the upper and lower end plates of the vertebrae [9, 11]. These methods can obtain richer information about the spine structure for the subsequent treatment of scoliosis, but they often focus on a single vertebral feature and cannot meet the accuracy requirements of vertebral features for post-processing of curvature calculations.

Fig. 1.
figure 1

An overview of the proposed method.

In this study, in order to obtain interpretable Cobb angle calculation results, we discard the end-to-end regression angle approach and choose the post-processing approach to calculate Cobb angle. Given the advantages and disadvantages of the landmarks detection and contour segmentation, we combine the two approaches through the proposed MVIE-Net. We use the more robust confidence map to locate landmarks, since the direct landmark regression approach is vulnerable to accuracy and robustness. we generate confidence maps at landmark locations and obtain the coordinates of landmarks by segmenting and parsing the confidence maps. The proposed MVIE-Net adopts a dual-task codec structure, where two tasks share the same encoder and have independent decoders, and the tasks interact with each other through a jump connection. We pair landmarks based on the location information obtained from vertebral contour segmentation.

In summary, the main contributions of this study are as follows:

  • We use confidence maps to locate landmark coordinates, transforming the traditional points regression task into a confidence map segmentation task.

  • We propose the simple and efficient multi-task learning framework MVIE-Net to simultaneously segment the vertebral contour and the confidence maps of landmarks.

  • We combine vertebral contour information and landmarks information and match the landmarks of vertebrae by the relative position relationship between contours.

2 Method

As shown in Fig. 1, the spine X-ray images are input to the proposed MVIE-Net to generate segmentation results for vertebral contours and landmark confidence maps, respectively. Next, landmarks are resolved in the confidence map and paired left and right according to the relative configuration of vertebrae. Finally, Cobb angles are calculated from the paired landmarks.

2.1 Confidence Map of Landmarks

The vector \(T=({t}_{1},{ t}_{2}, \dots , { t}_{34})\), \(B=({b}_{1},{b}_{2}, \dots , {b}_{34})\), \(C=({c}_{1},{c}_{2}, \dots , {c}_{34})\) represents the set of upper landmarks, lower landmarks and centers of mass of 12 thoracic and 5 lumbar vertebrae in the spine, respectively, where \({t}_{2j-1}=({t}_{(2j-1, x)}, {t}_{(2j-1, y)})\), \({b}_{2j-1}=({b}_{(2j-1, x)}, {b}_{(2j-1, y)})\), \({c}_{2j-1}=({c}_{(2j-1, x)}, {c}_{(2j-1, y)})\), \(j=1,\dots , 17\) denotes the coordinates of the four corner points along the clockwise direction and the centroid of the j-th vertebra, respectively.

Fig. 2.
figure 2

Confidence maps. (a)–(c) show the confidence maps generated at 68 landamrks for σ = 2, 4, 6. σ = 2 is not conducive to the segmentation of the confidence maps, and σ = 6 radiative range intersects, so we set σ = 4. However, integrating all key points into one confidence map makes the resolution of the segmented points difficult, so as shown in (d)–(f), three confidence maps are generated at the upper landmarks, the lower landmarks, and the centroid the vertebrae confidence maps.

To estimate the locations of 85 key points (68 landmarks and the 17 centroids of the vertebra), we employ confidence maps [12, 13] to represent the belief of each pixel location \(x = \left( {x^{\prime}, y^{\prime}} \right), \,x \in I\) with respect to the landmark and centroid. Considering the interference of intersecting confidence maps at different key points to parse the landmarks, we generate three confidence maps at the upper landmarks, lower landmarks, and form center of the vertebrae as the segmentation labels for landmark detection, which is defined by Eq. (1), Eq. (2), Eq. (3) respectively:

$${\Psi }_{t}(x)=(exp(-\frac{\parallel {\text{x}}-{\text{t}}_{1}{\parallel }^{2}}{2{\sigma }^{2}}),...,exp(-\frac{\parallel {\text{x}}-{\text{t}}_{34}{\parallel }^{2}}{2{\sigma }^{2}})),$$
(1)
$${\Psi }_{b}(x)=(exp(-\frac{\parallel {\text{x}}-{\text{b}}_{1}{\parallel }^{2}}{2{\sigma }^{2}}),...,exp(-\frac{\parallel {\text{x}}-{\text{b}}_{34}{\parallel }^{2}}{2{\sigma }^{2}})),$$
(2)
$${\Psi }_{c}(x)=(exp(-\frac{\parallel {\text{x}}-{\text{c}}_{1}{\parallel }^{2}}{2{\sigma }^{2}}),...,exp(-\frac{\parallel {\text{x}}-{\text{c}}_{17}{\parallel }^{2}}{2{\sigma }^{2}})), $$
(3)

where \(\sigma \) is the radiation radius of the confidence map generated by the key points. We tested the effect of different \(\sigma \) and finally set \(\sigma =4\), as shown in Fig. 2.

2.2 The Proposed MVIE-Net

As shown in Fig. 3, the hard parameter sharing structure [14] is used in our multi-task learning framework taking into account the similarity of vertebral contour segmentation and keypoint detection tasks. The two tasks share encoders with unique symmetric decoders. The basic convolution module consists of two \(3\times 3\) convolutions. ELU activation function and Batch Normalization are used to optimize the model parameters. The codec side uses jump connections to fuse low-level spatial location features with high-level semantic features [15].

Fig. 3.
figure 3

MVIE-Net architecture. The proposed MVIE-Net is designed to perform vertebral contour segmentation and key points detection tasks at the same time.

On the encoder side, we extract the features of the image using the base convolution modules, and each base module is followed by a max pooling to halve the image size, and finally, we reduce the size of the feature map to 1/32 of the original size by down-sampling 5 times.

On the decoder side, we halve the channels and double the feature map size using a \(2\times 2\) transposed convolution. The feature information of the last layer of the two decoders is fused together byconcatenation, which allows the two tasks to interact with each other. At the end of the two decoders, we use two binary cross entropy losses \({L}_{c}\), \({L}_{p}\) as the loss functions for the two tasks.Then the loss function of the whole network is \(L = \lambda {L}_{c} + {L}_{p}\), and by experiment, we set \(\lambda =0.3\).

2.3 Cobb Angle Measurement

Given the lack of interpretability of regressing the Cobb angle directly from the model, we calculate the Cobb angle by mathematical modeling. We first extract the information needed to calculate the Cobb angle from the network segmentation results, and then model and calculate the Cobb angle based on the Cobb angle definition.

Post-processing Contour Segmentation Results

As shown in Fig. 4, we binarize the vertebral segmentation results and then calculate the minimum bounding rectangle of the vertebral contour. A line passing through the centroid of the vertebra and parallel to the MBR intersects the left and right midpoints of the vertebral contour. Intercept the middle 2/3 of the line connecting the left and right midpoints and make a vertical line through the two endpoints of the intercepted line. By fitting a straight line to the set of upper and lower boundary points of the vertebrae intercepted by two vertical lines, we obtain the upper and lower end plates similar to those manually labeled by the physician.

Fig. 4.
figure 4

Post-processing process of vertebral contour segmentation results. (a) vertebral contour segmentation results (b) vertebral contour point set (green) and the centroid of vertebra (red) (c) the four points of the minimum bounding rectangle (yellow) (d) Contour left and right midpoints (darkred) (e) vertebra upper and lower endplate point set (purple) (f) results of linear fitting of the upper and lower edge point sets (blue) (Colour figure online)

Post-processing Key Points Segmentation Results

As shown in Fig. 5, we parse the coordinates of the corresponding key points by finding the maximum value of the confidence region generated by each point. After obtaining the upper landmarks and lower landmarks of the vertebrae, the two upper and lower vertices belonging to the same vertebrae are paired according to the left and right midpoints of the vertebrae contours.

Fig. 5.
figure 5

Post-processing process of key points segmentation results. (a) upper landmarks confidence map segmentation result (b) lower landmarks confidence map segmentation result (c) coordinates paresd out from the two confidence map (green is the upper, red is the lower) (d) the position of the left and right center points of the contour (dark red hollow points). (e) results of landmark matching (Colour figure online)

Fig. 6.
figure 6

Cobb angle measurement using landmarks. (a) the vectors A, B used to calculate the angle. (b) Calculating angles using the parallel property (c) Simulate the upper and lower end plates using the upper and lower landmarks connections

Methods of Calculating the Cobb Angle

We learned that the label of the Cobb angle used by the public AASCE Challenge dataset [16, 17] is calculated by Eq. (4):

$$angle=\mathrm{arccos}(\frac{{\varvec{A}}\cdot {\varvec{B}}}{\parallel {\varvec{A}}\parallel \cdot \parallel {\varvec{B}}\parallel })$$
(4)

As shown in Fig. 6 (a), the vectors \(A={p}_{2}-{p}_{1},B={p}_{4}-{p}_{3}\) represent the vectors of any two different vertebrae pointing from the midpoints of the two landmarks on the right to the midpoints of the two landmarks on the left.

We found that the algorithm given in the dataset uses the line connecting the midpoints of the left and right landmarks to simulate the vertebral endplates, which is different from the upper and lower endplates selected by physicians in clinical practice. To investigate the effect of using different information on the Cobb angle measurements, we propose the following three ways to calculate the Cobb angle:

  1. 1.

    Midpoint: To obtain the angle of inclination of each vertebra, we let \({k}_{a}=y2-y1/x2-x1,{k}_{b}=y4-y3/x4-x3,\alpha =\mathrm{arctan}({k}_{a}),\beta =\mathrm{arctan}({k}_{b})\), (see Fig. 6(b)), then the angle between vertebrae \(angle=\alpha +\beta \). Since the midpoint of the landmark is used, the results obtained by this method are the same as the method Cobb angle measurements given in the dataset.

  2. 2.

    Endpoint: As shown in Fig. 6(c), we follow more closely the way the clinician looks for the endplate, using the upper landmark connection of the vertebra above to simulate the upper endplate the lower landmark connection of the vertebra below to simulate the lower endplate.

  3. 3.

    Straight-line fit: In order to fully simulate the way physicians clinically mark the upper and lower endplates, we use the contour information alone to calculate the Cobb angle. We consider the straight lines fitted to the upper and lower boundaries of the contour as the upper and lower endplates of the vertebrae and calculate the Cobb angle from this.

3 Experimental Details

3.1 Dataset

The public AASCE Challenge dataset used for the experiment contained a total of 609 anterior-posterior radiographic images with labels. The dataset is divided by the provider into 481 images for training and 128 images for testing. Each image was manually labeled by a clinician with 68 landmarks in 12 thoracic and 5 lumbar vertebrae. These images are of varying sizes (\({\sim}2500\times 1000\)).

3.2 Implement

We manually labeled 17 vertebrae as our contour segmentation labels using the labeling tool labelme, and generated key point segmentation labels by Eq. (1), Eq. (2), Eq. (3). To alleviate the overfitting problem of small datasets, we expanded the dataset through rotation, mirroring, and gamma transform (see Fig. 7).

We trained the proposed MVIE-Net in a Tesla T4 GPU using the pytorch framework. We resize the image to a fixed size of \(768\times 256\) while keeping the width and height of the image constant. The network was trained 500 epochs using the SGD optimization and stopped when the verification loss was not significantly reduced.

3.3 Evaluation Metrics

We qualitatively evaluated the vertebral segmentation results using the Dice Coefficient and the Intersection over Union (IoU) metrics which are defined as Eq. (5) and Eq. (6):

Fig. 7.
figure 7

Data augmentation. (a) resized image with its corresponding contour segmentation labels and upper landmarks confidence map (b) after rotating (c) after vertical mirroring

$$Dice=2\frac{|{V}_{seg}\cap {V}_{gt}|}{|{V}_{seg}|+|{V}_{gt}|}=\frac{2TP}{FP+2TP+FN},$$
(5)
$$IoU=\frac{|{V}_{seg}\cap {V}_{gt}|}{|{V}_{seg}\cup {V}_{gt}|}=\frac{|{V}_{seg}\cap {V}_{gt}|}{|{V}_{seg}|+|{V}_{gt}|-|{V}_{seg}\cap {V}_{gt}|}=\frac{TP}{FP+TP+FN}.$$
(6)

Following the AASCE Challenge, we use Symmetric Mean Absolute Percentage Error(SMAPE) and mean absolute error (MAE) to evaluate the accuracy of the Cobb angle measurements which can be computed as Eq. (7) and Eq. (8):

$$SMAPE=\frac{1}{N}\sum\nolimits_{i=1}^{N}\frac{\sum_{j=1}^{3}|{X}_{ij}-{Y}_{ij}|}{\sum_{j=1}^{3}|{X}_{ij}+{Y}_{ij}|}\times 100\mathrm{\%},$$
(7)
$$MAE=\frac{1}{N}\sum\nolimits_{i=1}^{N}(\frac{1}{3}\sum\nolimits_{j=1}^{3}|{X}_{ij}-{Y}_{ij}|),$$
(8)

where the \({X}_{ij}\) and \({Y}_{ij}\) is the estimation of the \(j-th\) Cobb angle and corresponding ground truth for the test image \(i\). \(N\) is the number of testing images.

4 Results and Analysis

To evaluate the proposed method, we tested the segmentation results of the proposed network and explored the effect of different Cobb angle calculation methods on the Cobb angle measurement results.

4.1 Segmentation Results of MVIE-Net

We compare the segmentation results of the two tasks of MVIE-Net separately with some efficient medical image segmentation networks. Table 1 shows the qualitative results of the proposed model on vertebral segmentation. Compared with the results of U-Net and U-Net++, the proposed model obtains better vertebral segmentation results. Although the proposed model has a larger number of parameters, it handles two tasks simultaneously. When using U-Net and U-Net++ to process two tasks simultaneously, the number of parametres would be twice as large as it is now, which means that our model reduces the number of parametres by 4.71M compared to U-Net which also implements two tasks. Figure 8 shows the quantitative results for vertebral segmentation, and it can be seen from the red circles that the proposed network shows a significant improvement over U-Net and U-Net++ segmentation results, reducing the number of false segmentations that occur. Table 2 shows the qualitative results of the proposed model on keypoint segmentation. The qualitative metrics are generally low because the confidence maps generated by the key points are small relative to the images, but as can be seen in Fig. 9, the segmentation of the network can achieve the desired results, and as can be seen from the red circles, the proposed network has clearer and non-adhesive confidence map segmentation results.

Fig. 8.
figure 8

Qualitative results of vertebrae segmentation. GT refers to the ground-truth landmarks. The red circle in U-Net marks the wrong segmentation, and the red circle in U-Net++ marks the missed vertebrae. (Colour figure online)

Fig. 9.
figure 9

Qualitative results of keypoints segmentation. The 7 images from left to right are the input image, the confidence map of the upper landmark, the confidence map of the lower landmark, the confidence map of the shape center, and the visualization of the confidence map on the original image. The red circle in U-Net shows the case where two points are connected, and the red circle in U-Net++ shows the segmentation anomaly of the points. (Colour figure online)

Fig. 10.
figure 10

Comparison and differences between the Cobb angle estimations and Cobb angle ground truth.

Fig. 11.
figure 11

Visualization of the straight-line fit method for estimation of four images.

Table 1. Qualitative vertebral segmentation results
Table 2. Qualitative keypoints segmentation results
Table 3. Cobb angle calculation results of 3 methods
Table 4. Comparison with the state-of-the-art methods

4.2 Cobb Angle Measurement Results

We tested three Cobb angle calculation methods (midpoint method, endpoint method, and straight-line fit method), and the results are shown in Table 3. The Midpoint method achieved the best results because it is consistent with the Cobb angle labels in the dataset, which all pass through the vertebral The left and right midlinks simulate the endplates of the vertebrae.

In order to show the accuracy of the proposed method in Cobb angle estimation, we compared the results of Cobb angle estimation with those of other methods, and the comparison results are shown in Table 4. The proposed method obtains the best results among all curvature post-processing methods for calculating Cobb angles, with MAE metrics reaching SOTA. Although the direct regression Cobb angle method [4] reaches SOTA in SMAPE metrics, the approach focused only on the Cobb angle calculation results and was unable to obtain information on the the end vertebrae, which is equally important for the diagnosis of scoliosis. Figure 10 shows the histogram of the difference and scatter plot between the Cobb angle estimation of the proposed method and the Cobb angle groud truth.

Table 5. Cobb angle estimations for 4 scoliosis radiographs

It is important to note that we also tested our method on four scoliosis radiographs with the Cobb angle manually marked by the physician (see Fig. 11). The test results are shown in Table 5, where the straight-line fit method has the smallest SMAPE (3.88, compared to 5.51 for the midpoint method and 5.98 for the endpoint method), due to the fact that the method is more similar to the way clinicians determine the endplates. Therefore, we concluded that although the midpoint method showed better results on the public AASCE challenge dataset, the estimations of the straight-line fit method were more similar to the physician's manually labeled Cobb angle.

5 Conclusion

This paper presents a new method for automatic measurement of the Cobb angle of scoliosis, using a network to extract scoliosis information and post-processing to calculate the Cobb angle, which can obtain more comprehensive vertebral contour information for visualization of the spine than direct Cobb angle regression. First, the proposed multi-task learning network MVIE-Net can simultaneously perform vertebral contour and key points detection, and the MVIE-Net network adopts a single encoder and dual decoder structure, and the symmetric structure and jump connection between the dual decoders improve the generalization ability of the network. Then, we proposed and tested three angle calculation methods based on the definition of the Cobb angle, namely the midpoint method provided by the public AASCE Challenge dataset, as well as the extended endpoint method, and the straight-line fit method that simulates the physician's positioning of the endplate. MVIE-Net with midpoint method achieved SOTA in MAE metrics and the best SMAPE metrics among the known methods using post-processing on this dataset. The SMAPE of all three methods was lower than 6 on the physician manually labeled Cobb angle image processing, indicating that the proposed method can be used as an adjunct to the physician's clinical scoliosis Cobb angle measurement.