Keywords

1 Introduction

Cobb angle is a measurement of the bending deformity of the vertebral column. It has been widely used for scoliosis treatment, including structural, lateral, rotated curvature of the spine. Large reports show that there is a sustained increment of the prevalence rate of scoliosis. Therefore, it’s likely essential to have a reliable estimation of Cobb angles.

In clinical practice, each X-ray contains seventeen vertebras. For each one of them, doctors find four landmarks and draw the outline accordingly. Then they locate the key vertebras and calculate the angles. The main obstacle is that this method is highly time-consuming and can be easily mistaken.

To address this problem, researchers have designed lots of automated frameworks. Two-stage methods like [9] simulate clinical practice. They firstly segment anatomical structures and then estimate the measurement based on segmentation. Some recent studies adopt the detector, instead of the segmentation model, to observe the key structure [5]. But these studies are limited by the selection of key vertebras and bias of different operators. Direct estimation methods [6,7,8] aim to obtain the functional connection of medical images and clinical measurement. While these studies are limited by the multi-type views (anterior-posterior and lateral X-ray images). In the case of a single view, they perform much worse due to the insufficient use of global information.

For the purpose of full use of global information, we propose a direct estimation method based on pyramidal feature aggregation. In contrast to earlier direct methods, we add an extra branch to predict spinal masks inspired by [1], which can benefit the angles’ estimation. To further enhance the feature extraction, we make a fusion of the decoder’s feature map with multiple scales. Overall, our method achieves the highest symmetric mean absolute percentage error (SMAPE) of 12.8 on the 125 scans from the challenge.

Fig. 1.
figure 1

Overall of our framework. The network learns multi-scale features from single view (AP) via pyramidal feature aggregation and multi-task learning.

2 Methods

2.1 System Framework

As is shown in Fig. 1, the network is based on the encoder-decoder structure [3]. The last feature map of the encoder is transformed into a vector via global average pooling (GAP), and the biggest feature map of the decoder outputs the mask via activation function (Softmax). In an attempt to make full use of a decoder, we add GAP into each stage of the decoder except the last one. For the different lengths, the feature vectors are simply concatenated and output the three angles through the dense layer.

For network training, our loss function includes mask loss and angle loss which can be defined as:

$$\begin{aligned} loss = loss_{mask} + loss_{angle} \end{aligned}$$
(1)

Dice similarity coefficient (DSC) is a metric function to measure the degree of similarity and always used in the medical image segmentation, so we calculate the mask loss by DSC. As for the Cobb angle, we choose SMAPE, which is usually used to evaluate the angles’ estimation.

$$\begin{aligned} loss_{mask} = 1-\frac{2\left| X \cap Y \right| }{\left| X\right| +\left| Y\right| } \end{aligned}$$
(2)
$$\begin{aligned} loss_{angle} = 1-\frac{1}{N}\sum _{N}\frac{SUM\left[ \left| A-B\right| \right] }{SUM\left[ A+B\right] }\times 100\% \end{aligned}$$
(3)

Here A and X are ground truth, B and Y are predicted result, N is the number of test data.

2.2 Image Pre-processing

The direct estimation method hasn’t taken it into consideration that different ratio (rate of height and width) will obtain different Cobb angles (see Fig. 2). There is a certain problem while resizing all the images into the same size. For instance, if the original ratio is smaller than the model’s input ratio, Cobb angles will be smaller in the result. Hence we keep the ratio while resizing and use padding to fit the input shape. What’s more, we augment the training data by randomly shifting resized images.

Fig. 2.
figure 2

Purpose of the pre-processing method. There is some error while changing original image into different ratio, which may lead to lose of information.

3 Experiments

3.1 Datasets and Implementation Details

Datasets are collected from the challenge (Accurate Automated Spinal Curvature Estimation, MICCAI 2019), and it’s composed of three parts, among which only train and validation sets’ annotations are provided. As a result, we take the experiments only on the two sets. We count the three angles’ value and draw the histogram as Fig. 3. From top to bottom, we number the Cobb angles. It’s apparent that the angle above is more likely to be bigger than the angle below. From the graph, we can also see that most angles are small.

Fig. 3.
figure 3

Count of the three angles’ value. From top to bottom we number the angles as ANGLE 1–3. Blue box represents the histogram and red line is the univariate or bivariate kernel density estimation. The x-axis means angle’s value and we set the hist bins as 25. (Colr figure online)

Although samples in the training set may come from the same patient, considered that there is still bias due to interval of operation, we randomly select \(10\%\) samples from the public train set as the validation and take public validation set as our test set. For all experiments, we use the same optimizer (Adam) and the learning rate (\(1e-4\)). By initialized with 100 epochs, the training process is terminated by a strategy called Earlystop.

3.2 Results

Through the same setting described above, we have taken a series of experiments to make the discussion. First of all, we talk about the impact of different encoder: Vgg11 [4], ResNet-50 [2]. Thanks to the single channel of X-ray images, weights pre-trained on ImageNet cannot be transferred into this task. Thus we train both networks from scratch and haven’t adopted deeper encoder. As the second and third rows show, there is a significant difference between the two conditions. Vgg11 outperforms so that we take it as our encoder while adding PFA and a new branch. The last two rows suggest that estimation is more precise by using our method (Table 1).

Table 1. Experimental results for Cobb angles’ estimation.

In detail, correlation coefficients between the three different predicted angles and ground truth are given as Fig. 4. The figure demonstrates that the estimation of the first angle is much better than the rest, and the second one is better than the third one. It has the same variation tendency as angle’s range that the smaller angles have worse performance (see Fig. 3).

Fig. 4.
figure 4

The correlation coefficients between three angles predicted by the proposed method and ground truth. The angles are numbered from top to bottom and our method has much better performance on the first angle.

4 Conclusion

In this paper, we have proposed a Multi-Task network with pyramid feature aggregation to estimate Cobb angles automatically. Inspired by Multi-Task learning, we add a new branch for segmentation to enhance the feature extraction. Typical aggregation of pyramid features is used to catch features at different levels. The two strategies both provide a more precise estimation of Cobb angles, and our method finally achieves high performance of SMAPE (12.97).

However, imbalanced estimation exists that our method estimates the top angle much better than the rest. There are several possible explanations for this result. Since big Cobb angles have a distinct appearance, the model can easily achieve excellent performance. On contrast, estimation of a straight spine with small Cobb angles has obvious relative fluctuation.