1 Introduction

Idiopathic scoliosis (IS) is a complex three-dimensional (3D) deformity of the spine for which there is no known cause [1]. Surgery is usually recommended if the spinal deformity is severe or progressive. Classification of the scoliotic curve pattern is very important for preoperative surgical planning to select fusion levels. Currently, the King classification [2], the Lenke classification [3], and the Peking Union Medical College (PUMC) classification [4] are the most widely used classification methods for that purpose. Qiu et al. [5] compared the reliability of these classification methods and reported that the PUMC classification had higher reliability (interobserver agreement of 84.3% with a kappa coefficient (κ) of 0.819, intraobserver agreement of 92.0% with a κ of 0.907) than the King classification (interobserver agreement of 81.2% with a κ of 0.773, intraobserver agreement of 91.5% with a κ of 0.897) or the Lenke classification (interobserver agreement of 60.5% with a κ of 0.560, intraobserver agreement of 81.8% with a κ of 0.796).

Many computerized systems have been developed to improve the accuracy, convenience, or repeatability of diagnostic classifications [6,7,8,9]. For idiopathic scoliosis, some studies have developed computer-aided methods to classify spinal deformity on radiographs. Stokes et al. [10] developed a computer program that applied a rule-based algorithm to automatically identified the King types based on the Cobb angles calculated from the coordinates of vertebral landmarks on the posteroanterior (PA) radiograph. Their approach required the manual identification of numerous landmarks (70 landmarks per radiograph). The inherent variability in landmark identification might result in unreliability of the measurement and classification. Phan et al. [11] proposed a rule-based program for the Lenke-type classification.

All these classification methods heavily rely on the Cobb angle measured from radiographs. Recently, Anitha et al. [12] proposed an automated system to detect the vertebral endplates by using a customized filter that was a combination of anisotropic, sigmoid and differential filters. Automatic identification of these endplates by their system reduced the variability of the Cobb angle measurement. Utilizing curve fitting methods to determine the spinal curvature, Sardjono et al. [13] implemented the automatic Cobb measurement with the best results, namely, a mean absolute error of 3.91°. In our previous study [14], a computer-aided approach that automatically measured vertebral inclination was developed to improve the reliability of the Cobb measurement. We achieved average intraobserver and interobserver errors of <2° and 3°, respectively. Based on the computer-aided Cobb measurement, we have developed computer-aided systems to improve the reliability of the King classification [15] and the Lenke classification [16]. With the computer-aided King system, the average intraobserver and interobserver κ values were improved from 0.75 to 0.90 and from 0.66 to 0.86, respectively [15]. With the computer-aided Lenke system, the average intraobserver and interobserver κ values were improved from 0.77 to 0.89 and from 0.65 to 0.81, respectively [16]. In the present study, we propose a computer-aided PUMC classification system using the computerized Cobb measurement to improve the reliability of the PUMC classification.

2 Methods

In this study, radiographs obtained from 65 patients (51 girls, 14 boys; 12.5 ± 3.5 years of age) with idiopathic scoliosis were used. The selection criteria were (1) diagnosis of idiopathic scoliosis, (2) age between 9 and 18 years, (3) no prior spine surgery, (4) Cobb angle <90°, and (5) visibility of the pelvis and T1–L5 vertebral levels on radiographs. The exclusion criteria were other musculoskeletal or neurological disorders. Informed consent was obtained from all patients/parents. The institutional review board of Yunnan University approved the study.

In the PUMC classification, the most important measurement is the Cobb angle. The Cobb angle can be manually measured by calculating the angle between the lines drawn along the upper endplate of the superior end-vertebra and the lower endplate of the inferior end-vertebra. Studies of intra- and interobserver variability of manual Cobb measurements have revealed that the errors are typically ±5° [17], which introduces variability into the classification method. In our previous study [14], an approach based on the fuzzy Hough transform (FHT) was developed for Cobb measurements of a spinal curve. We used this approach in this study to identify the inclination of each vertebral endplate for the automatic PUMC classification. This technique is described in the following subsection.

2.1 Measurement of vertebral endplate inclination

We normalized each radiograph that contained vertebrae from T1 to L5 to a standard height of 1000 pixels. After image enhancement by histogram equalization, the user successively selected vertebrae from T1 to L5 by clicking the mouse on each vertebra. Once the user clicked on a vertebra, an initial rectangle of 100 × 80 pixels was created and displayed. The user could adjust the rectangle to fit the vertebra by clockwise or anticlockwise rotation, magnification or minification as well as up, down, left, or right movement of the rectangle. As an example, Fig. 1 shows the selected rectangles on a PA radiograph. The user-assigned rectangle defined the region of interest (ROI) for a vertebra.

Fig. 1
figure 1

A spinal radiograph with user-assigned rectangles and central sacral vertical line

For each ROI, the Canny edge detector was used to obtain the required edge image for the FHT. Figure 2 shows an ROI (Fig. 2a) and the edge image of the ROI (Fig. 2b). It can be seen in Fig. 2b that the vertebral radiograph contains a complex background due to various types of artifacts. Therefore, it was necessary to delete the artifacts and noise. According to the user-assigned rectangle an inner rectangle and an outer rectangle were automatically defined in the algorithm, as the rectangles shown in Fig. 2a. The distance between the inner rectangle and the user-assigned rectangle was 1/6 of the width of the user-assigned rectangle, and the distance between the outer rectangle and the user-assigned rectangle was 1/8 of the width of the user-assigned rectangle. The ROI just covered the outer rectangle. The noise and artifacts inside the inner rectangle and outside the outer rectangle were then deleted from the ROI region. Figure 2c shows the edge image with the noise and artifacts deleted. The FHT was then utilized on each edge image of the ROI. In the Hough space, the vertebral shape priors were used to select the candidate peaks corresponding to the lines that best fitted to a vertebra (i.e., a pair of endplates and a pair of vertical edges): (1) the distance between two endplates of a vertebra was in the range of 30–60 pixels, and the distance between two vertical edges was in the range of 40–80 pixels; (2) the average angle of two endplates was tilted <45°, and the average angle of the vertical edges was between 45° and 90°; (3) the angle difference between two endplates or two vertical edges of a vertebra was <10°; and (4) the endplates and the vertical edges were close to being perpendicular to each other. Using these shape priors, four lines fitted to a vertebra were identified by selecting peaks in the Hough space. Figure 2d shows the detected lines for the edge image shown in Fig. 2c. More details of this technique were described by Zhang et al. [14]. The inclination of each line that was fitted to a vertebral endplate was recorded automatically. For each vertebra, the rectangle that consisted of the four detected lines, i.e., the vertebral rectangle, was also recorded.

Fig. 2
figure 2

Vertebral endplates detection: a region of interest (ROI) with the inner and outer rectangles, b edge image of the ROI, c edge image with noises deleted, d detected endplates

2.2 Computer-aided PUMC classification

The PUMC classification recognizes three main types according to the number of apexes: type I for one, type II for two, and type III for three apexes. There are a number of subtypes for each curve type, with a total of 13 subtypes [4], depending on the characteristics of 3D deformities and the flexibility of the curvature. For type I, subtype Ia has a thoracic curve; Ib has a thoracolumbar curve; and Ic has a lumbar curve. For type II, subtype IIa has double thoracic curves; IIb has a thoracic curve plus a thoracolumbar/lumbar curve, with the former being at least 10° higher than the latter; IIc has a thoracic curve plus a thoracolumbar/lumbar curve, with the curve magnitude difference being <10°; and IId has a thoracic curve plus a thoracolumbar/lumbar curve, with the former being at least 10° smaller than the latter. For type III, there are two subtypes. The distal curve of subtype IIIa must meet all 4 criteria: (1) without lumbar kyphosis, (2) a Cobb angle of the lumbar curve ≤45°, (3) rotation of the lumbar curve being <2°, and (4) flexibility of the lumbar curve ≥70%. According to the curve flexibility, rotation of the apical vertebra, Cobb angle on the convex bending radiograph, and thoracolumbar/lumbar kyphosis on the sagittal radiograph, subtypes IIb, IIc, and IId can be further divided into 2, 3, and 2 subtypes, respectively: IIb1 and IIb2 of IIb; IIc1, IIc2, and IIc3 of IIc; and IId1 and IId2 of IId.

To implement the PUMC classification, the user assigned two symmetrical landmarks on the sacrum of the PA radiograph by mouse clicks. According to the accepted definitions [18], the lateral edge of the radiograph was used to define the vertical direction. The central sacral vertical line (CSVL) was determined as the vertical line passing through the midpoint of the two sacral landmarks, as shown in Fig. 1. The apical vertebrae were then identified based on the detected endplate inclinations and vertebral positions. The apical region that was first located included the two vertebrae above and the two vertebrae below a disc separating two vertebrae tilting in opposite directions to the horizontal. The vertebra in this region that had the greatest horizontal distance from the CSVL was then identified as the apical vertebra. Based on the detected apical vertebra, the upper vertebra with the greatest inclination of the superior endplate and the lower vertebra with the greatest inclination of the inferior endplate were selected as the two end-vertebrae for a curve. The angle between the superior endplate of the upper end-vertebra and the inferior endplate of the lower end-vertebra was automatically calculated as the Cobb angle.

According to the PUMC classification criteria, the computerized classification algorithm was implemented, as shown in Fig. 3, to perform the automatic PUMC classification. After the Cobb angles on the standing radiograph were obtained, the system automatically differentiated Ia, Ib, Ic, IIa, IIb, IIc, IId, and III. To further classify the subtypes of IIb (IIb1 and IIb2), IIc (IIc1, IIc2, and IIc3), IId (IId1 and IId2), and III (IIIa and IIIb), the rotation of the apical vertebra together with the parameters of flexibility and kyphosis that were calculated from the Cobb angle in the convex bending and sagittal radiographs could be determined. After the Cobb measurement, when these parameters were input into the computer, it automatically further classified the subtypes of IIb, IIc, IId, and III.

Fig. 3
figure 3

Flowchart of the PUMC classification rule

2.3 Evaluation

In this study, two groups of observers participated in the experiments. The first group included two pediatric orthopedic surgeons with 12 and 10 years of experience in a scoliosis clinic. The second group included two orthopedic residents. With the aid of the computer, each observer performed the PUMC classification twice over a period of three weeks. Without aid from the computer, the four observers also classified spinal curve patterns twice using the traditional PUMC method [4]. The kappa statistic in SPSS software (SPSS Inc., Chicago, IL) was used to assess the reliability of the PUMC classification under the conditions with and without the aid of the computer. Under each of these two conditions, the κ values were calculated for paired sets of classifications by each observer (intraobserver repeatability) and between observers (interobserver reliability) using all combinations of paired observations. The resulting values were averaged over combinations of pairs (intraobserver or interobserver) to provide an overall assessment of reliability.

Cases that were consistently classified by the two surgeons twice without the computer’s aid were used as the ground truth. The surgeons’ first trials with and without the computer’s aid were compared to evaluate the accuracy of the proposed system.

3 Results

In the experiments, we only required the observers to differentiate cases into subtypes Ia, Ib, Ic, IIa, IIb, IIc, IId, IIIa, and IIIb because for some cases, only the standing radiographs were available. Although the proposed Cobb measurement can be applied to the convex bending and sagittal radiographs, and the functions of further classification were implemented in the system, further classification of these subtypes was not tested in the experiments. In addition, no cases were classified as type III by any of the observers.

The intraobserver reliabilities are shown in Table 1. With the computer’s aid, the κ values were in the range from 0.90 to 0.94, which were in the excellent range (≥0.90). For each observer, the κ value was improved by using the computerized method. The average intraobserver κ value and classification consistency were improved to 0.93 and 95%, respectively. Table 1 also shows that larger improvements were obtained for the resident group than for the expert group.

Table 1 Intraobserver reliability

The interobserver reliabilities are shown in Table 2. With the computer’s aid, the average interobserver κ value was improved from 0.83 to 0.90 for the expert group and from 0.77 to 0.85 for the resident group. For all observers, the average interobserver consistency was improved from 79 to 88% and the average κ value was improved from 0.75 to 0.86. The results also demonstrated that the use of the computer reduced the difference between experts and residents in the classification.

Table 2 Interobserver reliability

Without the computer’s aid, two surgeons consistently classified 53 cases in two trials. For the 53 cases, the surgeons’ first trials with and without the aid of the computer were compared. The results showed that under the two conditions, one surgeon’s classifications were consistent in all cases (100% accuracy) and the other surgeon’s classifications were consistent in 52 cases (98% accuracy). The one inconsistent case was due to inconsistent classification between types IId and IIc.

To evaluate the importance of the repeatability of the Cobb measurement and objective judgments in the classification, we used the Cobb angles manually measured by the one surgeon (with 12 years of experience) as the standard value. With the standard values of the Cobb angles, the system resulted in consistent classification in all cases. Without the computer’s aid, the two residents were asked to make classifications based on the standard Cobb angles. Two residents’ classifications were consistent in 63 cases (97% consistency). The average judgment time of residents for one case was 35 s. The results demonstrated the importance of the Cobb measurement in the classification. In addition, the system can aid in the decision. If given the parameters, the system could perform consistent classifications in real time.

4 Discussion

The reliability of spinal deformity classification is an important topic in the orthopedic community. Although the PUMC classification has higher reliability than the King or Lenke classifications [5], improving reliability in preoperative surgical planning is still necessary. The variability is mainly due to human technical and judgment errors. This article proposes a computer-aided system to reduce variability in the PUMC classification. The results of this study indicated that the tasks of measuring scoliosis curves on radiographs and subsequently classifying curve types were made more reliable with the aid of the proposed system. First, the reliability of identifying the radiographic features that provided the measurements used for classification was improved by the computer-aided Cobb measurement method. In our previous study [14], the accuracy and reliability of the proposed Cobb measurement had been demonstrated (<3°, comparable to the 3.91° that was obtained with the system from Sardjono et al. [13]). The higher repeatability of the Cobb measurement results in the higher reliability of the PUMC classification. Second, judgment errors were reduced by using a computerized classification algorithm. With the computer’s aid, the variability produced by the interpretation and memory of the classification criteria was avoided. Although the proposed method still requires user interaction, only limited user skills are required. The results show that the system can be used equally well by individuals with less clinical experience. Because the only hardware requirement was a personal computer and the software environment was Visual C++, the developed software could be easily implemented in a clinic.

Scoliosis classification is used to facilitate the objective assessment of scoliosis, i.e., to make it possible to “speak the same language” in the assessment. There is no gold standard for the classification results before surgery. Therefore, the accuracy of the system was evaluated by comparing the surgeons’ classifications under the conditions with and without the aid of the computer using cases that were consistently classified by two surgeons twice without the computer’s aid as the gold standard. The comparative results demonstrated the accuracy of the proposed method (accuracy of more than 98%).

The results of the consistent classifications under the given standard Cobb angles demonstrated the importance of the repeatability of the Cobb measurement in the classification. In the experiments, the further classification of the subtypes of IIb, IIc, and IId was not tested, which might be one of the reasons for the good performance of two residents even without the computer’s aid (97% consistency), although it took more than half a minute for judgment. Another reason might be due to the given Cobb angles which reduced the effort required in the PUMC classification. When the Cobb measurement and further classifications are required, the system would play a more important role in facilitating the classification.

In conclusion, the proposed computerized system may be of value to assist a surgeon in the PUMC classification of scoliosis. It can reduce the Cobb measurement variability and human judgment errors in the PUMC classification. The computer’s aid was especially useful for inexperienced observers.