Introduction

Spinal balance is critical for physiologic function and low energy expenditure [1]. Sagittal cervical alignment is one of the most important parameters in management of cervical spine disorders [2]. It is thus crucial to use a reliable and reproducible measurement method; one that allows to properly assess the course of the disease and the results of treatment [2]. Cervical lordosis (CL) is the most commonly used cervical parameter by surgeons and researchers [3]. Three distinct cervical lordosis assessment methods have been described [4]. The Cobb method using lines perpendicular to C2 and C7 vertebral distal end plate lines was primarily described to evaluate scoliotic curves [46]. In some cases it is also considered as Cobb angle between C1 and C7 vertebrae [4]. The Harrison posterior tangent method calculates sum of segmental angles measured between lines parallel to the posterior surface of each cervical vertebral bodies from C2 to C7 for an overall cervical curvature angle [4, 7]. In the Jackson method the angle between lines parallel to the posterior surface of the C7 and C2 vertebral bodies is measured [4, 8]. This method was also used by Gore et al. and is often known as a Gore method [9].

Although Harrison et al. suggested that the Harrison method may provide the best measurement of CL [7], the Cobb method is still the most widely used [4]. Reliability of these three methods was evaluated using lateral cervical radiographs and an attempt to establish the best method of measurement was previously made [2, 3, 7]. However, whole-spine lateral radiographs are a key image used in evaluation of global spinal sagittal alignment [10]. According to Park et al. there are possible difference in radiological parameter measurements between lateral cervical radiographs and whole-spine lateral radiograph [10].

The aim of this study was to evaluate agreement between the three methods of CL measurements, as well as their reliability and reproducibility on standing long-cassette lateral radiographs of the spine.

Methods

After obtaining Institutional Review Board approval, a database of standard standing digital long-cassette lateral radiographs of the whole spine taken between June 2009 and June 2014 was retrospectively reviewed. Forty-four standing lateral radiographs were randomly chosen from the radiograph database.

Similar radiologic protocol was used during the entire study period. Lateral radiographs were obtained with each subject standing in natural position, with horizontal gaze (patients looked at the point on the wall at the sight level 2 m in front of them), shoulders flexed 30°–45° and the elbows slightly flexed with the hands resting on a support. Hips and knees were in full extension. The radiographs covered the pelvis with the hips, the whole spine and the cranium to the level of the external auditory meatus and the lower margin of the orbit.

On each of the lateral radiographs the angle of CL was measured using the following methods:

  1. 1.

    Cobb method C2–C7 (CM)—the angle between the inferior end plate of C2 and the inferior end plate of C7 [46] (Fig. 1a).

    Fig. 1
    figure 1

    Three methods of cervical lordosis measurements: a Cobb method C2–C7 (CM); b C2–C7 posterior tangent method (PTM); c sum of posterior tangents method (SPTM) for segments C2–C3, C3–C4, C5–C6, C6–C7

  2. 2.

    C2–C7 posterior tangent method (PTM)—described as the Jackson method or the Gore angle [4, 8, 9]—the angle between the line sustained by posterior margin of C2 vertebral body and the posterior margin of C7 vertebral body (Fig. 1b).

  3. 3.

    Sum of posterior tangents method (SPTM) for segments C2–C3, C3–C4, C5–C6, C6–C7—described as the Harrison method [4, 7]—sum of the angles measured with the sagittal tangent method at five levels: C2–C3, C3–C4, C4–C5, C5–C6, C6–C7 (Fig. 1c).

Lordotic CL angles were presented as positive values, and kyphotic CL angles were presented as negative values.

All of the radiographs were downloaded from the Centricity PACS system (General Electric Medical Systems, Centricity PACS Radiology RA1000 Workstation; General Electricts Helathcare, Barrington, IL) as bitmap images and analyzed quantitatively with Surgimap Spine Software (Surgimap, New York, USA).

Evaluation of the intraobserver reproducibility of the three methods of CL measurements

The measurements were performed on 44 radiographs by one researcher (orthopedic spine surgeon with 5 years of experience) 3 times at 4-week intervals. The order of the radiographs in the second and third series of measurements was different and random. The intraobserver reproducibility was tested and quantified by intraclass correlation coefficient (ICC) and median error for a single measurement (SEM) [11].

Evaluation of the interobserver reliability of three methods of CL measurements

Three independent researchers (orthopedic spine surgeons with 10, 6 and 5 years of experience) measured CL on the same 44 radiographs once with each of three methods tested. The interobserver reliability was tested and quantified by intraclass correlation coefficient (ICC) and median error for a single measurement (SEM) [11].

Evaluation of agreement between the three methods of CL measurements

The evaluation of agreement between the three methods of CL assessment was based on the measurements performed by one randomly chosen researcher (orthopaedic spine surgeon with 6 years of experience) on 44 radiographs.

Agreement between the methods was quantified by the intraclass correlation coefficient (ICC) and the median error for a single measurement (SEM) [11].

Statistical analysis

The data were analyzed using the JMP 10.0.2 (SAS Institute Inc, Cary, NC) statistical software and in Microsoft Office Excel 2007 (Microsoft, Redmond, WA). The ICC value of less than 0.40 indicated poor agreement, 0.40–0.75 indicated fair to good agreement, and values greater than 0.75 reflected excellent agreement [12]. To estimate the sample size needed to test the agreement between the three methods evaluated, as well as the intraobserver reproducibility and interobserver reliability for all of the methods we treated the ICC value greater than 0.7 (with its 95 % confidence interval of 0.55–0.85) as having an acceptable reproducibility for a research tool [13]. The minimum number of subjects to test the agreement, intraobserver reproducibility and interobserver reliability in or setting was 44 [14]. Randomizations were performed by use of RAND function in Microsoft Office Excel 2007.

For each parameter the mean values, standard deviation, and the values range were established. Normal distribution of data was analyzed with the Shapiro–Wilk test. The results were compared with repeated ANOVA test with Bonferroni correction, with p < 0.05 considered as significant.

Results

Among the evaluated patients there were 13 males and 31 females, with a mean age of 15.8 ± 3.7 years.

Evaluation of the intraobserver reproducibility of the three methods of CL measurements

Intraobserver reliability was excellent for all of the methods tested with ICC = 0.96 and SEM = 2.06° for CM, ICC = 0.96 and SEM = 1.99° for PTM, and ICC = 0.96 and SEM = 1.98° for SPTM, Table 1.

Table 1 Reliability of segmental cervical lordosis measurements according to Harrison method

Evaluation of the intraobserver reproducibility of the three methods of CL measurements

Intraobserver reliability was excellent for all of the methods tested with ICC = 0.92 and SEM = 2.71° for CM, ICC = 0.94 and SEM = 2.62° for PTM, and ICC = 0.93 and SEM = 2.78° for SPTM, Table 1.

The intraobserver reliability of segmental CL measured with SPTM was excellent at all levels from C2 to C7, with the lowest value at C5–C4 level, Table 2. The interobserver reliability of segmental CL measured with SPTM was excellent at all levels from C2 to C7, with the lowest value at C5–C4 level, Table 2.

Table 2 Reliability of segmental cervical lordosis measurements according to Harrison method

Evaluation of agreement between the three methods of CL measurements

The overall agreement between three methods tested was excellent with ICC = 0.89 and SEM = 3.44°. In pairs comparison revealed excellent agreement for all of the methods with ICC ≥ 0.80 and SEM ≤ 4.80°, Table 3.

Table 3 Agreement between three methods of cervical lordosis measurements

Mean values CL for a Cobb method, tangent method, tangent sum method were 10.5° ± 13.9°, 17.5° ± 15.6° and 17.7° ± 15.9°, respectively. The values of CL of each patients measured with three methods are presented in Fig. 2.

Fig. 2
figure 2

The values of CL of each patients measured with three methods are presented

The difference between three methods was statistically significant (p < 0.0001, F 48.43). The pair comparison with Boferroni correction revealed significant difference was between Cobb method versus tangent method (p < 0.0001) and Cobb method versus tangent sum method (p < 0.0001), but not between tangent method versus tangent sum method (p > 0.05).

Discussion

We present a comparison of three methods of CL measurements on standing long-cassette radiographs of the spine. Such an analysis has never been previously published.

Long-cassette lateral radiographs are important in global sagittal balance assessment [10]. On such radiographs cervical alignment can be measured and the relationship between CL and other spine segments can be established. Evaluating CL on long-cassette lateral radiographs may avoid additional radiation exposition associated with obtaining dedicated cervical radiographs. This is important in every patient, however, especially in children and the adolescent population [15].

Discrepancies between lateral cervical radiographs and long-cassette whole-spine radiographs in spinal parameters were reported. Park et al. described significant difference between CL values on lateral cervical radiographs and long-cassette whole-spine radiographs in the same individuals [10]. Body positions, arm placement, and focus distance are usually different between plain cervical and long-cassette, whole-spine radiographs [10, 16]. Taking into consideration the previously mentioned reports and the fact that previous studies concerning the reliability of CL measurement methods were based on lateral cervical radiographs, the evaluation of the measurements reliability for long-cassette whole-spine radiographs was needed. The three evaluated methods of measuring CL, namely the Cobb (CM), the Jackson (PTM) and the Harrison (SPTM) method proved to be reliable, which is in line with data presented for lateral cervical radiographs [3, 7]. In neither the intraobserver agreement nor in interobserver agreement evaluation have we found a predominance of any of the evaluated methods. In segmental analysis, all measurements in intra and interobserver evaluation showed excellent agreement with slightly lower ICC values at C5–C4, C4–C3 and C3–C2 levels. This partially stays in line with Harrison’s et al. paper who reported lower reliability for C3–C2, C4–C3 and C7–C6 levels [7].

We initially expected, that the SPTM method could have a slightly lower ICC due to the number of calculated segments (separate measures), which were summed and with a possibility of error at each level. Despite this, the results were within an excellent agreement interval.

Park et al. performed interclass correlation coefficient calculation for Cobb and Gore (PTM) measurements method performed by two researchers on cervical lateral radiographs and whole-spine lateral radiographs with both demonstrating excellent ICC [10]. In interobserver evaluation, Cote et al. reported that the Cobb method had an ICC of 0.96 [17]. There are two other studies describing CL measurements on plain lateral radiographs performed by Ohara et al. [3] and Silbert et al. [2]. In both studies, the conclusions are in line with ours, however, due to differences in statistical method used to evaluate their results (Pearson correlation coefficient), a direct comparison of results is not possible.

When evaluating studies focusing on clinical results and not the measurement methodology, the ICC may be lower, than presented in studies describing measuring methods [7, 10]. Park et al. performed cervical lordosis measurement according to Cobb method in different age groups on full length spine radiographs [18]. The ICC for the Cobb angle was 0.777 for the intraobserver reliability and 0.672 for the interobserver reliability [18].

The complexity of the shape of the cervical vertebrae, curvature of surface of the vertebral end plates and presence of uncinate processes can be confusing at radiographs when 3D structure is presented as a two dimensional picture and all the structures overlap each other [19, 20]. In this situation, radiological image of the posterior surface of the vertebrae seems to be more clear and less affected by overlapping structures. Thus we expected that the Cobb method may be less replicable than the posterior tangents methods. However, this has not been reflected by the results of our study. What is more, contrary to studies concerning cervical radiographs, ICC for the Cobb method and for other methods were similar, when in other studies ICC for Cobb method was slightly lower than for Jackson method [7, 10].

Currently, there is not a standard cervical alignment assessment method. Each method has proponents and opponents and all methods can be found in published data [2, 3, 7, 10]. It is important to know if the results can be compared in a reliable manner or used interchangeably without significant bias. Thus we performed agreement calculation between evaluated methods.

When evaluating results, it is important to not only focus on the analysis of the ICC value but also graph analysis and SEM evaluation. Considering only ICC value can result in an improper conclusion being reached that all methods are in excellent agreement and can be used interchangeably. However, when we evaluate SEM perception of these result can be different. Since agreement between SPTM and PTM SEM is low (1.10°), we could incorrectly assume that this value would not have an important clinical effect. However, in agreement analysis between the Cobb methods and both tangent methods, SEM is four times larger. Such a high value of SEM, especially in relation to mean CL suggests that we should not consider using Cobb method interchangeably with both tangent methods, because it could be a source of substantial error.

Harrison et al. suggested that Cobb method underestimate CL [7]. In published data CL calculated with Cobb angle is lower than in posterior tangents methods in the same patients [3, 7, 10]. In our study the mean Cobb angle was approximately 7° lower than in posterior tangent methods, when the difference between both tangents methods was 0.2°. Thus we wanted to assess if CL values achieved with different methods differ significantly. Our concerns were confirmed in repeated ANOVA calculations.

One of the limitations of this study could be that we used radiographs without dividing them into subgroups according to age or disease. However, the authors’ idea in this study was to assess random radiograms typically used in everyday practice, regardless to listed factors. Further studied are needed within specialized subgroups. Another limitation is the fact, that the ideal method of measurement requires very distinct vertebral contours, however, in lateral long-cassette radiograms this is not always the case. What is more, often in lateral long-cassette radiograms the vertebral borders are more blurred than in lateral cervical radiographs. Actually, the idea of this study was based on possible difference between these two types of radiograms and the possible consequence in measurement results.

To our knowledge this is the first study comparing CL measurements methods on standing long-cassette whole-spine radiograms. The strong side of this study is the method of analysis based on interclass correlation coefficient, median error for single measurement and Bland and Altman idea of comparison between measuring methods [21, 22]. The fact that measurements were performed with widely used free software (Surgimap Spine) can be an additional advantage for surgeons and researchers. Analysis of published results without taking into consideration the measurement method might lead not only to scientific bias but also therapeutic miscalculation.

Conclusions

All three methods appeared to be highly reliable. Although, high agreement between all measurement methods was shown, we do not recommend using Cobb measurement method interchangeably with PTM or SPTM within a single study as this could lead to error, whereas, such a comparison between tangent methods can be considered.