Introduction

Patella alta is a well-known anatomic risk factor for recurrent patellar instability which may contribute to patellofemoral pain [12, 15, 34]. In patients with patellar instability or patellofemoral pain, a measurement of patellar height should be included in the workup for treatment because surgical correction of patellar height might be indicated. To measure patellar height, several measurement methods and imaging modalities are in use. Measurement methods include the Insall–Salvati ratio (IS) [21], Blackburne–Peel ratio (BP) [7], Caton–Deschamps ratio (CD) [10], modified Insall–Salvati ratio (MIS) [19] and patellotrochlear index (PTI) [5] (Fig. 1). Unfortunately, there is still no consensus in literature on measurement method or cut-off value as shown by Biedert et al. [6] in their recent review of patellar height measurement methods.

Fig. 1
figure 1

Patellar height measurement methods. a Insall–Salvati ratio: ratio of the length of the patellar tendon (measured from the distal pole of the patella to the tibial tuberosity) (A) to the maximum length of the patella (measured from the distal pole to the proximal pole of the patella) (B). b Blackburne–Peel ratio: Ratio of the height of the distal pole of the patellar articular surface above a tibial plateau line (A) to the articular surface length of the patella (B). c Caton–Deschamps ratio: ratio of the distance between the anterosuperior point of the tibial plateau and the distal pole of the patellar articular surface (A) to the articular surface length of the patella (B). d Modified Insall–Salvati ratio: ratio of the distance between the distal pole of the patellar articular surface and the tibial tuberosity (A) to the articular surface length of the patella (B). e Patellotrochlear index: overlap percentage of the trochlear cartilage (measured from the superior most aspect of trochlear cartilage with respect to the inferior most aspect of the articular patellar cartilage using a right angle and parallel lines) (A) and the articular cartilage of the patella (B)

Widely used imaging modalities are conventional radiography (CR), computed tomography (CT) and magnetic resonance imaging (MRI). Recently, however, Giovagnorio et al. [17] proposed ultrasound as a good imaging modality to measure patellar height, reducing the need for other imaging techniques. IS, BP, CD and MIS were originally designed for measurement on CR. These methods could also be applied to CT or MRI images, in which case different normal values might be used. Lee et al. [23] present normal values for IS and BP for different imaging modalities. These have not been described for MIS and CD to our knowledge. Apart from the confined amount of literature describing normal values for CT and MRI, there is also a lack of descriptions of standardized measurements of patellar height on these imaging modalities. The measurement technique differs from measurement on CR, among other things because of the presence of slices on CT and MRI. Shabshin et al. [32] describe a method to standardize the choice for a slice to measure patellar length and patellar tendon length. It is not known if this is a reliable method between observers or if it is applicable to all measurement methods, and it is time consuming to apply this in daily practice. Barnett et al. [3] describe good inter- and intra-observer reliability for IS, BP, CD and PTI on MRI. However, their choice of slice on MRI is based on the PTI, which was specifically designed for the measurement of patellar height on MRI using the femur as a reference point. This is in contrast to IS, MIS, CD and BP where the tibia is used as reference. Ali et al. [1] and Barnett et al. [3] describe that the PTI does not correlate well with other patellar height measurements. Therefore, this might not be a good way to choose a slice on which to perform other patellar height measurements.

The aim of this study was to determine the intra- and interrater reliability for different patellar height measurement methods (IS, BP, CD, MIS originally designed for CR) on CR, CT and MRI. This includes the intra- and interrater reliability for the PTI on MRI. It was hypothesized that there might be significant variability in measurement results between measurements and between imaging modalities.

Materials and methods

All patients over 18 years of age who were treated in our hospital for patellar instability between May 2015 and April 2017 who had pre-operative CR, CT and MRI imaging of the knee were included in this study. This resulted in 48 patients. Forty-six patients had a pre-operative CR, CT scan and MRI of one knee and two patients had all imaging done on both knees pre-operatively. Eight patients had a patella alta on account of all measurements on radiographs, 19 patients had normal patella height and 21 patients had patella height that varied between the four measurement methods using radiography.

Measurements

Five different methods for measuring patellar height were used: the Insall–Salvati ratio (IS) [21], the Blackburne–Peel ratio (BP) [7], the Caton–Deschamps ratio (CD) [10], the modified Insall–Salvati ratio (MIS) [19] and the patellotrochlear index (PTI) [5] (Fig. 1). Measurements were performed as follows:

Insall–Salvati ratio:

Ratio of the length of the patellar tendon (measured from the distal pole of the patella to the tibial tuberosity) to the maximum length of the patella (measured from the distal pole to the proximal pole of the patella).

Blackburne–Peel ratio:

Ratio of the height of the distal pole of the patellar articular surface above a tibial plateau line to the articular surface length of the patella.

Caton–Deschamps ratio:

Ratio of the distance between the anterosuperior point of the tibial plateau and the distal pole of the patellar articular surface to the articular surface length of the patella.

Modified Insall–Salvati ratio:

Ratio of the distance between the distal pole of the patellar articular surface and the tibial tuberosity to the articular surface length of the patella.

Patellotrochlear index:

Overlap percentage of the trochlear cartilage (measured from the superior most aspect of trochlear cartilage with respect to the inferior most aspect of the articular patellar cartilage using a right angle and parallel lines) and the articular cartilage of the patella.

IS, BP, CD and MIS were measured on conventional radiographs, CT and MRI. The PTI was measured on MRI as it was specifically developed for this imaging modality.

All measurements and ICC were reported with two decimals. Bland–Altman was reported using three decimals.

Observers

Four observers with different medical experience performed all of the measurements mentioned above on radiograph, CT and MRI in two cycles with a minimum of 4 weeks apart. The four observers included an orthopaedic surgeon, a radiologist, an orthopaedic resident and a medical student.

IRB approval was received from the review board of Canisius Wilhelmina Hospital, Nijmegen, The Netherlands, ID number: 039-2017.

Statistical analysis

For each measurement (IS, BP, CD and MIS on CR, CT and MRI), the intra- and interrater reliability was determined by calculating the intra-class correlation coefficient (ICC). To compare results of the IS, BP, CD and MIS measurements between CR, CT and MRI (inter-method reliability), an ICC was also calculated. The ICC estimates the average correlation among pairs of data and gives a value between 0 and 1.

Scores were interpreted as follows: a score of 0–0.50 indicating poor reliability, 0.50–0.75 indicating moderate reliability, a score of 0.75–0.90 indicating good reliability, and a score higher than 0.90 indicating excellent reliability [29].

The ICC is a qualitative measure of reproducibility. To further quantify the reliability of the measurements, the Bland–Altman analysis was used to assess agreement [2]. It evaluates the mean difference in measurements and a range of agreement within which 95% of the differences between one measurement and the other are included. A Bland–Altman analysis was performed for measurements with an ICC ≥ 0.70. A mean difference < 0.20 on the different measurement methods was deemed acceptable for clinical use.

SPSS software (version 24.0, SPSS Inc., Chicago, IL, USA) was used to calculate the intra-class correlation coefficients and Microsoft Office Excel (version 14.0, Microsoft Corp., Redmond, WA, USA) was used for all Bland–Altman analyses calculating mean differences and limits of agreement.

We included all patients who had all three imaging modalities available in the selected time frame. In a review article, Bujang et al. [9] provide a guide to determine the minimum sample size required for estimating the desired effect size of ICC. According to this guide, the minimum sample size requirement for our study is 25 subjects when alpha is pre-specified to be 0.05, power to be 0.90, an acceptable ICC of 0.70 and an expected ICC of 0.9.

Results

Of the 48 patients, 11 were male and 37 were female. Median age was 22 (18–51) years. The radiographs and scans were of variable quality because some of these patients were referrals from other hospitals where imaging had already been performed. However, none of the scans needed to be rejected due to poor quality.

Table 1 shows the minimum and maximum ICC values for intra-observer reliability of each measurement.

Table 1 Intra-observer reliability: ICC min/max values (CI 95%)

The IS was the only measurement that had a moderate to good or excellent intra-observer reliability on CR (0.72–0.91), CT (0.78–0.83) and MRI (0.70–0.85). The intra-observer reliability of the PTI for MRI was good to excellent for all observers (0.81–0.91). The lowest intra-observer reliability was seen with the other MRI measurements (BP 0.42–0.73, CD 0.26–0.84, MIS 0.18–0.76). Of all observers, the radiologist scored the best overall intra-observer reliability. The overall ICC’s of the medical student, however, were clearly lower than those of the other three observers. To include these measurements in the calculation of inter-observer reliability would significantly alter and cloud the final inter-observer reliability results. For this reason, and because in every day practice these measurements are not done by students, the results of the medical student were excluded from calculating the ICCs for inter-observer reliability.

Table 2 shows the ICCs for inter-observer reliability of each measurement. The inter-observer reliability was good for IS measurements on CR (0.80), CT (0.75) and MRI (0.78). The overall ICC was moderate to good for CR and CT. The PTI showed good inter-observer reliability (0.80); however, ICC for BP, CD and MIS was poor on MRI (0.09, 0.41 and 0.27 resp.).

Table 2 Inter-observer reliability: ICC (CI 95%)

The ICC for inter-method reliability was calculated using measurements of the radiologist because those had the highest intra-observer reliability. However, when using the measurements of the orthopaedic surgeon or resident the results were similar. The IS method showed a moderate to good ICC for comparison of all three modalities with the best agreement between radiograph and MRI. The MIS method showed a poor agreement between CR, CT and MRI (Table 3).

Table 3 Inter-method reliability: ICC (CI 95%)

Cut-off points were used to classify the ICCs for intra- and inter-observer and inter-method reliability, but an ICC remains a qualitative measure. Regarding inter-observer and inter-method reliability, it was hypothesized that an ICC of 0.70 could also be high enough to ensure good reliability. To quantify this, a Bland–Altman analysis was performed for all measurements with an ICC of 0.70 or higher. Results are shown in Table 4.

Table 4 Degree of agreement between observers according to Bland–Altman analysis: mean differences (limits of agreement)

The results of Bland–Altman analyses for inter-method reliability of the IS measurement are not displayed in a table but were 0.084 (± 0.182) for CR versus CT, 0.059 (± 0.122) for CR versus MRI and 0.101 (± 0.209) for CT versus MRI.

Discussion

The most important finding of the present study was that the inter- and intra-rater reliability was good for the Insall–Salvati (IS) ratio on all imaging modalities and for the patellotrochlear index (PTI).

Smith et al. [33] researched intra-observer reliability for different patellar height measurements on CR and found the reliability of CD to be better than BP and IS. This is in contrast to the current study where the IS method had the best intra-observer reliability compared to CD, BP and MIS. This was not only the case for CR but for CT and MRI as well. Smith et al. [33] also propose that the intra-observer reliability of a measurement method may be related to experience, which is what this study showed as well. The ICCs for intra-observer reliability of the different measurements conducted by the medical student were generally lower than those of the orthopaedic resident, the orthopaedic surgeon and the radiologist.

Barnett et al. [3] in their study of different patellar height measurements on MRI found a good intra-observer reliability for IS, BP, CD and PTI on MRI. The current study also showed good intra-observer reliability for IS and PTI on MRI; however, BP, CD and MIS showed poor reliability. Different observers may choose a different sagittal slice on different occasions which leads to a decreased intra- and inter-observer reliability. Most authors report the use of the mid-sagittal slice to perform measurements [3, 23, 24], but in patellar instability patients this is rarely the slice with the maximal length of the patellar bone, tendon or cartilage. Due to the fact that the patella is often lateralized, this will lead to different interpretations of which slice is the most accurate to perform measurements, when for example on one slice the cartilage is thickest and on another the patellar length is highest. So differences in the interpretation of cartilage thickness, patellar length and 3D configuration might give rise to different measurement results. Having more experience with these measurements will increase uniformity of the observer and as a consequence increase the intra-observer reliability. The results of the medical student, later excluded from the results, confirm that experience is needed for adequate measurements; these measurements cannot be done reliably by unschooled personnel.

With regard to the inter-observer reliability, both Van Duijvenbode et al. [13] and Gracitelli et al. [18] found the IS to have the best agreement on CR compared to CD, BP and MIS, which is what this study showed as well. Although Van Duijvenbode et al. [13] advise to use the MIS rather than IS because of better validity. Chareancholvanich et al. [11] described ICC values for inter-observer reliability similar to this study, with the IS method being the most reproducible. Kar et al. [22] even found that a clinical measurement of the Insall–Salvati index was not statistically significant different from a radiological IS measurement. Lee et al. [23] found excellent ICCs not only for IS but also BP on all imaging modalities. However, both Seil et al. [31] and Berg et al. [4] found the BP ratio to be the most reliable on CR. Alternatively, Philips et al. [27] in their extensive review of patellar height measurements rule the BP and CD to be the most reliable radiographic techniques.

Barnett et al. [3] describe good inter-observer reliability for IS, BP, CD and PTI on MRI and Munch et al. [25] found high inter-observer reliability for BP, CD and MIS on MRI (0.78–0.87). In the current study, however, only IS and PTI showed good inter-observer reliability on MRI. These observations show that there are quite a few discrepancies in the literature about the reliability of measuring patellar height. IS and BP seem to alternate as most reliable; however, in this study IS was undoubtedly better based on the ICC.

Biedert et al. [5] also found the inter-observer correlation of the PTI to be high and significant. This is supported by Ali et al. [1] and Barnett et al. [3]; however, they found that the PTI did not correlate well with other patellar height measurements on MRI. Munch et al. [25] found a good correlation between PTI and CD, and between PTI and BP. The current study was not designed to look at the correlation between different measurement methods on MRI, but it seems literature is divided on this matter.

The second aim of this study was to investigate the applicability of established normal values of patellar height measurements for CR on to CT and MRI. The IS method showed a moderate to good reliability for comparison of all three modalities with the best agreement being between radiography and MRI. The other patellar height measurements showed only poor to moderate agreement between CR, CT and MRI. One explanation for this could be that the IS is a measurement method with only bony references. Therefore, this might be more consistently measured on different imaging modalities, in contrast to cartilage which is not visible on either CR or CT.

If a correlation between methods is found, it does not necessarily mean that measurement methods agree in outcome. The Bland–Altman analysis evaluates if there is a bias between mean differences between measurement methods and estimates a limit of agreement [16]. When looking at the mean differences between the radiologist and the orthopaedic surgeon, the largest mean difference is 0.079, for BP on CR, which is within the acceptable range (< 0.2). However, the limits of agreement are > 0.2 for all measurements except IS on MRI, but they all include 0. In conclusion, the mean differences between experienced observers are acceptable for clinical use (< 0.2), but one should be aware of a possible disagreement when interpreting IS, BP, CD, MIS on CT or MRI.

Lee et al. [23] described a good correlation between CT and MRI for the IS method. The current study showed good correlation and acceptable limits of agreement for MRI but unacceptable limits of agreement for CT, although this was still better than the BP, CD and MIS methods. Based on these findings, it would be unwise to adopt normal values for radiographs on to CT but they could be used for MRI. If one wishes to use CD, BP and MIS methods, both conventional radiographs and a scan have to be ordered to fully assess the patellofemoral anatomic morphology.

Both Lee et al. [23] and Miller et al. [24] describe an adjustment on established normal values for radiographs to create new normal values that can be used for CT and MRI. A new set of normal values could not be calculated in this study because instead of a normal population the current study group consists only of patients with patellofemoral symptoms and pathological patellofemoral anatomy.

Over the years, numerous methods of measuring patellar height have been developed. Subsequently there is a fair amount of literature available comparing the various measurement methods and testing the reliability of these methods [6]. With easier access to CT and MRI nowadays, new methods are being developed specifically for MRI as well as options for transferring normal values for x-ray onto CT or MRI are being explored.

The measurement of patellar height is widely researched in literature but with varying outcomes. However, in the current study all the above-mentioned factors have been combined and everything was tested under the same circumstances, making the outcomes more reliable. Only Lee et al. [23] so far have compared patellar height methods between X-ray, CT and MRI, but they only tested the Blackburne–Peel and Insall–Salvati indices. Yue et al. [35] in their recent study did compare the Insall–Salvati, modified IS, Caton–Deschamps and Blackburne–Peel indices but used only radiograph and MRI as imaging modalities.

Some promising patellar height measurements that were not included in this study are the patella-plateau angle [28]. Its reliability and reproducibility in patients with patellofemoral instability have been published recently [8]. Previously, it was only used in osteoarthritic patients [14] and in patients with a total knee arthroplasty [30].

Also, Nizić et al. [26] propose a new reference line for diagnosing patella alta that is simple, accurate and reproducible, with a 100% binary intra- and inter-observer agreement. Hanada et al. [20] found the Modified Blumensaat line to be a valid and applicable patellar height measurement with a knee flexion angle of 30°–40° on conventional X-ray. They state that a patellar height measurement that utilizes a femoral reference point is better than a tibial-based method when patella alta is suspected. According to them, a patellar height measurement that utilizes a femoral reference point would be more suitable when patellofemoral joint pathology due to patellar height is considered.

A possible limitation of this study could be that a lot of the conventional radiographs were not perfect lateral views. Also, the scans were of variable quality because they came from a variety of hospitals. This could result in imperfect measurements; however, it was decided not to change this because it is what best represents daily practice in most hospitals.

With literature being divided on the best way to measure patellar height, clinicians are using many different methods. As for any clinical measurement, to accurately communicate between clinicians, define indications for surgery or do clinical research, it is important to have a gold standard. With most clinicians having accessibility to CT and/or MRI nowadays, it is important for that gold standard measurement method to be applicable to multiple imaging modalities.

Conclusion

In this study, the Insall–Salvati ratio shows better intra- and inter-observer reliability than the Blackburne–Peel ratio, the Caton–Deschamps ratio and the modified Insall–Salvati ratio on all imaging modalities. Radiography and CT seem to have better reliability than MRI. The patellotrochlear index, however, shows good inter- and intra-observer reliability on MRI.

Only for the IS method is there acceptable agreement between CR and MRI. This means the established Insall–Salvati normal values could be used for MRI as well.

This study shows that the most reliable method to measure patella height is the Insall–Salvati ratio measured on conventional radiographs or the patellotrochlear index on MRI.