Introduction

Impaired ankle dorsiflexion (ADF) is hypothesized to increase forefoot pressure [13]. This can lead to various problems of the foot and ankle, including metatarsalgia [4, 5], plantar fasciitis [6], Achilles tendinopathy [7], and plantar ulceration [8]. The most common cause for reduced ADF is a tightness of the M. gastrocnemius (MGT) [9, 10]. MGT has gained increased attention in the literature. Despite the rising number of studies reporting on treatment strategies [11], there is no consensus on how to diagnose MGT. Currently there is no standardized examination procedure and no consistent definition of physiological norm values [1215]. However, this is the prerequisite to diagnose impaired ADF and MGT as well as initiate treatment.

MGT is characterized by impaired ADF with the knee extended. Knee flexion results in an increase of ADF. Although this principle forms the basis for any test addressing MGT, published examination procedures differ significantly. These vary in various aspects, including the force applied to the ankle [1618], the measurement device [19], and the anatomical measurement landmarks used [12, 20, 21]. Obviously, every one of these parameters has a pronounced impact on the degree of ADF measured. Because of those variations, we are missing a consistent definition of physiological and pathological norm values. Currently, definitions for impaired ADF vary from less than 0° to less than 10° [14, 2224].

To put it into a nutshell, we are missing a consistent diagnostic algorithm for MGT. Consequently, the indication for treatment is unclear. As an example we analyzed the indication for therapy of the 18 trials included in the most recent systematic review on the effectiveness of M. gastrocnemius recession. One-third of the studies did not mention the degree of ADF necessitating therapy. Only three papers gave details on the testing procedure [11, 15]. Therefore, the validity and comparability of outcome studies on gastrocnemius recession are limited. We strongly believe that the prerequisite for any study on the efficacy of gastrocnemius recession is to clearly define the indication. This necessitates a standardized examination procedure. Based on this, norm values must be defined.

The aim of this study was to apply a standardized examination procedure [12] to a large collective of young, asymptomatic individuals to identify norm values for ADF. Based on these, we intended to define a decision pathway to diagnose impaired ankle dorsiflexion and M. gastrocnemius tightness.

Materials and methods

Study population

64 healthy, asymptomatic individuals meeting the criteria presented in Table 1 were included. The local ethics committee approved the study (#007–14)

Table 1 Inclusion and exclusion criteria for study participation

Testing procedure

The standardized testing procedure was defined in a previous study [12]. ADF was assessed bilaterally for each subject, non-weight bearing and weight bearing, both with the knee extended and flexed. Three investigators performed all measurements independently: senior consultant foot and ankle surgery (HP), 5th year resident orthopedic surgery (SFB), final year medical student (FS). The investigators were blinded to each other’s results. The order of examination was changed randomly in between the subjects. A standard goniometer (MDF Instruments USA, Inc. Malibu, CA, USA) with 2° increments and 20 cm length was used.

Measurement landmarks

Similar anatomical measurement landmarks were applied for both non-weight bearing and weight bearing examination:

Y-axis: Distal long axis of the fibular marked prior to testing (a line connecting the center of the lateral malleolus with the center of the fibula 10 cm proximally) [25, 26].

X-axis: Plantar aspect of the foot (non-weight bearing)/the floor (weight bearing).

Non-weight bearing

Two examiners conducted the test. The subject was placed in supine position. One investigator applied maximum force to the forefoot for maximum ADF with one hand, while assuring a subtalar neutral position with the other hand. The other examiner performed ADF measurements, both with the knee fully extended (Fig. 1A1) and 90° flexed (Fig. 1A2).

Fig. 1
figure 1

Illustration of the non-weight bearing and weight bearing testing procedure. A1 Non-weight bearing examination with the knee extended; A2 non-weight bearing examination with the knee flexed (approx. 90°, min 20°); B1 weight bearing examination with the knee extended; B2 weight bearing examination with the knee at least 20° flexed

Weight bearing

One examiner conducted the test. The subject performed a lunge stance in front of a wall. The foot to be measured was the rear foot. It was centered (2nd toe and heal) on a previously marked line perpendicular to the wall. The subject was allowed to hold onto the wall to stabilize their stance. In order to achieve maximum ADF the participant was asked to move the hip towards the wall until just before heel lift off. ADF measurements were performed both, with the knee fully extended (Fig. 1B1) and at least 20° flexed (Fig. 1B2) [12].

Data assessed

Standard demographics (gender, age, BMI) were recorded following informed consent. ADF measurements were conducted bilateral, with the knee extended and flexed, non-weight bearing and weight bearing. Three investigators conducted the measurements separately and the inter-rater reliability was assessed. ADF differences between the right and left side as well as the knee extended and flexed were calculated.

Outcome variables

The primary outcome parameter were the ADF measurements (norm values). Secondary outcome parameters were inter-rater test reliability, secondary values generated from the ADF measurements, i.e., side- and knee position differences, and the influence of the demographic parameters on the ADF norm values.

Statistics

Inter-rater test reliability was assessed using the interclass correlation coefficient (ICC; 1.1). ICC values can range from 0 to 1, with 1 being perfect agreement. Values greater than 0.7 are generally considered acceptable [27, 28]. The ICC was calculated for all possible combinations of investigators.

To obtain reliable ADF norm values, subjects demonstrating a disproportionate inter-rater variance were excluded from further analysis. Therefore, the maximum difference of ADF between the three investigators was calculated for each measurement. Based on these values we calculated the 95 % CI. Subjects were excluded, if any measurement exceeded the 95 % CI of the inter-rater variance.

In the following, standard descriptive statistics were calculated. Possible gender, side, or test (non-weight bearing/weight bearing) differences were assessed using the Students t test (independent/paired, where appropriate). A Pearson correlation coefficient was used to test the influence of age and BMI on ADF. The symmetry between the left and right ankle (knee extended) was calculated using the Bland–Altman levels of agreement analysis. ADF differences between the knee extended and flexed, for both non-weight bearing and weight bearing, were assessed by standard descriptive statistics and a MANOVA. p values less than 0.05 indicated statistical significance. Statistics were computed using SPSS Vs. 21 (IBM Company).

Results

64 healthy, asymptomatic subjects with a mean age of 28.3 ± 4.0 years (58 % female) were included. The mean BMI was 22.7 ± 3.0.

Inter-rater test reliability (ICC)

The overall ICC ranged from 0.876 to 0.915 for non-weight bearing and from 0.851 to 0.901 for weight bearing examination (Table 2). Calculating ICC values for all rater combinations separately (HP-SFB, HP-FS, SFB-FS) revealed higher ICC values for the more experienced raters with a mean ICC (±SD) of 0.903 ± 0.023 (HP-SFB). The mean ICC with the student was 0.862 ± 0.035 (FS-HP) and 0.867 ± 0.029 (FS-SFB). The complete analysis is presented in Supplement 1.

Table 2 ICC values of all three investigators (mean and 95 % CI)

Norm values

As pointed out above, subjects demonstrating an inter-rater variance exceeding the 95 % CI interval were excluded from further analysis. This corresponded to an inter-rater difference of ≥10°, which was observed in 5 subjects. In order to assess the physiological ADF, the values of the remaining 59 subjects were calculated as the mean of all three investigators for each measurement.

The descriptive statistics for the mean ADF are presented in Table 3. No gender differences were found. No side differences were found but for non-weight bearing knee flexed measurements (p 0.002). The weight bearing examination resulted in a significantly higher ADF for all measurements compared to non-weight bearing. Age did not influence ADF. Higher BMI scores were only associated with significantly lower values for ADF when tested non-weight bearing. The corresponding Pearson correlation coefficients for the right leg were −0.389 (p 0.002) with the knee extended and −0.313 (p 0.016) with the knee flexed. For the left leg the correlation coefficients were −0.410 (p 0.001) with the knee extended and −0.304 (p 0.019) with the knee flexed.

Table 3 Mean descriptive statistics for each measurement of ADF

Symmetry right and left ankle

The summary of the Bland–Altman level of agreement analysis is presented in Table 4. Figure 2 illustrates the Bland–Altman plots. Overall an average bias of 0.6°/0.3° for the non-weight bearing/weight bearing examination was present for the right limb. Considering the 95 % confidence interval revealed a physiological ADF side difference for the knee extended <6°.

Table 4 Summary of Bland–Altman level of agreement analysis for the symmetry of the right and left ankle
Fig. 2
figure 2

Blande-Altman plot representing the asymmetry of dorsiflexion between the right and left ankle with the knee extended. Each subject is represented by a data point. The y-axis shows the difference between the right and left ankle. The mean ADF of the right and left ankle is plotted on the x-axis. Values above zero indicate greater ADF on the right side. The thin solid line represents the level of agreement between both ankles, i.e., mean difference between the right and left ankle. The dashed lines represents ±1SD, the thick solid line the lower and upper bounds of the 95 % confidence interval

Differences of ADF with the knee extended and flexed

The differences between the knee extended and flexed for non-weight bearing and weight bearing are presented in Table 5. Overall, a mean gain in ADF of approximately 10° was observed by flexing the knee. The MANOVA showed no significant influence of the testing condition (non-weight bearing/weight bearing). The model revealed a side and an interaction effect on ADF. Still, the mean difference varied around approximately 10° ADF.

Table 5 ADF differences between the knee extended and flexed

Discussion

The key to diagnose impaired ADF and MGT is the application of a standardized examination protocol. Up to now, multiplicities of different testing procedures have been described. These differ in various aspects, all of which have a pronounced impact on the measured values of ADF. The most distinct aspects are the force applied to the ankle, the anatomical measurement landmarks and the measurement device. ADF can be assessed non-weight bearing [16, 29], weight bearing [17, 30] or instrumented [18, 31]. Measurement landmarks applied for the y-axis were the fibula [17, 32], the tibia [33, 34], or the Achilles tendon [35, 36]. The x-axis was either defined by the plantar surface of the foot [32, 37] or the fifth metatarsal bone [29, 38]. Finally, devices used to measure ADF were a mobile app [35], digital inclinometer [34, 35], measuring tape [39, 40], custom made devices [18, 31, 41], or a standard goniometer [16, 42, 43]. This hinders the definition of physiological and pathological norm values. Therefore, the authors strongly believe that the community should agree on one standardized examination procedure.

In a previous study, the authors laid the foundation for a standardized non-weight bearing and weight bearing examination procedure [12]. This protocol was applied herein. The advantages of this procedure have been discusses in detail [12]. In brief, the examination protocol is highly standardized and demonstrates a high inter-rater reliability. Moreover, reproducible landmarks are used, no special equipment is needed and it can be conducted in a timely manner. Therefore, it is highly applicable for the daily routine.

To put our results into perspective, we identified studies using comparable landmarks and performing non-weight bearing or weight bearing tests. Non-weight bearing measurements revealed significantly lower ADF values [14, 21, 42]. Kim et al. [21] examined individuals in supine position and reported mean ADF values of approximately 11° ± 3° with the knee extended and 17° ± 4° with the knee flexed. A single investigator applied the force and conducted the measurements. Slightly higher values (knee extended: 13° ± 8°; knee flexed: 22° ± 11°) were reported by DiGiovanni and colleagues [14]. They used an equinometer and applied a standardized torque of 10 Nm. Rabin et al. [42] conducted non-weight bearing measurements with the subject in prone position. They reported ADF values of 25° ± 5° with the knee flexed. Again one investigator conducted the measurements and applied the force. One must assume that the force applied varied between these studies. Applying the torque while measuring will hamper the force produced by the investigator. Further, greater force can be applied with the subject in prone position. The use of a custom made device, which applies a defined torque, standardizes the force, but is not available to most physicians. Overall, these examples highlight the sensitivity of non-weight bearing measurements for the force applied to the ankle. This does not only explain the diverging values between the different studies, but again emphasizes the necessity of a standardized examination procedure.

Our study, in line with previous [32, 42], showed significantly higher values of ADF for weight bearing compared to non-weight bearing examination. This is most likely due to the greater torque acting on the ankle. The few studies applying a comparable weight bearing examination reported similar values [17, 43]. Munteanu et al. [17] conducted measurements using a clear acrylic plate apparatus and similar landmarks. They reported mean ADF values of 36° ± 5° with the knee extended. Konor et al. [43] reported mean values of 43° ± 6° with the knee flexed. In contrast to the non-weight bearing procedure, weight bearing examination with similar landmarks produces comparable results throughout studies. Taken together, the weight bearing procedure seems to feature a higher reliability. Moreover, previous studies demonstrated significantly higher ICC values for weight bearing measurements [17, 30, 32, 42, 44]. Therefore, the weight bearing examination seems to outmatch the non-weight bearing.

A further strength of this study was the large number of subjects enrolled. Various previous studies assessed ADF in less than 30 healthy subjects [17, 20, 30, 35, 43]. Moreover, we applied an extensive systematic approach to define ADF norm values. Not only did we assess ADF bilaterally with the knee extended and flexed, but also investigated the symmetry between the right and left ankle. The authors are not aware of any study applying a similar systematic approach. Only one study has assessed the side symmetry using a different test [45]. Hoch et al. [45] investigated ADF side differences in healthy volunteers using the lung test (weight bearing, knee flexed, toe-to-wall distance). They reported a bias of 0.1 cm (±2.8 cm; 95 % CI) on the right limb. Similar small physiological differences with a mean bias of 0.3° (±5.3°; 95 % CI) were found in our study. Consequently, an unilateral ADF reduction on the symptomatic side of more than 6° has to be considered highly suspicious for impairment.

Several limitations should be discussed. First, adjacent joints movements (e.g., subtalar and midtarsal joints) can bias the ADF measurements. Therefore, care was taken to assure neutral position of the subtalar and talonavicular joints during testing, as recommended in the literature [4649]. Second, the herein used goniometer had 2° increments, which could add to measurement inaccuracy. Although other devices might be more accurate, the goniometer has to be considered the clinical gold standard [28]. Moreover, with respect to the high ICC, and narrow confidence intervals observed, we believe this tool to be sufficiently accurate. A final practical limitation is the disposal of two investigators conducting the non-weight bearing examination. In the clinical routine, usually only one physician examines the patient. Conducting non-weight bearing examination by a single investigator might increase measurement inaccuracy, as adjacent joint movements cannot be controlled, maximum force application and simultaneous execution of the measurement are hindered.

As outlined above, we applied a highly reliable and standardized test to a large collective of asymptomatic subjects. Bilateral values for ADF were measured for both, the knee extended and flexed, non-weight bearing and weight bearing. Based on this extensive examination procedure, we aimed at defining a decision pathway to diagnose impaired ankle dorsiflexion and M. gastrocnemius tightness (Fig. 3). The presented pathway is only valid if the herein outlined examination procedure and landmarks are applied. Any patient presenting with unilateral pathologies associated with impaired ADF (for example, plantar ulceration [8], metatarsalgia [4, 5], plantar fasciitis [6], or Achilles tendinopathy [7]) should be examined according to the following three steps:

Fig. 3
figure 3

Decision pathway to diagnose impaired ankle dorsiflexion and M. gastrocnemius tightness

  1. 1.

    Measurements: Bilateral ADF, both with the knee fully extended and flexed, should be measured. For the reasons discussed above, the authors recommend to use the weight bearing protocol. Non-weight bearing examination should only be conducted in case the weight bearing test cannot be performed.

  2. 2.

    Impaired ADF (knee extended): Less than 10° on the symptomatic side has to be considered impaired. This is in line with the literature [14] and the herein observed minimum values. Notably, the physiological lower limit (95 % CI) of ADF was 30° weight bearing (20° non-weight bearing). Consequently, values between 10° and 30° weight bearing (10°–20° non-weight bearing) have to be considered restricted. In those cases, the side symmetry should be taken into account. Side differences of less than 6° were found physiological (95 % CI). Therefore, if more than 5° of side asymmetry are present, ADF should then be considered impaired. ADF greater 30° is not impaired.

  3. 3.

    M. gastrocnemius tightness: Once impaired ADF is diagnosed, one must identify the cause. MGT is present, if knee flexion results in an increase of ADF of approximately 10°. If knee flexion does not result in an increase of ADF, MGT is not present. Therefore, further diagnostics have to be conducted to identify the underlying pathology.

Overall, the herein presented study, for the first time, provides a standardized and detailed examination protocol to identify impaired ADF and MGT. Future studies should apply this decision pathway to diagnose impaired ADF and MGT and initiate treatment. This systematic approach will allow comparison of studies on the effectiveness of treatment strategies for M. gastrocnemius tightness.