Introduction

Hallux valgus (HV) is a common foot deformity that causes significant pain and functional disability. HV, also commonly referred to as bunion deformity, presents with a prominent head of the first metatarsal, medial deviation of the first metatarsal with axial plane pronation, and lateral deviation of the hallux [1, 2]. There is a general lack of epidemiological data on HV, but one systematic review found that HV affects about 30% of females and 13% of males across various regions of the world including the USA, UK, and Germany. This study also found that the prevalence of HV increased with aging, and approximately 23% of adults aged 18 to 65 years and 35.7% of elderly people over 65 years of age were affected by this disabling condition [3]. Patients with HV may present with aching pain in the deviated metatarsal head that eases when removing their shoes. A variety of painful sequelae are associated with HV, including but not limited to the following: synovitis of the 1st metatarsophalangeal (MTP) joint, plantar-central osteochondral lesion of the metatarsal head, plantar plate degeneration and tearing, entrapment of the medial dorsal cutaneous nerve with burning or tingling pain, and inflammation of the medial bursa [4, 5]. On physical examination, the great toe is visually deformed and may present with swelling, skin discoloration or callous, and a sharp tenderness at the metatarsal and/or metatarsal sesamoid joint during ambulation [6].

Imaging is commonly used in addition to history and physical examination to determine the severity and management planning of HV. Plain radiographs are the primary imaging modality for diagnosing HV, and radiographs are mainly performed in the anteroposterior (AP) weight bearing dorsoplantar, lateral, and axial views [7]. Quantitative and qualitative radiographic foot evaluations are obtained on these views, and AP projection is typically the primary view used for radiographic severity assessment. Cross-sectional imaging of the foot can be performed using magnetic resonance imaging (MRI), which is typically done to further evaluate the articular wear of the MTP and meta-tarsal sesamoid joints, intermetatarsal and adventitial bursitis, or other causes of similar foot pain, e.g., Joplin’s neuroma or Heuter’s neuroma. Quantitative radiographic measurements utilized to define HV severity on the AP view include intermetatarsal angle (IMA), HV angle (HVA), tibial sesamoid position (TSP), metatarsus adductus angle (MAA), transverse osseous foot width, 1st MT length, and distal metatarsal articular angle (DMAA), and qualitative assessments included MTP osteoarthritis (OA) and lateral round sign of the first metatarsal head. Table 1 provides a summary of these measurements [7, 8].

Table 1 Definitions and normal values for quantitative foot measurements on the AP view

Inter-reader reliability (IRR) is a way to evaluate the agreement between quantitative foot measurements made by multiple readers. A retrospective study of 56 patients found that IRR for HVA was excellent at 0.94 and for IMA was at 0.76 [9]. Another retrospective study reported that HVA measurements were reliable on both radiographs and MRI [6]. The limitations of these two studies were their retrospective nature, small sample size, and measuring only HVA and IMA from the above-described various measurements.

A prospective multicenter study for HV treatment was led by our institution [10], and standardized AP weight-bearing radiographs were performed in all HV patients. The primary aim of this scientific report was to assess the inter-reader analysis of the above-described measures for HV assessment and report the mean differences. The secondary goal of this study was to compare the correlation of consensus reading with the patient-reported outcome measures at the initial presentation for pre-operative assessment. We hypothesized that expert reads show good inter-reader reliability, and worsening measures correlate with poorer patient-reported outcome measures (PROMs).

Materials and methods

This is data extraction and reporting from a prospective single-arm level 3 multicenter clinical trial. Informed consent was obtained from all patients as they enrolled for the surgical treatment of their HV deformity and local institutional IRBs were in place.

Patients

The adult patients were recruited from seven US-based centers by 13 foot and ankle surgeons. All patients in this study underwent 1st TMT joint realignment arthrodesis for symptomatic HV deformity. This was the trial designed for as IMA above 12–13° needs proximal re-alignment, and this biplanar procedure allows early weight-bearing as compared to the standard lapidus procedure. Inclusion criteria were symptomatic HV in patients between 14 and 58 years of age as higher age may have other confounding factors, such as osteoarthrois and hallux rigidus leading to rigid and potentially uncorrectable deformity, and thus, this specific age group was set as enrollment criteria for this multicenter trial, intermetatarsal angles between 10.0 and 22.0°, and hallux valgus angles between 16.0 and 40.0°. Exclusion criteria consisted of prior history of HV surgery, BMI > 40 kg/m2, diabetes mellitus with HbA1c ≥ 7, evidence of peripheral neuropathy, significant metatarsus adductus (≥ 23°), moderate to severe osteoarthritis of the first metatarsophalangeal (MTP) joint complex, and current use of nicotine products (Fig. 1 shows the flow chart with study sample). Additionally, patients not fitting our age or IMA and HVA inclusion criteria were excluded (patients older than 58 were excluded, patients with IMA over 22°, and patients with HVA above 40° were excluded). Patient demographics recorded included age, gender, BMI, and laterality of the foot.

Fig. 1
figure 1

Inclusion and exclusion flowchart. A total of 183 patients consented. Ten patients were screened out due to the use of allogenic bone graft at the time of index, pregnancy after consent, MAA > 23, and age > 55 at the tie of consent. There were six patients that consented but did not have the surgery within 90 days and were considered screen failures

Radiographs

The radiographic imaging was obtained pre-operatively and at regular intervals up to 36 months post-operatively. The imaging used for this study included the baseline weight-bearing AP view obtained in a pre-defined standardized manner covering the whole foot from the distal phalanges to the calcaneus and proximal ankle joint. The tube-film distance was 100 cm (40″), and the beam was centered at the base of the 3rd metatarsal/3rd cuneiform and directed 15° posteriorly towards the heel. No filter or bucky/grid was used. Exposure factors were 55 KVp and 3.2 mAs, and the beam collimation was from the outer skin margins of the foot on all four sides. The images and patient information were uploaded as anonymized cases on the electronic data capture software (ClindexLive, Fortress medical systems, Hopkins, MN).

Measurements

Two fellowship-trained experienced musculoskeletal (MSK) radiologists reviewed all images on a thin client PACS software (AG Mednet, Inc., Boston, MA). The readers had a training session on 20 HV images, and a manual was shared with them describing and displaying images with all of the above-described measurements (Fig. 2 shows the measurements). Following image quality assessments, the measurements were performed prospectively and independently at two different institutions, blinded to each reader’s findings and the respective clinical findings. Any differences of more than 3 mm or 3° between the readers were re-read by the senior MSK reader. The administrative team presented these cases with discrepancies to the senior reader who evaluated these studies again in the light of previous interpretations of both readers, and the consensus reads were used for final correlation. The consensus read was done more than 6 months after the initial reads. The measurements were recorded as follows:

  • HVA (hallux valgus angle): in degrees

  • IMA (intermetatarsal angle): in degrees

  • Lateral round sign present: positive/negative

  • TSP (tibial sesamoid position): 1–7

  • MAA (metatarsus adductus): in degrees

  • Transverse osseous foot width: in mm

  • 1st metatarsal length: in mm

  • Distal metatarsal articular angle (DMAA): in degrees

  • MTP (metatarsal phalangeal) osteoarthritis (AP view): none/mild-moderate/severe

Fig. 2
figure 2

Hallux valgus assessment on AP view. A HVA. B IMA. C DMAA. D MAA. E 1st MT length. F TSP. G Transverse osseous foot width. H Lateral round sign of 1st MT

PROMs

The PROMs were collected at the time of the initial clinic visit when radiographs were also performed. These included—visual analog scale (VAS), Manchester-Oxford Foot Questionnaire (MOxFQ), and Patient-Reported Outcomes Measurement Information System (PROMIS-29). These were stored in the ClindexLive software for future correlations.

Statistical analysis

The patient demographics were expressed as means + / − standard deviations. Intraclass coefficient (ICC) and kappa were obtained for inter-reader analysis. Bland–Altman plots were generated, and mean differences among various measurements were calculated. A partial Spearman rank order correlation controlling for age and BMI of the consensus reads were performed with PROMs (Manchester-Oxford Foot Questionnaire, PROMIS subscales). PROMs were scaled to a 100-point scale. Correlation coefficients were interpreted as negligible: 0–0.1, weak: 0.1–0.39, moderate 0.4–0.69, strong: 0.7–0.89, and very strong: 0.9–1. FDR adjusted p value indicators were p < 0.0001****, p < 0.001***, p < 0.01**, p < 0.05*. No asterisk indicates p > 0.05. All analyses were done on R version 4.1.1 (R Core Team, Vienna, Austria).

Results

The mean age of the study cohort was 40.77 and the mean body mass index was 26.11 kg/m2. Females made up 91.2% and males at 8.7%. From the study cohort size of 183 patients, 48.63% underwent imaging of the left foot and 51.36% underwent imaging of the right foot (Table 2). All AP views were of good quality and were used for making quantitative foot measurements. The mean and standard deviation of quantitative foot measurements are summarized in Table 3.

Table 2 Study cohort characteristics
Table 3 The mean and standard deviation of quantitative foot measurements. Calculated using consensus reads

The inter-reader reliability for IMA was 0.92 (excellent agreement), 0.96 for HVA (excellent agreement), 0.32 for lateral round sign (poor agreement), 0.73 for TSP (good agreement), 0.67 for MAA (good agreement), 0.99 for transverse osseous foot width (excellent agreement), 0.8 for DMAA (excellent agreement), and 0.48 for MTP OA (fair agreement) (Table 4). Overall, this cohort did not have severe symptoms. Bland Altman plots for IMA, HVA, and MAA are shown in Figs. 3, 4, and 5.

Table 4 Inter-reader reliability as measured using intraclass correlation coefficient (ICC)
Fig. 3
figure 3

Bland–Altman plot: IMA. X axis represents average of the two measures and the y axis is the difference between the two paired measures. The dashed lines represent two standard deviations on either side, and the solid line represents the mean difference between all patients

Fig. 4
figure 4

Bland–Altman plot: HVA

Fig. 5
figure 5

Bland–Altman plot: MAA

Spearman rank correlation between radiographic reads and PROMs detected a negative correlation between transverse osseous foot width and MOxFQ (R =  − 0.20, 95% CI [− 0.35, − 0.06], p = 0.006), PROMIS physical (R =  − 0.15, 95% CI [− 0.29, 0.0], p = 0.048), and VAS (R = 0.089, 95% CI [− 0.24, 0.06], p = 0.02). This indicates increasing transverse osseous foot width (a larger measurement) correlated with worsening PROMIS physical, better MOxFQ, and better VAS. Additionally, increased 1st MT length was negatively correlated with PROMIS sleep (R =  − 0.18, 95% CI [− 0.32, − 0.03]). This means that a longer 1st MT correlated with better sleep as HV leads to shortening of the 1st MT. There were no other significant correlations between the consensus reads of quantitative foot measurements and PROMs (supplemental).

Discussion

Hallux valgus is a common foot deformity that causes significant pain and functional disability. Due to the tri-planar nature of the HV malalignment, many different quantitative foot measurements (mentioned above) are used to determine the severity and for surgical planning [7]. This is the largest sample to date illustrating the inter-reader reliability of multiple quantitative foot measurements in patients suffering from HV with several of these measurements, such as lateral round sign studied for the first time in this manner. There was excellent IRR for HVA, IMA, transverse osseous foot width, and DMAA, good agreement for TSP and MAA, fair agreement for MTP OA, and poor agreement for lateral round sign. This is also the initial report of the correlation of these qualitative and quantitative parameters with PROMs. We report that increasing transverse osseous foot width correlated with worsening PROMIS physical, better MOxFQ, and better VAS. Additionally, we report that a longer 1st MT is correlated with better sleep.

Our findings of excellent inter-reader reliability for HVA, DMAA, and IMA are in line with previous small-scale studies [6, 11, 12]. To the best of our knowledge, no previous study has measured inter-reader reliability for transverse osseous foot width. Our finding that there is excellent reliability for this measurement is promising and validates this measurement as a reproducible way to measure hallux valgus severity and for surgical planning. A major strength of this study is that standardized radiographs were obtained in weight-bearing positions at different institutions with strict quality controls. This may have facilitated excellent reproducibility in multiple measurements.

MAA, along with HVA and IMA, is one of the other common measurements used in HV patients as increasing metatarsus adductus can lead to underestimation of IMA and potential under correction of HV deformity. The good agreement for MAA is in line with previous literature. A study on intra- and interobserver reliability found the interobserver reliability of MAA to be 0.62 (95% CI: 0.452–0.760) [13]. The lower reliability for MTP OA and TSP may be due to their ordinal nature. The poor agreement for lateral round sign may be due to the binary nature of this measurement and difficulty in subjectively assessing the distal plantar projection of the 1st metatarsal head due to metatarsal pronation. The poor agreement for lateral round sound leads us to conclude that in isolation it should not be used as an indicator for HV severity or to plan pre-operatively. Inter-reader reliability for MTP OA and lateral round sign also has not been studied previously.

To the best of our knowledge, no studies have been conducted studying how pre-operative radiographic foot measurements correlate with pre-operative PROMs. On the other hand, there have been some studies correlating radiographic foot measurements with post-operative PROMs. For example, a retrospective study performed by Matthews et al. of 80 patients presenting to a single urban foot and ankle specialty clinic performed a correlation analysis between pre- and post-operative radiographic foot measurements with post-operative Foot and Ankle Outcomes Scores (FAOS) subscales. This study found only a minimal correlation between radiographic foot measurements and post-operative FAOS subscales [14].

Our finding that an increased transverse osseous foot width was correlated with worsening PROMIS physical combined with our findings that increased foot width was correlated with better MOxFQ and VAS scores leads us to believe these significant findings are likely spurious due to their contradictory nature. Additionally, these findings interpreted in the broader context of there being no appreciable trends between pre-operative radiographic foot measurements and pre-operative PROMs demonstrate that pre-operative radiograph foot measurements do not effectively measure the quality of life in patients presenting with HV deformity before surgery.

This study had a few limitations. First, the cohort for the sample is comprised of pre-operative patients and majority of females, which is not entirely representative of the general population. Thus, the results should be considered in that context. Second, patients above the age of 58 were excluded from the study, which limits the generalizability of this study. Patients older than 58 represent one-third of the population with HV [10]. Third, there was a tenfold female predominance in comparison to males. This also limits the generalizability of our study. One strength is that we used standardized radiographic positioning, radiographic measurements by two fellowship-trained radiologists, and a large consecutive sample of patients referred for HV surgery, and PROMs were prospectively collected in a uniform manner. PROMs have also been shown to vary with aging [15]. In the future, a wider spectrum of patients may be studied to examine if the results are similar and whether there is improvement in PROMs with improving measurements.

In conclusion, we report good to excellent IRR for the most commonly used measurements for HV assessment on the dorsoplantar AP view radiograph and poor agreement for the lateral round sign. We report no major trends in the correlation between the quantitative radiographic foot measurements and PROMs.