Introduction

Proliferation of patient-focused questionnaires for measuring of health status and health-related quality of life is the result of the increasing international interest in surgical outcomes with measurements obtained from the perspective of patients or their non-clinician caregivers (e.g., parents of patients too young to self-report) for use in clinical research evaluating surgical effectiveness.

The field of orthopedics is concerned with a broad anatomical range and therefore with a wide range of functional problems. Although generic instruments are potentially responsive to clinically important changes in health, disease-specific measures can give a more detailed pattern of specific symptoms and impairment related to a specific disease. Challenges arise when clinical outcomes are assessed in children. Instruments that are originally developed for adults may not capture the pediatric patient’s perspective and may not accurately reflect how children relate to their usual environments. While adult patient-reported outcome measures (PROMs) already exist for foot and ankle disease, only one questionnaire has been validated for pediatric foot and ankle diseases: the Oxford Ankle Foot Questionnaire (OAFQ) [1, 2].

The purpose of this study was to translate the OAFQ into Italian, to perform a cross-cultural adaptation and to evaluate the psychometric properties of the Italian version of OAFQ.

Materials and methods

This study was performed according to the guidelines for Good Clinical Practice and the Declaration of Helsinki. All participants and their parents gave informed consent for participation in the study. A license for the use of the OAFQ was obtained from the copyright holder (Isis Outcomes, Oxford, UK). The translation procedure was performed following the guidelines proposed by Guillemin et al. [3] for the cross-cultural adaptation of HRQL measures with forward–backward and counter-translation. Forward translation into Italian of the OAFQ was independently performed by two informed translators, orthopedic surgeons, mother tongue Italian and fluent in English. The first version was obtained after a consensus meeting of the two translators. This provisional Italian version was translated back into English by three translators naive to the outcome measure: two mother tongue Italian subjects fluent in English, with medical background, and one mother tongue English speaker, fluent in Italian. A second consensus meeting of all translators was arranged in order to check for discrepancies or any problems. The English versions (the original and the three versions produced by translating it back into English) were found to be semantically similar and no further adjustments were required. The final Italian version was obtained after testing it on fifteen pediatric patients with foot and ankle disease and their corresponding parents to discuss perceived difficulties of comprehension. These patients were asked to rate all items concerning their importance (very important, important, unimportant and indifferent). None of the patients reported problems to complete questionnaires because of language problem or redundancy.

Patients

Children aged between 8 and 13 were considered eligible for this study when they were diagnosed with symptomatic flexible flatfoot by an orthopedic surgeon and referred for surgical correction. Clinical diagnosis of flatfoot was based on a valgus position of the heel (>10°) and collapse of the medial longitudinal arch [4, 5]. Weight-bearing anteroposterior (AP) and lateral radiographs were used to confirm the clinical diagnosis: the talus-second metatarsal angle, the talonavicular coverage angle, the calcaneal pitch angle, the talo-first metatarsal angle and the medial cuneiform-fifth metatarsal height were assessed.

Children and parents were asked to complete separate “child/adolescent” and “parent” versions of the Italian OAFQ, the Child Health Questionnaire—Child Form 87 (CHQ-CF87) and the Child Health Questionnaire—Parent Form 50 (CHQ-PF50) [6].

All the patients underwent surgical correction of the deformity through subtalar arthroereisis [7]. During the pre-admission visit, patients and their parents were asked to fill out the Italian version of the OAFQ, the Italian version of the CHQ-CF87 and CHQ-PF50. All the questionnaires were scored as recommended in the original versions. The OAFQ includes 15 items, of which 14 are grouped into three subscales: physical (6 items), emotional (4 items) and school and play (4 items). Each item has a five-point Likert-type response format ranging from 0, poor function, to 4, good function. The last item provides information about patient’s satisfaction on the footwear they can wear. Missing data in the OAFQ were imputed if half or more items in a domain scale were present; in these cases, the missing items were given the mean of the subject’s own item scores in that scale [1]. The scores of all items were summed separately for the three subscales and were transformed to a percentage scale (0–100), where a higher score represents a better functioning. The CHQ is a generic health instrument developed to assess the physical and psychosocial well-being of children independently from the underlying disease. The parent version (CHQ-PF50) and the children version (CHQ-CF87) consist of 50 and 87 items, respectively. These questionnaires cover different concepts of physical, psychosocial health and disabilities associated with these. These concepts include: global health (GGH), general health perceptions (GH), physical functioning (PF), role functioning—emotional (RE), role functioning—behavioral (RB), role functioning—physical (RP), bodily pain (BP), behavior (BE), global behavior (GBE), mental health (MH), self-esteem (SE), family activities (FA), family cohesion (FC), parental impact—emotional (PE), parental impact—time (PT) and change in health (CH). The two concepts, RE and RB, are combined into one, namely role functioning—emotional and behavioral (REB) in the CHQ-PF50. Two concepts, PE and PT, are not applicable for CHQ-CF87 as the questionnaire is answered by the children themselves. These health concepts comprise multiple items or a single item. The response options of each item vary from 4 to 6 levels. For concepts with multiple items, the responses to the items are summed up and transformed to a scale that ranges from 0 (lowest possible score indicating the worst health) to 100 (highest possible score indicating the best health). The CHQ-PF50 and CHQ-CF87 include, respectively, 15 and 14 items. The CHQ-CF87 and the CHQ-PF50 contain similar but not identical items, and the use of both questionnaires is advised in order to detect children’s perspectives about their general health status.

The OAFQ was re-administered to the patients/parents the day before surgery in the hospital, approximately between 1 and 2 weeks after the pre-admission visit. One week was considered an acceptable time to ensure that the clinical situation would not change during this period and that patients would not remember their answers to the first questionnaire.

Approximately 6 months after surgery, the OAFQ was re-administered to the patients/parents during a follow-up visit. CHQ-PF50 and CHQ-CF87 were administered once in the pre-admission visit. The OAFQ was administered three times (pre-admission visit, day before surgery and last follow-up visit) to all patients/parents.

Statistical analyses

Data were entered into a Microsoft Excel spreadsheet (Microsoft Corporation, Redmond, WA) and analyzed using SPSS 20.0 (SPSS Inc., Chicago, IL). Normal distribution of the scores was tested using the Kolmogorov–Smirnov test. All tests were two-tailed and conducted at a 5 % level of significance.

Reliability

Test–retest reliability was evaluated by comparing domain scores from questionnaires completed at baseline and again approximately within 2 weeks using the ICC (model: two-way random; type: absolute agreement). An ICC of more than 0.80 is usually considered an indicator of good reproducibility [8, 9]. The child–parent consistency for each domain score were assessed using the intraclass correlation coefficient (ICC). The smallest detectable difference (SDD) was calculated as 1.96 × √2 × standard error of measurement (SEM), defined as S√(1 − r), where S is the standard deviation of test results and r is test–retest reliability [10]. Internal consistency was assessed calculating the Cronbach’s α, and the consistency of each question with total score subscale was evaluated using item-total correlation [11]. Values between 0.7 and 0.9 are considered acceptable for psychometric scales [12]. For the scales to be considered sufficiently reliable for use in groups of patients, item-total correlation should be above 0.4 [13].

Convergent validity

Convergent validity was evaluated for each domain with Spearman’s correlation coefficient using a priori hypothesized correlations with CHQ domains. To demonstrate convergent validity, we assumed moderate to high correlations between the OAFQ subscales and the CHQ physical components (PF, RP, BP). To demonstrate divergent validity, we assumed lower correlations between the OAFQ subscales and the mental health components of the CHQ (GH, GGH, REB, GBE, MH, SE, FA, FC). According to Dawson et al., a coefficient correlation value of 0.60–0.79 indicated a strong correlation and 0.40–0.59 a moderate correlation [14]. Such psychometric properties were evaluated on the baseline data.

Feasibility was assessed by calculating missing responses and the floor and ceiling effects and were considered present if more than 15 % of the participants achieved the highest or lowest possible score [15].

Responsiveness

Responsiveness to surgical treatment was evaluated using Student’s t test comparing the pre- and post-surgery scores. Interpretation of score changes was performed with standardized effect size (ES) and standardized response mean (SRM). The effect sizes were calculated as described by Husted et al. [16]; values >0.20, >0.50 and >0.80 were considered small, moderate and large, respectively.

Results

The forward and back-translations of the OAFQ presented no major difficulties or problems with the language. A total of 61 consecutive patients (28 girls, 33 boys; mean age 10.9 ± 1.6 years) were enrolled in the study. Each participant and a parent completed questionnaires before surgery and at the follow-up visit (Table 1). The average time between completion of baseline and retest questionnaires was 10.9 days (SD 4.0; range 6–20). All 61 patients and parents participated in the reliability and responsiveness assessment. No missing data were observed during the three assessments.

Table 1 Baseline and follow-up scores from the Oxford Ankle Foot Questionnaire

Reliability

Table 2 shows reliability parameters of the questionnaire. The test–retest reliability was confirmed by high ICC values for both child and parents subscales. The SDD in child-reported scores was 17.0, 7.7 and 7.0 for physical, school and play and emotional subscale, respectively; in the parent-reported scores, it was 3.3, 12.9 and 7.8 for physical, school and play and emotional subscale, respectively.

Table 2 Reliability coefficients for internal consistency, child- and parent-reported scores and stability (test–retest)

Cronbach’s α value for the study of the questionnaire was greater than 0.7 in all the OAFQ domains for both children and parent forms. All the questions in three subscales of the questionnaire had correlation greater than 0.4, except for the last question of the “school and play” subscale in the child form (Table 3).

Table 3 Item-total correlations of the four domains

At baseline, no floor effect was detected for all domains for the two version, but ceiling effect was observed in the school and play domain for 14 children (23 %) and 10 parents (16.4 %) and in emotional for 17 children (27.9 %).

Convergent validity

Tables 4 and 5 show the correlations between OAFQ and CHQ for children and parent form, respectively. A significant agreement between the OAFQ domains and the scales of the CHQ with related content, particularly in the areas of physical function and pain, was observed for both children and parent forms.

Table 4 Correlation (Spearman’s ρ) between the OAFQ (child) subscales and CHQ-CF87 subscales
Table 5 Correlation (Spearman’s ρ) between the OAFQ (parent) subscales and CHQ-PF50 subscales

Responsiveness

The average follow-up from surgery was 7.2 months (SD 1.6; range 4–12). The mean OAFQ scores improved in all the domains after treatment with the subtalar arthroereisis, for both children and parent scales (p < 0.01). Effect size and SRM values ranged from small to moderate for almost all domains (Table 6).

Table 6 Mean OAFQ scores and responsiveness of Italian OAFQ subscales

Discussion

In this study, the cross-cultural adaptation of the OAFQ to the Italian language has been presented. It showed satisfactory psychometric properties in pediatric patients with flatfoot deformity, treated with subtalar arthroereisis. After adaptation, the Italian version of OAFQ seems to be a feasible instrument as illustrated by the absence of missing data that reflect the good acceptance of the questionnaire. The absence of “floor” effect in the preoperative evaluation resulted in accordance with the original version and confirms the good feasibility of this version. However, a ceiling effect was detected for the school and play domain school for children and parents and in emotional domain for children (27.9 %). The internal consistency (measured by Cronbach’s α) was satisfactory for all analyzed subscale and similar to the original version. The high value of ICC, in particular for what concern child–parent and test–retest scores, indicates good concordance and high reliability, respectively. For what concern the construct validity, moderate/strong correlations were found between the subscales of OAFQ Italian version with the respective parameters of the Child Health Questionnaire, in particular for the main physical and pain parameters, in both children and parents. In the parent version, lower correlations were identified between emotional subscale and all CHQ domains: This represents an important bias to take in account when evaluating the results. It is possible that parents do not feel correlations between foot/ankle condition and emotions of their children. This is also more evident when observing the correlation between pain and emotion: It is possible that these two domains correlate only in most severe cases. Interestingly, the emotional domain for the children version was correlated higher with the CHQ domains in comparison with the school and play domain. This supports the importance of asking children about their symptom and health-related quality of life, because of the well-known discordance between child and parent reports of quality of life [17].

Further studies are necessary to better define the role of the emotional subscale evaluation in the pediatric patients affected by flexible flatfoot. Despite these observations, both convergent and divergent a priori validity hypotheses were satisfied. We were then able to confirm the hypothesis of construct validity. The results obtained by the responsiveness assessment showed the ability of this tool to identify significant changes associated with the surgical treatment (subtalar arthroereisis). In the original study by Morris, patients treated for trauma reported a mean improvement of about 40 points in the physical subscale, 50 points for the school and play subscale and 20 points in emotional subscale [2]. Our results were significantly lower with respect to this patients subgroup, but they were similar to the elective patients subgroup in the original study. The changes in the scores were higher in comparison with respective SDD just for emotional subscale in children and physical subscale in parents. For this reason, we cannot exclude that the other subscales are subject to error in the measure obtained with the questionnaire. Further studies are necessary to assess responsiveness of the questionnaire with different treatments. Moreover, the low ES for physical and emotional subscales could indicate a relatively low effectiveness of surgical treatment or that the surgical indication is given in patients with relatively good preoperative clinical scores. We are well aware of the study limitations. In fact, the population considered in the present study is not representative of all pediatric patients affected by foot and ankle disorders, and the results we obtained should be confirmed in patients affected by different pathologies, in order to better define the reliability of the questionnaire. Moreover, it was not possible to sort patients for different severity of the disease, since no clinical nor radiological validated classification is available for pediatric flexible flatfoot.

Up to date, in Italy, a suitable score for the evaluation of foot and ankle in pediatric patients does not exist. The present work is the first attempt to provide an effective and reliable tool for the evaluation of these patients. Even if it is still under investigation, this tool could be used in future researches in order to adopt a common language on the international scenario.