Introduction

Patient-reported outcome (PRO) measures provide insights from the patient’s perspective into the impact of disease and treatment on their health and quality of life. PRO measures are categorized as generic or disease- or joint-specific. Generic measures often reflect health-related quality of life questions that are relevant across different diseases and populations. In contrast, specific measures include areas of importance related to a specific disease. In clinical studies, both generic and disease-specific measures are often included, with disease-specific measures often considered the primary outcome [1].

Numerous PRO measures to evaluate elbow dysfunction have been described, but there is no universal agreement regarding which PROs should be used because many of them lack reliability data [2]. This problem may be due to the fact that it is difficult for any single scoring system to adequately capture the impact of disease and treatments related to the full spectrum of elbow pathology. The PROs that have been used to assess elbow diseases include the Mayo Elbow Performance Score (MEPS), Oxford elbow score (OES), Disabilities of the arm, shoulder and hand (DASH), Visual Analog Scale (VAS) and the patient-rated tennis elbow evaluation (PRTEE) [36]. Short-Form Health Survey (SF-36) is a generic score that can be  used to establish a health profile of the patients with elbow pathology [7]. The MEPS, designed to measure pain, stability, range of motion and the patient’s ability to accomplish functional tasks, is one of the most commonly used physician-based and joint-specific elbow rating system [3].

Before instruments that evaluate outcome measures can be used in different regions of the world, they must be translated, culturally adapted, and retested to ensure the validity of the revised instruments [8]. In addition, the cross-cultural adaptations may contribute to a better understanding of the measurement properties of the outcome measures. Therefore, the purpose of this study was to translate and culturally adapt the English version of the MEPS into Turkish and investigate the reliability, validity and responsiveness of the translated version.

Methods

Translation and cross-cultural adaptation

Translation and cross-cultural adaptation of the MEPS was performed in five stages, as described by Beaton [8]. The Turkish version of the MEPS was named “MEPS-T.”

Participants

Informed consent was obtained from all of the participants in the study; the informed consent form was approved by Istanbul University Research Foundation (Ethics committee approval date: December 23, 2011, IRB study protocol: 2011/2092-880). The study included patients seen between March 2012 and January 2013 at Istanbul University Department of Orthopedics and Traumatology. The eligibility criteria were as follows: (1) 18 years of age or older; (2) elbow pathology including lateral and medial epicondylitis, bursitis, contractures, osteoarthritis or radial head fracture and (3) the ability to read and write in Turkish. The diagnoses were established by a physician based on the patient history, physical examination and diagnostic imaging results. Maudsley’s and Cozen’s Test for lateral epicondylitis and Golfer’s elbow test for medial epicondylitis were performed. X-rays has been taken to diagnose arthritis and fractures of the elbow joint. Range of motion was evaluated for the presence of contracture (Table 1). The patients with a history of inflammatory arthritis, neuropathic pain and gross structural abnormality of the elbow or any acute condition were excluded. In the first assessment, 91 patients with elbow pathology completed the MEPS-T (see the Appendix) and the previously validated Turkish versions of the DASH, SF-36 and VAS [9, 10].

Table 1 Demographics of the patients

Administration of PRO measures

The physical therapists administered the questionnaires in a random order to the patients in a waiting room after the patient’s appointment with an orthopedic surgeon. The “range of motion” and “instability” subscales of the MEPS-T were assessed by the same physical therapist in the first and second assessments. The second assessment, in which the patients were asked to complete the MEPS-T again, occurred 7–14 days after the first MEPS-T to determine the test–retest reliability of the MEPS-T. To minimize the risk of short-term clinical change, no treatment was provided during this period. Responsiveness was assessed in a subgroup of 46 patients diagnosed with lateral epicondylitis who had conservative treatment for 6 weeks at the clinic. The patients were assessed at baseline and after 6 weeks of treatment.

Statistical analysis

All statistical analyses were performed with Stata version 11. (Stata Corp. LP., TX., USA). Descriptive statistics were calculated for all variables. These included frequency counts and the percentage for nominal variables and measures of central tendency (means and medians) and dispersion (standard deviations and ranges) for continuous variables. The measurement properties analyzed in this study for the instruments included internal consistency, test–retest reliability, construct validity and ceiling and floor effects.

Test–retest reliability

Test–retest reliability represents a scale’s capability of yielding consistent results when administered on separate occasions during a period when an individual’s status has remained stable [11]. The patients who reported “no change” in their condition between the first and second assessments were included in the analysis of test–retest reliability. Interclass correlation coefficient (ICC) was calculated using a 2-way mixed model ANOVA. The values of 0.4 or greater were considered satisfactory (specifically, r = 0.81–1.0 was excellent, 0.61–0.80 was very good, 0.41–0.60 was good, 0.21–0.40 was fair and 0.00–0.20 was poor) [12, 13].

Agreement

Agreement was assessed with the standard error of measurement (SEM) and minimal detectable change (MDC). The ICC was used to calculate the SEM, which is an index of measurement precision. The SEM was calculated as SD × √(1−ICC). The MDC refers to the minimal amount of change that is within measurement error. The SEM was used to determine the MDC at the 95 % limits of confidence (MDC95 %) and was calculated using the formula 1.96 × √× SEM [14].

Validity

Validity is represented by the extent to which a score retains its intended meaning and interpretation [15]. In this study, we examined three aspects of validity: construct, convergent/divergent and content validity. Evidence for construct validity of the Turkish MEPS-T was provided by determining its relationship with the DASH, VAS and the PCS of the SF-36. The PF, RP and PCS of the SF-36 domains were used to assess convergent validity. Evidence for divergent validity was provided by determining the relationships with the MH, RE and MCS domains of the SF-36. Pearson correlation coefficients were calculated to assess construct and convergent/divergent validity. Content validity was assessed by the distribution of the scores and occurrence of ceiling and floor effects. Floor and ceiling effects of the MEPS-T at the first and second completion of the form were assessed by calculating the proportion of patients scoring the minimum or maximum values on the scale relative to the total number of patients. We considered scores between 0 and 10 % being minimum scores and scores between 90 and 100 % to be maximum scores. Floor and ceiling effects were considered to be relevant if greater than 30 % of the patients had a score at the limits of the scale [16].

Responsiveness

Responsiveness determines whether an instrument can detect clinical changes. Effect size (ES) was determined by calculating the differences in the means of baseline and follow-up data, divided by the standard deviation at baseline. A value between 0.20 and 0.50 was considered to be small effects, between 0.51 and 0.80 moderate effects, and between higher than 0.80 large effects [14].

Results

Translation and cross-cultural adaptation

No difficulties were encountered in translating the questionnaire, and the back translation corresponded very well to the original version. The questions were very simple to understand for the patients, so there was no need for cultural adaptation.

Measurement properties and testing

Table 1 provides the demographic and clinical characteristics of the patients. The descriptive statistics for the scores at baseline and at the second assessment of the MEPS-T are provided in Table 2. The mean ± SD duration of symptoms was 8.1 ± 1.2 months. Ninety-one patients (42 males; mean ± SD age: 49.2 ± 11.9 years; range 18–67 years) completed all of the questionnaires at the first assessment. Thirty-two of these patients did not return to the clinic for the second assessment. Therefore, of the 91 patients who participated at the first assessment, 59 patients (28 males; mean age: 42.8 ± 10.6 years; range 20–65 years) participated in the second assessment for the test–retest reliability analysis. Responsiveness was analyzed in the 46 patients (23 males; age: 42.8 ± 8.0 years; range 31–58) diagnosed with lateral epicondylitis.

Table 2 Descriptive statistics for the patient-reported outcome measures

Test–retest reliability

The average ± SD interval between the two assessments was 9.4 ± 2.4 days. The test–retest assessment had an ICC of 0.89, indicating excellent reliability.

Agreement

The SEM and MDC were 4.1 and 11.3, respectively.

Construct validity

The MEPS-T results correlated well with the results obtained using the DASH and VAS (r = −0.61 and r = −0.53, respectively; p < 0.001). The correlations between the results using the MEPS-T and the SF-36 are presented in Table 3. The MEPS-T was most strongly associated with the BP and MCS scales (r = 0.58 and r = 0.43, respectively; p < 0.05) of the SF-36. However, the MEPS-T showed poor and fair correlation with the PF and RP scales of the SF-36 (r = 0.18 and r = 0.25, respectively).

Table 3 Correlation between MEPS and other outcome measures in the literature and present study

Floor and ceiling effects

The floor and ceiling effects and the number of items answered were identical during the test and retest examinations. None of the patients’ scores were at the maximal or minimal value of the overall MEPS-T, indicating that there was no floor or ceiling effect. However, the subscales of the MEPS-T that were analyzed depended on the diagnosis. The “range of motion” and “stability” subscales of the MEPS-T showed high ceiling effects in patients with lateral epicondylitis. Of the 55 patients in the subgroup, 31 and 42 % reported maximal scores in the “range of motion” and “stability” subscales, respectively.

Responsiveness

For the 46 patients with lateral epicondylitis, the baseline scores of the MEPS-T were compared with the scores obtained after 6 weeks of treatment. The mean ± standard deviation of the baseline and post-treatment MEPS-T scores were 68.7 ± 14.4 and 76.0 ± 14.0, respectively, which resulted in a moderate (ES of 0.50, 95 % CI 0.33–0.62).

Discussion

This study test–retest reliability, validity and responsiveness data for the MEPS-T are provided. Based on our sample, the MEPS-T demonstrated acceptable levels of reliability, validity and responsiveness as a PRO questionnaire for Turkish-speaking individuals.

The test–retest reliability of the MEPS-T was excellent (ICC = 0.89), comparable to that reported previously by Cusick et al [17]. The time interval between repeat measurements is an important issue when determining test–retest reliability. In general, the interval between repeat administrations for a PRO measure should be relatively brief (3–7 days) when the condition being measured is expected to change rapidly [11]. However, short test–retest intervals carry the risk of patients ‘‘becoming familiar with the questions’’ and simply answering based on memory of the first assessment. Although longer intervals can decrease this possibility, other factors need to be considered to prevent bias in such studies. Because the pain and function subscales of the MEPS consist of only nine questions, patients could easily remember the questions over a short time interval. In this study, an interval of 7–14 days was chosen to decrease the likelihood of this possibility and also to ensure an individual’s condition had not changed. Similarly, Cusick et al. used a 2- to 3-week interval for retest assessment for the MEPS. The MDC was determined to be 11.3, indicating that a change of less than this value on repeated administrations of the MEPS-T should be considered a reflection of measurement error rather than a true change in the patient’s condition.

Recent studies attempting to validate the MEPS have focused on determining the relationship of MEPS with PROs, including the OES, subjective elbow value (SEV), American Shoulder and Elbow Surgeons (ASES) and) [1719]. In these studies, the highest levels of association were with the ASES and the function and social–psychological conditions of the OES (r = 0.83, r = 0.77, r = 0.77, respectively). Schneeberger et al. [19] used SEV for validity and found a very good correlation value (r = 0.59). In the present study, the DASH and the VAS were used for validity estimation and found to have a very good (r = −0.61) and good (r = −0.53) correlation, respectively. To determine convergent and divergent validity, we determined the level of associations between the scores on the MEPS-T and the eight domains and two summary scores for the SF-36. The MEPS-T was more strongly related to concurrent measures of MCS (r = 0.43) and BP (r = 0.58) than to concurrent measures of PF (r = 0.18) and PCS (r = 0.33). There is no literature with which to compare our results.

Ceiling effects occur when a measure’s highest score is unable to assess a patient’s level of ability. This can be especially common for PROs used on multiple occasions, thereby decreasing the likelihood that the testing instrument has accurately measured the intended subscales. In this study, the patients’ “range of motion” and “instability” subscales were already high at the baseline because these symptoms are not typical in patients with lateral epicondylitis. Although many recent studies have used MEPS to assess lateral epicondylitis [2023], we believe that MEPS is of limited use for lateral epicondylitis and it is not the best tool to use to assess patients with this condition. A disease-specific PRO such as the PRTEE should be considered for assessment of lateral epicondylitis.

Responsiveness, based on the completion of the MEPS-T at baseline and after 6 weeks of treatment, indicated an ES of 0.50 (95 % CI 0.33–0.62). Responsiveness has previously been reported after different elbow surgeries with a standardized response mean (SRM) of 1.26 and ES between 0.98 and 2.71 [19, 24], which is considered high compared to our result. These findings also suggest that MEPS-T is not the ideal PRO measure to assess patients with lateral epicondylitis.

One limitation of the study is that this is the first translation and cross-cultural adaptation study using the MEPS. In addition, physicometric properties of the original English version of the MEPS have not been reported. Therefore, we could not compare our results with those of previous studies.

Conclusion

The MEPS-T is brief and easy to administer and interpret, with a minimal investment of time required for the clinician or researcher. The MEPS-T is a reliable, valid and moderately responsive instrument that can be used as a PRO measure for Turkish-speaking individuals with elbow disease.

Clinical massages

The MEPS-T has sufficient reliability, validity and responsiveness, with values similar to those reported. The MEPS-T can be used as a PRO measure for Turkish-speaking individuals with various elbow pathologies.