Introduction

Low back pain (LBP) has been recognized as an important health issue worldwide. Despite no published study documenting the exact prevalence of LBP in Indonesia, such condition is considered as the second most frequent reason for pain-related hospital visits [1]. A substantial amount of information regarding LBP prevalence was found, which resulted in the heterogeneity of the data: point prevalence range from 1.0 to 58.1% (mean 18.1%) and 1-year prevalence from 0.8 to 82.5% (mean 38.1%) [2]. In 2010, the Global Burden of Disease Study showed that the global point prevalence of LBP was 9.4% with the global burden increased from 52.8 million in 1990 to 83.0 million in 2010. They also stated that LBP ranked highest in terms of disability and sixth in terms of the overall burden. These data urge the need for further studies regarding LBP across different settings. [3]

Specific tools have been developed to quantify the functional status, as it cannot be assessed adequately by clinical assessment alone. One of the most commonly used patient-reported outcome measures (PROM) for LBP is the Oswestry Disability Index (ODI) [4,5,6,7]. ODI, first established by Fairbank in 1980, consists of ten questions that are categorized into two sections: the intensity of the pain and its disabling effect on personal and social life. The validity, reliability, and responsiveness of the original English version of ODI have also been proven to be satisfactory. A cross-cultural adaptation of ODI is required before using it in different languages or cultures [8]. Currently, the ODI has been culturally adapted, translated, and validated in more than 20 languages. Moreover, the ODI has also been included as a PROM in the International Consortium for Health Outcome Measurement standard set for LBP [9].

The Indonesian language is one of the most commonly spoken languages in the world as it is the national language of Indonesia. Although over 300 different native languages are spoken in Indonesia, the Indonesian language is used by more than 200 million Indonesian people either as their first or second language [10]. However, since no Indonesian version of ODI is available, its use in Indonesia is not possible. The aim of this study was to perform a cross-cultural adaptation of ODI in Indonesian language and to assess its psychometric properties.

Materials and methods

UK English version of the ODI 2.1a was used in this study. Approval for the adaptation was obtained from the MAPI Research Trust. The Institutional Review Board approved the trial prior to the study (No.47/TU/DM/IX/2016). The adaptation was performed based on MAPI guidelines and Beaton’s cross-cultural guidelines [8].

Translation and cross-cultural adaptation

Phase 1: Forward translation

Two native Indonesian speakers independently translated the original questionnaire to Indonesian. The first translator (naïve translator/FT1) was neither familiar with the concept of the ODI questionnaire nor with LBP. On the other hand, the second translator (FT2) had a medical background and was aware of the concepts that are being evaluated. Each forward translation was compared. The different/ambiguous terms were documented and resolved after a discussion between the two translators resulting in combined forward translations (FT12).

Phase 2: Back translation

Two translators separately translated back the FT12 version of ODI into English (BT1 and BT2). Both translators had no medical background and were unaware of the original version. Both back-translations (BT1 and BT2) then were compared with the original version of the questionnaire to validate whether the translated version reflects the same content as the original version.

Phase 3: The expert committee

The back-translation was reviewed by the MAPI project manager, four translators, and the principal investigators. This review process aimed to highlight any discrepancies in meaning or terminology used and to obtain the best possible translation, which was the pre-final version. Each issue, rationale, and decisions during the discussions were documented.

Phase 4: Pilot Test of the Pre-Final Version

The comprehensiveness of the pre-final questionnaire was tested in 30 LBP patients to ensure the adapted version was understandable. After completing the questionnaire, the subjects were interviewed to explore their understanding of each question and response. This result of this test was then re-evaluated by the committee, and the final form of the questionnaire (ODI-ID) was then established (Supplemental Data File 1).

Phase 5: Test of the Final Version

The questionnaire (ODI-ID) was field-tested to assure the validity, and other psychometric properties remained intact. Consecutive sampling was conducted in an outpatient spine clinic in a tertiary referral general hospital from November 2016 to February 2017. The sample size (115 patients) was predetermined based on the subject to item ratio ≥ 10 [11]. The inclusion criteria were: LBP ≥ 6 weeks; adult (older than 17 years); able to read and write in Indonesian fluently. The exclusion criteria were acute LBP; the presence of neurological deficit; had an incidental event (including surgery) during two periods observation that might increase/reduce back pain significantly. Each respondent filled the booklet (consisting of the ODI-ID, Short Form-36 questionnaire and visual analog scale) twice with a one-week interval. Both baseline and follow-up assessments were performed in the clinic. Afterward, a short-interview was performed to detect whether there were any changes in their clinical condition during this period. Additionally, we measured the changes in VAS and SF-36 (bodily pain and physical function subscale). VAS ≤ 1, bodily pain (BP) ≤ 3, and physical function (PF) ≤ 3 were regarded as clinically stable [12]. If there was a missing answer, the total score of ODI-ID was adjusted and calibrated on a scale 0 to 100, in accordance with the original version of ODI [5].

The assessment of psychometric properties (internal consistency, test–retest reliability, measurement errors, and construct validity) was performed and presented based on COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) guidelines [13].

Statistical Analysis was performed using the IBM SPSS software version 22.0 (SPSS Inc., Chicago, Illinois).

Factor analysis

Confirmatory factor analysis (CFA) was performed using AMOS software version 26.0 (SPSS Inc., Chicago, Illinois). Two models were evaluated from the baseline data: (1) one-dimensional and (2) two-dimensional: static activities (pain, sleep, standing, traveling, sitting) and dynamic activities (personal care, lifting, walking, sex, social), as proposed in previous studies [14,15,16]. Indicators determining the model fit such as: standardized root-mean-square (SRMR), goodness-of-fit index (GFI), adjusted goodness-of-fit index (AGFI), comparative fit index (CFI), and root-mean-square error of approximation (RMSEA) were calculated. We used the following cut-off values to indicate a good fit: SRMR value < 0.08 [17]; GFI and AGFI > 0.9 [18]; CFI > 0.9 [19]; and RMSEA < 0.05 [20].

Reliability

Internal consistency of ODI-ID was evaluated by calculating Cronbach’s alpha (CA) of the baseline questionnaires [21, 22]. Correlation between each item and the whole instrument (ODI-ID) was calculated using Pearson correlation. A significant item-total correlation was determined if the correlation coefficient exceeded 0.576 (95% critical value of the sample correlation coefficient for 10 items) [23].

Test–retest reliability was determined using intraclass correlation (ICC) between the first and second tests. The ICC used was a single measurement, absolute agreement, 2-way mixed-effects model with 95% confidence intervals. An ICC of > 0.70 was regarded as good reliability [24].

Measurement error

Standard error of measurement (SEM) was determined from the error variance of the ANOVA associated with determination of the ICC [25].

Minimum detectable change (MDC95%) was calculated by multiplying SEM by 2.77 where 2.77 was obtained from Z value for the 95% CI (1.96) times the variance of two measurements (√2) [25, 26].

The distribution of floor-and-ceiling effect (percentage of sample achieving the worst and best possible scores, respectively) was also determined for both the baseline test and the follow-up test. Test instruments should exhibit minimal floor and ceiling effect (less than 15% of the respondents) to be considered reliable [27].

Construct validity

Hypothesis testing

A total of nine hypotheses (Table 1) were tested to evaluate the construct validity of ODI-ID, using the standard hypothesis testing methodology [28]. Pearson correlation coefficient was used to assess the association of baseline ODI-ID with all the subscales of the SF-36 questionnaire and visual analog scale for pain. Pearson correlation (r) of > 0.50, 0.36–0.49, and < 0.35 was considered strong, moderate, and weak correlations, respectively. The results are reported as hypothesis confirmed or not confirmed. The total of met hypotheses was reported as percentages. If it was more than 75%, we confirmed the construct validity of the ODI-ID [28].

Table 1 Hypotheses testing for construct validity of ODI-ID

Results

Translation and cross-cultural adaptation

During the cross-cultural adaptation process, several noteworthy issues arose during the translation phase and were solved in an expert meeting.

  1. (1)

    Translation of “personal care”. The phrase can be translated into two similar words: “asuhan pribadi” or “perawatan pribadi”. However, none of them is clear enough to describe personal care activities in Indonesian language. During the pretesting phase, some respondents were confused with this phrase. Thus, we decided to add a description to explain which activities are included such as “wash, dressed, etc.”

  2. (2)

    Translation of walking distance. There was a difference in terms of metric used in distance between the original version and Indonesian culture. The metric units such as miles and yards are rarely used in Indonesia. The original English version used 1 mile, quarter-mile, and 100 yards. The direct conversions of these metrics are 1.6 km, 400 m, and 91 m, respectively. After discussion, we decided to round these units to “1.5 km”, “500 m”, and “100 m”. We expected that the rounding would not affect the validity or reliability.

  3. (3)

    Translation of “prevents”. The direct translation of the word “prevents” in Indonesian is “mencegah” which means avoiding something negative to occur. Thus, after discussion, we felt another word (“menghalangi”), whose direct translation was “hinder/limit”, is more suitable in our language to describe the disabilities that occurred.

Patient demographics and scores distribution

The study initially included 115 respondents; 4 were dropped out due to inability to return for the second test; meanwhile, 15 were considered unstable as they underwent changes in their treatment course and/or clinical condition. This left 96 final respondents to be further analyzed. Their average age was 40.1 ± 12.6 years, and 48 respondents (50%) were male. The average time to complete ODI-ID was 4.4 ± 1.5 min (2.5–6.5 min), and the average interval between the first and second questionnaires was 6.0 ± 1.3 days.

The average ODI-ID score on the first and second administrations was 44.7 ± 13.8 and 44.5 ± 12.6, respectively. Twenty-eight respondents did not fill out an answer to the question about sexual life. There were no floor or ceiling effects observed for the total ODI-ID. The average score of VAS was 3.8 ± 1.5 (moderate pain). The average SF-36 score was 50.4 ± 23.

Confirmatory factor analysis

The baseline ODI-ID data had a significant yet poor fit with both 1-factor and 2-factor (static-dynamic) models as shown in Table 2. The Chi-square difference between these two models was not significant (p > 0.05); therefore, we used the 1-factor model to calculate the Cronbach’s alpha.

Table 2 Confirmatory factor analysis of the ODI-ID

Internal consistency

CA index for ODI-ID was 0.90, suggesting good internal consistency. A significant correlation with total ODI-ID was found in every item, except for social life and travelling. Correlation between social life and traveling with the whole scale was 0.55 and 0.52, respectively. The CA for the rest of the scale did not exceed 0.90 if any of the items were removed.

Test–retest reliability and measurement error

The intraclass correlation coefficient for ODI-ID was 0.97 (95% CI 0.96–0.99) showing good reliability. Standard error of measurement was 3.35. The minimum detectable change was 9.

Construct validity

ODI-ID has statistically significant correlations with VAS and all SF-36 subscales (Table 3). We found a total of 9 out of 9 a priori hypotheses were met, confirming the construct validity of ODI-ID. It correlated strongly with VAS, SF-36 physical functioning (PF), and bodily pain (BP), moderately with the other SF-36 subscales.

Table 3 Construct validity: correlation coefficient between ODI-ID, VAS and SF-36

Discussion

The aim of this study was to adapt the ODI questionnaire to Indonesian-speaking patients and evaluate the psychometric properties of the Indonesian version. The translation process was carried out following the established guideline for cross-cultural adaptations [8] to obtain a reliable and valid adaptation of the questionnaire.

The ODI-ID was easily administered and understood by the patients. This was shown by the average time to complete the questionnaire of 4.5 min, which is comparable to other ODI versions (range: 3.4–6.6 min) [29,30,31,32]. Twenty-eight patients failed to answer to “sex life” question. This finding was similar to other studies [30, 31, 33], which most likely attributed to cultural issues. Additionally, since ODI version 2.1a was used as a reference, the question number 8 comes with an additional statement: “if applicable” which rendered the patients for not answering this question. There was also some concern that the sex life question was not answered accurately. Some believed that patients did not answer this question due to psychosocial factor (do not have a partner, other condition that does not allow them to have sex, etc.) rather than a pain-related factor [34]. However, we decided to keep this question in the ODI-ID since the content of the question was considered to be important. Furthermore, as stated in the original version, the scoring method will be adjusted if there is a missing/inapplicable question [4].

The result of confirmatory factor analysis showed a significant fit for both tested models; however, the fit was not brilliant in our series. There was also no statistically significant difference in the chi-square between models in our study, indicating that the 2-factor model did not provide a better fit than the original ODI 1-factor model. Previous studies indicated uncertainty about the factor analysis for ODI. While Italian [31], Slovenian [35], Polish [36], Hungarian [37] and Dutch [38] version of ODI revealed 1-factor structure, 2-factor structures were shown by German [33], Spanish-Colombia [39], Finnish [40], and Arabic [29] version of ODI. The number of samples might play a role in their result, which was proven by Gabel et al. They performed a factor analysis in 35,623 LBP patients and verified the one-factor structure that was proposed by the original author [14].

The Indonesian version of ODI demonstrated good internal consistency (CA = 0.9) that even exceeded the original English version, which ranged from 0.71 to 0.87 [4, 41,42,43]. CA values are considered high if it ranged from 0.70 to 0.90. If CA is too high (> 0.90), it may suggest that some items are redundant as they are testing the same question [22]. Similar internal consistencies were found with the most translated ODI versions, especially Danish (CA = 0.88) [44], Turkish (CA = 0.9) [45], German-Swiss (CA = 0.9) [46], German (CA = 0.89) [33], Slovenian (CA = 0.9) [35], French Canada (CA = 0.88) [47], Polish (CA = 0.9) [36], Hungarian (CA = 0.89) [37], India-Marathi(CA = 0.88) [48] and Arabic (CA = 0.89) [29]. The other versions were ranged from 0.75 to 0.99 (Table 4). Furthermore, the alpha for each item in ODI-ID did not exceed the alpha for the total item, which indicates the homogeneity of the questionnaire.

Table 4 Characteristics of the published cross-cultural adaptation studies of the Oswestry Disability Index (updated from Domazet et al. [30])

Test–retest reliability shows the stability of a questionnaire within a certain timeline, with ICC values illustrating the inter-rater reliability within the two time intervals. A value of 0.97 showed that this version has excellent test–retest reliability. One of the limitations of this study is that we did not include a transition question to confirm whether the patients reported any change in their condition during this period. However, by conducting an interview and evaluating their VAS and SF-36 (bodily pain and physical function subscale) on the second visit, we confirmed that there were no changes regarding their condition. The 7-day interval was also chosen since the selected time intervals should be short to avoid changes due to nature, but not too short to allow recalling of previous answers. Interval of 1 to 2 weeks is reported as the best time to measure the reproducibility of the functional status questionnaire [49]. In the original ODI study, the ICC value reached 0.99; however, recall bias might exist since an interval of 1 day between measurements was used [4].

As for measurement error, only a few ODI-validation-studies published the standard error of measurement (SEM) and minimum detectable change (MDC), which are an important feature of a questionnaire. For the Indonesian version, the MDC95% was 9 which is similar to the German (Swiss) version [46]. The range of SEM in the previously validated version was 3.4–4.8 which resulted in MDC95% around 9–13 (Table 4).

To assess construct validity, we performed hypotheses testing as recommended by the COSMIN guideline [28]. All a priori hypotheses were met, which support the construct validity of the ODI-ID. We selected the SF-36 questionnaire and visual analog scale as our instruments for the hypotheses, due to their availability in Indonesian language. Besides the Roland Morris Disability Questionnaire (RMDQ), which is unavailable in Indonesian, SF-36 and VAS were commonly used to assess the validity of other cross-cultural adaptation of ODI [31, 32, 39, 44, 50,51,52].

The hypotheses were developed based on the direction and magnitude of the correlations obtained from previous cross-culturally adapted versions. Due to the wide margin of correlations in the previous studies, the magnitude of the correlations was made wider in our hypotheses. Although the hypotheses on the association of OID-ID with all subscales of SF-36 and VAS were met, the strength of correlations was varied. The correlation of ODI-ID with BP subscales (r = − 0.76) were higher than the Italian (r = − 0.69) [31], Iranian (r = − 0.68) [32], Norwegian (r = − 0.64) [50] and Brazilian-Portuguese version (r = − 0.58) [52]. The correlation of ODI-ID with PF subscales (r = − 0.72) was similar to the Italian (r = − 0.75), Norwegian (r = − 0.77), and Iranian version (r = − 0.68) but lower than the Brazilian Portuguese version (r = − 0.83). Similar discrepancies were found for the correlation with other SF-36 subscales. Meanwhile, ODI-ID showed stronger VAS correlation (r = 0.85) when compared with the other adapted versions (r = 0.37–0.78; as shown in Table 4).

Our study has several limitations. First, regarding the transition question that was brought up earlier. We also did not measure the responsiveness of ODI-ID, which is necessary to evaluate its ability of the questionnaire to detect small important clinical changes. And lastly, exploratory factor analysis (EFA) was not performed since the number of samples was inadequate. The required number to perform an EFA is around 500–1000 respondents to obtain a good cumulative explained variance [53].

Conclusion

Translation and cultural adaptation of ODI in Indonesian was successful. The Indonesian version of ODI maintained the reliability, validity, and psychometric characteristics of the original ODI. This questionnaire will be a suitable instrument for assessing LBP-related disability for Indonesian-speaking patients.