Introduction

Patient-Reported Outcome Measures (PROMs) have become standard measurement tools in research to measure health and quality of life as perceived by the patient. PROMs are increasingly used in clinical practice to monitor patients and to assist patient–doctor communication [1]. Also, routinely collected PROM data in hospitals and other health care settings are increasingly used to benchmark outcomes and to improve quality of care [2, 3].

However, PROMs are not free of problems and challenges. For example, many instruments of varying quality have been developed for many constructs. Validity, measurement error and responsiveness are often undocumented or disappointing (for example [410]), which make it hard for researchers and clinicians to choose the best PROM for a specific purpose. Some PROMs are burdensome for patients because they are too long, or contain irrelevant, incomprehensible or poorly formulated questions. Scores are often difficult to interpret and cannot be compared between instruments due to the use of sum scores of ordinal response options. This is especially problematic for benchmarking (comparing performance of health care providers). And finally, many PROMs have large measurement error, which preclude their use for assessing and monitoring individual patients in clinical practice [1].

To deal with these problems, the National Institutes of Health (NIH) invested heavily in the development of a new measurement system, the Patient-Reported Outcomes Measurement Information System (PROMIS®) [11, 12]. Clinicians, researchers and statisticians have joined forces to collect, combine and transform all existing PROMs into a new, state-of-the-art assessment system for measuring patient-reported health of adults and children that is more valid, reliable and responsive than the existing PROMs [1315].

Patient-Reported Outcomes Measurement Information System consists of a dynamic set of item banks. An item bank is a set of questions that all measure the same construct (or domain), e.g. pain, depression, the ability to participate in social roles and activities). Each domain comprises 6–121 items. The constructs were chosen from the domains of physical, mental and social health, to represent general constructs relevant for measuring health and well-being of adults and children, regardless of disease [16]. The items from an item bank can be administered in short questionnaires with fixed items (short forms) or, more efficiently, through computerized adaptive testing (CAT) [17]. A CAT is a computer-administered test in that, after the first item, presentation of items is determined by persons’ responses to previous ones. After each question, the person’s latent trait or domain level (the score on the instrument) is estimated, and when the estimation reaches a pre-defined precision (usually this happens after about 5–7 items), the computer stops asking questions.

It is expected that PROMIS will be implemented worldwide and that PROMIS instruments will experience rapid adoption, once their cross-cultural validity is documented [1820]. A Spanish translation of 21 items banks for adults and 9 item banks for children was performed by FACITtrans for use in the US and in Spanish-speaking countries. Other major translation efforts are currently in progress but, for the most part, already completed translations include only short forms or parts of item banks. In 2009, the Dutch–Flemish PROMIS Group was established with the aim of implementing PROMIS in the Netherlands and Flanders (the Dutch-speaking part of Belgium). The first step concerns the translation and cultural adaptation of the PROMIS items from English into Dutch–Flemish. This paper describes the translation of 17 item banks for adults from the Patient-Reported Outcomes Measurement Information System (PROMIS1) into Dutch–Flemish. These 17 adult item banks were chosen because they were publicly available at the time of the translation. The translation of nine paediatric item banks will be described elsewhere. This study presented the first completed large-scale PROMIS translation performed outside the US. The methodology used and experience gained in this study can be used as an example for researchers in other countries interested in translating PROMIS.

Methods

Table 1 lists the 17 item banks that were translated. The item banks are comprised of 6–121 items. In total, 563 items were translated. The Dutch–Flemish version was obtained using a universal approach to translation based on the Functional Assessment of Chronic Illness Therapy (FACIT) multilingual translation methodology [2123]. This methodology consists of a translation phase and pilot testing with cognitive debriefing. Overall, the goal of this methodology is to attain five dimensions of cross-cultural equivalence:

Table 1 Seventeen translated PROMIS adult item banks
  1. 1.

    Semantic/linguistic: the meaning of the item is the same in the source and translated language;

  2. 2.

    Content: the item is relevant to both cultures (cultural appropriateness);

  3. 3.

    Conceptual: the translated item measures the same theoretical constructs as the source item;

  4. 4.

    Criterion: when compared to a known or standardized measurement, the translation exhibits similar measurement properties to the source;

  5. 5.

    Technical: the method of assessment results in comparable measurements in both cultures [24].

In this project, we addressed the first three dimensions. The last two will need to be checked by additional psychometric validation. We strived to obtain one uniform Dutch–Flemish translation for all items. Separate translations for Dutch and Flemish were produced only when necessary due to irreconcilable differences in the Dutch and Flemish dialects.

Translation

The translation team implemented specific steps in order to develop precise and culturally appropriate translations of the English source. Documentation of this process can be found in the item histories (available upon request), and the steps involved are itemized below.

  1. 1.

    Forward translation: Source items in English were translated into Dutch–Flemish by four native speaking, independent professional translators (two from the Netherlands (one living in the US, one living in the Netherlands) and two from Flanders (one living in the US, one living in Flanders) all with a college degree and experienced in the field of PROM survey research. The translators were instructed to use simple language and to capture the meaning of the item rather than perform a literal translation. Furthermore, the translators were encouraged to complete (i.e. give a response to) the items for themselves to get a better understanding of the meaning and interpretation of the items.

  2. 2.

    Reconciliation: A third independent, native speaking professional translator from the Netherlands, reconciled the four forward translations by choosing the best translation after resolution of discrepancies. The translator was instructed to try to produce a universal translation and thus avoid region-specific or overly colloquial language. The translator could also provide alternative translation(s), if necessary.

  3. 3.

    Back-translation: The reconciled version was then back-translated by two English speaking translators, one fluent in Dutch and one in Flemish, both with a college degree and experienced in the field of PROM survey research. The back-translators were blind to the original source English version.

  4. 4.

    Quality control: Comparing back-translation with source document: FACITtrans staff compared source and back-translated English versions to identify discrepancies in the back-translations and provided clarification to the reviewers on the intent behind the items. This step also resulted in a preliminary assessment of harmonization between languages.

  5. 5.

    Independent reviews: Three to five bilingual experts from the Dutch–Flemish PROMIS group (at least three Dutch and one Flemish) examined all of the preceding steps (including the four forward translations, the reconciled version and comments from the translator who carried out the reconciliation, the two backward translations, and additional comments or questions from FACITtrans staff, if there were any) and selected the most appropriate translation for each item or provided alternate translations if the previous translations were found to be unacceptable. In this step, a previous (not approved by the PROMIS Statistical Center) Dutch translation of the physical functioning item bank [25] was also considered as one of the possible translations.

  6. 6.

    Pre-finalization review: FACITtrans staff evaluated the merit of the expert reviewers’ comments, identified potential problems in their recommended translations and formulated questions and comments to guide the Dutch–Flemish language coordinator.

  7. 7.

    Finalization process: The Dutch–Flemish language coordinator determined the final translation by reviewing all of the preceding steps and addressing FACITtrans staff’s comments. Along with the final translation, the language coordinator also provided literal back-translation and polished back-translation.

  8. 8.

    Harmonization and quality control: FACITtrans staff, in collaboration with the PROMIS Statistical Center, assessed the equivalence of the final translation and verified that documentation of the decision making process was complete. The quality assurance also addressed issues of consistency with previous translations (e.g. the Spanish translation) as well as between the items. The Dutch language coordinator was consulted again for additional input as necessary.

  9. 9.

    Formatting and proofreading: All items were checked for spelling and grammatical issues by two proofreaders working independently, and reconciliation of the proofreading comments was carried out.

Testing of translations

First, the target language version was pilot-tested with 70 native Dutch or Flemish-speaking participants. Study participants represented a convenience sample recruited from the general population in the Netherlands and Belgium. Potential participants were approached and deemed eligible if they were age 18 or greater, native speakers of Dutch or Flemish, and were able to provide verbal consent. Each item bank was tested in 6–18 adults, so that each item would be reviewed by at least 6 participants. Two interviewers participated: a Dutch health care professional and a Flemish linguist. Both interviewers had many years of experience carrying out cognitive interviews of this type. Training, based on a study-specific interviewing protocol, was carried out via teleconference with FACITtrans. Prior to the start of each administration, the interviewer fully explained the study to the participant. Respondents completed the translated version of the items on their own and participated in a cognitive debriefing interview, using a specific script. In addition to questions around ease of comprehensibility and general relevance, probes were designed to elicit feedback on the phrasing of each translated item. Each item, answer category, instruction and recall period were discussed. Participants were asked to paraphrase items, define specific words and phrases, and to describe their decision making process when choosing their response. Overall, the interviews aimed to confirm that translations had been accurately understood in relation to the intended meaning as defined by the item’s definition which in turn allowed FACITtrans to assess the linguistic validity and acceptability of the Dutch–Flemish items.

Second, a reading difficulty assessment was carried for the physical function item bank because there was concern that some items might be a bit difficult. A Dutch–Flemish system developed to determine the reading ability of school children in the Netherlands and Flanders [Analyse van Individualiseringsvormen (AVI)] was used for this evaluation [26].

Results

Translation

For most items, an acceptable Dutch–Flemish translation was obtained. An example of the entire translation process for one item is provided in Table 2 in “Appendix” section. Ten items from five item banks required separate translations for Dutch and Flemish: physical function (five items), pain behaviour (two items), pain interference (one item), social isolation (one item) and global health (one item) (10/563 items (2 %) in total). For example, the word “walking” was translated as “lopen” in Dutch, but had to be translated as “stappen” in Flemish because “lopen” means running in Flemish (“hardlopen” in Dutch) and “stappen” means going out in Dutch.

Table 2 Example of the translation process of one item from the pain interference item bank

Other challenges faced in the translation process included:

  • Scarcity or overabundance of possible translations: We provide three examples. First, in Dutch–Flemish, no distinction is made between tired and fatigued, whereas both words are used in the Fatigue item bank to indicate different nuances of the fatigue experience, and in English, the word “tired” is more easily endorsed than “fatigue”. Second, the phrase “When I was in pain…” can be translated as “Wanneer ik pijn had…”, (‘when I had pain’) “Toen ik pijn had…”(‘at the time I had pain’) or “Als ik pijn had…” (‘if I had pain’). The first option was chosen because it was back-translated as ‘when I had pain’. Third, in Dutch–Flemish, the phrase “Are you able to…” can be translated in two different ways: “Bent u in staat om…” or “Kunt u…” (can you…). There is no conceptual difference between these two phrases. The latter was chosen because it is more often used in everyday speech.

  • Broader or smaller construct in the target language: For example, the literal Dutch–Flemish translation of the word “exercise” would be “oefenen” (practice) or “sporten” (sport), but these constructs are smaller than the construct of exercise in the English source. Eventually the word “lichaamsbeweging” (‘body movement’) was chosen, which captures the meaning of the item, but this word is less used in de Dutch–Flemish language.

  • The ordering of items relative to others: For example, it was difficult to find and order eight different Dutch–Flemish words to describe the level of fatigue as used in the Fatigue item bank.

  • Different units of measurement: In the Netherlands and Flanders, kilos and metres are used instead of pounds and miles. As a consequence, in some items, the quantities referred to are a bit uncommon. For example, the item “are you able to run five miles” was translated as “kunt u ongeveer 8 km hardlopen” (are you able to run approximately 8 km), which is a bit uncommon distance to run. Some quantities were rounded during translation. For example, 100 yards was translated as 100 m (which is 109 yards).

  • Irrelevant items: For example, the item “Does your health now limit you in putting a trash bag outside?” was considered irrelevant because trash bags are hardly used anymore in the Netherlands and Flanders. The alternative that was used in the Dutch–Flemish version was: “Wordt u door uw gezondheid op dit moment beperkt in het buitenzetten van het vuilnis?” (Does your health now limit you in putting the trash outside?).

  • Different performance: For example, the item “Are you able to push open a door after turning the knob?” was translated as “Kunt u een deur openduwen nadat u de klink naar beneden heeft gedaan?” (‘are you able to push open a door after pushing down the latch’) because round operating mechanisms are quite uncommon on Dutch and Flemish doors.

  • In some cases, the formulation of the original English item was considered suboptimal. For example, the question “How often did pain prevent you from walking more than 1 mile?” was considered too difficult. Also, the item “On how many days was your fatigue worse in the morning?” was considered unclear (worse than what?). To achieve semantic equivalence, we decided to literally translate these items as “Hoe vaak weerhield de pijn u ervan om meer dan anderhalve kilometer te lopen?” and “Op hoeveel dagen was uw vermoeidheid’s morgens erger?”.

Finally, translators and reviewers had to consider whether or not to retain the exact wording of previously translated legacy instruments (e.g. items from the Health Assessment Questionnaire). Where possible, the exact wording of items from prior translations of legacy instruments was retained, except in cases where a new translation was considered better, or for reasons of consistency with other items in the item bank.

Testing of translations

The Dutch–Flemish PROMIS item banks were tested at various locations in the Netherlands and Flanders. In total, 70 adults participated, 35 from the Netherlands and 35 from Flanders. The average age was 49 years (range 20–77), and 58 % were female. In 53 items (9.2 %), slight changes in the wording of questions were made. For example, the phrase “In hoeverre …” was changed into “In welke mate…” in 36 items (both are formulations of ‘to what extent’). A few items were rephrased to represent more everyday speech. For example, “Kunt u een hamer gebruiken om op een spijker te slaan?” (‘are you able to use a hammer to pound a nail?’) was rephrased into “Kunt u met een hamer op een spijker slaan?” (‘can you pound a nail with a hammer?’). The reading difficulty of the physical function item bank was AVI–E5, equivalent to the average reading level of 9-year-old children.

Discussion

This study presented the first completed large-scale PROMIS translation performed outside the US. Seventeen PROMIS item banks for adults were translated into Dutch–Flemish. Some difficulties were found in the translation of some items, but for all items eventually an equivalent translation was obtained. No major problems were identified in the pilot testing and cognitive debriefing. Only 10 items (2 %) required a separate Dutch and Flemish translation.

A strong point of the study was the methodology used for translation. The translation was produced using a standardized methodology as approved by the PROMIS Statistical Center. The PROMIS translation methodology was developed through substantial research in the health-related quality of life (HRQOL) field to ensure that translations reflect conceptual equivalence with the English source and are rendered in language that is culturally acceptable and relevant to the target population. This procedure is consistent with previous published guidelines for the translation of PROMs and existing industry guidance for translation and validation of PROMs for non-English-speaking populations [2729] and with current recommendations from the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) for the translation and cultural adaptation of PROMs [30].

In initial item bank development, PROMIS investigators devoted substantial effort to ensure that each item was understood by diverse English- and Spanish-speaking individuals [14, 16]. Items were written at a grade school level and tested for comprehensibility among low-literacy populations. This facilitated the translations into Dutch–Flemish.

We strived to obtain one uniform Dutch–Flemish translation for all items. The goal of a universal approach to translations is to result in one version for multiple countries instead of country-specific versions of the same language. This approach was chosen for practical reasons and to avoid unnecessary language bias introduced by multiple translations, and there is also policy support for one official language [31]. Success in obtaining a uniform translation was not guaranteed in view of the differences between the Dutch and Flemish language regarding to vocabulary, style, meaning of words and grammar. However, it has been estimated that the difference in vocabulary is limited to a few thousand words [32]. Therefore, a uniform Dutch–Flemish translation was deemed possible.

Nevertheless, for ten items from five item banks, conceptual differences mandated separate translations for Dutch and Flemish. This means that for five item banks country-specific CATs (and perhaps short forms) will be required.

Translating PROMs always requires a trade-off between a linguistic (literal) and conceptual translation. For example, for the item “are you able to run 5 miles” the final translation chosen was the literal translation “kunt u 8 km hardlopen” (‘are you able to run 8 km’) to retain equivalence in physical performance. However, as this is an uncommon distance to run (competitions usually involve 5 or 10 km), respondents may have more problems in answering this question correctly. Also, the item “On how many days was your fatigue worse in the morning?” was translated as “Op hoeveel dagen was uw vermoeidheid’s morgens erger?” although the item was considered unclear (worse than what?).

The goal of the translation methodology was to attain five dimensions of cross-cultural equivalence. The first three dimensions (semantic/linguistic, content and conceptual equivalence) were checked via the cognitive debriefing. The last two (criterion and technical equivalence) will need to be checked by additional psychometric validation.

We highly recommend the use of PROMIS instruments in future Dutch and Flemish studies. The use of PROMIS has clear advantages over traditional PROMs [18, 33]: PROMIS instruments have better content validity than existing PROMs because they are based on a well-developed conceptual model, years of experience with existing PROMs and extensive participant input. PROMIS instruments have potential to demonstrate better responsiveness than existing measures, which lead to reductions in sample sizes for clinical studies [13, 15]. PROMIS instruments (in particular CAT) have small measurement error, which makes them more suitable for use in daily clinical practice and for benchmarking purposes (discriminating between health care organizations). PROMIS scores are easier to interpret than scores of other PROMs because item response theory (IRT) methods result in scores on an interval level. In addition, all PROMIS instruments are expressed on a common metric: as T scores with a mean score of 50 (representing the mean score of the reference population) and a standard deviation of 10. PROMIS short forms have already been translated in many languages and more large-scale translations are planned. It is expected that PROMIS will be used worldwide, which will facilitate the comparison with international research. Finally, the PROMIS system is a dynamic system, which means that newly developed items and item banks can be easily incorporated with no need to change the entire metric.

Patient-Reported Outcomes Measurement Information System instruments can be administered in short forms or through CAT (or a combination of both). CAT has great advantages over traditional paper questionnaires. Multiple studies have shown that CATs have consistently better precision than short forms [34, 35]. The increasing computer and Internet accessibility of the general public and the increasing sophistication of hand-held computer devices have enhanced the feasibility of using CAT in outcome assessment both for clinical research and for use in daily clinical practice. However, future studies should show whether CAT is feasible in several settings (from trials to practice) [36].

PROMIS is currently developing standards for the release of PROMIS instruments after translation

These standards will strike a balance that not only encourages use but also emphasizes collaboration in the ongoing evaluation of measurement invariance (equivalence) across languages. While we are confident that these translations are linguistically equivalent, the extent to which they are psychometrically comparable remains to be determined. For example, the translation of eight different words to describe increasing fatigue levels may have resulted in a different ordering (item difficulties) of the fatigue items, potentially introducing DIF. In addition, the use of different units of measurement may have affected the item difficulty and may therefore introduce DIF. If important language DIF is found, language-specific item calibrations may need to be developed.

Further research on these translations will increase confidence in their use. This includes calibration of the Dutch–Flemish translations, examination of their content validity in specific populations, test–retest reliability, construct validity and responsiveness in relevant patient populations (for example, validation of the pain and physical functioning item banks in patients with osteoarthritis) and in the general population. This can be realized by including PROMIS instruments in ongoing or planned research projects. Furthermore, to facilitate the interpretation of PROMIS scores, obtaining country-specific reference scores are recommended.

Finally, some practical steps will facilitate use of PROMIS instruments in a new country such as the Netherlands. Country-specific PROMIS short forms can be considered. CAT software needs to be obtained (e.g. from PROMIS) or developed, and a CAT user interface must be translated or developed. Ideally, short forms, CAT software and scoring manuals can be made available through a country-specific website, and CAT software can be incorporated in existing infrastructure for data collection or made available through a country-specific server through which CATs can be administered and on which data can be stored.

Dutch–Flemish PROMIS short forms of the 17 item banks translated in this project will soon be made available through the PROMIS Assessment Center (www.assessmentcenter.net). Cross-cultural validation studies of entire item banks in different patient populations are ongoing. Since initiating this project, new item banks have been developed that will also be translated into Dutch–Flemish in the future.

In conclusion, for all items, an acceptable translation was obtained. The Dutch–Flemish PROMIS items are linguistically equivalent. Short forms will soon be available for use and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.