Plain English summary

Most questionnaires designed to evaluate patient-reported outcomes regarding scarring are available in English. The objective was to generate a validated French version of the SCAR-Q questionnaire. The SCAR-Q questionnaire (including Appearance, Symptom and Psychological impact scales) was translated into French using a translation-back-translation process in accordance with international guidelines. For validation, two hundred patients completed the questionnaire. Four steps were required to obtain a translation consistent with the original version. Statistical analyses showed our French version A French version of the SCAR-Q questionnaire is validated, ready for use.

Introduction

Scars are natural consequences of cicatrization following surgery, trauma or burn injury. It has been shown that poor aspect scarring can result in serious mental disorders or poor self-esteem [1, 2].

Patient-reported outcome and experience measures (PROMs and PREMs) are now part of the quality assessment of the overall care of patients [3]. Klassen et al. recently developed a specific PRO instrument regarding scar evaluation [4]. The SCAR-Q self-questionnaire is composed of three scales evaluating scar appearance (12 items), symptoms (12 items) and psychological impact (5 items). The SCAR-Q was designed to evaluate all scar types in children and adults. It was considered useful in research where appearance is an important outcome. Indeed when treatments aim to specifically improve the appearance of scars, asking patients what they think about how their scar looks seems a fundamental and practical measure [4]. After rigorous development, a study among 731 patients validated the scale [5]. Like most questionnaires designed to evaluate scarring from the point of view of patients, SCAR-Q was developed in English language [6, 7]. Unfortunately, such a questionnaire cannot be used in patients with another mother language. To our knowledge, no questionnaire regarding scarring evaluation by patients is available in French.

The objective of our study was to translate the SCAR-Q instrument into French [8,9,10,11] using International guidelines issued by International Society for Pharmacoeconomics and Outcomes Research (ISPOR) and by World Health Organization (WHO) [12, 13], then to test the reliability and validity of the translated version.

Materials and methods

Ethical considerations

We obtained permission to use the SCAR-Q self-questionnaire from the original development team. All participating patients gave their written consent before joining this study, which was carried out in accordance with the Declaration of Helsinki (1983). We obtained an Ethical Committee Authorization (Authorization N° 2020-67, Assistance Publique des Hôpitaux de Marseille, Marseille, France).

Translation

ISPOR and WHO recommendations were used to carry out the translation process [12, 13].

The translation process comprised four steps: 1/ forward translation 2/ back translation 3/ back translation review, and 4/ patient interviews. The translation process required three individuals who were fluent in both English and French. Two individuals whose mother tongue was French served as forward translators (English to French). One was a surgeon and the other one professional translator specialized in medical translations. Once the two forward translations were completed, cognitive debriefing was conducted to establish consensus to merge the two translations and produce a single version. The last professional translator, whose mother tongue was English and was fluent in French served as back translator. The back translator did not see or review the original English version of the SCAR-Q. Once the back translation was complete, the questionnaire was returned to the original development team and analyzed by an expert review panel including various four medical specialists and two paramedics. The development team provided feedback and instructions prior to conducting the cognitive debriefing interviews. The cognitive debriefing interviews engage ten patients in the target patient population to determine the quality of the translation. The final version was not aimed to provide a literal translation, but rather a conceptually equivalent translated version worded in language patients can understand easily. As suggested by the developer of the scale SCAR-Q scale scores were transformed into 0 (worst) to 100 (best) based on logits from Rasch measurement theory analysis [5].

Population

Inclusion criteria were: French nationality and French as their first language; age ≥ 18 years; having a visible scar, whatever the location, size (centimeters), date or etiology. We did our best to cover the variability of location, size of scars and the demographic variability. Exclusion criteria were: illiterate patients or those unable to understand and respond to the survey, and persons with cognitive limitation.

The patients were recruited for the consultation of dermatology or surgery. The reason for consultation was not always the scar, but the presence of a visible scar made it possible to include the patient if he agreed and if the inclusion and exclusion criteria were met. The statistical power was calculated. Computing power and minimum sample size for RMSEA: with 200 patients and a RMSEA = 0.065 we had a 95.7% power (with an alpha risk = 5%) to show that this RMSEA was lower than 0.08. The first 200 patients that fully answered the questionnaire were included in the study. The majority of the patients approached to complete the survey participated; however, we do not have demographic information on persons with declined to participate.

Statistical validation

We used questionnaires completed by patients and stored in a secure database. All analyses were conducted using IBM SPSS Statistics 20.0 (IBM Inc., New York, USA) and the lavaan package for R.3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). All tests were two-sided and p-values < 0.05 were considered significant.

Continuous and categorical variables were described using, respectively, means (± standard deviation) and counts (percentages). At scale level, floor and ceiling effects were considered to be present if more than 15% of respondents achieved, respectively, the lowest or the highest possible score [14]. At item level, these effects were considered to be present if more than 95% of respondents answered in the lowest or highest response category [15]. Items were considered redundant if the polychoric inter-item correlation was > 0.7 and irrelevant if < 0.2 [15]. Reliability was assessed using ordinal Cronbach’s alpha (α) [16] and considered satisfactory if ≥ 0.7 [17]. Structural validity was tested using confirmatory factor analysis (CFA) with the robust weighted least squares (WLSMV) estimator and Delta parameterization, based on the original 3-factor structure of the initial version. Model fit of a correlated 3-factor structure was examined using the root mean square error of approximation (RMSEA, good fit if < 0.06, poor fit if ≥ 0.10, acceptable elsewhere), the comparative fit index (CFI) and the Tucker-Lewis index (CFI and TLI, good fit if > 0.95, poor fit if < 0.90, acceptable elsewhere) [18].

The Spearman rank coefficient (ρ) was used to assess correlations between subscales. Three hypotheses were identified based on previous findings to determine known group validity: higher SCAR-Q scores were excepted when the scars were bigger, more recent, and localized on the face [5]. Univariate analyses were performed using ANOVAs.

To test the repeatability of this new scale, test–retest reliability was assessed in 10 patients (given the ICC estimated (≥ 0.94) we had an 87% power to detect an ICC higher than a lowest acceptable ICC of 0.6 with a two-sided test). We considered that no clinical changes would occur in 1 month given the long-time scars of the patients included (75% of scar older than 12 months, scars were more than 13 years old on average). Patients answered the questionnaire again under the same conditions 1 month later. Consistency between responses was evaluated using intra-class correlation coefficient. The closer the coefficient to 1, the higher the repeatability. This test was performed for each subscale.

Before initiating the study, all researchers were trained in patient interviewing. As all questionnaires were completed, there were no concerns regarding missing data.

Results

Translation

Some differences were found between the two forward translations, regarding both instructions and items. For example, “How does your scar look?” was translated as “A quoi resemble votre cicatrice?” by one translator (literally “what does your scar look like ?”) and as “Quel est l’aspect de votre cicatrice?” (literally “what is the aspect of your scar?” by the other. A “reconciled” version containing both instructions and items was submitted to the back translator. No major comments were made by the development team concerning the back translation. Patient interviews included ten patients (mean age = 37 years old, range 25–62) with different types of scars (burns, surgical or traumatic scars). This step entailed minor modifications of item wording. For example, for item n° 6 of the SCAR Appearance scale the word “irrégulière” was added to the scar description.

Statistical validation

Two hundred patients completed the questionnaire. Population characteristics are reported in Table 1. We included 109 females (54.5%) and 91 males (46.5%). Mean age (± SD) was 53.3 (± 18) years old (min = 18; max = 84). Mean scar size was 6.8 (± 6.3) cm (min = 1; max = 35) and mean scar date was 156.7 (± 169) months (min = 1; max = 651).

Table 1 Sample characteristics

Regarding item description, no floor or ceiling effect was found (max = 85%) for all items. Regarding scale description, we found a ceiling effect for all 3 subscales: in 34.0%, 33.0%, and 56.0% of patients for the appearance, symptoms, and psychological impact subscales, respectively (Fig. 1). Forty-one patients (20.5%) were not impaired at all in the three dimensions. Internal consistency was considered satisfactory with a Cronbach’s alpha > 0.7 for all subscales (0.97, 0.90 and 0.97, respectively, for SCAR-Q Appearance, Symptom and Psychological impact scales). Several items from the Appearance and Psychosocial impact scales showed redundancy, with many inter-item correlations above 0.7. The CFA of the original structure displayed reasonable fit, with RMSEA = 0.065 (90% confidence interval: 0.057–0.072), CFI = 0.974, and TLI = 0.972. Subscales were positively correlated but not strongly (0.45 < ρ < 0.65, p < 0.001).

Fig. 1
figure 1

Bar graph showing scores of patients for all subscales. A ceiling effect was observed in all subscales

Repeatability was tested on a randomized subgroup of 10 patients who did not differ from the other patients regarding their main characteristics and initial SCAR-Q scores. Intra-class correlation coefficients were 0.95 (95% confidence interval: 0.83–0.99), 0.94 (0.59–0.99), and 0.99 (0.94–1.00), for Appearance, Symptom and Psychological impact scales, respectively.

Discussion

Modern practitioners are committed to the diagnosis and treatment of patients and to addressing the patient’s objectives and assessing patient satisfaction, as part of a comprehensive approach to care [19]. Scar is a part of surgical outcome. Efforts to assess the patient’s perspective will result in better therapeutic interventions to improve scar quality and acceptance [20]. For that purpose, an instrument like SCARQ must be available in other languages than English, for a much broader use. French alone is estimated to be spoken daily in Metropolitan France by 76 million native speakers and by 235 million fluent speakers worldwide.

Our translation process has already been used in other studies [21, 22]. It ensures a translation of the idea or concept, more meaningful and accurate than a mere literal translation [23]. ISPOR and WHO recommendations were chosen for the translation process, in order to obtain culturally adapted French version of the SCAR-Q questionnaire. Combining forth and back translation methods, we reduced potential biases in the process. Two-way translation, combined with a panel expert review and patient interviews, guarantees a culturally and socially adapted version. No translation can perfectly match the original document because of conceptual differences due to diverging languages histories. However, back translation into the original language ensures a conceptually valid translation [23]. By incorporating different medical specialties and paramedics in the panel of experts, we tried to make an easily understandable questionnaire. Actual patients were integrated into this process, to check whether the proposed version was also suitable for non-medical or non-paramedical users. To cover the variability of situations, we included patients with different types (keloid, contracture, hypertrophic) and locations (face and neck, chest, upper and lower members) of scars to be as representative as possible of target patients.

Cronbach’s alphas of the French version of SCAR-Q were high (0.97, 0.90 and 0.97 for SCAR-Q Appearance, Symptoms, and Psychological scales, respectively) thus demonstrating a high level of internal reliability. This is consistent Cronbach’s alpha values of the original scale [4], ie 0.96, 0.91, and 0.95, respectively. The ceiling effect found on each subscale was probably related to the fact that scars were not the main reason for consultation in our cohort, explaining the high number of high scores (Fig. 1). Moreover, the majority of patients had scars for more than 1 year, thus probably increasing patients’ acceptance and increasing the scores. This must be confirmed by multiple patients’ evaluation at different time. As expected, subscales were positively correlated, but not too strongly (0.45 < ρ < 0.65) which supports the idea that they measure distinct dimensions. The repeatability analysis attests to the efficacy of the questionnaire although generated by translation into French.

No translation process is perfect and conceptual differences may remain. However, the combination of two methodologies aimed to minimize bias and statistical validation provided acceptable results. Another study limit is that we do not have demographic information on persons who declined to participate.

Generalizability of study finding may be limited due to the sample characteristics. The sample included a high percentage of persons who had their scars for surgery or trauma. Our sample, smaller than in Ziolkowski et al. study [5], resulted in high ceiling effects across the 3 scales. Furthermore, our sample only included 2 patients with burn scars and our results may not be generalizable to patients with burns. While these finding demonstrate the initial validity of the French translation of the SCAR-Q, future studies are required to examine the reliability and validity in other samples, in particular with among persons with more recent and severe scars.

In further studies, the French SCAR-Q could also be used simultaneously with English version in a study including both English and French-speaking patients. This would allow examination of measurement equivalence.

Conclusion

The SCAR-Q questionnaire is a reference in the field of patient-reported outcomes regarding scar evaluation. The 4-step translation-back translation process made it possible to obtain a high-quality French version in line with the original document. This translated version is now usable in France.