Introduction

Pelvic floor dysfunction (PFD) manifests as a variety of symptoms which can have a potentially detrimental effect on a woman’s quality of life (QOL). The aim of any treatment plan for such an individual is the restoration of function. The objective and standardized assessment of symptoms are of vital importance to jointly guide the planning of treatment and the evaluation of functional outcomes. This assessment can be strengthened through the use of condition-specific health-related QOL questionnaires [1]. The Pelvic Floor Distress Inventory-20 (PFDI-20) and the Pelvic Floor Impact Questionnaire-7 (PFIQ-7) are validated short form questionnaires that measure symptom severity and the impact on QOL [2]. These questionnaires have been linguistically and culturally validated in diverse languages and cultures [312]. This process additionally allows international comparison of symptom severity and responses to treatment. They have however never been validated in any African language or culture. The primary objective was the cultural and linguistic validation of these questionnaires for the Afrikaans and Sesotho languages spoken in Southern Africa.

Materials and methods

This research was performed at a referral urogynecology unit in Universitas Academic Hospital as well as at a general gynecology outpatient clinic in Pelonomi Hospital, Bloemfontein, South Africa. The research was approved by the local ethics committee (HSREC 51/2016). The PFDI-20 measures symptom severity and consists of three scales: the Pelvic Organ Prolapse Distress Inventory-6 (POPDI-6), the Colorectal–Anal Distress Inventory-8 (CRADI-8) and the Urinary Distress Inventory-6 (UDI-6). Each item in a specific scale can be scored from 0 to 4, and the scale score is calculated as a percentage. The PFDI-20 score is the sum of the scale scores out of 300. The PFIQ-7 measures symptom impact on QOL and also consists of three scales: the Urinary Impact Questionnaire-7 (UIQ-7), the Colorectal–Anal Impact Questionnaire-7 (CRAIQ-7) and the Pelvic Organ Prolapse Impact Questionnaire-7 (POPIQ-7). Each item in these scales can be score from 0 to 3, and the scale score is calculated as a percentage. The PFIQ-7 score is the sum of the scale scores out of 300. A higher score in each of these questionnaires indicates more symptom bother.

Linguistic and cultural validation

The questionnaires were translated into Afrikaans and Sesotho by means of three forward translations by clinicians and backward translation by a native speaker not involved in the forward translation [13, 14]. They were then tested on a sample of ten women with PFD who completed the questionnaires and were interviewed afterwards. The interviews revealed that the public awareness of PFD was very low and the phrasing of the questions required simplification to reflect cultural concepts with regard to pelvic floor function. The questionnaire layout was designed to enhance accurate questionnaire completion. They were again tested on a further sample of ten women which confirmed comprehension and ease of completion of the questionnaires, and thereafter it was rolled out in full.

Study design and population

Women were eligible for inclusion if they were over the age of 18 years, did not suffer from any chronic pain syndrome and were literate in the language evaluated. The study group consisted of women referred to the urogynecology unit with complaints of PFD. The control group consisted of women seen at a general gynecology outpatient clinic who were not referred due to PFD. The sample size was determined by the number of items in the questionnaire with a minimum of five participants per item required [15]. The PFDI-20 had the most items and the aim was thus to recruit 100 study and 100 control participants per language group. Definitions used for pelvic organ prolapse (POP), urinary incontinence (UI) and anal incontinence (AI) were according to the International Urogynecological Association (IUGA)/International Continence Society (ICS) report on terminology for POP [16]. AI was marked as present if the symptom of either fecal and/or flatus incontinence was asserted at baseline data completion.

Participants completed both questionnaires at the baseline visit and 1 week later. The study group also completed the questionnaires 6 months later to allow the evaluation of intervention. The questionnaires were completed after 1 week by telephone [17]. The desired modality is self-administration. This was, however, the only feasible option to permit test–retest assessment in this population due to the vast referral area, cost of transport, lack of internet facilities and lack of an effective postal service. All of these factors are frequently present in most African countries. Only the first 20 participants of the study groups and the first 10 participants of the control groups were used for this due to financial constraints in this study. Any treatment was deferred during this week to avoid test–retest changes related to clinical improvement. The study group completed the patient global impression of improvement (PGI-I) in the third round [18]. The PGI-I was dichotomized into “improved” for scores 1–3 and “not improved” for scores 4–7.

Measurement properties

The psychometric domains of reliability, validity and responsiveness were assessed for each questionnaire as recommended in the consensus-based standards for the selection of health measurement instruments (COSMIN) checklist [19]. Internal consistency was assessed by Cronbach’s alpha value for the extent to which the individual scale items measured the same concept. Acceptable values ranged from 0.70 to 0.95 [20]. Test–retest reliability was assessed by completing the questionnaire twice with 1 week between completion rounds. This time period was considered long enough to prevent recall bias, but short enough for any relevant clinical changes to occur. It was calculated in terms of the intraclass correlation coefficient (ICC) between the baseline and second round of completion and was performed for all individual scales as well as for the summary scores. Values of ≥0.70 were considered to indicate adequate reliability [21]. Measurement error was expressed as the calculated limits of agreement (LOA) and are summarized as the mean change of scores during the test–retest period and was a reflection of the random error of scores not attributable to true clinical changes [22].

Construct validity assesses the validity of the questionnaires to measure the given construct, i.e. PFD [21]. We hypothesized that: (1) women with symptoms of POP will score higher in the POPDI-6 and POPIQ-7 scales than those without (2) women with symptoms of UI will score higher in the UDI-6 and UIQ-7 scales than those without; and (3) women with symptoms of AI will score higher in the CRADI-8 and CRAIQ-7 scales than those without. Construct validity was considered adequate when at least 75% of these hypotheses were confirmed [21, 23].

Responsiveness in the context of a questionnaire can be defined as the outcome that can be achieved when the instrument is designed in such a way that it is cognizant of and responds appropriately to the clinical result experienced by the individual. Responsiveness therefore measures the ability of the questionnaires to detect change that occurs as a result of treatment. This was assessed during the third round in the study population. The relationship between the mean change in scores and the PGI-I dichotomized classification was summarized. The baseline scores were also compared with the third round scores using the paired t test, and in this case a p value <0.05 was considered to indicate a significant improvement. The standardized response mean (SRM) was used to evaluate whether the questionnaires were responsive to change at the group level in each of the languages [24, 25]. A number of statistical tests have been used to evaluate responsiveness. The SRM is however considered appropriate when evaluating responsiveness in a single group before and after an intervention, as was the case in this population [26, 27]. The difference in mean PFDI-20 and PFIQ-7 scores between those classifying themselves as “improved” compared to “not improved” according to the PGI-I were used to define a cut-off value indicative of improvement.

The remainder of the results are summarized categorically as frequencies and percentages. The chi-squared test and Fisher’s exact test were used to evaluate univariate associations, and the unpaired t test for continuous variables. Statistical significance was set at a p value of <0.05.

Results

Out of a possible 330 women. the eligible study population consisted of 213 women (64.5%) for the Afrikaans and Sesotho questionnaire validation. Of these 213 women, 208 (97.6%; 104 in each language group) consented to participate in the three rounds and 200 (96.1%) completed the required questionnaires at the specified time points. Of 231 women approached for the control group, 206 (89.2%; 103 in each language group) consented to participate and complete data were available for 200 (97.1%) of these. The baseline characteristics are summarized in Table 1. The mean age of the study groups was higher (p < 0.0001) for both languages. There was a significant difference (p < 0.0001) in pelvic floor symptoms between the study and control groups for both languages and the majority of participants in the study groups experienced more than one symptom.

Table 1 Baseline characteristics and questionnaire scores of participants

Internal consistency

The PFDI-20 demonstrated good internal consistency with Cronbach’s alpha values of 0.89 and 0.84 in the Afrikaans study and control groups and acceptable consistency in the Sesotho study group (0.71) and control group (0.75). The PFIQ-7 demonstrated good consistency (0.88) in the Afrikaans study group, but poor consistency (0.54) in the control group. A similar pattern was found among the Sesotho study (0.81) and control (0.64) participants. These results are presented in Table 2.

Table 2 Internal consistency (Cronbach’s alpha)

Reliability

The test–retest intraclass correlation for both participant groups in both languages ranged from 0.89 to 0.99, confirming very good reliability. These results are presented in Table 3. The correlation additionally confirmed that a telephone follow-up for these questionnaires was feasible and reliable in the population studied. This represents a potentially significant cost saving in an environment with limited resources.

Table 3 Test–retest reliability (intraclass correlation coefficient)

Measurement error

The LOA are summarized in Table 4 for the different language groups. The overall magnitude of the measurement error was calculated by dividing the LOA by the range of all measures. The measurement error was 8.8% for the PFDI-20 and 3.2–8.4% for the PFIQ-7 among the study participants in both groups. This confirmed low measurement error in both language groups.

Table 4 Limits of agreement

Construct validity

Figures 1 and 2 illustrate the construct validity. The Afrikaans subgroup with POP reported higher mean scores for the POPDI-6 (56.7 ± 20.2) and POPIQ-7 scales (50.2 ± 26.0) than those without POP (18.7 ± 15.4 and 10.4 ± 13.7, respectively; p < 0001). The Sesotho subgroup with POP reported higher mean scores for the POPDI-6 (57.8 ± 18.8) and POPIQ-7 scales (48.4 ± 24.9) than those without POP (18.7 ± 22.3 and 20.2 ± 18.4, respectively; p < 0.0001).The Afrikaans subgroup with UI reported higher mean scores for the UDI-6 (57.1 ± 24.0) and UIQ-7 scales (52.6 ± 27.7) than those without UI (19.8 ± 15.6 and 26.3 ± 23.8, respectively; p < 0.0001). The Sesotho subgroup with UI reported higher mean scores for the UDI-6 (54.6 ± 18.2) and UIQ-7 scales (51.7 ± 25.0) than those without UI (15.0 ± 10.7 and 26.3 ± 24.7, respectively; p < 0.0001). The Afrikaans subgroup with AI reported higher mean scores for the CRADI-6 (53.7 ± 24.5) and CRAIQ-7 scales (41.1 ± 29.9) than those without AI (21.1 ± 17.2 and 18.2 ± 20.2, respectively; p < 0.0001). The Sesotho subgroup with AI reported higher mean scores for the CRADI-6 (47.1 ± 18.7) and CRAIQ-7 scales (40.9 ± 25.8) than those without AI (18.5 ± 14.2 and 21.0 ± 23.7, respectively; p < 0.0001). These findings support the prespecified hypotheses for POP, UI and AI.

Fig. 1
figure 1

PFDI-20 scale scores. Afr Afrikaans, So Sesotho, POP pelvic organ prolapse, UI urinary incontinence, AI anal incontinence; boxes mean and standard deviation, whiskers minimum and maximum scores (n = 100 for each variable)

Fig. 2
figure 2

PFIQ-7 scale scores. Afr Afrikaans, So Sesotho, POP pelvic organ prolapse, UI urinary incontinence, AI anal incontinence; boxes mean and standard deviation, whiskers minimum and maximum scores (n = 100 for each variable)

Responsiveness

The third completion round was after an average period of 5.2 months. In each language group, 100 participants completed the third round of questionnaires (Table 5). There was a significant improvement in scores across all domains, and this was evident in both language groups. The PGI-I results showed improvement in 79% of the Afrikaans group and in 81% of the Sesotho group. The cut-off score to distinguish women with improvement in symptoms form those without improvement could be determined by comparing the summary scores of the questionnaires. The cut-off scores in the Afrikaans group were <35 for the PFDI-20 and <24 for the PFIQ-7, and in the Sesotho group were <31 and <30, respectively.

Table 5 Responsiveness

Discussion

The objective of this study was to validate the PFDI-20 and PFIQ-7 in African women for the Afrikaans and Sesotho languages. The psychometric properties of reliability, validity and responsiveness confirmed the validity of these questionnaires in this population and in this healthcare setting. The original PFDI and PFIQ showed very good internal consistency with alpha values of 0.88 and 0.97, respectively. This agreement was confirmed with the development of the short form versions of these questionnaires [2]. Overall Cronbach’s alpha values between 0.71 and 0.89 for the PFDI-20 and between 0.81 and 0.88 for the PFIQ-7 were calculated in the two language groups. Similar alpha values have been reported for the Spanish, Turkish, Japanese, Chinese, Hebrew, Dutch and Swedish versions of the PFDI-20 and PFIQ-7 questionnaires [35, 710].

The questionnaires had to be altered after the initial pilot phase due to challenges that the participants described. It required a more illustrative layout and specific directives on completion in the first round and this was likely a reflection of the low level of education and awareness of PFD. Secondary school had not been completed by 81% of the Afrikaans and 92% of the Sesotho study participants, which is not unusual among African women [28]. This observation emphasizes the importance of public education on PFD among African women, something that is still greatly neglected.

Telephonic interviews as discussed by Geller et al. was used for the second round [17]. This was a potential concern with regard to its effect on reliability, but it was the only feasible method to evaluate test–retest reliability in this type of healthcare environment. The observed ICC values of 0.98–0.99 were not inferior to those documented in the validation of these questionnaires by self-administration [3, 5, 8]. The high correlation could be explained by the fact that the women had completed the questionnaire themselves the week before and were thus familiar with the questions being asked over the telephone. This might not have been the case had the questionnaires initially been administered by telephone and hence requires further research.

Responsiveness is an essential element in the assessment of any intervention. This is of particular importance in urogynecology where the primary aim of an intervention is mostly to improve the individual’s QOL [29]. The PGI-I was utilized as the gold standard for evaluating responses to an intervention. The responsiveness was statistically significant across all scales (p < 0.0001) in this population and for both languages. A similar degree of responsiveness has been shown in the validation of these questionnaires in Turkish and Dutch populations [3, 9]. Responsiveness was however not evaluated in the majority of PFDI-20 and PFIQ-7 validation studies [46, 8, 10, 30]. The initial validation of these short-form questionnaires showed that the PFDI-20 is more responsive than the PFIQ-7 [2]. This was not observed in this population, and there is no apparent explanation for this finding at present.

There were some limitations to this study. The internal consistency was evaluated on a smaller sample of women and the Cronbach’s alpha value might have been different if a larger sample had been analyzed. The second completion round was conducted by telephone and although the format was structured, it was not possible to ensure that the questionnaires were administered consistently in all interviews. Furthermore, the use of the telephone as a modality for administering the questionnaires was not specifically validated in this population prior to this study. The statistical results for test–retest reliability are, however, reassuring. The education levels of both language groups were generally limited and we had to verbally explain and demonstrate to the majority of patients the process of filling in and scoring symptom severity for the questionnaires in the first round. This was in addition to having redesigned the questionnaires after the pilot phase to permit ease of completion. This resulted in a more labor-intensive process and also emphasizes the importance of short-form questionnaires in an environment with limited human resources. Longer questionnaires would very likely not have been practically feasible in our environment and their use could have resulted in high rates of incomplete questionnaires, which would have limited clinical value of the study. Pelvic floor symptom screening was not repeated in the third round and only the PGI-I was used to calculate responsiveness of the questionnaires. This lack of additional clinical information could have influenced the evaluation of responsiveness and consequently the determination of improvement or deterioration of function.

The strengths of this study were the use of adequate sample sizes based on the items evaluated in the questionnaires for both the study and the control groups. The methodological design of this study was according to the recommendations of the COSMIN initiative so that adequate evaluation of the different psychometric measurement properties of these questionnaires was ensured. This is particularly important for the validation of patient-reported health outcomes. The control group allowed insight into the score distribution in women who have not been identified with PFD. The results of the study reveal the importance of public education with regard to symptoms of PFD. This is particularly relevant to the Sesotho speaking women, among whom an unexpectedly high number (26%) were identified with symptoms of PFD. This is in contrast to an expected low prevalence of this condition amongst ethnic black African women based on observations in the referral urogynecology unit. The role of the questionnaires as a routine screening tool should therefore be further explored in this population to allow the identification of PFD and appropriate referral and management. It also points out the lack of robust epidemiological data that exists for PFD among African women.