Introduction

Pelvic floor disorders (PFD), including pelvic organ prolapse (POP) and urinary and fecal incontinence, are common and negatively affect the lives of women. A community-based survey in the USA found that 23.7 % of women suffered from at least one disorder [1]. PFDs affect women’s social interactions, including intimacy with sexual partners, and treatments of PFDs have been associated with both beneficial [2] and deleterious effects on sexual health [3]. In 2001, Rogers et al. published a new measure of sexual health in women with urinary incontinence and/or pelvic organ prolapse, the Pelvic Organ Prolapse/Urinary Incontinence Sexual Questionnaire (PISQ), which is accepted as the standard measure of sexual health in women with pelvic floor dysfunction [4].

Although the PISQ is widely used to assess sexual function in women with pelvic floor disorders, clinically important differences in scores have not been determined. The smallest change in score associated with a clinically meaningful change in a questionnaire has been called the minimum important difference (MID) [5]. The MID is the difference in score that is representative of what patients and physicians perceive as beneficial or harmful and that would mandate, in the absence of troublesome side effects or excessive cost [6], a change in the patient’s management. MID provides interpretation of treatment effectiveness, linking clinical indicators with patient-reported outcomes (PROs) in health-related quality of life (QOL) measures. Our objective was to determine the MID of the PISQ using a dual-cohort methodology: one cohort comprising women with overactive bladder treated with tolterodine, and another cohort comprising women treated surgically for prolapse and/or incontinence.

Materials and methods

This is an ancillary analysis of a randomized clinical trial on the efficacy of tolterodine with regard to overactive bladder symptoms and sexual and emotional quality of life in sexually active women (cohort I; Tolterodine Efficacy in Sexually Active Women [TESA trial]) [7] and a trial that reported sexual function changes after surgery for urinary incontinence and/or prolapse (cohort II) [2]. Since the de-identified data were already available from the TESA trial, this ancillary analysis did not require an Institutional Review Board review. In the study of cohort I, sexually active women with urgency urinary incontinence (UUI) were randomized to placebo or tolterodine extended release for treatment of overactive bladder (OAB) symptoms for 12 weeks. Eligible women had OAB for at least 3 months, were sexually active in a stable heterosexual relationship, had a mean of ≥8 micturitions, ≥ 0.6 urgency incontinence episodes and ≥3 OAB micturitions per 24 h, as recorded in 5-day bladder diaries at baseline, and reported at least “some moderate problems” on the Patient Perception of Bladder Condition (PPBC) questionnaire. In the study for cohort II, women who underwent reconstructive surgery with placement of a suprapubic catheter for postoperative bladder management were randomized to nitrofurantoin monohydrate crystals or placebo for urinary tract infection prophylaxis [8]. A subset of randomized women who were sexually active in a heterosexual relationship participated in the sexual function trial, and the data from these women were used in the current analysis. The surgical trial evaluated sexual function changes in women who underwent incontinence and/or prolapse surgery and measured outcomes at baseline and again at 12 weeks following surgery.

Calculation of MID may be performed using anchor-based or distribution-based methods. Anchor-based approaches use an external indicator, either clinical or patient-reported, to assign subjects into groupings reflecting no change, small or large positive changes, or small or large negative changes in clinical or health status. Anchors include global ratings of change or actual changes in patient reported outcome measures that have ideally demonstrated MID values in the target population. For our anchor-based evaluation of the PISQ, we used multiple anchors in cohort I including the Overactive Bladder Questionnaire (OAB-q), the Patient Perception of Bladder Condition (PPBC), the Patient Perception of Treatment Benefit Questionnaire (PPTBQ), and bladder diaries. The Incontinence Impact Questionnaire-7 (IIQ-7) was our anchor for cohort II. To evaluate whether or not an anchor was appropriately correlated with the PISQ, Pearson’s correlation evaluated the associations between total PISQ scores and the three PISQ domains (Behavioral/Emotive, Physical and Partner-Related) with the various anchors. A correlation of r = 0.3 was considered an acceptable correlation using Cohen’s guidelines [9].

Distribution-based methods assess the MID by using the standard deviation (SD) of observed scores in a relevant sample. An estimate of one-half the SD or standard error of the mean (SEM) may be appropriate for approximating the MID for some PRO instruments. We chose to use half the SD, rather than SEM to approximate the MID by the distribution-based method [10]. For both the anchor- and distribution-based methods, we evaluated the change in scores from baseline to 12 weeks.

We employed a triangulation method of estimating the MID that entailed integrating global ratings with clinical benchmarks of change and statistical methods for estimating magnitude. The quantitative approach can be complemented by qualitative data from clinical experts or patients to provide insight into factors that must be considered when recommending guidelines for interpretation [11].

Measures

For both cohorts, demographic information was collected at baseline. For cohort I, 5-day bladder diaries recorded the time of each micturition UUI episode, at both baseline and at the 12-week follow-up. Validated QOL measures were completed in both cohorts at baseline and at follow-up.

The PISQ measures sexual function in women with pelvic floor disorders and consists of 31 questions in three domains: Behavioral/Emotive, Physical, and Partner-Related. Responses for each question range from 0 (“always”) to 4 (“never”), with the exception of one question regarding the frequency of climax with masturbation with responses from 0 (“do not masturbate”) and 1 (“never”) to 5 (“always”). Total scores range from 0 to 125 with higher scores indicating better sexual function. The joint IUGA/ICS definition and terminology has been used [4].

The Patient Perception of Treatment Benefit Questionnaire (PPTBQ) [12] is a global measure of treatment benefit and satisfaction. It consists of two questions. Responses categorize the degree of benefit and satisfaction into “little” or “much” benefit or “little” or “very” satisfied with treatment.

The Patient Perception of Bladder Condition (PPBC) questionnaire is a validated and responsive global measure of bladder condition [13]. The PPBC consists of a single item that assesses the patient’s subjective impression of her current urinary problems. Patients are asked to rate their perceived bladder condition on a six-point scale ranging from 1 (“no problem at all”) to 6 (“many severe problems”). Changes in scores are negative if patients improve and positive if their bladder condition deteriorates.

The OAB-q [14] is a validated condition-specific instrument that comprises an eight-item Symptom Bother scale and a 25-item health-related QOL (HRQL) scale divided into Concern, Coping, Sleep, and Social Interaction aspects for assessment. Responses for each item are scored on a six-point Likert scale. Scores for each scale are normalized to a 100-point scale. Higher Symptom Bother scores indicate greater symptom bother; higher HRQL scores indicate better HRQL. The MID of both the Symptom Bother and the HRQL scales of the OAB-q has been determined to be 10 [15].

The IIQ-7 is a seven-item validated, reliable, and responsive questionnaire [16]. Responses are on a four-point Likert scale; higher scores represent worse symptoms. Improved symptoms result in negative changes in scores. Although the MID for the long form version has been determined to be a difference of 16 points [6], the MID for the short form version of the questionnaire has not been determined.

Cohort I anchors included the change from diary wet to dry on week 12 bladder diaries, as well as changes in the PPBC, PPTBQ and OAB-q questionnaires. For the PPTBQ, we analyzed the difference in means of the PISQ scores between patients who had reported “little” benefit or satisfaction (minimum change) and patients who had reported “much” benefit or satisfaction (maximum change). Using the PPBC, we dichotomized the population into women who reported a score change equal to 0 or −1 (minimum change) and those with a score change ≤−2 (maximum change) to determine differences in mean PISQ scores. For the OAB-q, the population was likewise divided into two groups: those with an absolute change of less than or equal to 10 (minimum change) and those with an absolute change of greater than or equal to 10. The anchor in cohort II was the IIQ-7. The IIQ-7 is scored so that lower scores indicate better quality of life. IIQ scores remained the same (a change equal to 0), improved (changes ranging from −1 to −2) or deteriorated (changes in scores ranging from 1 to 2). To define minimal and maximum change for the IIQ, we dichotomized women into those who had a score change of equal to 0 or −1 (minimum change) and a score change less than or equal to −2 (maximum change); women with deterioration in their scores were not included in the analyses. Distribution MIDs for both cohorts were determined by calculating half the standard deviation in change in PISQ total and the domain scores from baseline to week 12.

Statistical analyses

Differences in the PISQ total and domain scores over 12 weeks were assessed using paired t tests and the differences in the change in PISQ scores over 12 weeks due to categorical improvement in the anchor were assessed by two-sample t tests. MID is considered “not defined” for a given anchor if the two sample t tests indicated no difference in PISQ changes. If a statistically significant difference was found, then the MID for the PISQ change was defined to be the difference between the PISQ score in the maximum and the minimum groups. If there was a statistically significant difference, but the paired t test indicated no change in PISQ for the minimal improvement category of the anchor, then the MID was defined as being the mean difference in PISQ for the maximum improvement category of the anchor.

Results

In cohort I, 163 of the 202 subjects randomized to tolterodine (81 %) gave data through week 12. In cohort II, 75 of the 102 enrolled subjects whose condition improved or remained unchanged (75 %) completed questionnaires at both baseline and at 12 weeks. The mean age of women in the two cohorts was 49 and 47.1 years respectively, and the majority of women were White (Table 1). Thirty-two women in cohort II had prolapse beyond the hymen; women with stage 3 prolapse were excluded from participation in cohort I. More women in cohort II than in cohort I were premenopausal.

Table 1 Demographics

The correlation between all QOL measures and the PISQ was equal to or greater than 0.3 with the exception of the PPTBQ and the PPBC. For these two measures, the correlation as “r-coefficient” was not calculated. Both measures have questions that are reported as “yes/no” responses, which does not allow for correlation statistics. In the anchor-based analysis (Table 2), the MID values for changes in PISQ total scores at week 12 in cohort I were 4.7 points using the UUI anchor (diary-dry women at week 12), 5 points using the PPBC anchor, 5 points with PPTBQ, and 8.7 points with OAB-q. In cohort II, the MID at week 12 in PISQ total scores was 7 points in women with improved IIQ-7 scores. The distribution-based MID (Table 3) in PISQ total scores (i.e., half the SD in score change at week 12) was 5.3 points in cohort I and 5.8 points in cohort II. The range of MID values for total PISQ scores was narrow, from 4.7 to 8.7, using a variety of anchor- and distribution-based methods. In general, when choosing a MID value from a range, it is recommended that anchor-based methods are given more weight than distribution methods, because anchor-based methods reflect patient-rated and disease-specific variables [10]. MID estimations obtained from a bladder diary represent a more conservative estimate of clinically meaningful change than that of the patient’s perspective, which is subject to recall bias [17]. Since the anchor-based values were lower, we propose an MID for total PISQ scores of 6.

Table 2 Anchor-based assessment of the minimum important difference (MID) of the Pelvic Organ Prolapse-Urinary Incontinence Sexual Function Questionnaire (PISQ)
Table 3 Distribution-based analyses of ΔPISQ for cohorts I and II

The MID for the three PISQ domains were also analyzed, also using anchor- and distribution-based methods. As above, if the major change in an anchor was not significantly different than the minor change in an anchor, then the MID using that anchor was considered undefined. For the behavioral/emotive domain, the MID, using the anchor of resolution of UUI episodes on bladder diary (change from wet to dry in the diary at 12 weeks), was 2.3, and for all other anchors it was undefined. The distribution-based method of determining the MIDs for the behavioral/emotive domain using half a standard deviation of the PISQ behavioral/emotive domain scores was 3 for cohort I and 3.5 for cohort II. Therefore, the MID for the behavioral/emotive domain ranged from 2.3 to 3.5. For the physical domain, the MID using the IIQ as the anchor was 6, for the bladder diary was 2.2; for all other anchors for the physical domain, the MID was undefined (Table 2). The distribution-based MID for the physical domain was 2.5 for cohort I and 3 for cohort II, giving a range of MID estimates for the physical domain of 2.2–6. For the partner-related domain, anchor-based methods yielded an undefined value in all instances. Distribution-based analysis showed MID of 1 for cohort I and 1.5 for cohort II. We weighted the MID estimates obtained from the bladder diary for each of the domain estimates because we felt that the diary data represented a more conservative estimate of clinically meaningful change than the patient’s perspective as measured by questionnaires, which are subject to recall bias [16]. We therefore propose an MID of 2 for the behavioral/emotional domain, 1 for the partner-related domain, and 3 for the physical domain.

Discussion

This study used five anchor-based approaches and a distribution-based approach to establish the MID for the PISQ and the three domains of the PISQ. The anchor-based approaches included four measures from the patient’s perspective (the PPBC, PPTBQ, OAB-q, and IIQ-7) and one clinical measure (the bladder diary). Using the anchor-based method, the range of MID for total PISQ scores was narrow, from 5 to 9. These data were supported for both cohorts by distribution-based analysis, with an MID of both cohorts within this range.

When multiple approaches are used to determine the MID of a scale, a range of values rather than a single-point estimate are the result, as was seen in this study. A narrow range of MID estimates adds strength to our final estimate of the MID of 6. We found a very narrow range of MID identified by multiple anchors, which was then confirmed by our distribution analysis.

In MID determination, distribution-based methods can be used in situations where anchor-based approaches are not available. There are increasing data and a growing consensus that an effect size of 0.5 (or a change of 0.5 SD) is a conservative estimate of the MID that is likely to be clinically significant across different patient-reported questionnaires. A weakness of using the 0.5 SD approach is that the estimates may not represent the minimally significant change [10]. In our study, the 0.5 SD represented the lower boundary of the range of estimates, adding strength to our final MID estimates.

The strengths of the study include the use of multiple anchors in both surgical and nonsurgical cohorts, and the inclusion of women with stress urinary incontinence (with or without prolapse) and OAB. Multiple approaches were then used to triangulate MID estimates following the recommended guidelines. All anchors used in the estimates were at least moderately correlated with the PISQ.

The limitations are that the computation of MID is limited by the strength of the validity and the reliability of the instruments involved. Use of anchors is limited by the degree of correlation they may have with the PISQ.

In conclusion, an estimate of MID for the PISQ total score using anchor- and distribution-based methods is 6 points. Improvements that meet these thresholds may be considered clinically important.