Introduction

Overactive bladder (OAB) is a condition of urinary urgency, with or without urgency urinary incontinence (UUI), usually with frequency and nocturia [1], that is associated with a high degree of bother and has a negative impact on health-related quality of life (HRQL) [24]. The prevalence of OAB increases with age [5]. In a large survey conducted in Western Europe and Canada, OAB symptoms were reported by approximately 20% of men and women aged ≥60 years and increased with advancing age [6]. The prevalence of OAB approaches 40% in adults aged ≥40 years in the United States [7]. Because there is a greater likelihood of comorbidities and polypharmacy in elderly patients [8, 9], elderly patients with OAB are a medically complex population that should be treated accordingly [10].

Patient-reported outcomes (PROs) are tools used by physicians in clinical practice or as outcome measures in clinical trials evaluating the efficacy of a given treatment [11, 12]. For example, the Vulnerable Elders Survey (VES-13) is a 13-item questionnaire that assesses the risk of functional decline and death in community-dwelling older adults according to age, self-reported health status, and physical and functional limitations [13]. Individuals with a VES-13 score of ≥3 are 4.2 times more likely to die or experience functional decline in the next 2 years than those with a score of <3.

The OAB-q is a 33-item, self-administered, validated questionnaire that assesses symptom bother and the effect of symptoms on HRQL. Questions 1–8 constitute the Symptom Bother component, with higher scores indicating greater symptom bother. Questions 9–33 constitute the HRQL component, with higher scores indicating better HRQL. A total HRQL score and scores for four HRQL domains (that is, Coping, Concern, Sleep, and Social Interaction) are obtained. The OAB-q was originally developed and validated using a 4-week recall period [14, 15]. In response to US Food and Drug Administration guidelines calling for shorter recall periods for PRO measures [11, 15], a 1-week recall version of the OAB-q was initially validated in patients enrolled in three randomized controlled trials of fesoterodine for the treatment of OAB [16]. The mean age of the patients in these trials was 59 years, and the most common comorbidity across the three trials was musculoskeletal disorders. The 1-week recall version of the OAB-q demonstrated good internal consistency and validity and better (higher) responsiveness to treatment compared with the 4-week recall version. In the present analysis, we evaluated the factor structure and the reliability, validity, and treatment responsiveness of the OAB-q 1-week recall version in a medically complex elderly (≥65 years) population with OAB.

Materials and methods

Patients

Eligible subjects were men and women aged ≥65 years with a VES-13 score of ≥3, a mean of 2–15 UUI episodes per 24 hours at baseline and a Patient Perception of Bladder Condition (PPBC) score of 4–6. Patients started on a 4-mg dose of fesoterodine, with the option to increase to 8 mg at week 4 only; patients could revert to the 4-mg dose at any time during the subsequent 8 weeks of the study. There was a sham dose escalation or de-escalation in patients receiving placebo. Each patient provided written informed consent before enrollment. The study protocol was reviewed by the institutional review board (IRB) at each study site.

Patient-reported outcome measures

The OAB-q (1-week recall version) was administered at baseline, week 4, and week 12/early termination. In addition to the OAB-q, patients completed the PPBC. The PPBC is a single-item, self-administered, validated questionnaire [17] that asks subjects to describe their perception of their bladder-related problems on the following six-point scale: 1 my bladder condition does not cause me any problems at all; 2 my bladder condition causes me some very minor problems; 3 my bladder condition causes me some minor problems; 4 my bladder condition causes me (some) moderate problems; 5 my bladder condition causes me severe problems; and 6 my bladder condition causes me many severe problems. The PPBC has previously been scored as 0–5 but this difference in numerical scoring (1–6 vs. 0–5) has no bearing on the study itself as the response options remained the same. The PPBC was administered at screening, week 4, and week 12/early termination. Patients were also instructed to complete a bladder diary for three consecutive days before the baseline, week-4, and week-12 visits to record details of micturitions per 24 hours (including time and frequency of micturitions, UUI, and the feeling of urgency associated with each micturition using the five-point Urinary Sensation Scale [18], with scores ranging from 1 no feeling of urgency to 5 unable to hold; leak urine) and the type and number of protective undergarments used because of urine leakage.

Data analyses

Analyses were conducted to examine the factor structure, internal consistency, concurrent and discriminant validity, and treatment responsiveness of the OAB-q (1-week recall version) in this population of medically complex elderly patients with OAB. Second-order confirmatory factor analysis (CFA) was used to confirm the five-factor structure of the OAB-q (with first-order domains Coping, Concern, Sleep, Social Interaction, and Symptom Bother subscale; and second-order domain total HRQL). The criteria used to determine whether the model fitted the data were a Bentler’s comparative fit index (CFI) of >0.9, statistically significant path coefficients (t values >1.96), and standardized path coefficients ≥0.4 and statistically significant [12].

Because data used for the CFA from one study (NCT00928070) were skewed and had a restricted range, they were combined with data from the fesoterodine arm of a second flexible-dose study (NCT00798434) [19]. Week 12 data from 220 elderly patients with fewer than two UUI episodes per 24 hours were pooled with those from the original study. This second trial was a multicenter 24-week study consisting of a 12-week, randomized, double-blind, placebo-controlled, parallel-group phase followed by a 12-week open-label phase to evaluate the efficacy and safety of a fesoterodine flexible-dose regimen in elderly patients with OAB. The pooled data from the two trials provided a total of 786 observations for the CFA. Data from the second trial were not used in any of the other analyses.

Several other measurement techniques were employed [12]. Internal consistency (reliability of the OAB-q in measuring the concept it is purported to measure) was assessed using Cronbach coefficient α (CCA), with CCA values of >0.7 used as evidence of internal consistency. Convergent validity (the extent to which OAB-q scores are related to scores from other conceptually related instruments) was examined using Spearman’s rank-order correlation coefficients. Known-groups validity (the ability of the OAB-q to distinguish between theoretically distinct groups) was assessed by evaluating differences in OAB-q scores in patients categorized according to UUI severity (mild 0–2 episodes per 24 hours, moderate 3 or 4 episodes per 24 hours, or severe 5–15 episodes per 24 hours) and PPBC severity (some moderate problems with a score of 4, or severe to many severe problems with a score of 5 or 6). Inclusion criteria at screening required a PPBC score of 4 to 6 (that is, subjects had to have at least moderate problems with their bladder condition). Effect size (ES) was used to examine responsiveness to treatment (ability to detect change) of the OAB-q. ES of differences in mean OAB-q scores from baseline to week 12 (that is, difference in means divided by the pooled standard deviation at baseline) were considered ‘small’ (|ES| = 0.20), ‘medium’ (|ES| = 0.50), or ‘large’ (|ES| = 0.80) [12].

The clinically important difference (CID) was estimated using an anchor-based approach. The PPBC was used as the anchor, and an OAB-q domain score was used as an outcome using a repeated measures model. Analyses were conducted using PPBC as a continuous predictor (main analysis) and as a categorical predictor (sensitivity analysis) to check the linearity assumption of the main analysis. The differences in OAB-q Symptom Bother and HRQL domains corresponding to a one-category difference on the PPBC were assumed to represent their respective CID estimates. Responder analysis was based on a repeated measures model using the change from baseline in OAB-q domain scores at week 4 and week 12 as outcomes (dependent variables) in separate models. In each repeated measures model, the PPBC scores at week 4 and week 12 were transformed to represent three categories (worse, the same, or better [11], relative to screening) and were used as the continuous anchor predictor, which was also treated as a categorical (anchor) predictor in a sensitivity analysis.

Statistical tests were performed using a significance level of 0.05 unless otherwise noted. SAS software version 9.2 (Cary, NC) was used for all psychometric analyses, except for the pooled CFA data that were analyzed using SAS software version 9.4.

Results

Patient demographics

Data from 566 patients (NCT00928070) were included in all analyses except the CFA (NCT00928070 and NCT00798434 pooled). Participants were mainly female (82.2%) and white (85.9%) with a mean age of 75.0 years (range 65–91 years; Table 1). Pooled data used in the CFA included a total of 786 observations. Patients from the second study had a mean age of 72 years (range 65–90 years), were almost exclusively white (99.6%), and were fairly balanced with regard to gender. In this pooled dataset, approximately 50% of the data represented the population of interest (that is, medically complex elderly).

Table 1 Demographic and clinical characteristics of the patients at baseline

Validity

Bentler’s CFI was satisfactory (0.90), path coefficients were statistically significant (t values >1.96), and all standardized path coefficients were >0.40 at baseline, week 4, and week 12, confirming the second-order measurement model with four HRQL domains and one Symptom Bother domain with an Aggregated domain (representing total score) as a second-order domain in this medically complex elderly population (Fig. 1).

Fig. 1
figure 1

Confirmatory factor analysis (CFA) model for OAB-q (1-week recall) data. Second-order CFA was applied to confirm the five-factor structure (that is, four HRQL domains Coping, Concern, Sleep and Social Interaction, and one Symptom Bother domain) in a medically complex elderly patient population. The four HRQL domains are represented by latent (unobserved) variables F2, F3, F4, and F5, respectively. The Symptom Bother domain is represented by manifest (observed) variable F1. The second-order aggregate latent factor F6 subsumes all other factors. The factor loadings are represented by lvf and lv path coefficients (e.g., lvf2f6 represents the path coefficient, or loading, from F6 to F2 and lv30f4 represents the path coefficient from factor F4 to item 30 or variable V30). The disturbance terms for the factors are represented by d (e.g., d4 represents the disturbance term associated with factor F4). The error terms for the observed items are represented by e (e.g., e30 represents the error term associated with item or variable V30)

Convergent validity was supported by moderate correlations (0.4–0.7) between OAB-q domain and PPBC scores after baseline, demonstrating that the OAB-q and PPBC are closely related but do not measure exactly the same concept (Table 2). Lower coefficients (<0.3) were obtained at baseline because the range of scores was restricted by the study inclusion criterion for a PPBC score of ≥4.

Table 2 Spearman correlation coefficients between OAB-q domain and PPBC scores

Significant differences in OAB-q scores between patient groups with different UUI and PPBC severities provide evidence of known-groups validity. The mean scores for the OAB-q Symptom Bother domain followed the expected direction against the UUI severity categories at week 4 and week 12, with the lowest scores for those categorized as having mild UUI and the highest scores for those categorized as having severe UUI. At week 4, OAB-q scores (Symptom Bother, the four HRQL domains, and total HRQL) discriminated (P < 0.05) between the mild and moderate UUI groups, mild and severe UUI groups, and moderate and severe UUI groups. At week 12, OAB-q scores discriminated between the mild and moderate UUI groups and between the mild and severe UUI groups, but did not discriminate between the moderate and severe UUI groups, except for Symptom Bother (Fig. 2). OAB-q scores for Symptom Bother, the four HRQL domains, and total HRQL discriminated between the two PPBC severity groups (based on a screening value of 4, some moderate problems, or 5 and 6, severe to many severe problems) at week 4 and also at week 12 (P < 0.0001). At week 12, OAB-q scores discriminated between moderate and severe PPBC groups (Fig. 3).

Fig. 2
figure 2

OAB-q scores in relation to UUI severity at week 12 (HRQL health-related quality of life, OAB-q Overactive Bladder questionnaire, UUI urgency urinary incontinence). *P < 0.02 mild vs. moderate, **P < 0.01 mild vs. severe, ***P < 0.01 moderate vs. severe

Fig. 3
figure 3

OAB-q scores by PPBC severity at week 12 (HRQL health-related quality of life, OAB-q Overactive Bladder questionnaire, PPBC Patient Perception of Bladder Condition). *P < 0.001

Reliability

As shown in Table 3, OAB-q domains demonstrated good internal consistency, with all CCA values >0.8 (with 0.7 taken as evidence of internal consistency). At baseline, CCA values ranged from 0.88 (Social Interaction) to 0.96 (total HRQL). At week 4, CCA values ranged from 0.90 (Social Interaction) to 0.97 (total HRQL), and at week 12, the values ranged from 0.92 (Social Interaction) to 0.97 (total HRQL).

Table 3 Cronbach coefficient α values for OAB-q domains

Responsiveness

Treatment responsiveness was demonstrated by statistically significant and clinically relevant differences in mean OAB-q scores from baseline to week 12 with medium-to-large ES (|ES| = 0.50–0.80) for Symptom Bother, the four HRQL domains, and total HRQL (Table 4). Each treatment group (fesoterodine and placebo) showed improvement, but the fesoterodine group generally improved more than the placebo group (see cumulative distribution functions in Supplementary Material). Responder definition estimates were 16.77 (Symptom Bother), 14.37 (Concern), 15.38 (Coping), 12.75 (Sleeping), 7.44 (Social Interaction), and 12.91 (total HRQL) when PPBC was collapsed into three categories (that is, worse, same, and better). The linearity assumption in the model was supported as PPBC as categorical (anchor) predictor gave similar results.

Table 4 Effect size for mean change from baseline in OAB-q scores at week 12

Discussion

The results of the analyses demonstrated that the OAB-q (1-week recall version) is internally consistent, valid, and responsive to treatment effects in medically complex elderly patients with OAB. Internal consistency at week 12 was nearly identical to that previously reported for a younger and generally healthy sample of patients with OAB treated with fesoterodine [16]. The pattern and magnitude of the relationships among mean OAB-q domain scores and PPBC scores yielded Spearman correlations >0.40, thereby indicating good convergent validity. Differences in convergent validity were found at baseline compared with the previous validation sample [16] in the domains of Symptom Bother (r = 0.30 in the current study vs. 0.58), Concern (r = 0.24 in the current study vs. −0.55), and total HRQL (r = −0.32 in the current study vs. −0.50), probably owing to the restricted range of PPBC values permitted at enrollment. As shown in Table 2, following treatment with fesoterodine (as early as 4 weeks), the correlations are very close to the values from the previous study.

Greater correlations for each OAB-q domain were seen in the previous validation study [16], with the direction of the relationship being opposite for the Concern domain. One could suggest that in this medically complex elderly population, HRQL was very low at baseline and patients were near the ceiling of Concern. Thus, changes would be better observed as symptoms improved with treatment. It would be expected that as bladder problems increase, Symptom Bother would increase (strong positive relationship), and that as bladder problems increase, HRQL scores would decrease (strong negative relationship). In the previous validation sample (N = 1,839), musculoskeletal and connective tissue disorders, immune system disorders, nervous system disorders, and eye disorders were the most common comorbid conditions [16]. In the current sample of patients, the most common comorbid conditions were musculoskeletal and connective tissue disorders, hypertension, metabolism and nutrition disorders, and gastrointestinal disorders [20]. Moreover, 67.3% of the patients had six or more comorbid conditions and 68.3% were taking six or more concomitant medications at baseline [20]. In the current analysis, the directions of the relationships were the same from baseline throughout the treatment period and the relationships strengthened with treatment (Table 2).

The responsiveness to treatment of the OAB-q was very good, as demonstrated by statistically significant and clinically relevant differences in mean scores for the OAB-q domains from baseline to week 12. The mean change and ES for each OAB-q domain score with treatment were similar to those in the previous OAB-q validation study [16].

A CID of ten points for Symptom Bother and HRQL domains was established through previous validation with other samples of patients with OAB [21]. The aim of this analysis was to determine if this would hold true for the medically complex elderly patient population. The estimated differences in means for each OAB-q domain score for a one-point change on the PPBC were 13.28 (Symptom Bother), 12.47 (Concern), 12.30 (Coping), 10.87 (Sleeping), 6.54 (Social Interaction), and 10.88 (total HRQL), suggesting that these are clinically meaningful improvements.

Possible limitations of the current analyses are worthy of consideration. First, post hoc analyses of previous clinical trial data were used to validate the OAB-q (1-week recall version) in medically complex elderly patients with OAB. As a result, a test–retest reliability assessment of the OAB-q was not possible using data from this analysis. Second, because the patient population evaluated was predominantly white, the results may not be generalizable to nonwhite medically complex elderly patients with OAB.

Over the last 10 years, it has been recognized that PROs provide valuable information that complements objective assessments of treatment benefit in patients with OAB [22]. Because the prevalence of OAB increases with age, it is important to evaluate patient-reported treatment benefit in medically complex elderly patients. The OAB-q (1-week recall version) demonstrates reliability, concurrent and discriminant validity with other measures, responsiveness to treatment, and is psychometrically sound in medically complex elderly patients with OAB.