Introduction

Questionnaires are increasingly used to measure health-related quality of life in research. Although these measures may have appeal for use in clinical practice, their application in this context is inherently problematic: Generic measures often lack sufficient depth (e.g. may measure overall quality of life but not specific conditions), whereas detailed condition specific questionnaires are limited for general use by their necessarily narrow spectrum.

Recognising the potential benefits of a comprehensive approach to the evaluation of pelvic floor symptomatology, we have developed, validated and introduced to clinical practice an electronic personal assessment questionnaire for pelvic floor disorders in women (ePAQ-PF) [12]. The rationale for the creation of this instrument centred on improving communication and patient assessment; particularly, as many of the symptoms associated with pelvic disorders are of a personal and sensitive nature [3]. A major advantage of electronic systems relate to the practicalities of data capture. Typically, quality-of-life questionnaires are administered on paper. However, the use of batteries of paper questionnaires is excessively burdensome (for clinicians and patients) when applied clinically. The one-item-per-page, touch-screen format of ePAQ-PF offers a practical solution, being combined with interaction and skipping, whilst allowing exploration, documentation and analysis of relevant conditions in some detail.

Our initial research demonstrated good reliability, validity and acceptability for ePAQ-PF [2]. However, it is also important to assess the data quality generated by an instrument and the assumptions underlying its scoring and structure. To evaluate the data quality of health status instruments, most studies have focused on internal consistency reliability, secondary factor analysis and item–total correlations. However, other variables have been identified which influence scoring and structure of a questionnaire’s domains, including levels of missing data and floor and ceiling effects [46].

Factor analysis is a statistical procedure enabling the underlying dimensions of a questionnaire to be determined and was carried out in the original study to establish the domain structure of ePAQ-PF from the initial pool of questionnaire items [2]. Factor analysis simplifies complicated sets of data into factors using methods such as principal component analysis, which is a technique used to reduce a large number of items on a questionnaire into a smaller number of dimensions. It does this by statistically determining which items are related to others. Each factor that is produced is therefore an indication of the relationships between a set of variables. This can then be analysed using varimax rotation, the most commonly used method, which attempts to maximise the amount of variance explained. The replication of factor analysis in a new and different data set should verify the original factor structure and compositions of the domains produced from the first analysis [7].

Floor and ceiling effects refer to the extent to which patients score at the extreme ends of a questionnaire, i.e. lowest (floor) or highest scores (ceiling). If respondents tend to score at the extremes, then the extent of ill health in the sample may be over- or under-represented, and consequently, it may not be possible to report on improvements or deteriorations in health status with subsequent assessments [89]. A high percentage of missing data in an item or scale may indicate problems, such as an item being confusing, offensive or inappropriate [1011]. In addition to internal consistency reliability, it is important to evaluate item–total correlation (i.e. the extent to which there is a linear relationship between an item and its scale total, which has been corrected for overlap) [10]. To correct for overlap, the item to be correlated is omitted from the scale total.

The aim of this study was to test the data quality, scaling assumptions and scoring algorithms underlying the ePAQ-PF. These results were then compared with the secondary care data from the initial ePAQ-PF validation study [2].

Materials and methods

Women in the Urogynaecology Unit completed ePAQ-PF as part of their routine clinical care, prior to clinical consultation. Data were entered directly by the patient themselves to a secure password-protected National Health Service database using one of the several touch-screen computer terminals located in the clinic. Introductory pages provided subjects with instruction in the use of ‘Help,’ ‘Back,’ ‘Next’ and ‘Skip’ functions, which assisted unsupervised completion.

From an initial data set of 701, 81 (11.6%) women did not give consent for their data to be used for evaluation of the questionnaire and evaluation of the service, and therefore these questionnaires were excluded from the analysis. Women were also excluded if they were completing the questionnaire for the second time as part of their follow-up (n = 18) or did not have a questionnaire ID number (n = 3). These three questionnaires were ‘test’ completions unrelated to any clinical material or patient data. This provided a sample of 599 for analysis.

ePAQ-PF scoring algorithms

Domains

ePAQ-PF comprises of four dimensions: urinary, bowel, vaginal and sexual (consisting of 35, 33, 22 and 28 items, respectively). Within these four dimensions are 19 scored domains (Fig. 1). All items that contribute to these domains score between 0 and 3 (0 indicating best and 3 indicating worst health status). These domains are scored by dividing the sum of all item scores in the domain by the total possible item score and multiplying this by 100, to produce a scale ranging from 0 to 100. On this scale, a score of 0 indicates the best and 100 indicates the worst possible health status.

Fig. 1
figure 1

The questionnaire provides a report that summarises symptoms in 19 domain sores on a scale of 0–100 (0 representing best possible and 100 representing worst possible health). The clock-face icons to the right show the maximum impact of the constituent items cause in each of these domains (empty circle = not a problem, one-third full circle = a bit of a problem, two-third circle = quite a problem, completely full circle = a serious problem)

Impact or ‘bother’

When subjects respond affirmatively to an item about a particular symptom, an impact question is automatically presented relating to the ‘bothersomeness’ of that symptom: ‘How much of a problem is this for you?’ These supplementary questions relating to bother are only displayed if symptoms are present and responses are scored on a four-point scale (0 = not a problem, 1 = a bit of a problem, 2 = quite a problem and 3 = a serious problem). The maximum bother score attributed to any symptom in a domain is presented as the overall bother score for that domain.

All questionnaire responses are stored as numeric code in the secure password-protected central database. On completion, a printout is available, presenting responses to individual items in each of the four dimensions (bowel, urinary, vaginal and sexual). The patient’s most bothersome symptoms are highlighted, and domain scores are computed. These scores are presented both graphically and numerically on a single-page report, providing an overview as well as details of pelvic floor problems (Fig. 1).

The instrument uses two or more screening questions for each domain (including the first item in any domain). Subjects who respond in the negative to all screening items for a domain automatically skip subsequent items in that domain, and their domain score is reported as ‘screen negative.’ Accidental non-response (a fundamental problem with paper-based instruments) is largely eliminated, as progression is only permitted once a response has been selected. Items or whole domains may be skipped depending on responses to earlier screening questions or if the subject prefers not to answer.

Statistical methods

The following criteria were used to evaluate the data quality of ePAQ-PF: (1) secondary factor analysis, (2) internal reliability consistency, (3) descriptive statistics of the data including floor and ceiling effects and skewness, (4) frequency of each impact or ‘bother’ score, (5) data completeness, (6) corrected item to total correlations, and (7) item discriminant and convergent validity.

Secondary factor analysis was performed (principal component analysis, varimax rotation) to verify the scales produced from the first analysis in the development of the questionnaire [2]. For the purposes of this analysis, each dimension was factor analysed separately, and only those items obtaining a value of 0.50 or more on any of the factors were retained. Internal reliability consistency represents the extent to which items within a scale are associated with each other, i.e. the homogeneity of the items [1213], and was evaluated using Cronbach’s alpha statistic [14]. It has been argued that for the analysis at the group level, a minimum alpha co-efficient of 0.70 is acceptable; however, this increases to a reliability co-efficient of 0.90 for analysis at the individual level [15].

Descriptive statistics and score distributions for the 19 domains of ePAQ-PF were calculated. Included in this analysis were the percentages of patients scoring at the ‘floor’ and ‘ceiling’ of each domain and the skewness of the data. Skewness provides a measure of the extent to which a distribution is non-symmetrical. For example, when a data set is normally distributed and therefore symmetrical, the value will be zero and the mean and median will be close. The data is said to be ‘positively skewed’ when most of the scores are to the left of the mean; that is, the median is less than the mean, and there are fewer values to the right of the mean [16]. To determine the acceptability of the instrument, response rates were also calculated. Non-response is minimised with this instrument as progress through the questionnaire demands either a response (which may include ‘decline to answer’ or ‘quit,’ both of which are coded in the database); the percentage of women declining to answer or who quit the instrument was, however, calculated.

To measure item–total correlation (corrected for overlap), the item to be correlated was omitted from the scale total. As the data was non-normally distributed, the Spearman’s rho correlation co-efficient was used. It has been suggested that a correlation co-efficient of 0.40 is indicative of the item–total consistency [17]. To further evaluate the scaling assumptions of the instrument, it has been argued that each item should be more correlated with its own scale (item convergent validity) compared with the other scales within the questionnaire (item discriminant validity) [6]. Data analysis was carried out using SPSS v14, the statistics package for social scientists.

Results

Response rates and demographics

From the 599 completed ePAQ-PFs, 17 were excluded because of incomplete data, i.e. quit the questionnaire during completion (2.8%). Few women ‘declined to answer’ the urinary, bowel or vaginal dimensions, with the percentages of 0.5 (n = 3), 3.1 (n = 18) and 3.4 (n = 20) respectively. Ninety (15.5%) women ‘declined to answer’ the sexual dimension. Of the women who answered the screening question at the start of each dimension, 70.3% reported having some bladder problems or concerns, 45.9% as having some bowel problems or concerns and 50.2% as having some vaginal problems or concerns.

The sample of women in this study was similar to the secondary care data from the initial ePAQ-PF validation study. The mean age at the time of questionnaire completion was 52.6 years (SD = 15.65; range = 16–97 years, n = 582). In comparison, the mean age of the original secondary care sample was 52 years (SD = 14, n = 228). Three hundred and sixty-seven women (63.1%) were married, 51 were divorced (8.8%), 54 were widowed (9.3%), 69 were single (11.9%) and eight were separated (1.4%) at the time of completion. Thirty-three women (5.7%) chose not to disclose their marital status.

Secondary factor analysis

Secondary factor analysis grouped the five domains of the urinary dimension into four factors (Table 1), as the stress urinary incontinence and urinary quality-of-life domains both loaded onto factor 1. Similarly, the ‘bowel continence’ and ‘bowel quality of life’ loaded onto factor 1 in the factor analysis of the bowel dimension, and the ‘vaginal prolapse’ and ‘vaginal quality of life’ domains also loaded onto factor 1 in the factor analysis of the vaginal dimension. The sexual domains loaded onto three different factors, with sex urinary and general sex life loading onto the same factor and sex vaginal and dyspareunia loading onto another.

Table 1 Secondary factor analysis on ePAQ-PF

Internal consistency

Table 2 shows the internal consistency reliability for the 19 domains of ePAQ. The internal reliability exceeded 0.70 for all of the original 14 domains and the five newly created domains. Four domains (urinary stress incontinence, bowel quality of life, sex and bowel symptoms and sex and vaginal symptoms) reached an internal reliability of greater than or equal to 0.90, which indicated that these domains could be used for comparison at an individual level, although further research would be needed to establish the suitability of ePAQ-PF in clinical assessments. The highest α score was achieved by the ‘sex and vagina’ domain (0.93), followed by ‘urinary stress incontinence’ and ‘bowel quality of life’ (0.91) and ‘sex and bowel’ domain (0.90).

Table 2 Internal consistency reliability (Cronbach’s alpha) for the 19 domains of ePAQ-PF questionnaire

Score distributions, skewness and floor and ceiling effects

Descriptive statistics, score distributions, skewness and floor and ceiling effects are displayed in Table 3. As found in the initial validation study, urinary quality of life had the highest mean (32.3) thereby indicating worst health, closely followed by general sex life (27.2) and urinary stress incontinence (25.9). The new vaginal capacity domain had the lowest mean score (7.0), closely followed by sex and bowel (7.7). Although the vaginal capacity domain was not included in the initial validation study, similar findings were found as the sex and bowel domain was also found to have the lowest mean score in this analysis [2].

Table 3 Descriptive statistics and score distributions for the 19 domains of the ePAQ-PF questionnaire (n = 582)

All the domains were positively skewed, indicating a trend towards best rather than worst health status. The most positively skewed domains, vaginal capacity and sex and bowel (with values of 3.2 and 3.0, respectively), were those with the lowest mean scores. The domain with the lowest skewness was urinary quality of life (0.7), which was most normally distributed and had the highest mean score. Whilst very small ceiling effects were found (the largest ceiling effect was in the urinary quality of life domain 6.4%), large floor effects were observed with 7 of the 19 domains (36.8%) demonstrating a floor effect of greater than 50%; the vaginal capacity domain had the largest floor effect (80.8%).

The extent to which women felt their symptoms were ‘not a problem,’ ‘a bit of a problem,’ ‘quite a problem’ or ‘a serious problem’ for each domain of ePAQ-PF is shown in Table 4. For all 19 domains, more patients had an impact score of zero (‘not a problem’) than any other score. The domains with the highest percentage of patients scoring 0 were sex and bowel (85.1%) and vaginal capacity (83.3%), and these domains also had the lowest percentage of patients scoring 3 (‘a serious problem’; 3.3% and 2.8%, respectively). This may be a reflection of the fact that fewer women suffered with the symptoms associated with these domains or that the detrimental effect on quality of life caused by these symptoms was less than for other domains. The domains with the highest percentages of people with an impact score of 3 were general sex life (15.1%), dyspareunia (14.1%) and urinary stress incontinence (13.1%). This could be due to the fact that these symptoms were more prevalent than the others measured by ePAQ-PF or to them having a greater effect on quality of life or a combination of the two.

Table 4 Impact or ‘bother’ of conditions for each domain of ePAQ-PF

Item–total correlation

Tables 5, 6, 7 and 8 summarise the item–total correlations (corrected for overlap) for the urinary, bowel, vaginal and sexual domains, respectively. The correlations between items and their parent domains all exceeded the minimum correlation coefficient of greater than or equal to 0.40 for all four domains (range 0.40–0.90, p < 0.05), indicating good item internal consistency for all 19 domains. Although item scores should correlate with the overall scale score, it has been argued that they should not provide exactly the same information, and therefore the mean scores for each item on the questionnaire should differ [18]. As shown in Tables 5, 6, 7 and 8, the mean and standard deviations of item scores varied, although the range was only between 0.08 and 1.34. This is reflected in the response patterns to domains as reported in Table 3 whereby high floor effects were found; that is, overall, most women reported good health status in each area of pelvic floor symptoms.

Table 5 Descriptive statistics for each item on the urinary domains and the Spearman correlation of each item with the total score for the domain to which it contributes (i.e. item–total correlations corrected for overlap): urinary domain
Table 6 Descriptive statistics for each item on the bowel domains and the Spearman correlation of each item with the total score for the domain to which it contributes (i.e. item–total correlations corrected for overlap): bowel domain
Table 7 Descriptive statistics for each item on the vaginal domains and the Spearman correlation of each item with the total score for the domain to which it contributes (i.e. item–total correlations corrected for overlap): vaginal domain
Table 8 Descriptive statistics for each item on the sexual domains and the Spearman correlation of each item with the total score for the domain to which it contributes (i.e. item–total correlations corrected for overlap): sexual domain

Item discriminant and convergent validity

Overall, each item correlated more strongly with its parent domain than with the other domains in that dimension. However, the third item of the constipation domain (bowel dimension) correlated with the evacuation domain just as well as it did with its parent (coefficient 0.68), and the fourth vaginal pain and sensation question correlated more highly with the prolapse and quality-of-life domains than with the vaginal pain and sensation score (0.57 and 0.55, respectively, in comparison with 0.47). The fifth dyspareunia question (‘Do you feel that something is in the way when you have sex?’) from the sex domain also correlated with the sex vaginal domain more highly than it did with the dyspareunia total (coefficients 0.59 and 0.57, respectively). In the general sex life domain, the first question correlated better with the sex vaginal domain (0.80), and the fourth question correlated better with the sex urinary domain (0.46) than with the general sex life score (0.66 and 0.45, respectively).

Discussion

The aim of this research was to test the data quality of ePAQ-PF using seven criteria: secondary factor analysis, internal reliability consistency, descriptive statistics of the data including skewness, missing data, floor and ceiling effects and item-to-total correlation (corrected for overlap).

Overall, it appears that the quality of the data and scaling assumptions of ePAQ-PF are acceptable. The results produced in the secondary factor analysis of this new sample verified the structure of the original 14 domains. However, within each of the four dimensions, two domains loaded on one factor. This found that stress urinary incontinence was most strongly associated with urinary quality of life, bowel continence was most strongly associated bowel quality of life, vaginal prolapse was most strongly associated with vaginal quality of life and the sex and vaginal symptoms domain was most strongly associated with dyspareunia. These results are perhaps not surprising given that these domains were associated with the most bothersome of the reported symptoms in their respective dimension and related to conditions of incontinence and prolapse, for which patients are most commonly referred to urogynaecology services.

Since the original analysis, five new domains have also been created (urinary dimension: voiding, bowel dimension: irritable bowel symptoms, vaginal dimension: capacity, sexual dimension: dyspareunia and general sex life). The psychometric properties of ePAQ-PF were also re-evaluated in this study. The internal consistency reliability analysis verified the structure of these five new domains; all exceeded the minimum accepted value of 0.70 indicating the appropriateness of the scales for analysis at the group level. In the original validation study, of the 14 ePAQ-PF domains identified, only 11 of these were found to have internal consistency alpha values of 0.70 or more. In the present study, all 14 of these domains were found to have an internal consistency of at least 0.70, and four of these had alpha values of 0.90 or more. This suggests that these scales would be suitable for an analysis of patients at an individual level. Item–total correlation was also demonstrated as the minimum accepted correlation co-efficient of 0.40 was exceeded for all items on the instrument.

Non-response is minimised with ePAQ-PF as progress through the questionnaire demands either selecting a specific response or ‘decline to answer’ or ‘quit,’ all of which are coded in the database. With the exception of the sex dimension, high rates of data completeness were found for the other three dimensions, thus indicating that ePAQ-PF is both understandable and acceptable to respondents. Around 15% of women chose not to answer the sexual dimension. It has been argued that including questions that are of a sensitive or difficult nature (such as those relating to socially taboo topics or highly personal or private matters) may reduce the response rate for a questionnaire or produce incomplete or dishonest answers [11]. However, other studies within the field of gynaecology that have included questions about sexual intercourse in questionnaires have had good compliance rates (>80%) and low levels of missing data, and therefore these items were not seen as intrusive by the participants [1920].

The likely explanation for the lower rate of data completeness for the sexual dimension, compared with the urinary, bowel and vaginal dimensions, is therefore that the women who completed ePAQ-PF felt that these questions were not relevant or important to them (9.4% of women were widows, 11.9% were single, and 1.4% were separated). The questionnaire is optional throughout and becomes more sensitive and potentially challenging as it progresses. Women are not routinely screened prior to using the questionnaire and to an extent screen themselves. It may be inappropriate to exclude widows from in-depth assessment of sexual symptoms; the use of interactive skipping and the option of declining to answer an entire dimension appear to have worked well in this context. A previous study has shown a median completion time of 15 min [2]. Inevitably, some questionnaire fatigue may result in some of loss of data, though these data compare very favourably with the response rates of other studies using touch-screen questionnaires [21].

The positive skew of data distribution for all 19 domains indicates a trend towards good health status. Fourteen of the 19 domains had a range of 0–100, indicating that that the patient sample included groups of patients with very different levels of symptom severity. Floor and ceiling effects have been found to be a problem for some existing health status instruments. In particular, the majority of people who complete the Nottingham Health Profile questionnaire score zero for most and sometimes all of the six domains because it was designed to detect the severe end of illness [22]. Consequently, it does not reflect the health states for respondents with mild to moderate disease. All ceiling effects were found to be low, with the highest value being 6.4%. This indicates that ePAQ-PF is a good instrument for detecting changes in improvements or deteriorations in health status for patients with high levels of symptom severity in future assessments. However, most floor effect values were quite large, and whilst the mean scores for each item varied, the range was small (0.08–1.34), thus indicating that for many of the women, the symptoms were either ‘never’ or ‘occasionally’ present. In an unscreened population, it is not surprising that a large proportion of women were asymptomatic in a number of domains. In the clinical setting, however, this information is extremely valuable. For example vaginal prolapse surgery may itself create problems with dyspareunia and sexual dysfunction [23], and corrective surgery for stress urinary incontinence may result in de novo overactive bladder symptoms [24]. Thus, baseline data on health status, even if showing an absence of symptoms, is important and valuable.

A limitation of this study is that the sample was limited to a secondary care urogynaecology clinic, and therefore the results might not be generalisable to other areas of medicine. Whilst the reliability and validity of ePAQ-PF has been established, the application of health status instruments as outcome measures means they also need to be evaluated in terms of their ability to detect change. In clinical settings especially, it is important to establish the ability of such measures to detect and describe changes in patients’ health status over time and to show whether these changes are clinically relevant [25]. This is often referred to as the ‘responsiveness’ or ‘sensitivity to change’ of an instrument [13]. The responsiveness of ePAQ-PF is currently being evaluated in women undergoing treatments for pelvic floor disorders including tension-free vaginal tape and surgical correction of prolapse.

Conclusion

The quality of the data obtained using ePAQ-PF and the composition of its domains have been verified and suggest that psychometrically, the instrument is a valid, reliable and valuable tool for measuring symptom severity and health-related quality of life associated with pelvic floor dysfunction. This research has verified that the new structure of 19 domains has a high level of internal consistency, good item–total correlations and a low rate of missing data, indicating that patients do not find the questionnaire complicated, inappropriate or offensive. This compares favourably with the findings of the initial validation study on the original 14 domains of the ePAQ-PF [2]. High ceiling effects indicate that ePAQ-PF offers a useful tool for detecting changes in symptoms for women with pelvic floor symptoms. High floor effects mean that in many domains of the questionnaire, women had low or absent symptoms. Further research is needed to evaluate the feasibility, acceptability and psychometric properties of ePAQ-PF in other clinically related areas of medicine, e.g. colorectal surgery and physiotherapy, and to establish the responsiveness of the instrument.