Introduction

Kidney stones are associated with a significant morbidity [1] and represent a global challenge due to a rising prevalence all around the world [2]. Each stone episode leaves its footprint in daily life by impairing the quality of life.

Health related quality of life (HRQOL) represents a key consideration of clinical research [3, 4]. Assessing the HRQOL not only provides a valuable insight of the course of a disease, but also offers a better understanding of the impact of the disease on the individual concerned. Therefore, HRQOL deserves recognition and careful consideration in clinical practice.

Measuring HRQOL using validated test forms focuses on various aspects of physical and mental health conditions and, therefore, serves as a tool to delineate progress as well as success, and efficacy of an ongoing treatment [5]. While instruments measuring general HRQOL provide a broad perspective [6], they might not be ideal to gain a profound understanding of the impact of a specific disease on patients.

Recently a disease specific HRQOL measure for kidney stones was developed, the “Wisconsin Stone Quality of Life questionnaire” (WisQoL) [7]. It has been validated [8] and utilized [9] and was translated and validated in Spanish, Russian and Turkish [10,11,12]. The purpose of this study was to develop and validate a German version of the WisQoL.

Materials and methods

Translation process and pilot testing

Translation and linguistic validation was conducted in accordance with the multistep process established by Hutchinson [13]. The final consensus version was checked for readability [14] and successfully used for pilot testing including semi-structured interviews of 10 patients [15] suffering from urolithiasis.

Patient selection and data collection

Patients were recruited in Sindelfingen, Germany, and St. Gallen, Switzerland. Inclusion criteria were patient age ≥ 18 years, written informed consent, patient scheduled for urinary stone treatment, and sufficient German language skills.

Collection of patient characteristics and administration of the German WisQoL and validated Short Form Health Survey (SF-36v2) was performed prior to stone surgery (baseline), as well as 1, 3 and 6 months thereafter. As the final part of the WisQoL asks for the presence of urinary stones, stone-related symptoms or interventions and other events interfering with quality of life during the last 4 weeks, this repeated administration allowed for analyses of intra-individual changes including test–retest reliability and sensitivity to change.

Statistical analyses

Statistical analyses were performed with R, version 3.5.1. (R-Core-Team 2018) and the package psych, version 1.8.12.

WisQoL total score was calculated as the sum of scores of all items from 1a to 7f (28 items). Domain scores were calculated in accordance with the original publication [8]. Sum scores were calculated only from complete data.

Patients’ characteristics were evaluated using descriptive statistics. Approximate 95% confidence intervals for Spearman’s rank correlations (rho) were obtained through Fisher’s transformation. Statistical significance was evaluated at the 5% level without correction for multiple testing across items or domains.

Reliability was assessed by internal consistency (Cronbach’s α) separately for each visit and considering either the items of one domain or all items together. Test–retest reliability was assessed by correlation of domain and total scores between two consecutive visits (either at 1 month and 3 month, or at 3 month and 6 month follow-up) at which patients reported unchanged health state in the final items of the WisQoL (8.1 to 8.5). Two measures of correlation were used: The intraclass correlation (ICC), i.e. the fraction of total variation in scores that is attributable to differences among patients, and Spearman’s rank correlation (rho).

To assess convergent validity, relationships between pairs of items in the WisQoL and SF-36 v2 expected to measure the same component of life quality were evaluated with Spearman’s rank correlations (rho).

Sensitivity to change was measured by the mean difference in domain and total scores between two consecutive visits chosen such that the patient reported having stones or stone-related symptoms at the first visit but not at the second (WisQoL item 8.1 and 8.2). The significance of score improvement was analyzed using Wilcoxon signed-rank tests. For comparison, scores were also compared between two consecutive visits without change in stone presence or with unchanged health state to check that there was no (or only little) score improvement in this case.

Construct validity was tested by comparing patients with and without stones or stone-related symptoms (WisQoL item 8.1 or 8.2). Means were compared between groups with one-way analysis of variance followed by Tukey’s honest significant difference tests for pairwise comparisons.

Results

A total of 227 patients were recruited between February 2018 and April 2019, with 19 dropouts due to non-adherence and insufficient language skills. Data for 208 patients (140 men and 68 women; median age 54.9 years (range 18.1–86.4) who completed the questionnaires at recruitment were included (baseline, Suppl. Tab. 1).

Descriptive statistics for single items, domain scores, and total scores

All WisQoL items had scores spanning the full range from 1 to 5, and all items had similar variability of scores (SD) at an individual assessment (Suppl. Tab. 2). This indicates that all items assessed life quality on the same scale, i.e. a difference of ± 1 unit means the same for all items. Domain scores had different means and SD because they are calculated as the sum of different numbers of items.

As expected, total WisQoL scores varied widely among individual patients at baseline and after 1 month, whereas most patients had total scores between 120 and 140 after 3 and 6 months (Suppl. Fig. 1a). Most patients reached their individual maximum either after 1 month or after 3 months, but a few did so only after 6 months or stayed at low values. The contrasting time courses of total scores at the level of individual patients were reflected by an increase in mean total score up to the 3-month visit and no further change up to the 6-month visit (Suppl. Fig. 1b).

Associations between single items, domain scores and total score

Spearman’s rank correlations between all pairs of individual items were positive (rho > 0), meaning that all items measured quality of life in the same sense, i.e. higher scores always reflect higher quality of life (Fig. 1). Together with the similar variability of item scores (Suppl. Tab. 2) this means that the calculation of simple sum scores based on the raw item scores is a meaningful way of summarizing the information.

Fig. 1
figure 1

Spearman rank correlations (rho) between all pairs of individual items. (Each cell of the image corresponds to one pair of items (rows and columns), and the tone indicates the strength of the correlation. Items are ordered according to the four domains, and thick lines indicate the subdivisions.)

Items of the same domain were on average more strongly correlated with each other than with items of the other domains (Fig. 1, Suppl. Tab. 3). Nevertheless, domain scores consistently showed substantial inter-domain correlations (rho between 0.62 and 0.78), indicating a high conceptual relationship of the different domains (Table 1). Accordingly, all domain scores were also strongly correlated with the total score (rho between 0.79 and 0.93, Table 1).

Table 1 Inter-domain associations: Spearman rank correlations between all pairs of domain scores, and correlations of each domain score with the total score, given with 95% CI

Internal consistency and test–retest reliability

All domains had excellent internal consistency (alpha > 0.90). This was similar at baseline and at all follow-ups (all timepoints assessed: Suppl. Tab. 4; baseline and three-month follow-up: Table 3). Domain D3 had only ‘good’ internal consistency at baseline (alpha = 0.87), but this increased progressively during follow-up.

Test–retest reliability was calculated for patients with unchanged health state. Repeated assessments were correlated rather moderately with each other (Table 2).

Table 2 Internal consistency and test–retest reliability

Convergent validity

Relationships between corresponding items in the WisQoL and SF36 questionnaire were evaluated with Spearman rank correlations. Questions assessing fatigue (WisQoL 1b—SF36 9g/i), social impact of disease (WisQoL 3a—SF36 6), concentration level (WisQoL 3e—SF36 4d) and levels of pain (WisQoL 5b—SF36 7) were assessed. All correlations were significantly positive (lower limit of 95% CI > 0), even if only moderately so (Table 3).

Table 3 Spearman rank correlations between corresponding items in the WisQoL and SF36 questionnaires (Approximate 95% CI are provided)

Sensitivity to change

Domain and total scores consistently increased between two consecutive visits chosen such that either stone presence or current perception of stone-related symptoms improved between these visits. All changes were significant at the 5% level in Wilcoxon signed-rank tests (Suppl. Tab. 5), meaning that positive score changes were always more frequent than negative changes. In fact, scores improved even in the absence of a change in self-reported stone presence or symptoms (Suppl. Tab. 5). However, mean changes were much larger after stone removal and after symptom disappearance than in situations of unchanged self-reported condition.

Construct validity

Mean total WisQoL scores did not differ between patients who did and those who did not report currently having stones in their urinary system (Suppl. Fig. 2a), but were significantly different between patients who did and did not report currently having stone-related symptoms (Suppl. Figure 2b).

Discussion

Urinary stones are a common disease with reported prevalence rates up to 20% [16], and have been steadily increasing over the past years [17,18,19]. Urinary stones and their treatments can negatively affect patients and their quality of life [20]. In 2016, Penniston et al. developed and validated the WisQoL hereby creating a stone specific instrument with robust psychometric properties [8].

The psychometric properties of the German questionnaire showed excellent validation results, with a high internal consistency (alpha > 0.9), good sensitivity to change, and satisfactory convergent and construct validity. Minor exceptions were limited to single parameters and can be explained by the study’s methodology. The test–retest reliability for patients with unchanged self-reported health state ranged considerably for the single domains (Spearman’s rho 0.40–0.82), but was overall considered satisfactory (Spearman’s rho for total score 0.70 [95% CI 0.55 to − 0.80]). In principle, the uniformity of scales might promote a tendency to mechanically check similar scores for all items, especially towards the end of the questionnaire, leading to reduced differentiation. However, most pairs of items were only moderately correlated (rho < 0.7), suggesting that they were rated separately by the patients, and therefore not redundant (Fig. 1, Suppl. Tab. 3).

Repeated assessments were only moderately correlated with each other (Suppl. Tab. 5). Even in the absence of self-reported changes in condition, there could be distinct score improvements or score reductions. However, mean changes were much larger after stone removal and after symptom disappearance than in situations of unchanged condition, showing the ability of all scores to reflect changes in subjective stone-related condition over time.

Valid scores should not only reflect temporal changes in individual patients, but also differences between patients, so that they could e.g. be used to compare two treatment groups after the treatment. In the original English WisQoL [8], this was tested by comparing mean scores of patients with and without stones, or with and without stone-related symptoms. Interestingly, our results showed that mean total scores differed between patients who did and those who did not report currently having stone-related symptoms (Suppl. Fig. 2b), but not between patients who did and did not report currently having stones in their urinary system (Suppl. Fig. 2a). This is particularly relevant, as this underlines the quality of WisQoL in identifying the impact of health related quality of life measures.

We conducted our study in accordance with the standard linguistic validation [13]. Due to the large number of patients and the repeated follow-up, we were able to assess changes over a period of time, even in patients who required repeated interventions.

Our study has some limitations. Enrollment was conducted by availability at both recruiting sites. Thus, a bias of patients recruited may have been introduced. Less than half of the initially recruited patients completed all items for all four visits. This loss in follow-up is in part due to the repeated questionnaires patients received. It is of note, that this close follow-up allowed us to assess the test–retest validity of WisQoL, a test that had not been performed in the original WisQoL validation study.

Conclusions

The German version of the WisQoL is a robust measure of health related quality of life in patients with urinary stones. The validation is internally consistent and can satisfactorily discriminate between patients with a different stone and symptom status. Thus, the German WisQoL will be a helpful patient reported outcome measure in clinical studies assessing the impact of urinary stones and stone treatment on quality of life. The questionnaire is readily available from the authors.