Introduction

Urolithiasis has a global prevalence of around 10 percent of the population [1]. The burden of the disease is increasing, along with its associated economic impact [2]. The healthcare cost of treating urolithiasis is comparable to the combined cost of bladder and prostate cancer in the UK [3]. Most patients with urolithiasis undergo long-term follow-up involving regular clinic reviews and imaging to prevent or identify possible complications early. This is resource-intensive, involves exposure to ionising radiation and is not without diagnostic limitations. Furthermore, there are wide variations in practices. Deciding the optimal frequency and duration of follow-up is a longstanding problem with little evidence base and alternatives. The National Institute for Health and Care Excellence (NICE) indicated that currently no recommendations can be made regarding follow-up and that more research is needed [4]. The European Association of Urology urolithiasis guidelines panel have published guidance on the follow-up strategy for patients undergoing definitive treatment, based on different stone, patient and imaging factors. They have commented that the evidence base remains limited due to the high heterogeneity and the lack of any comparative data [5].

Recently, there has been a shift to patient-centred healthcare, and as a part of that, Patient Reported Outcome Measures (PROMs) have been developed which seek to ascertain patients’ views of their symptoms, functional status, and well-being [6]. Recent studies support the use of PROMs in clinical practice for improved shared decision-making and patient self-management [7]. They have been found to be used to “identify triggers for surgery and potentially reduce the burden on services by limiting unnecessary or ineffectual procedures” [8]. When used on a longitudinal basis, PROMs can track the progression and severity of disease and be incorporated as an adjunct to make changes to treatment and follow-up [9]. The American Urological Association (AUA) guidelines state that treatment decisions about urinary calculi should incorporate patient preferences that are influenced by the Health-Related Quality of Life (HRQoL) impact rather than the limited clinical and radiological outcomes [10].

Urinary Stones and Intervention Quality of Life (USIQoL) has been specifically developed as a core PROM for patients with upper urinary tract stones. It is scored in 3 sections covering the domains of Pain and Physical Health (PPH, 6 items), Psycho-Social Health (PSH, 7 items) and Work performance (W, 2 items). It has a completion time of 3–4 min and is well suited for longitudinal application [11]. Its completion involves patients rating the amount of bother attributed on a 4-point scale (1 = not at all, 2 = a little, 3 = quite a bit or 4 = a lot). The scale scores are generated by a simple summation of scores for each item in the domain (score range: PPH 6–24, PSH 7–26 points, W0-8) with higher scores indicating greater patient bother. The results from its validation study demonstrated that USIQoL is reliable (r ≥ 0.8), internally consistent (α ≥ 0.7) with good construct validity (good hypothesised correlations, r > 0.3) and sensitivity to change (p < 0.01). All scales demonstrated unidimensionality with good item fit and person separation indices.

When a patient with urolithiasis attends a clinic, the important question is whether there is a need for additional tests, and possible intervention. This is even more important in the context of patients with small and recurrent stones attending regular follow-up. In this setting, it would be important to know if the adoption of the USIQoL as a monitoring tool into routine practice can assist clinical decision-making, if the results correlate well with those of traditional methods of outpatient review, thus serving as an alternative way to manage the patients.

Our study had the following objectives:

  1. 1)

    To develop the first in endourology, PROM (USIQoL) based prediction model to identify patients at risk of needing intervention and conduct a preliminary validation of the model for outpatient use.

  2. 2)

    To develop USIQoL cut-off scores, for the physical and psycho-social domains, that can reliably differentiate between patients needing traditional evaluation and possible intervention(s) against the low risk stable group suitable for PROM only follow-up.

  3. 3)

    To establish the Minimal Clinically Important Difference (MCID) for the USIQoL defined as “the minimal change in the score considered to be relevant by patients and physicians” [12].

Materials and methods

The study received ethical approval from the Southeast Wales research ethics committee (17/WA/0195) and the local governance committee for service evaluation. It was performed in accordance with the Declaration of Helsinki. Established methodology for development of a prediction model was followed to derive clinically significant changes in the USIQoL scores and cut-off values [13].

The study involved the administration of two PROMs, USIQoL (disease specific) and EuroQoL EQ -5D-3L (generic) to patients during their outpatient attendance with the results analysed using summary index scores [14]. We present the results based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement [15].

The study was conducted in 2 phases.

Phase I—development of a USIQoL-based prediction model

This involved the analysis of the PROM scores and clinical outcomes, from an existing dataset collected in a prospective multicentre (4 secondary care urology departments) cohort study during the final phase of the validation of the USIQoL questionnaire that is already published [11]. This phase involved recruitment from March 2018 to June 2019 with the follow-up completed in September 2019. The study included patients with urolithiasis in both the outpatient and inpatient (including emergency) settings who completed the PROMs questionnaires prior to their face to face assessments. The analysis included evaluation of the outcomes stated below (A & B) and the development of suitable cut-off scores to distinguish between patients requiring intervention (high risk) or not.

Phase II—prospective single-blind preliminary external validation of the prediction model

We tested the decision model for an outpatient application, using a blind single centre prospective study with the data collected over 10 months (December 2021–October 2022). This external validation involved a separate sample (to the development phase) from a later period. Patients with urolithiasis attending urology outpatient clinics were invited to complete both questionnaires just prior to their clinic review. The completed questionnaires were collected by an independent clinic staff member not involved with the subsequent outpatient review. The patients underwent complete assessment by the clinician including a review of their most recent imaging. A final decision regarding management was formulated, with shared care decision-making, and documented in the case notes. The clinicians were blinded to the questionnaire scores. The validity of the Phase I model, including the proposed USIQoL cut-off scores, was subsequently evaluated. The eligibility criteria, predictors and the outcome data assessed were similar to the development phase.

The predictors, for which the data was collected as well as inclusion and the exclusion criteria, are shown in Table 1. The predictors used are part of the standard protocols followed in the daily practices across all the urological departments.

Table 1 Inclusion/exclusion criteria and predictor data

Our outcomes involved looking at the relationship between the USIQoL domain scores (PPH and PSH) and the outcomes from:

  1. 1)

    Traditional face to face assessments using two key clinical parameters (A1 and A2 below) and

  2. 2)

    EQ-5D-3L scores when administered simultaneously.

The clinical predictors were:

  1. A1)

    Decision regarding clinical management: This was based on the face to face patient assessments involving history taking and clinical examination as well as imaging studies. It was categorised into outcome A (active interventional treatment, which included shockwave lithotripsy, ureteroscopy, percutaneous nephrolithotomy or medical therapy with curative intent, such as dissolution treatment), and outcome B (no intervention, suitable for simple follow-up).

  2. A2)

    We separately looked at the additional predictor of presence of ongoing symptoms, at the face to face assessment, attributable to the stones.

  3. B)

    The results of the EQ-5D-3L and EQ-5D-3L Visual Analogue Scale (VAS) scores served as an external independent variable to assess global health.

For subjective measures in general, including the studies and models involving application of the PROMs, the Food and Drug Administration (FDA) recommends different types of anchors or predictors, as external criteria approximating truth. These also help to generate relevant thresholds for meaningful within-patient change. These serve as tools for internal validation and are useful for the cross checking of the outcomes. Those used were (1) Established clinical outcomes (outcomes A or B in our case), (2) Global impression of change (development of stone-related symptoms, ‘yes or no’, a binary variable), and (3) Current-state global impression of severity (EQ-5D PROM index scores in this study, a rank variable with higher scores indicating better health state) [16]. For the purposes of this model, USIQoL scores from the 2 major domain scales (PPH and PSH—13 questions) were considered. The work domain (paid employment) was not included as it was not applicable to all of the patients uniformly.

Sample size considerations

We followed the rule of thumb sample size recommendation; a) for psychometric analyses of summated scales (PROMS), based on the guidance for their traditional validity assessments [minimum 10 subjects per item of the total scale items (13 items for 2 domains in the USIQoL)], b) to ensure at least 10 events for each predictor parameter (widely referred to as events per variable EPV) being considered for inclusion in the prediction model [17].

For the development phase, we had data from a sample representative of the secondary care urolithiasis patient population, of 305 patients (with minimum requirement of 130 patients) from the 4 centres. We decided to use all available data to maximise the power and generalisability and cover a higher outcome proportion. For the preliminary validation, we included the data from 150 patients (> minimum 130 patients). As the PROMs were completed by the patients in the hospital prior to face to face interaction, we did not encounter problems with missing PROM data. The data for all other predictors was available as part of the normal clinical practice. The number of patients who did not enter the study remained small.

Statistical analysis

The analysis was undertaken using SPSS (version 29) software. Spearman’s Rank Correlation Coefficients were used to calculate the correlations between PPH and PSH and EQ-5D scores. We used Binomial Logistic Regression to assess the strength of the relationship between the two domain scores (PPH and PSH) with the clinical outcomes (A or B), stone-related symptoms and radiological parameters (stone site, size). We used Receiver Operating Characteristic (ROC) analysis to assess discrimination between outcomes A and B by looking at the area under the curve (AUC). An AUC of 0.7–0.8 is considered acceptable [18]. We also calculated the sensitivities and specificities for different cut-off scores.

We then selected the optimal cut-off score by first evaluating Youden’s Index [19]. It provides the best trade-off between sensitivity and specificity. In our study, if the sensitivity with the highest Youden’s index was below 0.70, we deemed it unacceptable and chose the nearest threshold with a sensitivity above 0.70. We calculated the cut-off scores for the two domains independently as well as when applied in combination (being clinically most relevant). We determined the MCID using a combination of anchor-based and distribution-based statistical methods [20].

Results

Of the 503 patients who were invited to complete the study during the 2 phases (Phase I = 345 and II = 158), 455 (90.3%) patients participated (Phase I = 305, Phase II = 150). We have presented TRIPOD 22 point checklist as Appendix I. The results are shown in Table 2. There was a male preponderance in both phases with a similar mean age and age range. In phase I, a higher number of patients required intervention. The differences in distribution of stone site, and the proportion of clinical outcomes between the two phases was attributed to the fact that the main focus in phase II was outpatients, with a relatively smaller number of patients suffering from acute stone episode.

Table 2 Descriptive statistics

Table 3 demonstrates the distribution of the domain scores, across phases, for both clinical outcome predictor groups (A and B). The mean and median values for the PPH and PSH domains in both data sets are higher in the intervention, compared with the no-intervention group, as expected. There was significant difference between the scores, for both domains, between intervention and the non-intervention group (p < 0.05, Table 3). The phase II data demonstrated that renal stones under and above 11 mm, were 88% and 12% respectively (total renal stone size range 3 mm to partial staghorn). 42% ureteric stones were under 5 mm, 42% between 5 and 10 mm and 16% were above 10 mm (range 3 mm–15 mm). Both domain scores for patients with ureteric stones were higher when compared with renal stones. The mean PPH score for ureteric stones was 13.8 and 10.6 for renal stones, and the mean PSH for ureteric stones was 14.9 when compared with 12.5 for renal stones.

Table 3 Statistical analysis of domain scores

The relationship between the PPH and PSH domain scores with the important clinical predictor anchors (outcomes A or B and presence of symptoms) was statistically significant (Table 3). From this, it was clear, that the odds of the patient expressing symptoms of stones, and needing full clinical evaluation and subsequent active intervention, increased with the increasing USIQoL scores. The results confirmed good correlation and one-dimensionality between the two domains.

Figures 1 and 2 show the ROC curves, and the derived AUC values, for PPH and PSH for phase II. All of the potential cut-off values, including those for Phase I PPH and PSH, along with their accompanying sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and Youden’s Index, are shown in Table 2. The AUC values were 0.75 (PPH) and 0.76 (PSH) in phase II, which demonstrated the satisfactory ability of the model to differentiate between the two clinical outcomes.

Fig. 1
figure 1

ROC Curve for Phase II PPH

Fig. 2
figure 2

ROC Curve for Phase II PSH

The ROC analysis results, together with the sensitivity, specificity, Youden’s Index, Positive Likelihood Ratio and Negative Likelihood Ratio for a range of cut-off scores, were used to determine the most appropriate cut-offs for PPH and PSH. We initially assessed Youden’s index, taking note of the domain scores used alone and using a combined rule for both domains together. The cut-off values of 11 (PPH) and 13 (PSH), with the highest Youden index, had sensitivities below 0.7; this was deemed to be clinically unacceptable, and a more appropriate cut-off was chosen (sensitivity minimum of 0.8 and specificity around 0.5). The chosen cut-off scores were the combined rule of PPH 9 and PSH 10 (sensitivity 0.815, specificity 0.468, NLR 1.53, PLR 0.40, Youden’s Index 0.283). The combined rule means that the cut-off threshold is reached if either the PPH or PSH domain scores or both are above the chosen cut-off. The full range of cut-off values with the associated sensitivity and specificity can be seen in Table 4.

The application of the cut-off scores during phase II demonstrated good and consistent model fit. There was a very good correlation between the EQ-5D utility scores and USIQoL domains, which provided evidence for an aspect of internal validity of their use in outpatient settings (Table 3). In the outpatient setting (Phase II), of the 150 patients, 12 were found to have ureteric stones, with the clinical decision to intervene in 6 of those. The USIQoL domains scores for all 6 of those requiring intervention were above the cut-offs with 100% (PPH) and 83% (PSH) sensitivity. In this cohort, the sensitivities for the PPH and PSH domain for the renal stones, above cut off were 72% and 88% respectively in keeping with the multifactorial decision making for renal stones. The distribution of the domain scores and the respective clinical outcomes suggested the emergence of two groups in the patients with scores above the cut-offs. The results of the validation sample showed that 65% patients with the PPH domain score, and 55% with the PSH domain scores above 14 required intervention. The similar rates of intervention for the PPH and PSH domains with scores 10–14 were, 26% (PPH) and 33% (PSH) respectively. Patients with domain scores ranging between 10–14 points, are served better with full clinical assessment, including up-to-date imaging, to make a fully informed management plan. However, once the domain scores were above 14, the need for intervention was much higher, thus defining it as the high-risk group of patients.

Table 4 Phase I and II ROC curve cut-offs and for PPH and PSH

The analysis, to define an MCID that helps understand the magnitude of the QoL impact and change, using different methods, revealed a 3–4 point difference for each of the domain scores (Table 5). It appeared higher for the stable outpatient population with a lower prevalence of interventions. The MCID appeared to be similar across methods of calculation.

Table 5 Calculation of the minimal clinically significant difference (MCID) for the two domains using anchor and distribution-based methods

Discussion

This two-phase study on the development and preliminary validation of a PROM based prediction model demonstrates the clinical suitability of adopting USIQoL to aid management of urolithiasis in outpatients. It provides an alternative approach to patient-centric evaluation that is useful for daily practices. The results have demonstrated good correlation between the USIQoL scores and the outcomes of traditional assessments. The proposed cut-off scores indicate its ability to discriminate between key clinical decisions and identify those that are likely to need intervention(s).

There is evidence for the usefulness of the PROMs in clinical practice. At a micro-level, PROMs facilitate the detection of physical or psychological problems and adherence to treatments [21]. PROMs compare favourably with other common clinical measures in terms of reliability [22]. With real-time access to the PROM data, it helps clinicians prioritise topics for discussion along with improved patient-clinician communication [23]. At the meso-level, PROM data can help in comparative effectiveness research and evaluate the impact of interventions [24]. PROM-based follow-up has been utilised in other specialties. The National Health Service (NHS) England national PROMs programme has been using these in the follow-up of patients undergoing hip/ knee replacements since 2009 [25]. Within urology, there are established PROMs to monitor patients with lower urinary tract symptoms. Formal assessment of factors affecting a patient’s quality of life should be incorporated into clinical care and utilised to guide treatment decisions, as demonstrated in patients with localised prostate cancer treatment [26].

Lack of application of PROMs in urolithiasis, especially as part of routine clinical practice, is common. In addition, variations in clinical practices and a lack of standard follow-up strategies are long-term problems. Evaluation and follow-up of patients have implications for resources and outpatient waiting times. In the latest Urology Outpatient Transformation guide in the UK, “personalised follow up—patient initiated follow up” and “using remote monitoring” were highlighted as two key components with scope for improved PROM-based follow-up [27]. Following the COVID pandemic, there are pressures for changes to outpatient practices and increased acceptance of alternative methods of follow-up.

With all these considerations, we looked at the development and application of a USIQoL based prediction model applicable in outpatient settings. We hypothesised that its use would provide additional valid and reliable data on the impact of the disease on the patients and the PROM scores would correlate with the clinical outcomes. The results could serve as a guide for triaging higher risk patients who need detailed assessments and likely intervention. We proposed a 2 phase study with the development of a USIQoL-based diagnostic prediction model based on the existing data (phase I), followed by a hypothesis-driven prospective, outpatient-based preliminary validation (phase II) that assessed the suitability of this model and the cut-off scores.

The ability of the PROMs to improve decision-making relies on them accurately capturing the burden of disease or treatment. For the PROMs to be useful in the follow-up of chronic conditions, they must be relevant and actionable. It should state what small changes to the scores mean and when there is a need to act or decide on management plans [21]. Hence, we constructed and reported the results based on the 22 point checklist advocated in the TRIPOD guidelines. One way to lend meaning and interpretation to the PROM is to dichotomise between values where within-patient changes are considered clinically important (“responders”) and those that are not [28]. We followed this approach to devise the principal clinical outcomes, A or B.

The predictors (anchors) used in our study were universally acceptable and clinically relevant. Similarly, we carefully constructed cut-off scores (phase I) with relevant sensitivity analysis. After making certain it satisfied the requirements for the ROC analysis, we undertook prospective preliminary validation study with adequate sample size. The results of the logistic regression analysis confirmed satisfactory relationship between the USIQoL scores and clinical outcomes and helped to draw reliable conclusions.

The principles behind satisfactory sensitivity and specificity values (traditionally used for laboratory tests) are applicable in this setting. The high sensitivities and acceptable specificities for both PPH and PSH confirmed the potential of these cut-off scores. Previous PROM studies have shown that ROC values of around 0.6–0.7 are common and satisfactory for patient evaluation [29]. We thought that a sensitivity of > 0.70 was required given the risks of a cut-off with a lower sensitivity resulting in an inappropriately large number of false negatives (presence of potential stone-related complications but low PROM scores). It was assumed that the risks of false negative results would outweigh any benefits against the risks of false positives (need for additional consultation and/or imaging). Hence, we decided to go for the cut-off scores with higher sensitivity while accepting the relatively modest specificity. The results regarding the appropriate stratification for the cohort of patients with ureteric stones that need intervention, a common pressing clinical question, confirmed the validity of our approach. Ureteric stones with USIQoL scores above the cut offs were at the highest risk of intervention followed by the symptomatic stones with significant QoL impact. The PROM showed good and consistent performance in identifying this high risk group of patients.

The analysis of the cut-off scores identified potential cohorts of patients with an increasing likelihood of detailed investigations (scores above the cut-off) progressing to active intervention (scores ≥ 14).

There is a significant clinical interest in defining the MCID for a given PROM so that the magnitude of the clinical impact, or change, can be understood and standardised. It is well known that MCID is a complex concept with multiple facets and variable results based on the methods used. We used a combination of anchor and distribution-based methods to give the best estimates. Our anchors were easily interpretable, widely used, and well correlated with the USIQoL outcomes. The results were fairly consistent across the methods [30].

There are limitations to the study. It was conducted among the English-speaking population. The development of USIQoL in other languages and in other healthcare settings will need further evaluation. Although, the study involved a satisfactory number of patients based on the sample size requirements, it remains relatively small for the final step of large-scale formal validation. In patients with clinically high risk of stone recurrence (cystinuria, uncontrolled metabolic disease) or unreliable PROM assessments (neuropathic conditions) the PROM might only be used as an adjunct alongside the traditional assessments until more focussed data is available. Our study was conducted with USIQoL administered in a paper format when patients physically attended the clinic. This helped us achieve high response rates and no missing data. Sound strategies will need to be followed to achieve this when applied in other settings, such as the use of ePROM or virtual clinics. The strategies to handle missing data (imputation methods) and their full impact will need to be further explored.

There are multiple applications to our findings that would benefit different stakeholders. For the first time, the study establishes the role of PROM in urolithiasis in outpatient settings. It would improve patient engagement in their care with the implementation of patient-initiated follow-up strategies. For patients, the results would reduce frequent hospital visits and imaging. The PROM would substitute a blanket policy of traditional follow-up for all by offering alternative pathways incorporating e-PROM, nurse-led services, and possible application of artificial intelligence.

In conclusion, this study has demonstrated the usefulness of the USIQoL as an aid to outpatient management. The cut-off scores identify at risk patients with potential problems. In the lower risk patient groups, it provides a reliable tool for patient-centric evaluation and an alternative to the traditional follow-up. The prediction model is a useful triage tool that is suitable for patient initiated follow-up. The results offer framework for the large-scale validation of the model.