Introduction

Autosomal dominant polycystic kidney disease (ADPKD) is the most common inherited kidney disease worldwide [1]. The theoretical life-time cumulative risk of ADPKD has been estimated to be 1/1000, while a minimal single time point prevalence is estimated to be ~3–5/10,000 from European population-based studies [2]. ADPKD is characterized by focal development and enlargement of cysts with increasing age leading to the distortion of kidney architecture and ultimately, end-stage renal disease (ESRD) in a majority of patients. Mutations of two genes (i.e. PKD1 and PKD2) account for 75–85 and 15–25 % of patients, respectively [35]. The field of therapeutics in ADPKD has seen a significant expansion recently, as major clinical trials have provided promising evidence in favor of new disease-modifying drugs. Though these trials are encouraging, limitations are noticeable in the form of methodological issues that restrict the interpretation of results. In this review, we focus on critiquing the methodological weaknesses of high-profile randomized control trials (RCTs) of novel drugs targeting ADPKD which have been published since 2009. A summary of the key characteristics and outcomes of these RCTs is shown in Table 1, while Table 2 shows the principal drawbacks of each individual trial. Our goal is to provide investigators with insight into some of the pitfalls to be avoided when designing future ADPKD trials. The following themes are discussed: study design; patient sample size; patient selection and outcome measures; study duration, dropout, and compliance; and outcome report and analyses.

Table 1 Summary characteristics of high-profile clinical trials
Table 2 Methodological critique of the trials

Study design

Perico et al. conducted a two-period cross-over study to assess the effect of rapamycin vs. conventional treatment in 16 patients with ADPKD [6]. Two groups of patients were randomized to 6 month treatment with rapamycin added to conventional therapy (period 1), followed by 6 months of conventional therapy alone (period 2), or vice versa. This study design provides repeated measurements of the outcome for both the experimental and control treatment in each patient and is intuitively attractive because of a reduction in patient sample size [7]. However, the premise of this study design is that the outcome measure is chronic and stable (i.e. once a treatment is withdrawn, after an appropriate wash-out period, the outcome will return to a baseline value). Examples of diseases that may satisfy this key assumption include asthma, irritable bowel syndrome, and depression. However, such is not the case for ADPKD which is a slowly progressive disease. Furthermore, for short-term studies such as the one conducted here the treatment effect from period 1 can be carried to period 2. While the authors observed that total kidney volume (TKV) tended to show less increase on sirolimus than on conventional therapy alone, the difference between the two treatment sequences was not statistically significant (for a very small sample size). Moreover, a carry-over effect from rapamycin (i.e. patients receiving rapamycin in period 1 tended to have less TKV expansion in period 2 under control therapy) was detected, further confounding the interpretation of the results. To prevent the carry-over effect, the investigators might have included a wash-out period during which no treatment was attempted [7]. In this case, for example, since sirolimus has a mean half-life of 60 h and tends to accumulate in solid organs following repeated oral administration, a wash-out period of a few weeks could have been considered [8]. In general, this study design is not recommended for ADPKD.

The study design using “intention-to-treat analysis” accounts for every patient who is randomized according to protocol assignment and includes those who are withdrawn due to personal reasons, noncompliance, deviations from protocol, adverse events, and severe adverse events [9]. The main advantage of this approach is to give an unbiased estimate of the treatment effect, preserve the sample size, and minimize type I error; however, it is conservative with the risk of increasing type 2 error [9]. Modified intention-to-treat is a variation that allows for the exclusion of a number of randomized patients if justified (i.e. participants considered ineligible after randomization) [9], as in Caroli et al. [10]. Other examples of the modified intention-to-treat approach include the TEMPO 3:4 and everolimus trials, in which only participants with at least one MRI measurement of TKV after baseline were kept in the primary efficacy analysis [11, 12]. While many studies reported the use of intention-to-treat or modified intention-to-treat, some did not state explicitly how the analyses were conducted. We believe the modified intention-to-treat approach provides a reasonably robust analysis of the RCT results without being excessively conservative. All randomized controlled trials should use this approach, as recommended by CONSORT guidelines [13].

The choice of an appropriate intervention for the control arm can sometimes be challenging. In the HALT-PKD (study A) trial, participants were randomized to either lisinopril plus telmisartan or lisinopril plus placebo, with second-, third-, and fourth-line antihypertensive agents added as needed to achieve the targeted blood pressure goals [14]. However, no control arm was included to test anti-hypertensive drugs that do not block the renin-angiotensin-aldosterone system (RAAS). The rationale of the study was based on animal data suggesting a role for the RAAS in the promotion of renal cyst growth through its mitogenic effects. The authors stated clearly that this hypothesis had not been adequately tested in patients with APDKD. However, without a non-RAAS treatment arm they would not be able to address this question. In HALT-PKD (study A and B), lisinopril combined with telmisartan did not show a benefit in changing estimated glomerular filtration rate (GFR), in comparison with lisinopril alone [14, 15]. However, it remains unknown whether RAAS blockade would have had a class-specific benefit on the study outcomes compared to other classes of anti-hypertensives that do not target the RAAS. A second example of issues related to study control measures is the TEMPO 3:4 trial where patients in both the tolvaptan and placebo group were asked to maintain good hydration to ensure that blinding remained optimal [11]. Suppressing the release of vasopressin in the placebo group may have attenuated the differences in outcomes between the two groups. The authors admitted that the rates of kidney growth in the placebo group were lower than in previous APDKD trials.

Patient sample size

Adequacy of patient sample size is critically important for any clinical trial to ensure proper statistical power. Thus, an underpowered negative study is not informative. Our review of the ADPKD clinical trial literature indicates that a majority of RCTs reported to date were likely conducted with inadequate sample size, with several high-profile studies having total cohorts of fewer than 100 patients [6, 10, 16, 17]. RCTs with small patient numbers have limited power to detect even a modest treatment effect but are at the same time prone to confounding due to imbalanced randomization and unequal allocation of patients with similar risk characteristics to the treatment and control groups. This problem is further compounded by studies with multiple outcome measures in which borderline p-values were reported without adjusting for multiple comparison; potentially resulting in spurious associations and increasing the likelihood of type 1 error [6, 10, 17, 18]. It should also be noted that sample size calculation requires stipulating a mean treatment effect and its variance on the outcome measure of interest. In this regard, the rate of changes in TKV, which is used as the primary outcome for all the RCTs published to date, presents a challenge since it is a non-linear trait and its variance may be non-normally distributed. Thus, to provide a reliable estimate of the sample size of this outcome an additional measure such as log-transformation is required.

Patient selection and outcome measures

Several studies have sought to combine assessment of interventions in the setting of ADPKD and autosomal dominant polycystic liver disease (ADPLD). Combining patients with these two different cystic diseases can potentially confound the interpretation of the treatment effect if they respond differently. Moreover, it makes the selection of an appropriate primary outcome challenging as some patients would be excluded when TKV is selected as an outcome measure. For instance, in the study of octreotide in 42 patients with either ADPKD or ADPLD by Hogan et al. [16], 13 of these patients were excluded from the TKV analysis, rendering the RCT design invalid for this outcome.

Another important issue is whether the patients recruited have a reasonable likelihood to benefit from the intervention. In the setting of ADPKD, this means that selecting patients with too mild or advanced disease is not ideal and can negatively impact on the study power. The disease in the former patients is unlikely to demonstrate any measureable changes during the trial period. By contrast, the latter patients are unlikely to respond to even an effective treatment due to the lack of significant functional kidney parenchyma. Most RCTs have set a lower TKV limit (e.g. >750 ml) to minimize enrolling patients with mild disease. On the other hand, none have set an upper TKV limit to minimize patients with advanced disease. In some studies, patients with TKV as high as 7 L were included [12]. More importantly, excluding atypical cases of PKD based on their imaging pattern and the use of an age-adjusted TKV-based risk classification as proposed by the Mayo Clinic have the potential to improve the homogeneity of the study population [19]. For example, enrolling patients with class 1D/1E will identify a high-risk and more homogeneous cohort. The impact for such an approach is expected to reduce the patient sample size for the RCT while minimizing exposure of low-risk patients to experimental treatments with potential harmful effects. Similarly, given the importance of specific PKD1/PKD2 mutation classes for delineating different patient risk groups for progression [35], they can be utilized as an entry criterion in the future RCTs to select a more homogeneous study population. In this context, most patients with protein-truncating PKD1 mutations will also correspond to those with the Mayo class 1D/1E. Homogenization of study patients to select a high-risk cohort for RCT can increase the power to detect a treatment effect and increase the robustness of the study design.

With respect to the choice of primary outcome measure, the use of TKV as a surrogate biomarker for progression of ADPKD is supported by the Consortium for Radiologic Imaging Study of Polycystic Kidney Disease (CRISP) which showed that baseline TKV strongly predicts subsequent loss of GFR [20]. TKV is now widely used in RCTs for ADPKD and provides the critical data for the recent approval of Tolvaptan by regulatory agencies in Canada, Europe, and Japan, but not in the US. Renal blood flow from MRI is also a promising biomarker if it can be shown to be highly reproducible in the RCT setting [21]. Additionally, the inclusion of a measure of non-cystic volume that reflects the relatively normal kidney parenchyma has the potential to improve the correlation of TKV to renal function decline. In this regard, the intermediate volume defined by contrast-enhanced CT scan has been reported to correlate with progressive loss of renal function in a small cohort of patients with ADPKD [22]. However, its validity needs to be confirmed by studies that include a larger patient number and the requirement of contrast may limit its utility, especially in patients with impaired renal function. Another promising approach to improve the utility of imaging-based biomarkers for ADPKD is magnetization transfer mapping using non-contrast MRI which has been recently shown to define both cystic and fibrotic compartments that were highly correlated to the histological findings in a PKD1 mouse model [23]. These developments have the potential to refine and improve the next-generation of imaging-based volumetric biomarkers for evaluation of therapeutic efficacy in ADPKD. By contrast, the use of traditional outcomes such as CKD stage 4 and ESRD in RCT will be limited given the requisite long follow-up time necessary for the development of these events.

Study duration, dropout, and compliance

Few published RCTs in ADPKD assessed treatment outcomes beyond 6 to 12 months. For instance, in their randomized, crossover trial, Perico et al. saw no significant changes in either TKV or GFR between 6 month sirolimus treatment vs. conventional therapy [6]. Similarly, the duration of the RCT by Van Keimpema et al. was also only 6 months [17]. In the latter RCT, there was a significant treatment effect in reducing TLV, but not TKV. However, it is unclear whether a longer treatment duration may allow the detection of a beneficial treatment effect for TKV as well. Another case illustrating the importance of adequate follow-up is the Walz trial, in which there was a significant slowing of TKV expansion with everolimus (compared to the placebo) treatment at 12 months which was not sustained at 24 months [12]. In the same study, the estimated GFR increased initially with everolimus (compared to the placebo) treatment, suggesting a positive effect of treatment, but then declined more than placebo from 6 to 18 months. For more traditional clinical outcomes (such as progression to ESRD), a much longer study duration would be needed; however this may not be practical.

High dropout rates may also complicate any RCT. In general, a 10 % drop-out rate is not unusual, particularly in long trials due to patient wishes, serious adverse effects, non-compliance, protocol violation, and death. However, high dropout rate may seriously impact on the clinical trial. Significant patient dropout can create an imbalance of patients with different risk characteristics for disease progression in the treatment and control arms, resulting in spurious association, and can potentially confound the interpretation of the study results. For example, in the 2-year RCT reported by Walz et al. 33 % of their patients treated with everolimus vs. 15 % in the control group did not complete the study. The percentages of missing MRI measurements at 12 and/or 24 months were 44.9 % and 31.3 %, respectively, and imputation was used to provide some of these missing values [12]. In the end, 79 % of the patients in the treatment arm and 81 % in the control arm with at least one-year data were analyzed. The results of this study suggested that everolimus treatment was associated with a significant reduction of TKV expansion on the one hand, but a more exaggerated rate of decline in eGFR on the other [12]. While these discordant results might reflect a true treatment effect, the high patient dropout could have resulted in an imbalance of treatment and control patients with similar baseline risk characteristics (e.g. ht-TKV, eGFR, and proportion with PKD1 vs. PKD2), rendering the trial no longer randomized.

Problems with patient compliance may limit the maximal treatment effect in a RCT. For example, the mean rate of adherence in HALT-PKD (study A) was significantly lower in the lisinopril-telmisartan group than in the lisinopril group [14] and this was not due to a difference in the occurrence of adverse events. Achieving optimal blood pressure (BP) control also proved challenging. The systolic and diastolic BP, as measured at home, was on target in 40–66 and 58–75 % of patients in the low BP group, respectively, and in 32–48 and 33–52 % of those in the standard BP group, respectively. Surprisingly, while the number of patients with side effects potentially attributable to the drugs (e.g. dizziness) was greater in the low BP group, the rate of adherence remained superior in that group. Reinforcing the importance of compliance frequently, in addition to standard strategies, such as pill counting and timely assessment for potential side effects, may improve adherence.

Outcome report and analyses

Inconsistency in reporting of baseline characteristics of participants is also a common issue. It is generally preferable to provide the baseline characteristics of the analyzed, as opposed to randomized, patients, as shown in the sirolimus trial by Perico et al. [6] and Cadnapaphornchai et al. [18]. By contrast, other RCTs only presented the baseline characteristics of randomized participants [1012, 14, 16, 17, 24]. Both approaches would provide equivalent results if the patient drop-out rate is not significant. However, in studies with a small sample size or high patient drop-out rates it is important to examine the baseline characteristics (e.g. eGFR, TKV, and proportion with PKD1 vs. PKD2) of the analyzed rather than randomized patients in both the treatment and control groups. Such an exercise can provide critical insight to assess whether the randomization was balanced, as in the RCT reported by Walz et al. [12]. In this regard, two other RCTs are also of interest. First, the HALT-PKD (study A) trial reported that intensive BP lowering (compared to standard BP control) was associated with a lowered annualized percent increase of TKV but no change in eGFR [14]. However, examination of the baseline characteristics of the analyzed patients showed a higher proportion of PKD2 patients who received the intensive (19.8 %) vs. standard (13.1 %) BP treatment. Second, the ALADIN study reported that octreotide-LAR (compared to placebo) treatment slowed TKV increase [10]. However, examination of the baseline characteristics of the analyzed patients in the placebo (compared to octreotide-LAR) treatment arm showed that they had a higher mean baseline serum creatinine (i.e. 108 vs. 92 umol/L) and TKV (i.e. 2160 vs. 1560 ml). These findings might reflect an imbalance of randomization of patients with different risk characteristics in the placebo vs. treatment arm and provide an alternative explanation for the observed results.

Investigating treatment effects in ADPKD also presents challenges in the measurement and analysis of outcomes, particularly with TKV. Measurement of TKV should be done according to standardized protocols and image analysis [25], and inter-observer variability should be defined. Concurrent use of CT and MRI modalities in the same study by Hogan et al. could potentially increase measurement-related variability [16]. Also height-adjusted TKV (HtTKV) may help to standardize patients’ TKV to body size [26]. With respect to statistical analyses involving TKV, there are notable lessons to be learned from the recent literature. Mean percentage change in TKV is not a linear trait, and as such, requires log-transformation to provide a more precise estimate of the treatment effects. This was seen by Serra et al. [23], who added an amendment to the primary efficacy analysis planned for the study.

Conclusion

In this focused review we have critically examined methodological issues and lessons learned from the published RCTs in ADPKD. From this review, we highlight a number of suggestions for future improvement including designs to enrich a more homogeneous patient population (i.e. based on age-adjusted TKV and underlying mutation class) at high-risk for disease progression, appropriate study duration and patient sample size that are matched to the disease severity of the study patients, and the use of baseline characteristics (i.e. renal function, TKV, and the proportion of PKD1 and PKD2 patients) of the analyzed patients as a quality control measure to assess any potential imbalance in randomization. Furthermore, the recognition that TKV change is not a linear trait is important in both the study design and interpretation. Implementing these lessons learned from the published trials will greatly enhance the robustness and validity of future clinical trials in ADPKD.