Introduction

Interest in comparative effectiveness research has increased as multiple treatment modalities became available for common oncologic conditions, including prostate cancer. Over the last decade, there has been an increase in the utilization of minimally invasive radical prostatectomy (MIRP), including both robotic and laparoscopic radical prostatectomy (RALP and LRP) [1]. Moreover, direct-to-consumer advertising and the perception that newer technology equates with better outcomes have culminated in the rapid uptake of RALP, which presently comprises of an estimated 86% of radical prostatectomies performed in the United States. [2]. However, comparative studies assessing the efficacy of different treatments are lacking. The Agency for Healthcare Research and Quality advocates for the linking of treatment decisions with outcomes under the assumption that meticulous and standardized reporting of outcomes may ultimately lead to improved patient care [3]. Multiple validated instruments have been developed to aid in the assessment and reporting of prostate cancer care outcomes [4]. The Institute of Medicine (IOM) has prioritized comparative effectiveness research dedicated to robotic-assisted surgery due to the prevalence and healthcare costs associated with utilization of this technology [5]. This brings attention to the need for comparative outcomes studies to better define the varying approaches to radical prostatectomy. We discuss the types of studies available, how they contribute to the literature, as well as their limitations.

Randomized controlled trials

While randomized controlled trials (RCT) have classically been the gold standard for outcomes analyses, they have notoriously been lacking from surgical literature [6]. Multiple reviews have found that the RCTs that do exist in the urologic literature are far from ideal [68]. Specifically, Scales explain that of RCTs, less than 50% of studies report crucial methodological study characteristics [6].

The dearth of randomized studies is exemplified in the comparison between minimally invasive and open surgeries for the treatment of prostate cancer. Ficarra recently performed a meta-analysis comparing open retropubic radical prostatectomy (RRP), laparoscopic radical prostatectomy (LRP), and RALP [2]. Of 23 studies examined, only one was a RCT, which was limited to a single surgeon’s experience of 60 men undergoing RRP and LRP [9]. Ficarra concludes that the ideal study would involve a prospective, multi-center, comparative study, involving the best surgeons in RRP, LRP, and RALP, using standardized inclusion and exclusion criteria surgical techniques, pathologic evaluation, postoperative care, functional questionnaires, strict follow-up, and evaluation by a third party.

Currently, it is not feasible to randomize men to receive minimally invasive versus open surgery. First, it is not possible to blind the patient to which intervention they will receive as surgical incisions give away the operation. While it may be possible to compare LRP and RALP, as they utilize similar trocar sites, there are ethical concerns regarding blinding patients to which surgical intervention they have received and so-called “placebo” surgery [1012]. Additionally, while “placebo” arthroscopic surgery has been performed in the Orthopedic literature [12], these surgeries involved minor debridements and were not fraught with long-term quality of life issues such as urinary and sexual function as well as cancer control, which are important outcomes when performing a radical prostatectomy.

Second, most surgeons preferably perform either the open or the minimally invasive approach. Although there are some series comparing single surgeon outcomes with both approaches [1316], it is rare for one surgeon to provide both operations. Therefore, if a RCT were to be performed, there would likely be multiple surgeons providing one treatment or the other rather than a single surgeon. This is an inherent problem, as surgical outcomes rely heavily on individual surgeon technique which is largely heterogeneous, and subtle variations may lead to significantly different outcomes [17, 18]. Additionally, surgeons may be on different points of the learning curve for their respective approach. Therefore, difference in outcomes observed in a surgical RCT may be due to variations in technique and surgeon experience rather than a true difference in one approach over the other. Unfortunately, these are variables that are difficult to adjust for in a surgical RCT.

Finally, in this age of widespread internet and consumer advertising, patients have become increasingly self-educated with regard to their own medical care leading to strong preferences for one surgical approach. Therefore, patents and surgeons alike lack adequate equipoise to randomize to one surgical approach over the other [19]. The inability to find multiple patients willing to be randomized to RRP versus MIRP substantially limits the ability to perform an adequately powered RCT. As a result, the overwhelming majorities of studies are non-randomized studies, limiting the literature to, at best, level II evidence.

Observational studies

In the absence of RCTs, observational studies are the primary source of contemporary data in the urologic literature, including studies on RRP, LRP, and RALP [20]. Compared with RCTs, observational studies typically provide larger study populations and also present fewer ethical conundrums regarding patient enrollment and informed consent. Observational studies, especially those that provide prospectively collected data, have proven to be excellent sources of short-term perioperative outcomes, such as estimated blood loss, postoperative pain, and lengths of stay. Improvements in all three of these parameters have been shown for LRP and RALP when compared with RRP in observational studies [2, 14, 16]. Observational studies have also incorporated the use of validated instruments in order to measure long-term postoperative sexual and urinary function, as well as overall quality of life (QOL) [21].

The experiences of several high-volume institutions have shown excellent clinical results for LRP and RALP that are comparable to previously reported RRP series [2226]. Unfortunately, due to limitations of RCTs mentioned above, there are few studies in which modalities are compared head to head. One notable exception is the observational study performed by the Urologists from Vanderbilt, who present outcomes of RALP and RRP performed by a single surgeon. These authors noted decreased estimated blood loss, less change in postoperative hematocrit, and lower transfusion requirement in men undergoing RALP versus RRP [16].

Meta-analysis of observational studies, while not ideal, is another means to compare outcomes between modalities. As mentioned, Ficarra performed a meta-analysis consisting largely of single-center observational studies and noted that while RALP and LRP tend to have improved perioperative outcomes, overall outcomes were comparable to those of RRP [2]. A meta-analysis by Coelho revealed lower positive surgical margins rates, higher continence rates, and higher potency for men undergoing RALP versus LRP and RRP [27]. However, a meta-analysis is only as good as the data that are collected and is thus limited by the weakness of observational data.

While helpful and more readily available than RCTs when comparing surgical approaches, observational studies have many drawbacks. First, cohorts are most often from single-center, high-volume, tertiary referral centers. This greatly limits the ability of such data to be generalized to the population and thus, somewhat limits their applicability to many community centers. Unlike other rare conditions, which may only be seen at tertiary centers, prostate cancer is the most common non-cutaneous malignancy diagnosed in US men [28] and its treatment is not limited to highly specialized centers. Thus, there remains a need for generalizability of data regarding prostate cancer treatment outcomes, as treatment is frequently performed outside of high-volume tertiary centers.

Second, observational data lack a consistent standard for outcomes reporting. Ficarra illustrated that urinary and sexual functional outcomes among RRP, LRP, and RALP studies often lack standardization, providing a wide range of results [29]. This makes comparisons between observational studies difficult, if not impossible. Krupski illustrated the difficulty that arises due to the use of disparate reporting methods for urinary function [30]. Defining sexual function presents a similar problem, as the term “potent” can mean different things to different people and sexual function outcomes may depend on which validated scoring system is used which also differs between studies [19]. Additionally, when assessing quality and transparency of data reporting in urologic observational studies, Tseng noted major deficiencies in outcomes reporting within the urologic literature, with only half of the studies examined meeting the methodological and reporting standards applied to most RCTs [31].

Finally, another significant drawback of observational data and case series is the lack of consistent long-term cancer control data. The goal of treating men with prostate cancer is to cure them of the disease; however, in high-risk patients with localized disease up to 40% to 50% may have biochemical recurrence within 10 years of surgery [2, 16]. When dealing with a generally slow growing malignancy such as prostate cancer, insufficient long-term follow-up may be an obstacle to differentiating the true efficacy of various treatment modalities, with follow-up times ranging from several months to 5 years in most studies [3234].

Secondary datasets

Some of the limitations seen with case series and observational studies are obviated with the use of secondary administrative data that are more generalizable to current patterns of care and provide larger numbers of subjects to include in analysis. Major advantages of secondary data include the availability of multiple patient and hospital level variables that allow for analyses of specific treatment modalities and their clinical outcomes. Variables of interest generally available in secondary datasets those are not available in case series include surgeon or hospital volume, detailed patient demographics, and comorbid conditions, all of which are factors that may impact clinical outcomes [35]. These data are accessed by identifying ICD-9 diagnostic and CPT procedural codes within the dataset and analyzing outcomes associated with these codes.

While a direct comparison of RRP, LRP, and RALP has yet to be made in prospective trials, comparisons have been made possible using secondary data. Unfortunately, given the lack of a specific CPT code for robotic assistance at this time, LRP and RALP outcomes must be grouped together and referred to collectively as minimally invasive radical prostatectomy (MIRP). Hu utilized data from the Surveillance, Epidemiology, and End Results database linked Medicare (SEER-Medicare) registry, which allows for the simultaneous linkage of pathologic data to diagnostic and procedural codes [36, 37]. This database provides very useful for analyzing perioperative complications as well as long-term data for assessing cancer recurrence and late complications. While the study supported previous data that MIRP confers shorter lengths of stay compared with RRP, results indicated the men undergoing MIRP were more likely to develop diagnoses for incontinence and sexual dysfunction, which is contrary to the purported benefits of MIRP.

While these registry data provided useful insight into the comparative effectiveness of RRP and MIRP, these unexpected results also highlight some of the limitations of using secondary data. In the case of MIRP versus RRP, one major limitation of registry data is the lack of validated sexual and urinary function measures. Instead of validated instruments measuring these important variables, one is limited to diagnosis and procedure codes to define erectile dysfunction and urinary incontinence. The use of administrative data to quantify diagnoses of incontinence and impotence is less sensitive than self-assessment with validated instruments [38]. Obviously, this is not the most ideal method to measure these outcomes, as results are often subjective and can vary depending on patient perception and physician documentation. Additionally, severity of diagnosed conditions cannot be measured using registry data. In the case of the aforementioned study using SEER-Medicare by Hu, while men undergoing MIRP were more often diagnosed and treated for erectile dysfunction and incontinence, there was likely a reporting bias as these patients may have had higher preoperative expectations or, perhaps, better baseline function and thus more disappointment with their outcomes. Also, those patients undergoing MIRP may have been more likely to complain of sexual or urinary dysfunction and therefore, diagnosed more often. These variables simply cannot be controlled for using registry data, making it a poor source for measuring functional outcomes in patients treated for prostate cancer.

Another drawback is that outcomes research from registries by definition may only be as good as the actual data collected and stored in the registry. Many of the problems conducting outcomes research from registries are related to a mismatch between the available data and the questions being asked. While this may bring to question the validity of conclusions that can be made, data from Medicare describing complications have been shown to be quite robust, with 89% concordance with medical chart abstraction [39]. Additionally, Medicare-linked pathologic data provided by SEER are not complete. For example, Gleason scores for patients are categorized as either “well-differentiated,” “moderately differentiated,” or “poorly differentiated” rather than reported as specific scores for each patient. Additionally, specific PSA values are not available, and PSA is reported as elevated or not elevated.

Another limitation of secondary datasets is that they are often limited to a specific subset of patients. For example, because SEER data are linked with Medicare, the data are limited to patients over the age of 65 and limit the generalizability of the results to younger cohorts. Finally, while these data have the continuity to describe longer-term follow-up, at this point the changing utilization of different treatment modalities limits its use. For example, in the case of RRP versus MIRP, the utilization of RALP has substantially increased since the early 2000s. If a study compares RALP to RRP from 1998 to 2008, follow-up may be limited in the RALP group due to higher volume toward the end of the study window.

Regardless of the drawbacks associated with administrative datasets, they remain important sources of comparative effectiveness between RRP and MIRP, especially in the absence of adequate RCTs and observational studies. They provide Urologists with accurate assessments of hospital stay and perioperative complications representative of community and academic practices. Additionally, in the near future, CPT codes will differentiate between LRP and RALP, allowing a more accurate comparison between treatment modalities. While not optimal for functional outcomes assessment, secondary data provide extremely useful data when comparing surgical treatment of prostate cancer.

Conclusions

In the absence of RCTs, comparative effectiveness data analyzing outcomes of RRP, LRP, and RALP are largely provided by observational data and secondary administrative datasets. Non-randomized observational studies provide adequate perioperative data from tertiary centers, however suffer from observer bias, varying surgeon experience, inconsistent outcomes definitions, and lack of generalization to community practice. Secondary datasets provide population-based objective outcomes assessment applicable to both high-volume academic as well as community settings, however lack functional outcomes assessment. However, until standardized prospective comparative analyses of RRP, LRP, and RALP are conducted, comparative outcomes data will remain imperfect. Researchers must strive to provide the best available outcomes data through accurate prospective data collection and consistent outcomes reporting.