Sepsis is a clinical syndrome of systemic inflammation caused by infection and can range in severity depending on the presence of shock and organ failure [1]. It remains a worldwide problem, with over 750,000 cases of sepsis reported annually in the United States [2], and accounts for over 25 % of intensive care unit (ICU) admissions in Europe [3]. Historically, mortality rates for patients with severe sepsis or septic shock have been as high as 60 % [4]. While contemporary treatments have improved mortality rates [5, 6], a substantial number of patients with sepsis still die.

In 2001, enthusiasm for the use of biologic therapy for sepsis peaked with the approval of drotrecogin alpha activated (DAA), trade name Xigris, a recombinant human activated protein C (rhAPC). These regulatory decisions were based on the results of the PROWESS trial, which demonstrated that treatment with DAA led to a 6.1 % absolute risk reduction in 28-day mortality in patients with severe sepsis or septic shock, as compared with placebo [7]. rhAPCs appear to modify disease course through modulation of dysregulated coagulation and subsequent microvascular thrombosis, as well as additional antiinflammatory effects [810]. DAA was quickly adopted into clinical practice, although debate regarding its efficacy began almost immediately, due to several issues. First, the PROWESS trial was terminated early, since it met a priori stopping criteria for efficacy. Second, a study amendment was made midway through the trial. The amendment included changes in the inclusion and exclusion criteria for study enrollment, a change in the placebo used (saline vs. albumin), and changes in the formulation of the study drug. While there was no benefit observed for DAA prior to the study amendment, after the study amendment, the results favored the use of DAA [11].

Due to lingering questions regarding the efficacy of DAA, especially in subgroups that were less acutely ill, the ADDRESS trial was performed, focusing on adults with a lower risk of death from severe sepsis (APACHE II scores <25 or single organ failure) and showed no benefit [12]. Subsequent trials, including a pediatric trial [13] and a trial using an extended infusion of DAA [14], also showed no benefit. Given the rising controversy, the PROWESS-SHOCK trial [15••] was designed to provide a definitive answer in the era of modern critical care by randomizing adults with persistent septic shock after protocol-specified early volume resuscitation to either DAA or placebo. In October 2011, Eli Lilly withdrew DAA from the market [16] on the basis of the preliminary results from PROWESS-SHOCK, which showed no effect on 28-day all-cause mortality. At the time, there was an ongoing multicenter study performed in French ICUs [17] assessing the benefit of DAA with and without low-dose corticosteroids in patients with vasopressor-dependent septic shock (APROCCHSS). APROCCHSS was prematurely terminated with the withdrawal of DAA from the market, although it too showed no mortality benefit when data from already enrolled patients were analyzed (see Fig. 1).

Fig. 1
figure 1

Forest plot comparing the effect of DAA versus placebo on risk ratio (RR) for 28-day all-cause mortality in all placebo-controlled randomized clinical trials of DAA for severe sepsis and septic shock

The controversy continued, however, with the publication of a number of observational trials based on large numbers of patients that argued that “real-life” use of DAA consistently shows a mortality benefit (see Fig. 2) [1826]. This was further reinforced by a meta-analysis [27•] that focused on observational trials and found that in these patients, hospital mortality was reduced by 18 % with DAA use; this effect estimate was similar to that observed for PROWESS. Metaregression suggested that increased mortality in the control arm and more severe disease as defined by the APACHE II score were associated with DAA benefit; this is in contrast to a meta-analysis of randomized placebo-controlled trials that showed no mortality benefit to the use of DAA [28]. Thus, the following questions remain: Is DAA beneficial in patients with severe sepsis and septic shock? If so, why did PROWESS demonstrate a mortality benefit, while later randomized placebo-controlled trials did not, and likewise, why do observational trials consistently identify a mortality benefit, when randomized trials do not?

Fig. 2
figure 2

Forest plot comparing the unadjusted effect of DAA versus control on risk ratio (RR) for mortality (ICU, hospital, or 28-day as specified) in all observational trials that included a comparison arm for DAA in studies on severe sepsis and septic shock

From a strictly epidemiologic perspective, when observational trials disagree with randomized clinical trials, two major reasons are cited. First, only randomized clinical trials are able to account for unknown risk factors that can be controlled for by randomization alone. Even when observational studies are of high quality and known confounders are carefully controlled for, using multiple methods with consistent results, this remains an issue. In critical care, such a factor can be confounding by indication, where patients thought to be more likely to benefit from an intervention are also more likely to receive it. In this setting, it may be impossible to detect indication bias, even after the propensity to use the intervention is adjusted for. Thus, observational studies may not be able to determine whether the association found with treatment was due to the effect of treatment or to underlying and often unmeasured patient characteristics that led clinicians to use the intervention.

Studies in the use of pulmonary artery catheters (PACs) serve as an enlightening example. In the 1990s, PACs were in widespread use in critically ill patients despite a lack of randomized clinical trials demonstrating efficacy, due to the belief that they were essential to the care of critically ill patients. Connors et al. performed a study evaluating the use of pulmonary artery catheters, using high-quality data gathered from the observational SUPPORT trial [29]. Both propensity-score-based matching and multivariate regression demonstrated an association between PAC use and increased risk of death. This finding spurred five large multicenter randomized controlled trials to determine whether PACs were associated with increased mortality; all randomized trials found no evidence of harm [3034]. It is only in retrospect that we can conclude that the findings of Connors et al. were biased, likely because of confounding by indication. Physician judgment that led to the placement of a PAC to guide therapy was not well captured in standard variables collected for SUPPORT, and sicker patients were more likely to receive PACs. This propensity was not captured even by traditional measures of disease severity such as APACHE II scores, underscoring the difficulty observational trials or databases have in capturing this element. The “missed” effect size can be quite large. As an example, a recent observational study of patients with H1N1 who were mechanically ventilated found that prone positioning was associated with a 4.07 increased odds of hospital mortality even after adjusting for severity of illness [35]. Yet a subsequent randomized controlled trial found that early prone positioning in ARDS led to a 16.8 % absolute reduction in 28-day mortality and a 17.4 % absolute reduction in 90-day mortality, again demonstrating the difficulty in disentangling the effect of treatment from underlying patient characteristics in observational trials.

In the case of DAA, if unmeasured confounding accounts for the divergent results seen between randomized clinical trials and observational trials, the subjects who received DAA in observational studies would be more likely to survive than those who did not. DAA has been estimated to add up to $16,000 to overall costs per patient treated [36], and one could argue that in clinical practice, given the known high cost of treatment, physicians may have consciously or subconsciously selected subjects more likely to survive in order to justify the use of an expensive intervention. One international multicenter observational study described patients who received DAA in clinical practice as being younger, with fewer comorbidities [22], suggesting that such a selection process did indeed occur with real-life use of DAA.

However, unmeasured confounding would not explain why early randomized clinical trials with DAA showed a benefit, while later ones did not. A second factor cited in discrepant results between randomized trials versus observational trials relates to the limited generalizability of study populations enrolled in clinical trials. It is apparent that the mortality rate of the control arms in clinical trials is substantially lower than that reported in observational trials (see Fig. 1 vs. Fig. 2)—on average, 23.4 % versus 48.2 %. Although heterogeneity exists in the reporting of type of mortality (ICU vs. hospital vs. 28-day mortality) in observational trials, it would be difficult to imagine that this heterogeneity in reporting can account for a doubling of the mortality rate seen in observational trials, as compared with clinical trials. One potential explanation for the large differences in the baseline mortality may be changes in contemporary care of sepsis. In 2001, the same year that PROWESS was published, Rivers et al. demonstrated that with early goal-directed therapy, hospital mortality decreased from 46.5 % to 30.5 % in patients with severe sepsis or septic shock [37]. Relevant to the question of DAA, subjects receiving early goal-directed therapy had a lower prothrombin time, D-dimer, and concentration of fibrin split-products in the 6- to 72-h interval following the early resuscitation intervention. This raises the question of whether favorable modulation of the coagulation system with early resuscitation may have reduced or eliminated the potential benefit of DAA. Through the efforts of the surviving sepsis campaign, early identification of severe sepsis, adherence to early volume resuscitation, and early appropriate antibiotic therapy have been slowly adopted over the years [5, 6]. An observational study recently demonstrated that the odds ratio of death decreased by 4 % a year from 1997 to 2007; this was accompanied by a decrease in the time to administration of appropriate antibiotics [26]. This study also found that while DAA was associated with a 6.1 % absolute reduction in 30-day mortality in patients with septic shock, the mortality benefit was confined to the subgroup who received delayed antibiotics. No effect of DAA was seen in subjects who received antibiotics within 6 h of shock onset when DAA-treated patients were compared with propensity matched controls.

Further support for the idea that early fluid resuscitation and appropriate antibiotic administration may have modulated the potential benefit of DAA comes from looking at these measures in PROWESS-SHOCK and APROCCHSS. These are the only two clinical trials in DAA that have reported the timeliness and appropriateness of antibiotic administration and volume resuscitation. An important inclusion criterion of PROWESS-SHOCK was the requirement for at least 30 ml/kg of intravenous fluids early in the course. Furthermore, patients in PROWESS-SHOCK were treated with antibiotics for a median of 2.5 h prior to shock onset, subsequently determined to have received initial appropriate antibiotics 84 % of the time, and source control was judged to be adequate in 90 % of subjects who needed it. These measures may have accounted for the lower than expected mortality rate of the control arm in PROWESS-SHOCK, as compared with PROWESS (24.2 % vs. 30.3 %). Although the mortality of the control arm in APROCCHSS approached PROWESS at 34.5 %, as in PROWESS-SHOCK, an inclusion criterion was the presence of vasopressor-dependent shock despite adequate volume resuscitation; subjects received, on average, 1,626 ml of crystalloid and/or 909 ml of colloid, and median time from onset of infection to initial antibiotics was reported as 0 (mean 1 ± 7) h. Therefore, both PROWESS-SHOCK and APROCCHSS were performed in subjects who received early goal-directed therapy and antibiotics. Early goal-directed therapy in these clinical trials, which likely was not or was only variably implemented in patient care in the observational trials (based on the included study periods; see Fig. 2), may have reduced or, possibly, eliminated the potential benefit of DAA.

An additional factor that may explain the striking difference in mortality rates between the randomized and observational trials in DAA may simply relate to the population studied. Was there, in fact, a responsive subset receiving DAA in clinical practice that was systematically excluded from clinical trials? While all clinical trials with DAA excluded subjects with significant liver disease, dialysis-dependent renal failure, or recent use of high dose aspirin, glycoprotein IIb/IIIa antagonists, warfarin, or nonprophylactic doses of heparin, these were relative contraindications for use in clinical practice and were left to the discretion of the treating physician. Off-label use of DAA has been reported to be over 10 % in one observational study [19], although it most frequently pertained to delayed initiation of DAA. Use of DAA in those with coagulopathy has been associated with increased mortality in a Veterans Administration study [38], although the sample size of this study was small. Patients with a significant coagulopathy were, however, excluded from PROWESS as well, and yet PROWESS showed a mortality benefit with the use of DAA.

It is also important to note that both PROWESS-SHOCK and APROCCHSS focused on vasopressor-dependent shock, in part due to the observation that this subgroup appeared to be particularly responsive to DAA in PROWESS [7]. While this clinical phenotype of persistent septic shock was thought to be the “optimal” clinical phenotype in which to test the efficacy of DAA 12 years ago, it is not clear that patients with persistent septic shock, especially after early resuscitation, uniformly experience dysregulation of coagulation or inflammation. Consider that only 40 % of patients in PROWESS-SHOCK had protein C activity less than 40 %, even though the median norepinephrine dose was 21–24 mcg/min approximately 17 h after shock onset and after nearly 4 L of fluid was administered. While it remains unclear whether low protein C activity is a good biomarker to identify patients likely to benefit from DAA (since a predefined subgroup analysis in PROWESS-SHOCK did not show evidence of benefit in those with protein C deficiency), this example shows the disconnect between protein C activity as a potential biomarker and the septic shock phenotype targeted in PROWESS-SHOCK [39].

It is possible, then, that the wrong subgroup of patients was selected for later definitive trials to assess the efficacy of DAA. In retrospect, the regulatory requirement to study lower disease severity subjects in ADDRESS, a subgroup with a very tenuous link to DAA responsiveness, was probably a mistake. A confirmatory trial, perhaps combined with a biomarker discovery aim, would have been a more sensible approach. Potential biomarkers for DAA responsiveness that could have been tested were elevated plasma levels of D-dimer, IL-6 [7], or microvascular alterations visualized by orthogonal polarized spectroscopy [40]. The development and regulatory pathway tilted toward testing in subpopulations defined by clinical scores (APACHE and sequential organ failure scores) or use of catecholamines for shock. As was noted above, these phenotypes are not necessarily linked to a pathophysiologic process or processes that would be expected to predict DAA responsiveness. However, this path of development was probably necessary to define the indicated population, given the regulatory climate of the time. Future drug development for patients with severe sepsis should be more strongly tied to subpopulations with the specific pathophysiologic alterations targeted by the new therapies. Such targeting will almost certainly require biomarkers.

In conclusion, we return to the past. It has long been recognized that sepsis clinical phenotypes, including “systemic inflammatory response syndrome,” have substantial limitations and are not strongly linked to underlying pathophysiologic pathways or alterations. Twenty-six years ago, after yet another failed sepsis trial, Roger Bone referred to the troubled 17th century baroque innkeeper Don Quixote, who returned to his senses and his home after attacking windmills masquerading in his mind as giants [41]. Dr. Bone lamented, “In sepsis, we have had a similar quixotic adventure with a simplistic and elementary understanding of the pathogenesis of sepsis. We tilted at windmills by trying to block ‘evil humors’ with magic potions. . . . We are now searching for the imaginary ‘magic bullet’ without even a semblance of a homogeneous patient population under the categoric definition ‘sepsis syndrome.’ I hope we will return home to our senses as did Don Quixote.” We hope so too.