1 Introduction

Over the last 25 years, oncologists have witnessed an unprecedented development of new drugs with different characteristics. In the 1990s, researchers faced new challenges regarding the development of new molecular targeted agents (MTAs), as some previous assumptions entrenched in cytotoxics development mandated a review [1]. First was the concept of biologically effective doses for demonstrating the proof-of-mechanism of certain MTA [2]. Second, there was pressure for re-defining new response criteria. Most MTAs act by modulating intracellular signaling pathways, so both the modified Response Evaluation Criteria in Solid Tumors (mRECIST) [3] and Choi criteria [4] emerged as suitable tools for assessing modifications in tumor biology. Also, the appearance of new and chronic secondary effects not accurately graded in the Common Terminology Criteria for Adverse Events (CTCAE) grading system [5] highlighted the need to develop new scores.

But if MTAs revolutionized early drug development (EDD) and they could be considered the second paradigm in cancer treatment (chemotherapy being the first one), new immunotherapeutic strategies represent the third paradigm. Cancer immunotherapy is not something new [6]. Nevertheless, it was not until 2010 when immunotherapy reached its current relevance, when the Food and Drug Administration (FDA) approved the immunotherapeutic agent sipuleucel-T for metastatic castration-resistant prostate cancer (mCRPC) [7] and the immune-checkpoint inhibitor ipilimumab for advanced melanoma [8]. Since then, we have seen unstoppable progress in this third paradigm [9]. The overwhelming success seen with immune-checkpoint inhibitors in several malignancies with dismal prognosis to date [8, 1016] and the high cure rates achieved with adoptive T-cell therapies in some hematological conditions [17, 18] have led researchers to consider immunotherapy a breakthrough in cancer treatment. In addition, immunotherapy could shift the treatment paradigm of orphan diseases in which active therapeutic options are scarce.

Immuno-oncology (IO) can be considered a new paradigm since major differences are obvious when compared to cytotoxics and MTAs. Table 3 depicts the main differences across the three paradigms in EDD.

Table 1 Differences in tumor assessment criteria in each of the three paradigms
Table 2 Common toxicity profiles across immunotherapy approaches
Table 3 The three paradigms in early drug development

Herein, we review the controversial issues that shape this new paradigm in cancer treatment.

2 Accuracy of Preclinical Data to Predict Toxicity and Efficacy in Humans

One of the major issues that immunotherapy development has faced is the imprecision of toxicological assumptions extrapolated from animal models. Clinical experience in IO shows that preclinical studies do not accurately predict human secondary effects. For instance, certain organ-specific toxicities might be overrated in animals, becoming less reliable predictors of human toxicities [19]. On the contrary, preclinical studies may underestimate actual toxicity in humans. As a notorious example, in 2006 six healthy volunteers participating in the first-in-human (FIH) study of TGN141 developed life-threatening cytokine release syndrome (CRS) after receiving a single dose of a CD28 super-agonist monoclonal antibody, reflecting the unpredictability of the immune-modulation with current preclinical models [20, 21]. Lessons learned from this incident highlighted that animal models can fail to predict human toxicities due to the particularities of the immune system (IS) of each species [22].

In this direction, it has been postulated that differences in immune cells migration in response to tissue inflammatory signals, in T-cell recognition molecules and their regulatory signals, and even in cross-linking of surface antigens, among other causes, could have a major role in determining efficacy patterns and safety profiles of immunotherapeutic agents [21]. However, most trials testing new immunotherapies have enrolled patients irrespectively of their human leukocyte antigen (HLA) subtype. Wolchock et al. [23] performed a retrospective analysis comparing pooled data from four phase II trials testing ipilimumab stratifying by HLA-A*0201 status. The analysis concluded that HLA does not have statistically significant correlation neither with toxicity nor with efficacy. More recent data with other checkpoint inhibitors in non-small cell lung cancer (NSCLC) also support this observation (L Mezquita, et al. Abstract 1223P. ESMO Meeting 2016). However, other reports suggest that specific HLA subtypes could possibly be related with risk of developing autoimmune toxicities with immunotherapy [24]. Also, for other immunotherapeutic approaches such as vaccines, chimeric antigen receptor (CAR) T-cells or other cellular therapies, the role of HLA in identifying epitopes whose expression is restricted to pathological tissues will be crucial in order to ensure on-target specificity whilst avoiding toxicities [25].

3 Selection of Patient Populations

It seems crucial to accurately define patient populations most likely to benefit from immunotherapy. Inclusion/exclusion criteria based on a strong biological rationale and pre-selection of candidates based on predictive immune-biomarkers are outstanding issues that merit further research.

Historically, most immunotherapy studies have excluded concomitant steroid treatment assuming that they may cause immune-modulation, which can diminish efficacy of immunotherapy, although there are no conclusive data supporting this [26]. Similar assumption applies with the combination of immunotherapy and cytotoxic drugs, but some recent data suggest that this approach is effective and safe [27]. Also, controversial is the potential trigger of an underlying autoimmune disease [28, 29] or re-activation of attenuated microorganisms in patients who recently received vaccinations. The main limitations for clarifying this is that these sub-populations have been generally excluded from immunotherapy trials and there is not enough experience “outside” clinical studies reported yet. In addition, influence of previous treatments still needs to be characterized. Gerlinger et al. showed how metastatic clear cell renal carcinomas (mRCC) pre-treated with everolimus significantly present lower T-cell infiltrates and intra-tumoral heterogeneity of T-cell clones, which can lead to reduced efficacy of immunotherapy [30]. Rising data have also suggested increased risk to develop unexpected toxicity during a new line of treatment following previous immunotherapy, probably because the biological effect of the immunotherapy is still present long after the last dose of treatment [31].

Different immunotherapy strategies have proven to enhance anti-cancer activity but little is yet known about predictive biomarkers of response to any of these approaches. Subsequently, in parallel to the development of IO drugs, there is an intense research program on biomarkers. For instance, important efforts have been made to evaluate the clinical role of programmed death-ligand 1 (PDL1) expression. Development of anti-PDL1/PD1 antibodies has been accompanied by detailed analysis of PDL1 expression in human tumors and immune cells, generating unanswered questions. First, it is unclear whether assessment of PDL1 expression is better in tumor cells (TCs) or in tumor-infiltrating lymphocytes (TILs) in terms of efficacy prediction. Emerging data suggest that anti-PDL1 therapy is more effective in patients whose TILs express high levels of PDL1 [32], whereas the degree of PDL1 expression in TCs seems more closely correlated with response to anti-PD1 blockade [33]. Interestingly, responses with anti-PDL1/PD1 drugs have been reported among patients with negative expression of PDL1 in tumors such as bladder cancer [34] or NSCLC (F Barlesi, et al. Abstract LBA44_PR. ESMO Meeting 2016). On the other hand, data indicate that patients with higher PDL1 expression are more likely to achieve higher response rates (RR) also in bladder cancer [34] and NSCLC [15, 16]. Dynamic changes in PDL1 expression during time, intra-tumoral heterogeneity, changes in epitopes conservation, and lack of standardized thresholds among screening tests could account for some of these apparently contra-intuitive responses [35]. Also, previous therapies may impact PDL1 expression [36]. Moreover, preclinical data suggest that lack of TILs may predict failure to anti-PD1 therapy [37]. Pre-existing CD8+ TILs located at the invasive tumor margin may in part induce PDL1 expression in TCs, favouring response to immunotherapy. Among responders to PD1 blockade, post-treatment analyzed samples have shown proliferation of TILs, which correlates with radiologic responses [38]. Subsequently, combined presence or absence of TILs and PDL1 expression has been proposed as a tool to classify tumor microenvironments [39].

Recent advances in technology have revealed differential patterns of genomic alterations across different tumor types [40]. Although the relationship between genomic landscapes and clinical benefit with immunotherapy is still not completely understood, there is increasing evidence that the higher the mutational load of a tumor the greater the benefit achieved from immune-checkpoint blockade [41]. Furthermore, identification of tumor antigen signatures present in certain responders to immunotherapy, together with tumor-microenvironment characteristics, widens the horizon for considering that genomics can help to predict efficacy to immunotherapy [4245]. In addition, intra-tumor heterogeneity might also have implications in response to checkpoint inhibitors [46]. Figure 1 shows strengths and limitations of three models of potential predictive immune-biomarkers.

Fig. 1
figure 1

Strengths and weaknesses of potential predictive immune-biomarkers under development

It is even more controversial whether these potential biomarkers may have a predictive value with other immunotherapeutic agents apart from immune-checkpoint inhibitors. Identification of epigenetic modifications in genes involved in immune response, evaluation of T- and B-cell repertoire and cytokine profiling following the administration of immunotherapies, among others, will probably shed some light into this field [47]. Figure 2 depicts some potential biomarkers for personalizing immune strategies.

Fig. 2
figure 2

Potential biomarkers for personalizing immune strategies. Abbreviations: pMHC, peptide major histocompatibility complex; TLR, Toll-like receptor; IFNα, Interferon alfa; TNFα, Tumor necrosis factor alfa; VEGF, Vascular endothelial growth factor; CRP, C-reactive protein; TILs, Tumor infiltrating lymphocytes; Tregs, regulatory T lymphocytes; Teff/Treg ratio, ratio lymphocyte T effector/lymphocyte T regulatory; sCD25, soluble CD25; IFNγ, interferon gamma; CTLA-4, Cytotoxic T-lymphocyte-associated protein-4; PDL1, Programmed death ligand 1; PDL2, Programmed death ligand 2; LAG-3, Lymphocyte-activation gene 3; TIM-3, T-cell immunoglobulin and mucin domain containing-3 protein; BTLA, B- and T-lymphocyte attenuator protein; sLDH, soluble lactate dehydrogenase; IDO, Indoleamine-2,3-dioxygenase enzyme; TGFβ, Transforming growth factor beta

4 Definition of Suitable Endpoints

Optimal assessment of objectives by endpoints suitable for cytotoxics or MTAs is still not well established with the new generation of immunotherapy treatments [48]. The novel mechanism of action (MoA) of these agents confers a series of specific characteristics to these drugs. Careful selection of endpoints that truly reflect the complex interactions between host IS and tumor are needed in order to identify early effective new compounds or futile agents.

Classically, efficacy of cytotoxics has been determined by growth or shrinkage of predefined lesions in serial tumor imaging. World Health Organization (WHO) criteria [49], RECIST [50, 51], or Revised Assessment in Neuro-Oncology (RANO) response criteria [52] have been used for years to assess tumor response. New MTAs led to the development of new imaging methods of response evaluation, such as mRECIST [3], Choi criteria [4], or Positron Emission Tomography (PET) Response Criteria in Solid Tumors (PERCIST) [53]. These guidelines incorporated not only static tumor measurements, but also functional assessments to characterize better the underlying biology behind tumor response.

While the most appropriate method to evaluate response to MTAs is still not fully established [54], the novel immunotherapy agents have added another layer of complexity to the field. Thus, at least four distinct models of immune response have been described: shrinkage in baseline lesions without appearance of new lesions; durable stable disease followed by response; shrinkage after an initial increase in total tumor burden; and response in spite of the presence of new lesions [55]. These patterns of response, initially described by Wolchok et al. [55], reflect complex biological processes in which both the host IS and the tumor are involved. For instance, the situation in which patients experience initial tumor progression or stabilization followed by tumor response might be due to the time the IS requires for expansion of T-cells before tumor infiltration [48]. On the other hand, patients whose tumors have an initial growth or new lesions appear followed by a decrease in tumor burden (pseudo-progression) might be experiencing infiltration by TILs preceding a late radiological response [56, 57]. Therefore, it is important to develop reliable radiological criteria able to determine accurately efficacy [58]. Early identification of non-responders is relevant to avoid unnecessary toxicity caused by immunotherapy, but it is equally important to recognize, which patients are receiving an efficacious immunotherapy and, therefore, should not be shifted to another drug even though the radiological response is not so evident [59]. The immune-related response criteria (irRC) try to address these issues [55]. However, irRC emerged in parallel to the development of anti-CTLA4 drugs, and extrapolation of these criteria to other immunotherapies should be done with caution. Other criteria such as unidimensional irRC or immune-related RECIST (irRECIST) have also been proposed [60, 61]. Table 1 summarizes differences in radiological criteria used for tumor assessment in each of the three paradigms.

Whichever the guidelines used, they should be a reliable tool to reflect the dynamic changes of the host IS and their effect on tumor cells. Some collaborative groups, such as the Cancer Vaccine Consortium and the International Society of Biological Therapy of Cancer, have made a series of recommendations in order to achieve this goal in forthcoming clinical trials: long-term clinical improvements and/or response after progressive disease (PD) can happen with immunotherapies; benefit/risk ratio has to be carefully considered before discontinuation; PD has to be confirmed; and durable stable disease (SD) may represent true benefit [62]. Other authors such as Ribas et al. [59] have proposed to prospectively establish radiologic landmark analyses at pre-specified delayed time-points, which would be a method to assess response caused by delayed expansion of anti-tumor T-cells [63]. However, this could be a problem when treating patients with rapidly growing disease in which an early confirmation of drug activity is needed.

Although the real incidence of pseudo-progression remains unclear, it represents a clinical challenge. If not correctly identified, it can mislead the physician’s decision making and, therefore, influence the clinical development of a drug. To differentiate between pseudo-progression and true progression, the irRC propose a confirmation by a consecutive radiological assessment no less than 4 weeks after progression was first documented [55]. Importantly, the clinical status of the patient should also be taken into account. If there is progression by imaging but the patient has a good performance status and/or has symptomatically improved, pseudo-progression has to be considered. On the contrary, if the patient presents clinical deterioration and there is clinical concern that 4 weeks is too long to repeat imaging, then it is probable that it is true progression. If pseudo-progression is suspected, a sensible approach is to perform a biopsy to assess the degree of TILs infiltration [37, 38, 64]. However, this still is an investigational approach and further investigation is needed.

Accurate survival assessment is another challenge in this new paradigm in EDD. Overall survival (OS), the strongest endpoint, is difficult to measure because it usually implies long-term follow-up and is influenced by subsequent lines of treatment [65]. In an attempt to overcome these obstacles, surrogate endpoints for OS are used, although they do not always correlate with OS [66]. This is a relevant issue with immunotherapy [7, 8, 10, 63, 6772]. As an example, the randomized phase III study in mRCC, which compared nivolumab versus everolimus found clear superiority of the immunotherapeutic agent in terms of OS, but not in PFS [63]. Albeit not frequent, the phenomenon of pseudo-progression might impact PFS assessment making it a not reliable surrogate endpoint for OS. If a tumor growth is wrongly classified as PD and patient is withdrawn from trial, PFS data will not truly reflect the clinical benefit of the immunotherapeutic drug.

Another outstanding challenge is comparison between survival curves. The most used method in oncology to determine whether an investigational agent is superior to another treatment is to compare Kaplan-Meier curves. However, this cannot be so readily applicable with immunotherapy. Usually, responders to chemotherapy or MTAs tend to do so early in the course of treatment. Since clinical benefit with immunotherapy can be delayed, early interpretation of survival curves might be difficult. For instance, in the IMPACT study comparing sipuleucel-T with placebo in mCRPC, separation of survival curves was only evident after approximately 6 months of treatment [7]. Similar effect is observed with ipilimumab in melanoma [8, 10] or with pembrolizumab in NSCLC [73]. This delay in the separation of survival curves can jeopardize the success of an immunotherapy since it reduces the statistical power to assess the difference between the curves [48]. Indeed, the phase III study of ipilimumab in mCRPC after progression to docetaxel was considered negative because it failed to reach the pre-determined primary endpoint of OS. However, when carefully assessing survival curves, a separation favouring ipilimumab after month 10 can be seen [74]. Design of new immunotherapy trials should consider the fact that hazard ratios between curves may not be constant over time, highlighting the need of a better statistical model to assess differences in survival [48]. If RR or early assessment of PFS are not reliable endpoints for determining if an immunotherapeutic agent is effective or not, the pace of clinical development of these drugs will necessarily be slow. The current pipeline of early clinical trials with immunotherapy agents is very extensive so a method to rapidly determine which drugs merit further development is therefore needed.

5 Emerging Toxicity of Immunotherapeutic Agents

Apart from checkpoint inhibitors, a broad spectrum of immunotherapeutic agents with different MoA is currently under development. Early identification of new and unexpected toxicities and standardized management guidelines are needed to further develop these therapies.

A unique profile of toxicities termed immune-related adverse events (irAEs) has been described with anti-CTLA4, anti-PD1, and anti-PDL1 antibodies [75]. The most frequent irAEs are well characterized and guidelines for their management are widely available [7683].

Toxicity data from other immunotherapies are still scarce, but new information is increasingly being reported. Infusion of ex vivo expanded tumor-reactive T-cells (adoptive T-cell therapy) has proven to be effective in certain cancers [84, 85]. Prior non-myeloablative chemotherapy is administered to enhance engraftment of transferred cells, followed by T-cell growth factors such as interleukin-2 (IL2) [86]. Although severe events are infrequent and usually manageable, sepsis secondary to immunosuppression caused by conditioning chemotherapy is the cause of the 1-2% rate of death observed with this treatment. Also, important is CRS, which can occur shortly after administration of T-cells and can lead to multiorgan failure. In a recent study of CD19 CAR in leukemia, all enrolled patients had some degree of CRS [87]. Of note, interleukin-6 (IL6) has been identified as a key cytokine implicated in CAR T-cells-mediated CRS. Administration of the IL6 receptor-blocking antibody tocilizumab to patients showing early signs of CRS has been described to achieve symptoms control [88]. Autoimmunity can also be induced with adoptive T-cell therapy by targeting a normal self-protein [89, 90], which is a potential life-threatening toxicity [25, 91]. Another immune therapeutic approach with a specific toxicity profile is cancer vaccines. A recent review evaluated 239 clinical trials with nearly 5,000 patients who received a cancer vaccine. Remarkably, 162 grade 3, 4, and 5 AEs were attributed to vaccination. Of those toxicities, 60 were local reactions, 40 were constitutional symptoms, and five were related to adjuvants used to enhance vaccine anti-tumor effects. The outstanding 62 systemic AEs were reported by investigators to be at least “possibly related” to vaccines [92]. Finally, it is also worth considering interleukines. Although they are not new agents, their side effects need to be well managed since many current immunotherapy strategies use interleukines as part of combinatory regimes. Toxicity profile of some of these molecules such as interferon alpha (IFNα) include fever or fatigue, and depressive symptoms have also been described [93]. Hematological toxicity with thrombocytopenia and leukopenia can occur in up to 10% of patients, as well as hyper- or hypothyroidism [94]. High dose IL2 is recommended to be administered in an inpatient setting by an experienced team with cardiac monitoring and hemodynamic support because it can lead to serious toxicities such as capillary leak syndrome [95].

Table 2 shows common toxicity profiles across different immunotherapy approaches.

6 Need for New Clinical Trial Designs

The frameworks in which cytotoxic agents and MTAs have been developed are not optimal for immunotherapeutic compounds. Appropriate early clinical trial designs that reflect their special features are, therefore, needed (Table 3).

For instance, most new immunotherapeutic approaches are being developed in combinatory regimes, but it is an open question as to which is the optimal design for multi-agent early clinical trials. Evidence suggest that using a checkpoint inhibitor as a backbone adding another anti-cancer agent is the most sensible strategy [114]. For instance, combination of antiangiogenics with immunotherapy might achieve synergistic effects since the antiangiogenic agent modifies the tumor microenvironment facilitating T-cells access to malignant cells and, therefore, enhancing their activity [114]. But should drugs be administered concomitantly or sequentially? Some toxicity results with concomitant checkpoint inhibitors and MTAs have shown unacceptable toxicity profiles [115, 116]. Would sequential or intermittent administration of at least one of the compounds have resulted in more tolerable side effects? Lack of predictive preclinical data and absence of a deep biological understanding of some of the toxicities observed [116] are limitations to answer that question. Is sequential therapy superior to concomitant in terms of activity? Antigens release by chemotherapies that induce immunogenic tumor cell death followed by checkpoint inhibitors might be superior to concomitant schedules [117, 118]. Trials aiming to achieve an abscopal effect, a phenomenon in which local radiotherapy triggers tumor response at a distant site [119], merit special mention. In addition to case reports [120122], some promising clinical trial data have been recently reported [123]. However, the schedule and total dose of radiotherapy, or the most appropriate technique to use remain to be elucidated [124126].

Definition of dose-limiting toxicity (DLT) is another challenge. Classically, the appearance of severe toxicity in a period of time (DLT period) is regarded as DLT. This is crucial in EDD since the classification of a drug as “safe” or “unsafe” depends very much on the DLTs observed. For chemotherapeutic drugs and MTAs the DLT period is usually as long as one or two cycles of treatment. But definition of DLT period with immunotherapeutic drugs is more unclear due to their MoA. It is not rare that irAEs appear relatively late in the course of the immunotherapy treatment [83]. Therefore, some severe toxicities might happen once the DLT period is clear and never account as DLT, underestimating the actual toxicity of the immunotherapeutic agent. Also, permanent conditions induced by immunotherapy such as endocrinopathies might have a detrimental effect on patients’ quality of life, but they are usually not recorded as DLTs. New AEs criteria and new trial designs reflecting long-term toxicities are needed to truly reflect the actual toxicity of immunotherapy.

Similarly to DLT, maximum tolerated dose (MTD) determination should also be adjusted. MTD concept is based on chemotherapy and MTA paradigms presumption that efficacy and toxicity increase with the dose. Consequently, MTD and recommended phase II dose (RP2D) is a balance between achieving maximum efficacy with acceptable toxicity. This seems to be true with ipilimumab [127], but dose–response relationship has proven to be not directly proportional with all immunotherapeutic agents [72]. Also, dose and toxicity are not always consistently related either, as demonstrated in several trials in which MTD was not reached [34, 128, 129]. Moreover, some data suggest that it might be a different toxicity pattern for the same dose of an immunotherapy depending on tumor type [130, 131]. This observation may be the consequence of different effects exerted by distinct histologies on the host IS.

In addition, pharmacokinetic (PK) and pharmacodynamic (PhD) properties of the new immunotherapeutic drugs, also significantly differ from chemotherapy or MTAs. The differential feature is that immunotherapy acts on the IS of the cancer patient, whereas chemotherapy or MTAs directly attack tumor cells. So the concept that the higher the concentration of a drug in blood the higher the efficacy because more drug reaches the tumor is not applicable with immunotherapy. A relatively low dose of an immunotherapeutic agent can be sufficient to initiate the IS activation against tumor. The IS is able to maintain its activation by itself, so high doses or continuous schedules of an immunotherapeutic drug might not be necessary to achieve long-term anti-tumor response and biological effects might last longer than PK/PhD analyses are able to reflect [132]. Moreover, this can also have implications in late appearance of severe toxicities when administering subsequent drugs. Long-lasting biological effects of immunotherapy might overlap with new lines of treatment leading to toxicities as if it was a de facto combinatorial regimen [31]. Optimal duration of treatment is also affected by this. With chemotherapy, the longer the patient is able to be in treatment the better in terms of efficacy. With MTAs this is not so evident, and intermittent schedules have been proposed. But this issue is even more challenging with immunotherapy. Which is the optimal duration of treatment if biological effects of immunotherapy are not reflected by PK/PhD data? Is it necessary to keep the patient on treatment as long as the tumor growth is controlled? Are a predefined number of doses enough to achieve long-lasting responses? Standard schedule of ipilimumab is four doses, whereas other agents such as anti-PD1 antibodies nivolumab or pembrolizumab are given until tumor progression or intolerance. Smarter new clinical trial designs able to solve all these outstanding questions, among others, need to be developed.

7 Conclusions

After chemotherapy and MTAs, we suggest that immunotherapy has become the third paradigm in cancer. The novel MoA of these new immunotherapeutic compounds entail a whole new series of features that are changing the way we develop anti-cancer drugs. Establishment of preclinical models that accurately predict toxicity and efficacy, selection of patient populations, finding and validation of biomarkers, assessment of response, determination of suitable endpoints, management of toxicities, or development of new clinical trial designs are among the challenges and obstacles that need to be overcome.

Evidences support that the optimal strategy for further immunotherapy development is combinatory regimens. There is a solid biological background for combining immunotherapy drugs with MTAs, chemotherapy, radiotherapy, or other immunotherapeutic compounds with different MoA. Clinical results reported to date confirm the rationale behind this strategy [133, 134]. However, we should be cautious with new and unexpected toxicities that may hamper the success of these novel regimens so an adequate EDD is essential. Careful and thoughtful preclinical and clinical approach must be taken to early identify immunotherapeutic treatments unacceptably toxic or insufficiently effective.

The third paradigm of immunotherapy in oncology is still an evolving field. Although important goals have been achieved so far, there are still many questions that remain unanswered. The final role of immunotherapy in cancer will be determined by our capacity to find the right answers to those questions in forthcoming years.