Introduction

High-dose melphalan followed by autologous stem cell rescue (ASCT) is the standard of care for younger patients with newly diagnosed multiple myeloma.1 However, the vast majority of patients’ relapse and post-transplant strategies in order to delay or prevent relapse are needed. Post-transplant maintenance therapy has been explored with controversial results. Thus, corticosteroid maintenance was found to prolong the duration of response, but the effect on overall survival (OS) was unclear.2, 3 Alfa-2b interferon (alfa2-IFN) has been used as maintenance therapy in a number of trials. In two meta-analysis, one on published results4 and other based on individual patients data,5 alfa2-IFN resulted in a modest but statistically significant benefit in progression-free survival (PFS) of 4–6 months and even a benefit in OS also from 4 to 7 months. Despite these positive, although modest, results, the benefit of alfa2-IFN maintenance was not considered as substantial and this drug has not remained as standard maintenance therapy. The immunomodulatory drug thalidomide has been investigated in six prospective randomized trials6, 7, 8, 9, 10, 11 showing an improvement in PFS in all of them and OS benefit in three.6, 8, 12 Moreover, the benefit in specific populations such as those with high-risk cytogenetics remains controversial.10, 12 The proteasome inhibitor bortezomib has only been tested in post-transplant maintenance phase in the HOVON trial.13 In this study, bortezomib was given in both the induction and the maintenance phase, and was compared with the control arm including induction with vincristine, doxorubicine, dexamethasone, followed by maintenance with thalidomide. Because bortezomib was administered not only as maintenance but also during the induction phase and because no random assignment for maintenance therapy was performed, the impact of bortezomib in the maintenance was not independently assessed. Lenalidomide, a more potent and less toxic immunomodulatory drug, has been investigated in three randomized trials14, 15, 16 showing a highly significant prolongation of PFS, but the improvement in survival remains controversial. Moreover, an increased risk of second primary malignancies was of concern with lenalidomide maintenance.

The Spanish Myeloma Group (PETHEMA/GEM) conducted a randomized phase III trial comparing induction with thalidomide/dexamethasone (TD) vs bortezomib/thalidomide/dexamethasone (VTD) vs vincristine, BCNU, melphalan, cyclophosphamide, prednisone/vincristine, BCNU, doxorubicin, dexamethasone/bortezomib (VBMCP/VBAD/B) in patients 65 years old or younger with newly diagnosed symptomatic multiple myeloma (MM) and autologous stem cell transplantation (ASCT) with MEL-200, followed by maintenance with thalidomide/bortezomib (TV) vs thalidomide (T) vs alfa2-IFN. The induction part of the study was published17 and the maintenance results are reported here.

Patients and methods

Eligibility

Patients with newly diagnosed and untreated symptomatic MM who were 65 years or younger with measurable serum and/or urine M-protein were eligible for entering the study. The main inclusion criteria required performance status <3, hemoglobin level ⩾8 g/dl, neutrophil count ⩾1 × 109/l, platelet count ⩾50 × 109/l, liver enzymes <100 IU/l, serum bilirubin <1.5 mg/dl, serum calcium <14 mg/dl and serum creatinine ⩽2 mg/dl. The main exclusion criteria were peripheral neuropathy grade ⩾2, systemic amyloidosis, and a positive serology for HIV or hepatitis B or C. The study was approved by the Spanish National Health Service and by all the local institutional ethics committees, and was conducted in accordance with the principles of the Declaration of Helsinki. All patients gave written informed consent.

Study design and end points

Patients were centrally randomly assigned to receive VBMCP/VBAD/B versus TD versus VTD. Combination chemotherapy with VBMCP/VBAD chemotherapy plus bortezomib consisted of a total of four cycles of alternating VBMCP and VBAD at doses and schedules as previously described,18 followed by two cycles of intravenous bortezomib (1.3 mg/m2 on days 1, 4, 8 and 11 at 3-week intervals). TD consisted of thalidomide 200 mg daily (escalating doses in the first cycle: 50 mg on days 1–14 and 100 mg on days 15–28), and dexamethasone 40 mg orally on days 1–4 and 9–12 at 4-week intervals for six cycles. The VTD arm was identical to TD plus intravenous bortezomib 1.3 mg/m2 on days 1, 4, 8 and 11 of each cycle. The duration of the induction therapy was 24 weeks in all arms. All patients were planned to undergo ASCT with high-dose melphalan at 200 mg/m2 followed by stem cell support. Three months after ASCT, patients with at least stable disease were randomized to receive maintenance therapy with alfa2-IFN (starting dose of 1.5 MU subcutaneously three times per week that could be increased to 3 MU at investigator discretion depending on the tolerance) versus thalidomide 100 mg per day orally (T) versus thalidomide 100 mg per day orally plus one cycle of intravenous bortezomib on days 1, 4, 8 and 11 every 3 months (TV). Maintenance therapy was planned for 3 years. Treatment was discontinued in case of disease progression, undue toxicity/adverse events, or consent withdrawal. The primary end point was PFS, and the secondary end points were increase in response rate, OS and safety.

Dose adjustment in case of peripheral neuropathy

Patients included in thalidomide or bortezomib and thalidomide maintenance were carefully evaluated for neurological toxicity. Thalidomide was reduced to 50 mg per day, and bortezomib to 1 mg/m2 or 0.7 mg/m2 in case of grade 2–3 peripheral neuropathy (PN) and/or neuropatic pain. In case of grade 4 PN, the drug was permanently discontinued. Given that neurological toxicity of bortezomib is characterized by peripheral neuropathy plus neuropatic pain, while neurological toxicity of thalidomide is characterized by PN without pain, the dose modifications in the TV arm were done as follows: patients with neuropatic pain only—the dose of bortezomib was modified. In patients with neuropatic pain plus PN, the dose of bortezomib and thalidomide were modified.

Response assessment

Response and progression were assessed according to the criteria of the European Group for Blood and Marrow Transplantation.19 The very good partial response (VGPR) category, as defined in the uniform response criteria for MM, was also assessed.20 In short, complete response (CR) required the disappearance of the original myeloma protein in serum and urine immunofixation, and <5% bone marrow plasma cells. Partial response (PR) required a serum M-protein decrease of 50% or more, and a urine M-protein decrease ⩾90% or more, and/or to <200 mg per 24 h, as well a reduction in 50% or more in the cross-sectional areas of extraosseous plasmacytomas. VGPR required a decrease of the serum M-protein ⩾90% and a 24 h urine M-protein excretion <100 mg. Progressive disease required a serum M-protein increase of ⩾25%, and an absolute increase of at least ⩾5 g/l and/or an urine M-protein increase ⩾25%, and an absolute increase of at least ⩾200 mg per 24 h. Relapse from CR required the reappearance of serum or urine M-protein by immunofixation or electrophoresis, development of ⩾5% bone marrow plasma cells in bone marrow, increase in skeletal involvement, hypercelcemia or development of extramedullary plasmacytomas, or appearance of any other sign of progression. Responses reported by the investigators were centrally reassessed.

Fluorescence in situ hybridization studies

Bone marrow plasma cells were isolated with anti-CD138-coated magnetic beads using the AutoMACs automated separation system (Miltenyi Biotec, Auburn, CA, USA). Interphase fluorescence in situ hybridization analysis was performed with specific probes (Abbott Molecular/Vysis, Des Plaines, IL, USA) for 13q and 17p deletions, and immunoglobulin heavy chain (IGH) translocations including t(11;14), t(4;14) and t(14;16) as previously described.21 All cytogenetic studies were centrally performed at the laboratories of Hospital Clínico of Salamanca and at the Hospital Doce de Octubre in Madrid.

Sample size and statistical analysis

The planned size of 390 patients was calculated for a two-side α level of 0.05 and a statistical power of 80% to compare induction treatments.

We assumed a 75% (N=292) of initial patients to reach second randomization. A sample of N=220 events would be required to prove a twofold superiority in terms of hazard ratio of the experimental arm (VT) over both standard arms (T and alfa2-IFN) assuming a unilateral alpha of 0.1 and a power (1—beta) of 80%. In consequence, a follow-up of 5 years was necessary to obtain events in 75% of randomized patients. Comparisons were undertaken in the intention-to-treat population.

PFS was calculated from the date of randomization to maintenance therapy to the date of relapse, progression or death from any cause. Patients who were removed from the study because of toxicity and received alternative therapy before progression were censored for PFS at the time when the alternative treatment was initiated. OS was calculated from randomization to maintenance therapy until the date of death or last follow-up visit. Survival curves were plotted according to the method of Kaplan and Meier22 and statistically compared by means of the log-rank test.23

Results

Baseline characteristics

Between 6 April 2006 and 5 August 2009, the 390 planned patients were included in the study. A total of 283 patients received the six planned induction cycles and the intensification with high-dose melphalan, followed by hematopoietic stem cell rescue (ASCT).17 Twelve patients were not randomized to maintenance therapy because of progressive disease (five patients), toxicity (2), patient decision (2), death (2) and lost of follow-up (1). Thus, 271 patients were randomized to the maintenance phase and were evaluable for response. Ninety-two patients were allocated to alfa2-IFN, 88 to T and 91 to TV. The patient characteristics at baseline including M-protein isotype, Durie and Salmon stage, International Staging System, induction regimen, proportion of patients with high-risk cytogenetics as well as the response status at the time of randomization were well balanced among the three groups (Table 1).

Table 1 Patient characteristics at baseline

Response upgrade during maintenance therapy

A response improvement during maintenance therapy was observed in 20% of the patients. The median time to response improvement was 5.7 months (0.9–24.9). The number of patients who improved the quality of response in the alfa2-IFN was 19 (11 patients in VGPR upgraded to CR, 6 patients in PR upgraded to CR and 2 patients from PR to VGPR) after a median of 2.3 months (0.9–10.2). The number of patients who upgraded the response in the T arm was 15 (eight patients in VGPR to CR, three patients in PR to CR and four from PR to VGPR) after a median time of 5.1 months (1.1–17.2). A total of 22 patients in the VT arm improved the quality of response (11 patients in VGPR upgraded to CR, 9 patients in PR to CR plus 1 to VGPR and 1 patient in minimal response (MR) to CR) with a median time of 6.3 months (0.9–24.9). Overall, the absolute change in the monoclonal component in pre/post maintenance was 4.6 g/l±0.5 g/l in serum and 0.05 g per 24 h in urine with no significant difference across the three arms.

The CR rate increased from 51 to 68% with alfa2-IFN, from 49 to 60% with T and from 53 to 74% with TV maintenance. The CR attained with TV was significantly higher than that obtained with T (74% vs 60%, P=0.04), but not significantly different from the CR obtained with alfa2-IFN (74% versus 68%, P=0.5). There were no significant differences between the CR rate obtained with alfa2-IFN and T. The best response rate achieved with maintenance therapy is shown in Table 2.

Table 2 Response rate after maintenance therapy

PFS from randomization to maintenance

After a median follow-up of 58.6 months from the initiation of maintenance therapy, the PFS was significantly longer with TV as compared with the two other arms (50.6 versus 40.3 versus 32.5 months with TV, T and alfa2-IFN, respectively, P=0.03) (Figure 1).

Figure 1
figure 1

Progression-free survival according to the maintenance arm.

The impact of maintenance according to the induction regimen was analyzed. In patients who received induction with VBMCP/VBAD/B, the PFS from maintenance with TV was 43.7 versus 34.2 months with T maintenance and 22.2 months with alfa2-IFN maintenance (P=0.2). In patients who received induction with Thal/Dex, the PFS from maintenance with TV was 48.1 months versus not reached with T and 31.4 months with alfa2-IFN (P=0.1). In the group of patients who received induction therapy with VTD, the PFS from maintenance was also longer with TV, although this difference did not reach statistical significance when compared with T and alfa2-IFN (62.4 months with TV versus 44.7 months with T versus 50.1 months with alfa2-IFN, P=0.4). Thus, the benefit of TV is observed across all induction subgroups, but the sample was not dimensioned to evaluate significant differences in each induction subgroup.

The quality of response had an impact on outcome. Thus, patients who were in CR at any time during the maintenance phase had a significantly longer PFS (median 50 months) in comparison with patients achieving VGPR (38.7 months), PR (23.7 months) or MR (16.1 months) (P=0.006), and this translated into a significantly better 5-year OS (78% vs 65% vs 61% vs 68% in patients in CR, VGPR, PR or MR, respectively, P=0.02). Patients who improved the quality of response during the maintenance therapy had a better PFS (59.9 vs 38.9, P=0.03) and 5-year OS (88% vs 70%, P=0.002) compared with those patients in whom the response was not upgraded during the maintenance period.

There were not statistically significant differences in OS among the three maintenance regimens, with an estimated 5-year OS of 78%, 72% and 70% in the TV, T and alfa2-IFN arms, respectively (Figure 2).

Figure 2
figure 2

Overall survival according to the maintenance arm.

Impact of cytogenetic abnormalities

Fluorescence in situ hybridization analysis were available in 230 (85%) of the 271 patients randomized to receive maintenance therapy. Thirty-five of these 230 patients (15%) had high-risk cytogenetics and were similarly distributed among the three maintenance arms: 14 patients were treated with alfa2-IFN, 9 with T and 12 with TV. Nineteen patients had t(4;14) (seven patients in alfa2-IFN, and six in T and TV arm, respectively), seven patients had t(14;16) (three patients in alfa2-IFN, and two in T and TV arm, respectively) and 11 had del (17p) (five patients in alfa2-IFN and TV arm, respectively, and one patient in T arm).

In the alfa2-IFN arm, the CR increased from 50 to 78% in patients with high-risk cytogenetics and from 50 to 64% in patients with low-risk cytogenetics. In the T arm, the CR increased from 44 to 66% and from 53 to 64% in patients with high- and low-risk cytogenetics, respectively, while in the TV arm, the CR increased from 33 to 58% and from 52 to 76% in patients with high- and low-risk cytogenetics, respectively.

PFS from the initiation of maintenance in patients with low-risk cytogenetics was longer when compared with patients with high-risk cytogenetics (45.7 versus 28.8 months, P=0.1) likely not reaching statistical significance due to the low number of patients with high-risk cytogenetics. The OS rate at 5 years was significantly longer in patients with low-risk cytogenetics compared with patients with high-risk cytogenetics (74% versus 56%,P=0.005). No statistically significant conclusions can be drawn by high-risk cytogenetics since the samples are not dimensioned for this purpose.

Toxicity

Regarding hematological toxicity, 10 patients developed thrombocytopenia grade 3–4 in the TV arm, while only 2 patients developed grade 3–4 thrombocytopenia with T (11% versus 2.2%, P=0.01) and 4 patients (4.4%) developed grade 3 thrombocytopenia with alfa2-IFN (11% versus 4.4%, P=0.08). The incidence of grade 3–4 neutropenia was similar in the three arms (17.7% versus 16% versus 13.3% with alfa2-IFN, T and TV arm, respectively).

Concerning non-hematological toxicity, the incidence of grade 2–3 peripheral neuropathy was higher in the TV arm (48.8% with TV versus 34.4% with T and 1% with alfa2-IFN). Grade 3 peripheral neuropathy was observed in 15.5% and 13.7% of patients treated with TV and T maintenance, respectively, while 33.3 and 20.6% of the patients treated with TV and T developed grade 2 peripheral neuropathy. No patient had grade 4 peripheral neuropathy. Three patients in the TV arm reported grade 3 asthenia. Five patients developed grade 2–4 depression and four patients grade 2–4 arthralgia under alfa2-IFN maintenance. The most relevant toxicities are summarized in Table 3.

Table 3 Most relevant toxicities during maintenance

Dose reductions due to toxicity was needed in 30 patients (33.7%), 24 patients (27.5%) and 10 patients (11.1%) in the TV, T and alfa2-IFN arm, respectively. In the TV arm, 29 patients (32.5%) required thalidomide reduction, while 18 patients (20.2%) needed dose reduction of bortezomib. Discontinuation of the maintenance therapy due to toxicity was required in 21.9%, 39.7% and 20% of the patients treated with TV, T and alfa2-IFN arm, respectively. Of interest, 14.6% of the patients in the TV arm in whom T was discontinued remained on bortezomib maintenance until the completion of the 3-year maintenance period. In addition, the discontinuation rate due to progressive disease was required in 21.9, 26 and 39% of the patients included in the TV, T and alfa2-IFN, respectively. Thus, the median duration of each maintenance arm was 24.7 cycles (2.05 years) for the TV arm, 19.4 cycles (1.6 years) for the T arm and 18.7 cycles (1.55 years) for the alfa2-IFN. The percentage of delivered dose of T compared with the planned dose was 55% in VT arm and 52% in the T arm, while the percentage of delivered dose of bortezomib in the VT arm was 70.8%.

Discussion

Maintenance therapy is being actively investigated in order to delay or even prevent relapse after ASCT in patients with MM. Among novel drugs, thalidomide and more recently lenalidomide have been used.6, 7, 8, 9, 10, 11, 12, 14, 15, 16 In contrast, the experience with bortezomib maintenance is limited.13 Our results showed that maintenance combining thalidomide with the proteasome inhibitor bortezomib (TV) resulted in a significant prolongation of PFS in comparison with T alone or IFN with manageable toxicity. Of interest, the CR rate increased in all treatment arms in about 20% of the patients, even in patients allocated in IFN arm. This improvement in response rate with maintenance therapy has been reported in other studies in the ASCT and non-ASCT setting.6, 12, 13, 14, 24, 25 Thus, in previous maintenance trials with thalidomide, the CR rate was improved in about 17%.6 In the study by Barlogie et al.12 the cumulative frequency of CR was significantly higher in the thalidomide arm than in the non-thalidomide cohort (62% vs 43%). Of interest, this improvement in CR was mainly observed in the first year of maintenance, a finding also observed in our study, with a median time to response upgrade of 5.1 months (1.1–17.2). In a Spanish trial24 comparing maintenance therapy with VT vs VP in elderly patients, the CR rate increased from 24% at the end of induction to 42% at the end of maintenance. In the HOVON-65/GMMD-HD4(ref. 13) trial, the response upgrade during post-ASCT maintenance with thalidomide was observed in 24 and 23% of patients treated with bortezomib.

Post-transplant maintenance therapy with lenalidomide showed a CR rate improvement from 8 to 24% in three different trials.14, 16, 25 According to these data, and as previously suggested by others,14 the benefit of maintenance therapy could be explained, at least in part, by the increase in the response rate rather than by a pure maintenance effect.

Deeper responses (that is, immunofixation negative CR with negative minimal residual disease) are associated with better survival outcomes.26 However, in the maintenance phase of the present trial, no minimal residual disease studies were planned.

Although it has been previously published that a delayed response after ASCT can occur in a significant proportion of patients in the absence of additional therapy,27 in our experience this phenomenon is exceedingly rare and the upgrade in response is almost exclusively observed in patients who receive maintenance therapy.28

Maintenance therapy with TV resulted in a longer PFS in comparison with T and IFN, including in patients who received induction therapy with VTD, although the difference did not reach statistical significance probably due to the sample size. Thus, patients who were given induction with VBMCP/VBAD/V and maintenance with TV had a PFS of 43.7 versus 48.1 months for patients receiving induction with T and 62.4 months for patients receiving induction with VTD. This is in agreement with the HOVON-65/GMMG-HD4 trial13 showing that bortezomib during induction and maintenance resulted in a better response rate, quality of response, PFS and OS. TV has also been beneficial in elderly people.24 In this regard, maintenance therapy with TV in a similar scheme (that is, one conventional cycle of bortezomib 1.3 mg/m2 on days 1, 4, 8 and 11 every 3 months associated to a reduced dose of thalidomide—50 mg) was superior to VP (bortezomib in the same schedule and prednisone 50 mg every other day). There is growing clinical evidence that proteasome inhibitors have a synergistic effect as shown in the overtime increasing CR rate with the six cycles of VTD induction,17 and the unprecedented efficacy in terms of CR rate and PFS of carfilzomib plus lenalidomide and dexamethasone in the relapse setting (ASPIRE trial).29 Thus, a combination of a proteasome inhibitor plus an immunomodulatory drug could result in an effective maintenance therapy. In this regard, the current Spanish trial is investigating the efficacy of lenalidomide/dexamethasone plus the oral proteasome inhibitor ixazomib versus lenalidomide and dexamethasone (no. EUDRA 2014-000554-10) in the post-ASCT setting.

Regarding hematological toxicity, thrombocytopenia was more frequent in the TV arm, while there was no significant differences in the incidence of neutropenia between the three arms. Concerning extrahematological toxicity, PN is the most relevant, with almost half of the patients developing grade 2–3 PN in the TV arm and one-third in the T arm. PN was mainly related to T as is demonstrated by the fact that discontinuation rate due to toxicity is higher in the T arm, and by the fact that 15% of the patients in the TV arm only discontinued thalidomide and completed the maintenance period with bortezomib alone. Similarly, in the HOVON-65/GMMG-HD4 trial, thalidomide-related toxicity also resulted in a significantly higher premature discontinuation rate when compared with bortezomib.13 Since PN is rarely completely reversible, particularly when it is due to thalidomide, it is important to titrate doses when the first symptoms appear. As in the HOVON trial, the long-term use of bortezomib as maintenance therapy was feasible. Furthermore, the subcutaneous administration may improve its tolerance. On the other hand, the proteasome inhibitor ixazomib may be preferable because of the lower neurotoxicity and its oral administration. Recently, significant prolongation of PFS and/or OS have been reported with lenalidomide maintenance after ASCT.14, 15, 16

In summary, our results prove the superiority in terms of PFS of a 3-year maintenance therapy of the combination of thalidomide and bortezomib over IFN or thalidomide alone.