Introduction

There are several measures employed to evaluate risk stratification after allogeneic hematopoietic stem cell transplantation (allo-HSCT) including chimerism examination, minimal residual disease (MRD) monitoring, and others which are less frequently used such as quantitation of hematogones and evaluation of bone marrow cytomorphology after transplant [14]. Various studies demonstrated the potential use of mixed chimerism kinetics to predict relapse of leukemia and CD34+-specific chimerism could provide a higher specificity of chimerism analysis [4, 5]. However, a decrease of chimerism analysis is not always related to relapse of underlying disease. Besides, it is laborious when cell sorting is required. With regard to quantitation of hematogones and evaluation of bone marrow cytomorphology after allo-HSCT, though they are able to predict prognosis to some extent, the sensitivity and specificity of the prediction still need to be further verified.

MRD monitoring is one of the most essential methods to predict leukemia relapse, especially to pediatric patients with acute lymphoblastic leukemia (ALL) [68]. Accumulating evidence confirmed that monitoring MRD was credible and effective to the early warning of leukemia relapse [912]. However, MRD evaluation and its clinical impact post-allo-HSCT still remain to be discussed and investigated. The risk assessment guided by MRD after transplant differs from risk evaluation at diagnosis or prior any therapy [4, 13]. Its attention focuses on a most early detection of posttransplant relapse and prompt interventions; therefore, the relapse rate could be reduced.

MRD measurement consists of leukemia-related genes detection by molecular techniques and LAIP examination with multiparameter flow cytometry (FCM). For those patients who lack leukemia-specific molecular abnormalities, WT1 expression can be found in 70 % to 90 % of patients with acute leukemia (AL) [1417]. We previously conducted small-scale studies and demonstrated that in addition to 0.6 %, 1.0 % would also be a suitable cutoff value to determine the ability of WT1 expression to predict leukemia relapse after transplantation [18, 19]. Nevertheless, there are a small percentage of patients with only once elevated WT1 would not undergo hematologic relapse without any intervention. Thus, the specificity to predict leukemia relapse obtained by WT1 monitoring is not satisfactory. Consistent with other reports, we have also demonstrated that LAIP examination with FCM was an important prognostic factor for relapse although it lacks enough sensitivity [18].

With all the fast changes in medicine, both WT1 and LAIP data of a majority of patients would be available in the current time. Thus, we speculated that the sensitivity for leukemia relapse might be increased when WT1 and FCM assays are combined, but the specificity would not be affected. MRDco+, a comprehensive positive MRD standard that consisted of multiple criteria based on the data of WT1 and LAIP, has been employed to be the threshold for intervention posttransplant. Our previous work has demonstrated that modified donor lymphocyte infusion (mDLI) treatment in MRDco+ AL patients of standard risk after HSCT might reduce the cumulative incidence of relapse (CIR) to levels similar to those of MRD− patients and improve their outcome [20].

In this study, we first compared the clinical value of various positive MRD standards for accurate prognosis of leukemia relapse based on WT1 and LAIP data in a large sample of adult patients with AL. The aim of the study is to determine suitable criteria directed against specific intervention measures by coupling these two MRD parameters after allo-HSCT.

Methods

Patient characteristics

All consecutive patients treated with non-T-cell-depleted allo-HSCT at the Peking University Institute of Hematology from January 1, 2006 to November 30, 2011 were enrolled in this study if the following criteria were met: (1) older than 14 years; and (2) having standard risk acute leukemia, defined as first or second complete remission without t(9;22)(q34; q11), t(15;17), inv(16)(p13q22), t(16;16)(p13; q22), or t(8;21)(q22; q22) cytogenetic abnormalities. Patients who received intervention but did not develop hematological relapse were excluded from the study. Six hundred seventy-two of all patients (n = 965) in this study were previously reported in 2012 [20] and further followed here. All patients provided informed consent for treatment under a protocol reviewed and approved by the Peking University Institute of Hematology. The patient and transplant characteristics are summarized in Table 1.

Table 1 Patient characteristics

Transplant protocols

All of the patients in this study received myeloablative conditioning regimens. Transplantations were performed as previously described [21, 22]. Patients who received human leukocyte antigen (HLA) identical related transplants received busulfan (BU, 0.8 mg/kg i.v., q6h) and cyclophosphamide (CTX, 1.8 g/m2/day for 2 days) or total body irradiation (TBI, 7.7 Gy) given as 1 fraction, followed by CTX. Patients who received HLA-haploidentical related transplants and HLA-matched transplants from unrelated donors were conditioned with BU + CTX + human antithymocyte globulin (ATG; Sang Stat, Lyon, France) 2.5 mg/kg/day i.v. for 4 days or TBI + CTX + ATG. All patients received G-CSF-mobilized bone marrow (BM) and peripheral blood stem cell (PBSC) or G-CSF-mobilized PBSC transfusion, followed by cyclosporine (CSA), mycophenolate mofetil (MMF) and short-term methotrexate (MTX). CsA was started i.v. on −9d at the dosage of 2.5 mg/kg. The dosage of CSA was adjusted to the blood concentration of 150–250 ng/ml. CSA dosage was reduced gradually and discontinued around 4 to 6 months after HSCT. In case of GVHD, CSA was continued. MMF was administered orally (0.5 g every 12 h) from −9d before transplantation to +30d, then was discontinued upon engraftment in sibling identical HSCT. 0.25 g MMF was given every 12 h for 1–2 months in haploidentical or unrelated HSCT on the basis of the presence of severe GVHD, infection, and relapse risk. MTX (15 mg/m2) was administered i.v. on +1d, and 10 mg/m2 MTX was given on +3d, +5d, and +11d in haploidentical or unrelated HSCT while 10 mg/m2 MTX was given on +3d and +6d in sibling identical HSCT. Posttransplantation immune suppression was immediately tapered and then discontinued in patients who had abnormal MRD (including FCM, WT1, or other acute leukemia-related gene expression) ≤100 days after transplantation. Patients who had abnormal MRD >100 days after transplantation had immune suppression immediately discontinued.

LAIP and WT1 monitoring

The days after the last stem cell infusion was preceded by “+”. BM samples from patients were obtained for the MRD investigation after HSCT. The MRD status of all patients enrolled in this study was examined at regular time points: +1 month, +2 months, +3 months, +4.5 months, +6 months, +9 months, +12 months, and every 6 months thereafter. WT1 expression was evaluated using TaqMan-based RQ-PCR technology as previously described [23]. ABL was selected as a control gene to compensate for variations in the quality and quantity of the RNA and cDNA. The primers and probe for ABL were based on a Europe Against Cancer Program report [24, 25]. The primers and probe used for WT1 detection were based on a report by Tamaki et al. [26]. The transcript level was calculated as target transcript copies/ABL copies in percentage. A WT1 transcript level less than 0.6 % was defined as negative. Our previous work also demonstrated that 1.0 % would also be a suitable cutoff value to predict leukemia relapse through receiver-operating characteristic (ROC) analysis [23]. LAIPs were detected using four-color FCM (Macs Quant Analyzer). Different antibody combinations were used for B-ALL, T-ALL, and AML as previously described [18]. In most B-ALL cases, antibody combinations of CD34−FITC/CD10−PE/CD45−PerCP/CD19−APC and CD22−FITC/CD20−PE/CD45−PerCP/CD19−APC or CD58−FITC/CD123−PE/CD45−PerCP/CD19−APC were sufficient to identify leukemic cells. In T-ALL cases, antibody combinations of CD7−FITC/CD34−PE/CD45−PerCP/CD3−APC and CD4−FITC/CD8−PE/CD45−PerCP/CD3−APC or TdT−FITC/cCD3−PE/CD45−PerCP/CD5−APC or CD7−FITC/CD10−PE/CD45−PerCP/CD5−APCr were used. In most AML cases, antibody combinations of CD7−FITC/CD117−PE/CD45−PerCP/CD33−APC and CD9−FITC/CD56−PE/CD45−PerCP/CD38−APC or CD64−FITC/CD13−PE/CD45−PerCP/CD11b−APC or CD15−FITC/CD123−PE/CD45−PerCP/HLA−DR−APC were used. A total of 1,000,000 events were routinely collected for analysis. When cell numbers were limited, a minimum of 750,000 events were collected. Positive FCM was defined as >0.001 % of cells with an LAIP phenotype in >1 BM samples in ALL patients and >0.01 % in AML patients after transplantation.

Intervention strategy

The intervention strategy included IL-2 treatment and modified donor lymphocyte infusion (mDLI) post-MRDco+, which were administered as previously described [20]. mDLI comprised G-CSF primed peripheral blood cells instead of harvested non-primed donor lymphocytes and short-term immunosuppressive agents for prevention of GVHD after infusion. IL-2 was administered subcutaneously at a dose of 1 × 106 U/day for 14 days. One or more subsequent cycles were administered after a 14-day interval. Patients who had an available donor or frozen G-CSF-mobilized graft were treated with mDLI in which the common steady-state donor blood lymphocytes were replaced with G-CSF-mobilized PBSC.

Study definitions

The positive MRD standards used in this study are defined based on the first detection of abnormal values of WT1 and FCM, and described in detail below. WT1+ was defined as a transcript level >0.6 %. WT11.0+ was defined as a detectable WT1 expression value higher than 1.0 %. FCM + was defined as positive FCM was detected in ALL and AML patients, respectively. When FCM+ or WT1+ was detected, the bone marrow test was repeated 2 weeks later. The combinative criteria for positive MRD (MRDco+) were defined as two consecutive FCM+ or WT1+ results or both FCM+ and WT1+ in a single sample within 1 year after transplantation. MRDco1.0+ was defined as two consecutive WT11.0+ or FCM+ or both WT11.0+ and FCM+ in the same sample within a year posttransplantation. Leukemia relapse was scored as BM, extramedullary, or both, using a previously described common morphological criteria [23]. FCM and RQ-PCR data were not used to define relapse. The diagnosis and grading of graft-versus-host disease (GVHD) was based on published criteria [27]. Overall survival (OS) was calculated from the date of transplantation until death or the last observation of patient life. Disease-free survival (DFS) was defined as the probability of being alive and free of disease at a given point in time.

Statistical analysis

The reference date of March 31, 2012 was used to define the end of follow-up. The median follow-up was 14.5 months (range 1.5 to 63 months). Disease-free survival (DFS), transplant-related mortality (TRM), and overall survival (OS) were calculated according to Kaplan–Meier statistics. DFS, TRM, and OS differences between groups were calculated using the log-rank test. The cumulative incidence of relapse (CIR) was correctly estimated by partitioning the probability of failure into the probability corresponding to each competing event. Differences in CIR between subgroups were tested according to Gray’s method, using R software for statistical computing. A two-sided P value of 0.05 was considered significant. The independence of categorical parameters was calculated using the chi-squared test or Fisher’s exact test. Cox proportional hazards multivariate regression models were used to identify leukemia relapse predictors. The final multivariate models were determined using a backward selection procedure, and a P value of 0.05 was the threshold for the inclusion and exclusion of variables. Sensitivity measures the proportion of actual positives correctly identified as such (e.g., the percentage of relapsed people who are correctly identified as having the condition). Specificity measures the proportion of negatives correctly identified (e.g., the percentage of non-relapsed people who are correctly identified as not having the condition). Youden's index = sensitivity + specificity − 1.

Results

Clinical outcomes of patients after transplantation

A total of 905 patients met the study inclusion criteria within the specified time period. Patients who received intervention but did not develop a hematological relapse were excluded from the study. Patients who received intervention but relapsed were included in the study. At least one of abnormal WT1 expression and LAIP was detected before transplantation in 90.8 % (748 of 824) of the subjects. LAIP data were available in all 748 patients. Both LAIP and Abnormal WT1 expression were detected before transplantation in 58.7 % (484/824) of subjects. The study design details are shown in Fig. 1. Of the 824 AL patients enrolled in this study, 155 (18.8 %) cases developed a hematological relapse, including those whose intervention therapy was based on their MRD status after allo-HSCT (median 7 months; range 1 to 27.5 months). Two hundred forty-four (29.6 %) patients died during follow-up; 47.8 % (119/244) died of leukemia relapse, while the others (125/244) experienced transplant-related mortality (TRM). The AL patient group included 323 ALL patients and 501 AML patients. Sixty-three of the 323 (19.5 %) ALL patients and 92 of the 501 (18.4 %) AML patients had a leukemia relapse after transplantation. Ninety-eight (30.3 %) ALL patients and 146 (29.1 %) AML patients died during the follow-up period. The mortality numbers related to leukemia relapse were 45 ALL patients (45.9 %) and 74 AML patients (50.7 %).

Fig. 1
figure 1

Study design and overview of consecutive patients who met the enrollment criteria of the study

Accuracy of indicating leukemia relapse based on a single MRD monitoring result or method

Of the 824 AL patients, 149 (18.1 %) patients had one positive WT1 result, while 50 (6.1 %) had two consecutive positive WT1 results. Because our previous work showed that 1.0 % would be the optimal WT1 cutoff value to indicate relapse [23], WT11.0 was included as another threshold in this analysis. One hundred one (12.3 %) patients had one WT11.0+, and 22 (2.7 %) patients had two consecutive WT11.0+ results. The number of patients who had one or two successive positive FCM results was 54 (6.6 %) and 17 (2.1 %), respectively. WT1+ and FCM+ were both detected in a single bone marrow sample in 34 (4.1 %) patients. WT11.0+ and FCM+ were detected in the same sample in 28 (3.4 %) cases. We investigated these standards’ accuracy for prognostic of hematological relapse. As Table 2 shows, the use of different prognostic standards (including one WT1+, two consecutive WT1+, one FCM+, two consecutive FCM+, concurrent WT1+ and FCM+, and concurrent WT11.0+ and FCM+) allowed separation of the patients who relapsed from those who did not. The specificity was relatively high; however, the sensitivity varied widely, from 10.3 % to 56.1 %. Similar data were also obtained for the 748 patients who had had at least one positive WT1 and FCM assay result before transplantation, as shown in Table S1.

Table 2 Sensitivity and specificity of various standards used to predict AL patient relapse after allo-HSCT (n = 824)

To further delineate the clinical impact of these MRD criteria, we also investigated their efficacy to indicate relapse in patients with AML and ALL. Among the 484 patients who had had positive WT1 and FCM values before transplantation, there were 308 AML patients and 176 ALL patients. In the AML group, 54 (17.5 %) cases developed hematological relapse after allo-HSCT. As Table 3 shows, the specificity was high, ranging from 90.6 % to 100 %. However, a wide range of sensitivity (from 3.7 % to 68.5 %) was observed. Of the 176 ALL patients who had both FCM+ and WT1+ results before transplantation, 37 (21.0 %) eventually underwent hematological relapse during follow-up. The sensitivity and specificity of the aforementioned MRD standards were analyzed as they were for the AML patients’ results. Table 3 shows that the specificity was relatively high, while a wide range of sensitivity (from 10.8 % to 62.2 %) was observed.

Table 3 Sensitivity and specificity of various standards used to predict AML (n = 308) and ALL (n = 176) patient relapse after allo-HSCT

Sensitivity and specificity of combinative MRD criteria for indicating leukemia relapse in different groups after allo-HSCT

Because acceptable degrees of sensitivity and specificity could not obtained simultaneously when we considered only a single testing result or single detection method, we then analyzed the indicating effectiveness of diverse combinative criteria in different groups of patients. In the total group of 824 AL patients, 79 (9.6 %) cases met the MRDco+ criterion, while 53 (6.4 %) cases met the MRDco1.0+ criterion. As expected, both MRDco+ and MRDco1.0+ showed a higher sensitivity for indicating leukemia relapse compared with other MRD standards, including two consecutive WT1+ results, one FCM+ result, two consecutive FCM+ results, concurrent WT1+ and FCM+, and concurrent WT11.0+ and FCM+ (Table 4). A more favorable result was that a high level of specificity (>97.0 %) could be observed in each patient group when MRDco+ or MRDco1.0+ was used as the positive MRD standard (Table 4). Although there was no obvious difference in specificity between MRDco+ and MRDco1.0+, MRDco1.0+ lost more sensitivity because it applied stricter screening than MRDco+. Considering both sensitivity and specificity concurrently, MRDco+, with multiple criteria that included both WT1 and FCM data, appeared to be a more practical MRD standard for relapse prognosis after HSCT.

Table 4 Sensitivity and specificity of combinative MRD criteria for predicting leukemia relapse in different groups after allo-HSCT

MRDco+ was the independent risk factor for posttransplantation leukemia relapse

To further analyze whether MRDco+ was an independent predictor of relapse, we conducted a multivariate Cox regression analysis that included age, sex, disease status (CR1/CR2), donor type, number of chemotherapy courses before CR1 (one/two/more than two courses), and MRD status (including the different MRD standards mentioned above) after transplantation. All of these factors have also been considered in the univariate analysis of AML and ALL patients. Through this statistical analysis, we noted that MRDco+ after HSCT was associated with a higher relapse rate and was the independent risk factor for relapse both in AML and ALL patients (AML—HR = 12.54, 95 % CI 7.01–22.42, P < 0.001; ALL—HR = 12.84, 95 % CI 6.05–27.25, P < 0.001). A statistical analysis using R software also indicated a significant difference in the CIR when the patients were divided into two groups based on MRD status. Patients with AML who had met the MRDco+ criteria had a CIR of 71.4 ± 9.8 % 2 years after transplantation, while those who remained MRDco− had a CIR of 9.3 ± 2.0 % at that time point (P < 0.001). Significant differences in DFS and OS between these two groups of patients could also be observed (DFS, P < 0.001; OS, P < 0.001). AML patients with MRDco+ had a DFS of 46.3 ± 9.0 % and OS of 47.3 ± 9.2 % 2 years after transplantation. However, patients with MRDco− had a DFS of 92.0 ± 1.6 % and OS of 92.3 ± 1.6 % at the same time point. The median OS of the MRDco+ and MRDco− groups were 18.5 months (range 12.5 to 24.4 months) and 47.2 months (range 43.8–50.5 months), respectively. There was a significant difference between these two groups in median DFS [MRDco+, 17.0 months (range 11.4–22.7 months); MRDco−, 46.5 months (range 43.1–49.8 months)]. All of the AML patients who met the MRDco+ conditions died of leukemia relapse. These results were significantly different from those of the MRDco− patients who had a TRM of 18.7 ± 2.5 % 2 years after HSCT (P = 0.009). Consistent with the AML patients’ results, the ALL patients who had met the MRDco+ criteria showed a CIR of 76.7 ± 12.7 % 2 years after transplantation, while those who remained in MRDco− had a CIR of 15.4 ± 3.1 % at that time point (P < 0.001). A significant difference in DFS and OS between these two groups was also found (DFS, P = 0.001; OS, P = 0.01). The ALL patients with MRDco+ had a DFS of 29.2 ± 12.8 % and OS of 52.8 ± 15.4 % 2 years after transplantation. However, the patients with MRDco− had a DFS of 63.2 ± 4.1 % and OS of 66.0 ± 4.0 % at the same time point. The median OS values for the MRDco+ and MRDco− groups were 18.3(10.2–26.5) months and 43.2 (38.8–47.7) months, respectively. There was a clear difference between these patients’ median DFS: 12.4(5.8–19.0) months in the MRDco+ group and 41.3 (36.7–45.9) months in the MRDco− group. All of the ALL patients with MRDco+ died from leukemia relapse, as did all of the AML patients with MRDco+. Nevertheless, the MRDco− patients had a TRM of 22.2 ± 3.4 % 2 years after HSCT, and there was no significant difference in TRM between the MRDco+ and MRDco− groups (P = 0.044).

Discussion

Researchers have observed the clinical impact of MRD prior to allo-HSCT and found that MRD was the only significant parameter in the multivariate analysis of outcome evaluation [12, 28, 29]. To date, the optimal threshold of positive MRD for relapse intervention after allo-HSCT has not been identified. In this study, we first compared various positive MRD standards for accurately indicating relapse based on WT1 and FCM data in adult patients with AL. In comparison with single MRD parameter, our results demonstrated that the combined use of WT1 and LAIP might achieve higher sensitivity without sacrificing specificity.

In this study, the consistent results of all 824 subjects and groups of patients with specific leukemia types suggested that combinative MRD monitoring based on FCM and WT1 assays was useful for indicating leukemia relapse after transplantation, even in those patients without positive FCM and WT1 data before transplant. The similar sensitivity and specificity of relapse prediction could be obtained in different groups of patients when MRDco was used. Thus, a large majority of AL patients would have access to efficient MRD monitoring after HSCT. This is so important to a developing country like China. Many patients come from growing cities or countryside that lack advanced diagnostic techniques for leukemia and might only have the results of bone marrow morphology. Under this condition, FCM-based LAIP and PCR-based WT1 examination could still be considered as the authentic MRD markers after transplantation for patients that lack specific leukemia-associated genes.

The results revealed that a relatively high specificity could be obtained by using combinative standards to indicate relapse; however, the sensitivities of both patients with AML and ALL did not reach a satisfying level. We attributed this less desirable sensitivity to the exclusion of patients who did not relapse after intervention because these patients’ intervention was guided by MRDco criteria. If this group of patients had been considered, the sensitivity of MRDco+ to indicate leukemia relapse would have been more than 80 %, which is acceptable. Furthermore, the specificity would have also increased. In clinical works, MRDco+ results could encourage clinicians to use mDLI, a major intervention method for patients after allo-HSCT. Although mDLI is relatively safe, it can still lead to severe GVHD in a small number of patients [22, 30]. Thus, using a parameter with higher specificity for predicting relapse to guide the administration of a strong intervention therapy could prevent the overtreatment of patients after transplantation and further reduce the incidence of TRM. The data from our previous work have also demonstrated that mDLI treatment in standard-risk MRDco+ AL patients after HSCT might reduce the CIR to levels similar to those patients with negative MRDco and improve their outcome. The TRM of these patients did not show a significant increase [20], indicating that MRDco+ criteria were appropriate under this condition. According to our results, the highest sensitivity was obtained when one WT1+ result was used as the intervention criterion. This finding suggests that the patients should be closely monitored as soon as abnormal WT1 expression is detected after transplantation and that IL-2 should be considered a relatively conservative intervention under this condition. Therefore, we might choose different MRD criteria to guide intervention therapy, and the parameter used to monitor residual leukemia should be appropriate to the goal of the therapy. Several types of positive MRD standards mentioned here could be applied for future reference to predict relapse after transplantation.

Up to now, there is no accurate criteria of positive FCM and WT1 levels that have been defined for predicting leukemia recurrence after transplant, especially in terms of directing interventions for various AL subtypes. This study compared the diverse standards of positive MRD in AML and ALL after HSCT for the first time. The statistical analysis suggested that MRDco+ had better sensitivity and specificity for both AML and ALL patients. Most studies investigating WT1 have focused on its clinical value in AML and suggested that WT1 was a more effective indicator for AML relapse than for ALL relapse [3134]. Consistent with previous reports, the data here showed that WT1 values had a higher sensitivity for indicating relapse (cutoff point = either 0.6 % or 1.0 %) in AML compared with ALL. Although the results of our analysis were broadly the same for AML and ALL patients, there were some differences. The different WT1 expression levels in ALL and AML patients might explain this problem. It has been stated previously that WT1 expression is lower in ALL patients than AML patients [35, 36]. Consequently, WT1 might be a more suitable marker for AML than ALL. Because LAIP has much more indicative significance in ALL, LAIP might be used to predict ALL recurrence better than WT1 expression.

In summary, our study first used statistical analyses to assess the value of a variety of positive MRD standards for indicating leukemia relapse after transplantation. The successful use of MRDco+ showed that the combined use of WT1 and FCM monitoring indicated relapse with acceptable sensitivity and no loss of specificity. The diverse positive MRD standards examined in this study were helpful for guiding different intervention measures for prevent leukemia relapse after transplantation. Although excess treatment might be avoided to some extent by using the appropriate MRD standard, the sensitivity was not sufficiently high to identify all patients would relapse. Further work should explore new, more appropriate parameters or new combinative criteria with both high sensitivity and specificity for predicting leukemia recurrence after allo-HSCT.