Introduction

Allogeneic hematopoietic cell transplantation (alloHCT) is the intensive but optimal therapy for higher risk acute types of leukemia (AML, ALL, CML in accelerated phase (AP) or blast crisis (BC)) and myelodysplastic syndrome (MDS) patients. However, relapse is still a frequent cause of treatment failure, although intervention before overt relapse may be beneficial.1, 2, 3, 4 Approximately 35–45% of alloHCT recipients will relapse within 5 years with their original malignancy.5 Technologic advances produced sensitive methods for early recognition of hematologic malignancy relapse. The highest sensitivity is achievable by PCR-based assays detecting recurrent molecular aberrations such as fusion transcripts and mutations. However, not all leukemia and MDS patients have aberrations detectable by PCR, limiting the applicability of such monitoring to only some patient subgroups. In contrast, non-mutated WT1 is overexpressed (5–10 times above background levels) in ⩾86% of patients with AML, MDS and ALL6, 7, 8, 9, 10 and could serve as a universal diagnostic marker for detection of leukemic blasts, despite heterogeneity in the etiology of these diseases. Since 1990, several groups have associated WT1 expression and its elevation with progression and relapse of hematologic malignancies.2,8,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 While existing literature established the relevance of WT1 for identifying future relapse,2,8,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 the WT1 test has not yet been validated as a relapse definition across relevant hematologic malignancies.

In a prospective study, we longitudinally evaluated the accumulation of WT1 mRNA transcripts in PB of alloHCT recipients to establish levels of WT1 transcripts (WT1 ratios) that will accurately predict the onset of relapse and to estimate a time interval from molecular (quantitative PCR of WT1) to hematologic (morphology of blasts) relapse.

Materials and methods

Study subjects

This study was conducted under City of Hope IRB-approved protocol no. 09050. Patients gave written informed consent, in accordance with the Declaration of Helsinki, for laboratory-based studies on PB samples obtained prospectively after alloHCT monthly for 6 months, and then alternating between 1 or 2 months until day 780. Patients over 18 years of age with confirmed diagnosis of MDS, AML, ALL and CML undergoing alloHCT at City of Hope after reaching CR (MDS with ⩽20% blasts, AML or ALL in morphologic remission (first or subsequent remission) or CML in chronic phase) were eligible for the study and were enrolled prospectively. Confirmed diagnosis of hematologic relapse, monitored prospectively, was defined as a study end point. Thus, patients who relapsed became ineligible to continue participation in the study.

Cells, RNA purification and cDNA synthesis

PBMCs and BM mononuclear cells were purified by Ficoll-Hypaque density gradient centrifugation from 10 to 40 ml PB or BM. Subsequently, total RNA was isolated from 3 to 5 million PBMCs or BM mononuclear cells using RNeasy (Qiagen, Valencia, CA, USA). cDNA was made from 500 ng of total RNA using the RevertAid First Strand cDNA Synthesis Kit (Fermentas Inc., Glen Burnie, MD, USA). RNA quality was measured by NanoVue (GE Healthcare, Little Chalfont, Buckinghamshire, UK) and its purity was based on its 260/280 ratio.

Quantitative real-time PCR analysis

WT1 transcript levels in PBMCs and BM mononuclear cells were measured in a batch using SYBR Green quantitative real-time PCR (qRT-PCR) on the ABI7300 instrument (Applied Biosystems, Carlsbad, CA, USA). c-ABL gene transcript was used as a recommended internal control.13,21,26 Absolute quantification of the transcript copy number was achieved for WT1 and c-ABL genes from the corresponding standard curves enabled by WT1 and c-ABL control genes cloned into plasmids. Plasmid dilutions were generated to span the anticipated transcript copy range (101–106 copies). For WT110 and c-ABL,26 published sequences were used to generate 89 and 96 bp products, respectively. Results were expressed as a ratio of WT1/c-ABL transcript copy numbers normalized by 104 (WT1 ratio: WT1/c-ABLx104).7,27,28 RNA from control positive cell line K562 served as a positive control. Results showing a >1 cycle deviation from the threshold cycle number (Ct) between duplicate wells were repeated. If the ratio was inconsistent or if two wells were dissimilar, sample testing was repeated. If ratios were still inconsistent or dissimilar after repeat, the data point(s) were excluded from the analyses. Samples containing <1000 copies of c-ABL were considered degraded and new cDNA was generated. Results were not released to the treating physicians and did not influence their clinical practice.

Statistical analysis

WT1 mRNA transcript levels of consented patients with at least two PB draws after alloHCT were analyzed. We evaluated longitudinal changes in transcript levels to establish WT1 thresholds that would be likely to indicate impending relapse. We identified patients having WT1 ratios exceeding these thresholds and then determined the sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of subsequent relapse. Exact 95% binomial confidence intervals (CIs) were calculated for sensitivities and specificities. Means and s.d.'s for lead time were calculated based on the earliest observed post-alloHCT day on which the patient’s WT1 ratio was greater than or equal to the threshold. To estimate the day on which the patient’s WT1 ratio was equal to the threshold, we used linear interpolation. In other words, we calculated the line between the last measure before crossing the threshold and the first measure greater than the threshold and estimated the day that the threshold was crossed. Cox proportional hazard regression models were used to examine the predictors of time to relapse (days) by univariate and multivariate analysis tools. The predictors included crossing the WT1 ratio of 20 (time dependent, dichotomous: whether or not each WT1 ratio exceeded the 20 ratio), age at transplant (above or below median), patient gender, patient/donor sex match (male/female, others), donor age (above or below median), disease type (AML, ALL/CML, MDS), donor type (related, matched unrelated), stem cell source (BM, PB, cord blood), pre-alloHCT CMV serostatus (negative, positive), donor pre-alloHCT CMV serostatus (negative, positive), disease risk status at transplantation (low, high), conditioning regimen (full intensity, reduced intensity), injected cell dose, acute GVHD grade (none or grade 1, grades 2–4), log-transformed pre-alloHCT WT1 ratio and CMV reactivation within 3 months (yes, no). Crossing the WT1 ratio of 50 (time dependent, dichotomous: whether or not each WT1 measure exceeded the 50 threshold) was not considered as a variable in the survival analyses because of its 100% specificity (i.e., patients who did not relapse never had WT1 ratio ⩾50). The multivariate analysis used stepwise regression on the variables that were significant in the univariate analyses. In a secondary analysis, we examined WT1 ratio and the interaction of the WT1 ratio with risk in a Cox proportional hazards model as predictors of time to relapse. We also examined a Cox proportional hazards model with post-alloHCT WT1 ratio and log-transformed pre-alloHCT WT1 ratio as predictors of time to relapse. To test the reliability of the WT1 ratio using PB, we analyzed the association of post-alloHCT WT1 transcript levels measured on PB and BM using a repeated-measures regression model. We also analyzed the association of the PB post-alloHCT WT1 ratio with bcr/abl using a repeated-measures regression model. The bcr/abl data were only available in the subset of Ph+ ALL and CML patients. As we could not derive a correlation coefficient from the output of the repeated-measures model, we converted the PB and BM WT1 ratios and bcr/abl measurements into z-scores and used this standardized coefficient as a proxy for the correlation coefficient. Exact binomial 95% CIs were calculated using StatXact 7. SAS version 9.3 (SAS Institute, Cary, NC, USA) was used to perform all other statistical analyses. The statistical significance level was set at α=0.05. R 3.0.1 was used to generate the figures.

Results

The WT1 transcript ratio level as a highly specific predictor of relapse

We measured the rate of change of WT1 ratio levels (WT1 assay) as a means to assess molecular relapse and a predictor of clinical relapse. WT1 transcript ratio levels were measured longitudinally in 82 AML, ALL, MDS and CML patients after alloHCT. The median follow-up was 295.5 days (range 57–785). Patient demographic and transplantation characteristics, alloHCT outcomes and the number of samples obtained per patient are summarized in Table 1. Fifty patients were considered as low risk for disease (AML CR1, n=25; ALL CR1, n=18; MDS RA or RARS, n=6; or CML CP1, n=1), whereas the remaining 32 patients were at high risk for disease (AML CR2/3, n=14; ALL CR2/3, n=6; MDS RAEB or RAEBT, n=10; or CML CP2, n=2). Cytogenetic risks for each disease are also detailed in Table 1. As expected in the transplant cohort, many had intermediate- or high-risk cytogenetics. Among 18 AML patients with normal cytogenetics, Flt3 mutation status was available in nine patients (five positive, four negative). The longitudinal patterns of WT1 ratio levels after alloHCT for patients who did not relapse (n=66) are depicted in Figure 1. A reference line was generated and fit to the data retrospectively at the WT1 ratio of 50, because it is the minimum WT1 ratio level that none of the non-relapsed patients exceeded. This level defines maximum specificity of 100% (95% binomial exact CI: 94.5–100) and can be considered a highly specific threshold level for relapse prediction. Figure 2 shows the WT1 ratios vs time for the 16 relapsed patients increased longitudinally. As opposed to Figure 1, which shows all patients in one plot, Figure 2 has individual panels for each patient so that each patient trend could be separately assessed. A reference line is drawn at WT1 ratio equal to 50 (towards the bottom of the plot) to quantitate how many relapsed patients had WT1 ratios that exceeded this highly specific threshold. Within each panel, the relapse day (number of days after alloHCT) and the patient’s last two WT1 ratio measurements and the number of days before relapse in which they were taken (R-day) are provided. For example, in the case of patient 1, R-74 means 74 days before relapse on day 747 after alloHCT. Four patients (patients 13–16 in Figure 2) never had a WT1 ratio exceeding 50, whereas the remaining 12 patients (12/16) had WT1 levels that exceeded the ratio of 50, providing a sensitivity of 75% for this ratio level (95% binomial exact CI: 48–93). The PPV and the NPV performance parameters for the WT1 ratio of 50 were 100% and 94.4%, respectively. The average number of days between the earliest observed time of crossing the WT1 ratio threshold of 50 to relapse for 12 patients depicted in Figure 2 was 29.4 (s.d.=19.8). Using the linear interpolation method, the average estimated day of crossing the WT1 ratio of 50 threshold was 63 days (s.d.=29.3). Thus, the WT1 ratio of 50 is a specific threshold for detection of impending hematologic relapse after alloHCT with an estimated 63 days before diagnosis of morphologic relapse.

Table 1 Patient, disease and transplantation characteristics and overall outcomes (N=82)
Figure 1
figure 1

WT1 levels in non-relapsed acute leukemia and MDS patients after alloHCT. WT1 transcript levels were measured by qRT-PCR in 82 patients and expressed as a ratio of WT1/c-ABL transcript copy numbers normalized by 104 (WT1 ratio). Patients without relapse did not cross the WT1 ratio of 50 illustrated by the horizontal solid line (66/66, 100% specificity).

Figure 2
figure 2

Time course of WT1 transcript expression levels in acute leukemia and MDS patients (N=16) with relapse after alloHCT. WT1 transcript levels measured by qRT-PCR and expressed as ratios (as in Figure 1) are shown for relapsed patients after alloHCT. Disease diagnosis of each patient is indicated in each individual panel. WT1 ratios crossed the level of 50 (horizontal solid line) and began to increase exponentially in 12 of 16 patients (12/16, 75% sensitivity). The relapse day and the patient’s last two WT1 ratio measurements and the day before relapse in which they were taken are provided (R-day). The y-axis WT1 ratio range for this plot is much larger than that for Figure 1 to accommodate the high levels of WT1 ratios these relapsed patients reached.

Varying WT1 ratio thresholds for relapse prediction

Using the WT1 ratio of 50 as a threshold for detection of impending relapse yielded 100% specificity (all non-relapsed patients never had WT1 levels reaching this threshold) and 75% sensitivity (three-quarters of patients who relapsed had reached WT1 levels exceeding this threshold). We further assessed lower WT1 ratios to increase sensitivity and capture more patients with impending relapse while only minimally reducing specificity. Table 2 shows numbers of relapsed and non-relapsed patients with varying WT1 levels, together with sensitivity, specificity, PPV and NPV. Means and s.d.'s of the calculated time to relapse from the patient’s earliest time of having a WT1 level equal to the given threshold are also shown. The thresholds of 40 and 30 decrease the specificity but do not increase sensitivity. Sensitivity increased to 87.5% using a WT1 ratio threshold of 20, whereas the specificity decreased to 84.8% as expected because of slightly increased number of false positives. Although, using a threshold of 10, the sensitivity becomes >90%, and the specificity is reduced to ~56.1%. Thus, the lower WT1 ratio threshold of 20 provides an improved sensitivity and specificity combination for relapse prediction and longer duration before morphologic relapse. Specifically, the observed time of patients’ earliest WT1 ratio exceeding the threshold of 20 and the onset of relapse increased to 49.1 days (s.d.=30.8). Using the linear interpolation method, patients crossed the WT1 ratio threshold of 20 by an average of 78 days (s.d.=28.8) before relapse diagnosis. Consequently, using a lower WT1 ratio threshold of 20 will improve the sensitivity and specificity combination and increase the time to relapse interval, without having excessive false positives.

Table 2 Characteristics of WT1 ratios for predicting relapse

Approaches to reduce false-positive cases

Using a lower WT1 ratio threshold (20 vs 50) increases sensitivity of detecting patients with impending relapse. However, as a negative consequence, the PPV decreases, thereby increasing the number of false-positive patients who never relapse. The highly specific WT1 ratio threshold of 50 had a PPV of 100% (no false positives detected) and the WT1 ratio threshold of 20 resulted in a decreased PPV of 58.3% with 10 false-positive patients. The absolute number of patients falsely identified as likely to relapse will be dependent on the cohort demographics. The proportion of patients with positive test results for the WT1 threshold of 20 that are going to have a disease relapse (PPV) may improve if we target the test to the patients clinically at higher risk of developing relapse. We evaluated the high-risk group (defined as AML and ALL in CR2/3, CML in CP2 or AP and MDS RAEB or RAEBT; see Table 1) and found that the PPV of the WT1 ratio of 20 improved to 69.2% in the high-risk patients (Table 3) compared with 58.3% for the entire cohort (Table 2). The sensitivity and specificity of the WT1 ratio of 20 for high-risk patients is comparable to those of the whole cohort using the ratio of 20. Additionally, the average time to relapse interval is longer for high-risk patients (58.1 days, s.d.=34.9; Table 3) than for the entire cohort (49.1 days, s.d.=30.8; Table 2). Thus, the lower WT1 ratio of 20 appears to be more valuable for this specific subgroup of patients with a higher prevalence of relapsed disease (11 relapsed patients in the high-risk group vs 5 in the low-risk group), which reduces the number of false-positive cases improving the PPV.

Table 3 Characteristics of WT1 ratio of 20 by risk

Subgroup analyses of each disease category

We also assessed the sensitivity and specificity of the WT1 ratio for disease type (AML, ALL/CML, MDS). First, there was no significant association between each disease category and relapse (χ2 P=0.3409) or time to relapse (Cox proportional hazards model P=0.3177). Consistent with the analysis of the entire cohort, sensitivity and specificity was found to be optimized using the WT1 ratio of 20 for each disease category. Crossing the WT1 ratio of 20 was associated with the sensitivities of 90% for AML, 66.7% for ALL/CML and 100% for MDS. Specificities were 86.2% for AML, 83.3% for ALL/CML and 84.6% for MDS. The average time of crossing the WT1 ratio of 20 to the onset of relapse was 50.8 days (s.d.=31.6 days) in AML, 51.5 days (s.d.=19.1 days) in ALL/CML and 42.3 days (s.d.=43.6 days) in MDS. Thus, compared with the results of the entire cohort (Table 2), the WT1 ratio provided a better sensitivity in AML and MDS patients, whereas better specificity was found for AML patients.

Model of relapse prediction using consecutive pairs of WT1 measurements

To improve the PPV rate without sacrificing sensitivity and specificity, we evaluated WT1 ratio thresholds using multiple measurements as opposed to just one WT1 ratio measurement using patient data shown in Figure 2. We evaluated patients’ consecutive pairs of measurements (immediately following each other by date of acquisition) by identifying increases in WT1 ratio levels from consecutive samples and identified patients whose WT1 ratio levels had a sum greater than or equal to a threshold of 30. With this method, the PPV of detecting impending relapse increases to 73.7% with five false-positive patients. This is an improvement over the 10 false positives found when using the single measurement WT1 ratio threshold of 20. With this method, sensitivity remained at 87.5% with 14 of the 16 patients shown in Figure 2 having two consecutive WT1 ratio measurements that totaled >30. For the 14 patients, the average time to relapse from the day when the patient’s second WT1 measurement totaled 30 was 41.8 days (s.d.=27.1).

WT1 expression as a significant and independent predictor of time to relapse

We used Cox proportional hazard regression models to identify risk factors predicting time to relapse in a survival analysis. Potential predictors of time to relapse were first examined individually in univariate Cox regression models. Table 4 lists the predictors that were found to be significant in the univariate Cox models (see Materials and methods section for the complete list of variables analyzed). Crossing the WT1 ratio of 20 (hazard ratio (HR)=58.16, P<0.0001), having high disease risk at transplantation (HR=3.27 P=0.0232) or receiving alloHCT from donors with age above the median age of 34 (HR=5.124, P=0.0109) were found to significantly increase hazard or decrease time to relapse in the univariate analysis. We used stepwise regression analysis to find that crossing the WT1 ratio of 20 was the only predictor independent of other variables significantly associated with decreased time to relapse (HR=58.16, P<0.0001; Table 4).

Table 4 Predictors of time to relapse

Our secondary analysis examines WT1 ratio, risk and the interaction of the WT1 ratio and risk as predictors of time to relapse. The interaction of the WT1 ratio and risk was significant (P=0.0141), indicating that the WT1 ratio has a different effect on time to relapse in each risk group. The HR in the high-risk group is NS, indicating that the WT1 ratio does not have an effect on time to relapse (HR=1.428, P=0.7417); however, the hazard ratio in the low-risk group is significant (HR: 1.007; P=0.0005). In another Cox model with WT1 ratio and log-transformed pre-alloHCT WT1 ratio as predictors of time to relapse, only the WT1 ratio was significant, but the log-transformed pre-alloHCT WT1 ratio was NS (data not shown).

Relationship of transcript levels of WT1 in PB and BM and association with bcr/abl

To further characterize the performance of WT1 expression as a minimal residual disease (MRD) marker, we studied the association of WT1 expression levels in PB and BM specimens. There were a total of 107 time points where both PB and BM samples were available in 61 subjects. We observed a strong positive association between PB and BM WT1 expression levels using the repeated-measures regression model (standardized coefficient=0.9311, P<0.0001). In addition, we examined the association of WT1 with the clinically available bcr/abl transcript levels of the 15 Ph+ ALL and CML patients (75 time points). There was a strong positive association between WT1 and bcr/abl positivity (standardized coefficient=0.6799, P<0.0001). These results emphasize the reliability and validity of measurement of WT1 transcripts for MRD monitoring and relapse prediction.

Discussion

Our prospective study expands on earlier observations evaluating WT1 as a marker for relapse detection in leukemia and MDS patients.2,4,7,11,12,18,20,22,23,29, 30, 31, 32, 33, 34, 35, 36, 37 In contrast to previous studies, all of the enrolled patients were adults (no pediatric patients) undergoing alloHCT after achieving CR. Our approach combines biologic measurement with statistical analytic tools to define quantitatively predictive thresholds for the onset of relapse based on a large and uniform cohort of patients. Assessment of predictive value of WT1 mRNA transcript quantitation was based on sequential measurements by qRT-PCR, an established method.38 An additional advantage to our approach is the sole use of patient PB specimens for measurements rather than relying on the more difficult to obtain BM biopsy specimen used in many previous studies.2,4,18,20,22,24,29,35, 36, 37,39,40

The long-term prognosis for patients with acute leukemia and MDS who relapse after alloHCT is very poor, with median survival of ⩽6 months, with only 25% having longer survival.41 The usefulness of our approach is its capacity to reliably predict a 29-day (or a 63-day interval estimated by interpolation) interval of relapse detection from crossing the WT1 ratio of 50, a relatively low ratio. While our methods provide an estimation of risk, given the relation between disease burden and outcome, achieving a 63-day lead time before morphologic relapse has potential clinical benefits as treatment regimens can be implemented or altered, such as immunotherapeutics targeting WT142,43 while the tumor burden is low.44, 45, 46

The conventional relapse definition is based on BM having >5% blasts on morphologic exam. This approach has major limitations as follows: (1) it is not quantitative; (2) it is incapable of detecting or predicting impending relapse; (3) and it is dependent on BM sampling requiring patients to undergo invasive procedures that have greater risk compared with PB draws. Our study, similar to others published in the past decade, relied on less-invasive PB sampling to obtain quantitative PCR data for longitudinal analysis of WT1 transcripts.7,16,17,23,25,31, 32, 33, 34,47,48 Based on a comparison of the timing of conventional morphologic tests that only alert retrospectively if relapse has already occurred, the WT1 test offers superior prediction of relapse, because of its ability to uncover an earlier step in the 'evolution' of relapse.

We acknowledge that there are published reports on the association of WT1 and future relapse. However, many studies have not developed a predictive model for relapse using statistical algorithms,10,25,33, 34, 35, 36, 37,47, 48, 49 use incomplete longitudinal data or have limited clinical relevance because of BM sampling or selection of patients from inhomogeneous cohorts (in terms of therapy, remission status before alloHCT and age).2, 3, 4,31 While an early study derived a 40-day post-transplant prediction window, the underlying data that were used to generate the longitudinal analysis were incomplete and specificity was not calculated.2 Other studies used either cross-sectional or longitudinal measurements to analyze a few representative patients to derive a prediction window that is not well validated for clinical translation.20,22

Several studies have been informative and bolster our initiative, without duplication. A representative example was a study using diverse patient subgroups with acute leukemia that derived a rough estimation of prediction intervals based on a small cohort of relapses, with a minority of patients receiving allo- or autotransplant dependent on age at presentation.16 However, as the data measurements were not at comparable intervals for all patients, extracting a uniform predictive algorithm was not the objective as it was in our study. An exhaustive longitudinal study of AML patients having CR without transplant showed that WT1 transcript levels could be used as a predictive test, yet PB was far less sensitive than BM as a cell source for measurement.32 The comprehensive study sponsored by a European-wide consortium was optimally conducted, and also focused on chemotherapy-treated patients who did not receive alloHCT.7 Interestingly, our proposed WT1 ratio level of 50 as an important threshold is consistent with this large European study.7 The conclusion from all of this work is that a framework exists for using WT1 for diagnosing MRD or confirming relapse, but its use as a prognostic tool has not been fully developed. Our prospective post-HCT cohort study is a valuable addition that could be further refined and generalized for all acute leukemia patients who are at high risk of relapse, despite receiving an alloHCT.50

Our data demonstrated that when the WT1 test is at a ratio of 50, it can serve as a clinically relevant reference value imparting a clear biologic meaning as a biomarker identifying all patients who crossed this ratio as being at risk for impending relapse (PPV=100%). However, it failed to identify 25% of patients who relapsed without crossing this ratio (75% sensitivity). Our investigation seeking a stronger sensitivity level and thus a less stringent lower ratio aimed to enhance the value of the WT1 test by broadening the patient population whose relapse could be better predicted than by the highly specific WT1 ratio of 50. Even with a risk of false positives (15% for the WT1 ratio of 20), an increased sensitivity because of the identification of additional patients with impending relapse may be valuable, especially if less toxic therapies (i.e., histone deacetylase inhibitors, hypomethylating agents, proteasome inhibitors, monoclonal antibodies, bispecific antibodies) were an option to reduce risk of future relapse.51,52 Lower ratios also lead to an earlier detection of relapse, when the disease burden is minimal and therapeutic options are most effective.44, 45, 46

In summary, because of its greater sensitivity the WT1 molecular assay is an alternative to the conventional morphologic approach of enumerating blasts. Ours is the first prospective study of WT1 transcript kinetics in alloHCT recipients that establishes an observed time interval between molecular and hematologic relapse based on longitudinal analysis of data from patient blood specimens. The recognition that a WT1 measurement above a highly specific ratio is an unambiguous indicator of relapse enables the implementation of early interventional therapies. It can also guide mechanistically oriented early-phase clinical research, as it provides an earlier definition of patient clinical status. The objective is to apply WT1 transcript measurements as a diagnostic biomarker for detection of relapse in the highest risk individuals—acute leukemia and MDS patients undergoing alloHCT. Our study design could be easily replicated at other centers to confirm that WT1 is a clinically useful biomarker of relapse after alloHCT that will guide treatment decisions for such patients.