Introduction

Morphologic evaluation of the bone marrow has been traditionally important in the assessment of treatment response in acute myeloid leukemia (AML). Complete remission (CR) typically requires less than 5% marrow blasts based on differential enumeration, absence of Auer rods, and recovery of absolute neutrophil and platelet counts (> 1.0 × 109/L and > 100 × 109/L, respectively) [1, 2]. The requirement of 5% blasts reflects the observation that normal subjects usually have less than 5% blasts in marrow aspirates identified by morphology. However, morphologic evaluation is inadequate to predict treatment outcome, as most patients who achieve morphologic CR ultimately relapse. In contrast to the 5% sensitivity of morphologic evaluation, detection of an abnormal immunophenotype on myeloid blasts and/or presence of a genetic abnormality specifically associated with AML can achieve much higher sensitivity and specificity, thus providing a more accurate and sensitive test to detect residual disease. Residual leukemia detected by multiparameter flow cytometry (MFC) or molecular techniques in patients achieving CR is called minimal residual disease or measurable residual disease (MRD). Efforts to detect MRD in AML started in the early 1990s [3, 4], but the clinical importance has only gained widespread recognition in the past several years [5,6,7,8], as the 2017 guidelines from the European LeukemiaNet demonstrate for the first time with the separation of CR into CR-MRDnegative and CR-MRDpositive subgroups, the latter carrying a higher risk of relapse [9].

Detection of Leukemic Blasts by Multiparameter Flow Cytometry

Protein expression on the cell surface or in the cytoplasm can be semi-quantified by MFC through measuring the binding of fluorescently labeled antibodies. Antibody binding collectively constitutes an immunophenotype, which can be unique for a particular cellular lineage, maturational stage, and/or state. Immunophenotypes of normal hematopoietic maturation have been well characterized [10, 11]. By identifying immunophenotypic differences between leukemic blasts and normal hematopoietic stem and progenitor cells, MFC can identify leukemic blasts in the vast majority cases of AML at both diagnosis and relapse. Thus, MFC is suitable for MRD detection in almost all subtypes of AML with general protocol and practice guideline for MFC testing available in several publications [12,13,14,15,16,17,18,19,20]. The detection limit of MFC in AML typically ranges from 0.1 to 0.01% (or 10−3 to 10−4) of leukocytes, depending on the type of flow cytometer used, number of cells collected, antibodies tested, immunophenotypic differences between leukemic blasts and regenerative myeloid populations, and operator experience. Four types of immunophenotypic abnormalities typically detected on leukemic blasts include cross-lineage antigen expression, abnormal overexpression, abnormal loss of expression, and asynchronous expression [21, 22]. Antibodies commonly used in evaluation of myeloid neoplasms include stem cell and progenitor markers (CD34, CD38, CD90, CD117, CD123, CD133, HLA-DR), myelomonocytic markers (CD4, CD13, CD11b, CD11c, CD14, CD15, CD16, CD33, CD36, and CD64), erythro-megakaryocytic makers (CD41, CD42, CD61, CD71, and CD235a), and lymphoid lineage markers (CD2, CD5, CD7, CD19, and CD56). Newer markers, such as CD96 [23], CLL-1 (hMICL) [24, 25], and TIM3 [26], appear to be useful in evaluation for leukemic stem cells. In practice, the combination of antibodies tested depends on clinical utility, cost, operator experience, and diagnostic approach used to identify leukemic blasts. Several antibody panels have been recommended by expert panels [12, 27] or demonstrated in clinical practice [28,29,30], but testing remains largely not standardized. In order to have a detection sensitivity of 0.01%, 250,000 to 1,000,000 leukocytes are typically measured for MRD testing.

In general, leukemic blasts are identified by two related operational approaches (Table 1). One starts with a large number of antibodies to characterize pre-treatment leukemia-associated-immunophenotypes (LAIPs) that are not significantly present in the normal marrow; selected antibody combinations best representing these LAIPs are used after therapy to identify any cells with an immunophenotype identical to the pretreatment LAIPs [30,31,32]. The maximum sensitivity of each tested LAIP can be determined by the background expression in non-leukemic specimens, which is typically between 0.01 and 0.1%. This LAIP approach has a pre-defined sensitivity threshold for each tested LAIP and provides consistent post-test data analysis, but it requires highly harmonized testing protocols [19] and consensus LAIPs if multiple laboratories perform testing [27]. More importantly, the approach is not effective if LAIPs or normal background populations change significantly after therapy, which is not uncommon in AML [33]. In addition to earlier studies showing feasibility [30, 32, 34, 35], the utility of LAIP on 4-color flow cytometer has been demonstrated in the context of clinical studies of DCOG ANLL97/MRC AML12 [36], childhood AML02 [37, 38], MRC AML16 [39], HOVON/SAKK AML 42A [40], and AMLCG [41].

Table 1 Comparison of two MRD detection approaches using MFC

The second approach, named difference-from-normal (DfN), identifies blast populations with an immunophenotype significantly differing from normal myeloid stem cell and progenitor cell populations (Fig. 1), thus does not require pre-determined LAIPs, and uses the same antibody panel for diagnosis and MRD detection [42, 43]. This approach is more specific as it takes population density and distribution into consideration to distinguish background noise from a true abnormal population and is resilient to immunophenotypic shift. The DfN approach is also more practical in tertiary hospitals or reference laboratories, where the LAIPs identified at diagnosis may not be available. However, this approach requires extensive knowledge of immunophenotypic patterns in normal myeloid maturation and thus leads to greater interobserver variation, which can be improved if the pre-treatment LAIP is available. The utility of the difference-from-normal has been demonstrated in the Children’s Oncology Group study AAML03P1 [29] and several studies at Fred Hutch Cancer Research Center/University Washington [44,45,46,47,48,49]. As higher-level multicolor (≥ 8 simultaneous antigens) flow cytometry is increasingly adopted in clinical laboratories, it becomes more feasible to apply a suitably informative fixed antibody panel at diagnosis and after treatment, which in combination with greater flexibility in defining leukemic populations during analysis can potentially integrate the two approaches.

Fig. 1
figure 1

An example of AML MRD detection using a difference-from-normal approach. MFC dot plots of AML MRD in a background of marrow regeneration. Cells in the progenitor area using CD45 and side scatter gating are displayed as “Blasts”. Subpopulations of maturing hematopoietic cells are color-coded as: aqua-hematogones; green-granulocytes; magenta-monocytes; orange-regenerative CD34+ myeloid progenitors; and red-leukemic blasts (representing 0.1% of total leukocytes). The CD117-positive leukemic blasts have increased CD33 and abnormally decreased to absent expression of CD13, CD15, and HLA-DR, an immunophenotype without a normal counterpart

Irrespective of diagnostic approach, the aforementioned studies have consistently demonstrated that MRD detected by MFC at any time point after induction therapy is a significant risk factor for relapse. In particular, (1) MRD detected later in therapy has a higher positive-predictive value and MRD-negativity achieved earlier after induction and maintained after consolidation has higher negative-predictive value [29, 37, 39,40,41] for outcome; (2) post-induction MRD positivity, even when reduced or cleared after consolidation, is still associated with a higher risk of relapse [29, 36, 41]; (3) hematopoietic stem cell transplant alone cannot effectively neutralize the risk of MRD [44,45,46, 49]; (4) MRD detected by MFC is only present in approximately 50% patients that eventually relapse, highlighting the limitation of this methodology in its current form.

Detection of AML-Associated Genetic Abnormalities by Quantitative Reverse Transcription Polymerase Chain Reaction

AML is a disease driven by heterogeneous genetic abnormalities. The presence of AML-specific genetic abnormalities after treatment can be surrogate markers for the presence of residual disease. The most commonly used molecular technique in AML MRD testing is real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR), which is a quantitative test of mRNA transcripts specifically expressed in AML. Detailed RT-qPCR protocols are available [50, 51]. The test consists of a two-step reaction: total RNA extracted from a specimen is first converted to cDNA through reverse transcription; then, the relative quantity of specific cDNA is measured in a RT-qPCR reaction using oligonucleotides that specifically hybridize to the sequences of interest. After measurement, the copy number of tested cDNA is calculated by comparing to a standard curve and then normalized to transcripts of a house-keeping gene to correct for differences in sample loading. ABL1 is the most commonly used house-keeping gene, which has the most constant expression in normal and leukemic cells [52]. The result of real-time RT-qPCR is usually expressed as the percentage of the copy number of the tested gene transcript to the copy number of ABL1; comparison of measurements made before and after treatment is typically expressed in changes on a log scale. The combination of PCR amplification and overexpression of tested transcripts in leukemia makes RT-qPCR analytically the most sensitive testing technique in AML MRD detection . It has been mostly applied in detection of AML with recurrent chromosomal translocations, NPM1 mutation, and WT1 overexpression.

Detection of Gene Fusion Transcripts in AML-Associated Recurrent Chromosomal Translocations

Approximately 15–20% of AML harbor recurrent chromosomal translocations, including t(8;21)(q22;q22), inv(16)(p13;q22), and t(15;17)(q22;q21). The effort to monitor treatment response in AML using these gene fusion transcripts started in the 1990s [4, 53,54,55,56,57,58]. In 2003, Europe Against Cancer published standardized testing protocols of RT-qPCR of gene fusion transcripts for residual disease detection in acute leukemia [51]. The standardized tests of t(8;21)(q22;q22) (RUNX1-RUNX1T1) and inv(16)(p13;q22) (CBFB-MYH11) have a lower limit of quantitation at 10 copies of gene fusion transcripts per reaction and a lower limit of detection at approximately 0.001% (copies gene fusion/copies of ABL1), depending on the expression level of the tested gene fusion transcript. The clinical utility of RT-qPCR testing of RUNX1-RUNX1T1 and/or CBFB-MYH11 have been demonstrated after induction/consolidation in CETLAM/LAM-99 [59], German-Austrian AML study group [60], MRC AML-15 [61], AML05 [62], and CBF-2006 [63, 64]. These studies show consistent findings: (1) a less than 3 log reduction in gene fusion transcripts in the bone marrow at the end of induction/intensification is the most important independent risk factor for relapse; (2) persistence of low-level gene fusion transcripts in the bone marrow after therapy are not associated with an increased risk of relapse; (3) molecular relapse in peripheral blood (PB) after treatment is associated with a high risk of hematologic relapse, with a median interval approximately 4–5 months. Furthermore, molecular relapse in the bone marrow detected in the first 3 months after allogeneic hematopoietic stem cell transplant is an independent risk factor of relapse in patients with AML associated with t(8;21) [65]. Notably, detection of the gene fusion transcript by RT-qPCR does not correlate with the blast percentage by morphology [66], and there is no correlation between the kinetics of reduction in gene fusion transcript and risk of relapse [60].

Detection of Mutated NPM1 in AML Associated with NPM1 Mutation

Mutation of NPM1 occurs in approximately 25–30% of adult AML and consists of recurrent frameshift insertion in exon 12, which disrupt the nuclear translocation signal at the NPM1 C-terminus [67]. NPM1 mutation is present in more than 95% of AML at relapse and thus is a suitable target for MRD testing [68, 69]. An allele-specific qPCR test of mutated NPM1 was first developed in 2006. The test has a lower limit of quantitation of 10 copies of plasmid molecules and can detect mutated NPM1 genomic DNA at 10−4 to 10−5 and mutated NPM1 transcript at 10−5 to 10−6 [70]. Although the clinical utility of testing mutated NPM1 in genomic DNA was demonstrated [71], testing mutated NPM1 transcripts has been the preferred test in several studies including German Study Groups [68], German-Austrian AML Study Group [72], Study Alliance Leukemia [73], AMLCG [74], NCRI AML17 [69], and ALFA-0702 [75]. These studies demonstrate (1) the presence of mutated NPM1 (≥ 1% mutated NPM1/ABL1) in the bone marrow at the end of therapy is associated with a significantly higher risk of relapse [73]; (2) the absence of detectable mutated NPM1 in the bone marrow at the end of induction predicts a significantly lower risk of relapse [72]; (3) a less than 4-log reduction or > 0.1% (mutated NPM1/ABL) of mutated NPM1 in peripheral blood after induction therapy is more predictive than testing peripheral blood at any other time point or in the bone marrow at any time point during therapy [69, 75]; and (4) a significant rise of mutated NPM1 in peripheral blood after completion of therapy predicts relapse, with a median interval of approximately 3 months [69, 72].

Detection of Abnormal WT1 Gene Overexpression

Abnormal gene expression in AML detected after treatment can be used as a surrogate marker of residual disease. The most extensively studied gene is WT1, a zinc finger transcription factor that is overexpressed in a subset of AML cases at a level approximately 103 times higher than in the normal bone marrow (BM) and more than 105 times higher than in the peripheral blood [76]. WT1 expression is typically measured using RT-qPCR with a sensitivity that varies from 10−2 to 10−4 in a dilutional study of AML cell lines [77]. Using a standardized assay in the European LeukemiaNet, Cilloni and colleagues demonstrated expression of WT above normal background at the time of AML diagnosis in 86% of the bone marrow and 91% of peripheral blood samples; however, only in 13% of the marrow and 46% of the peripheral blood was WT1 expression sufficiently elevated to allow detection of at least a 2-log reduction after treatment [77]. Although several studies in AML have demonstrated that residual WT1 expression above background after therapy is associated with a higher risk of relapse [78,79,80,81], the value of WT1 expression-based MRD testing is still in debate due to its limited sensitivity and specificity.

Digital PCR

Real-time qPCR offers a simple and sensitive test but requires a standard curve for quantitation. Comparison of results generated by different laboratories or over different time periods at the same laboratory requires considerable effort to standardize the test protocol [66, 82]. In addition, real time qPCR is vulnerable to background noise generated from nonspecific primer cross-hybridization. These technical shortcomings can be overcome by digital PCR (dPCR), a technique derived from qPCR [83] and recently adopted by a few clinical molecular laboratories [84]. Instead of the bulk-reaction used in analog RT-PCR, the dPCR reaction is conducted in thousands to millions of partitions in microfluidic chambers or oil/liquid emulsion droplets, each containing 0 or 1 template molecule to be tested and the necessary reagents for the reactions. After completion of the reactions, the number of partitions containing fluorescence-labeled PCR products above a threshold is measured. Because dPCR uses endpoint detection of the amplified product to count the absolute number of template molecules, the efficiency of amplification is less of a concern and plasmid standards or calibration curves are not necessary. The digital nature of the measurements also improves precision at the lower limit of detection by eliminating low-level noise due to nonspecific cross-hybridization.

In the detection of BCR-ABL1 gene fusion transcripts, dPCR has a lower limit of detection close to 0.001% International Scale, comparable to conventional RT-qPCR [85, 86]. dPCR has also been explored in the detection of AML hot-spot mutations in DNMT3A and IDH1/2 where it demonstrated a detection sensitivity of 10−3 mutated allele frequency [87, 88] and in a large variety of subtypes of mutated NPM1 transcripts where a detection sensitivity of 10−5 was seen [89, 90]. However, application of dPCR in AML MRD detection is still in the early stages of development, and its utility remains unclear in comparison with next generation sequencing technologies.

Detection of AML-Associated Genetic Abnormalities by Next Generation Sequencing

Quantitative PCR-based MRD detection requires a consistent abnormal sequence for hybridization with the corresponding oligonucleotide probe. Although the test is analytically sensitive and specific, it is only suitable for the less than 50% of AML cases associated with recurrent gene fusions or mutations. With the recent revelation of comprehensive genetic landscapes for AML [91,92,93], next generation sequencing (NGS) has been explored in monitoring response in AML after therapy [94,95,96,97,98,99,100,101]. In principle, NGS-based MRD detection is similar to PCR-based MRD detection, except genetic abnormalities are detected directly by DNA re-sequencing, providing increased specificity. In an NGS MRD test, DNA fragments of the regions of interest are captured and amplified by PCR, the PCR products sequenced in a massively parallel fashion, and the sequences reassembled and compared to expected reference sequences. The percentage of a specific abnormal sequence out of the total number of sequences for a tested region is commonly expressed as the variant allele fraction (VAF) and corresponds to the level of underlying disease. Unlike quantitative PCR, NGS does not require oligonucleotides that hybridize specifically to a particular sequence; thus in theory, NGS can detect any sequence variation in the tested regions and permits parallel testing of multiple genetic abnormalities. The advantage of NGS in AML MRD testing is best demonstrated in detection of the mutated NPM1 exon 12 [94, 95]. The test has a linear dynamic range and lower limit of detection compatible to qPCR and can detect all subtypes of NPM1 mutation without using allele-specific oligonucleotide probes. However, detection of mutated NPM1 likely represents a best case scenario in NGS-based AML MRD testing given that all subtypes of NPM1 mutations consist of ≥ 4 base-pair insertions in exon 12; thus, random chance generated by sequencing error is exceedingly rare, and mutation in NPM1 is a leukemia-driver occurring late in leukemogenesis.

NGS as a testing platform in AML MRD faces three critical challenges: (1) technically distinguishing random sequence error from true genetic abnormality; (2) clinically distinguishing genetic abnormality in a pre-leukemic background [102,103,104] from that present in leukemia;, and (3) confounding by dynamic changes in leukemic clonal heterogeneity during disease progression [104, 105]. The first challenge has largely been addressed through the use of molecular barcoding. The sequencing error rate of NGS is approximately 0.05 to 1% with 2% VAF being the commonly accepted limit of detection. With the use of molecular barcoding to label each individual molecule in the starting material, the sensitivity of NGS to detect rare mutations can be significantly improved [106,107,108]. Recent studies have demonstrated that multiplexed NGS testing using tagged molecular barcodes allows the simultaneous detection of mutations in several hundreds of amplicons with a sensitivity ranging from 0.1 to 0.001% VAF [101, 109, 110]. The wide variation in detection sensitivity may be in part due to variable tagging efficiency of molecular barcodes and the balance between test multiplexity and depth of sequencing coverage.

Compared to the technical solutions for minimizing the effect of sequencing error, interpreting the clinical significance of mutations detected in AML after therapy is more challenging. Some patients essentially clear all mutations at remission, while in others, only clear a subset of mutations, and in others, mutations persist in virtually all cells despite normal morphology [98]. Leukemia-initiating mutations, especially these in epigenetic modification pathways (DNMT3A, TET2, ASXL1, IDH1/2), can persist in stable remission, whereas mutations in NPM1 or genes involving proliferative pathways usually disappear in patients in stable remission. Nevertheless, detection of leukemia-associated mutations in more than 5% of bone marrow cells [98] or detection of more than one leukemia-associated mutation in more than 0.8% of bone marrow cells (> 0.4% VAF) [101] is associated with increased risk of relapse. This finding is similar to that seen using standard cytogenetic techniques [111]. Despite early promising results, the significance of residual leukemia-associated mutations after therapy in AML remains to be demonstrated in a large study using a high-sensitivity comprehensive NGS assay.

Perspectives

Evidence accumulated over the last 25 years has firmly established that the presence of MRD after induction and/or consolidation is a significant risk factor for relapse in AML. The recent ELN recommendation to establish CR-MRDNegative as a separate category of treatment response is a major step forward to the integration of MRD testing into standard clinical practice. The slow acceptance of MRD as standard care is in part due to a lack of standardization in methodology and guidelines for MRD assessment, especially regarding the clinically relevant detection sensitivity, optimal methods for evaluation, and the timing of MRD testing. Assessment of treatment response focused on MRD likely will require an integrated approach combining immunophenotyping and molecular detection techniques (Table 2) beyond evaluation at the time of hematologic recovery as recommended by current NCCN guidelines.

Table 2 Highlights of AML MRD detection

Molecular techniques, such as RT-qPCR, offer the highest analytical sensitivity and specificity to detect genetic biomarkers of leukemic cells, but technical sensitivity may not directly translate into clinical prognostic utility. In core-binding factor AML, gene fusion transcripts detected at diagnosis do not correlate with the level of leukemic blasts, as leukemic blasts can undergo myelomonocytic maturation to varying degrees. While the presence of maturing leukemic populations may increase MRD detection sensitivity, they can also complicate clinical interpretation of the findings, particularly when the signal is confined to maturing forms that lack intrinsic leukemic potential. In one study of pediatric AML, molecular MRD of core-binding factor AMLs was largely uninformative after adjustment for MRD detected by MFC [38]. On the other hand, MFC directly detects the immunophenotypic signature of leukemic blasts and is a more direct and integrated measurement of underlying AML, albeit with less sensitivity than molecular assays, and so provides a higher positive predictive value for the likelihood and time interval to relapse. Unlike MRD detected by RT-qPCR, there is no clear clinical threshold of MRD detected by MFC with a DfN approach, as all patients with detectable MRD are in higher risk of relapse [47]. Growing evidence also suggests AML patients in CR with MRD detected by MFC carry a similar risk of disease progression as patients in partial response (PR) [49][EHA 2017 abstract 3496]. Indeed, it may be prudent to consider MRD associated with impending relapse, such as higher level MRD detected by MFC, as partial response-MRD (PR-MRD), in contrast to low-level molecular MRD that is usually associated with longer time interval to relapse. In this context, additional studies are still needed to better understand the positive and negative-predictive values of MRD detected by different techniques during and after therapy. Such knowledge will become more important once high-sensitivity NGS panels are implemented for MRD detection. In particular, many recurrent mutations, i.e., hot-spot mutations, detected in AML at diagnosis also accumulate in age-related clonal hematopoiesis and may be observed in AML patients in stable remission. These mutations are considered founder mutations occurring in pre-leukemic clones, which are insensitive to conventional chemotherapy but can be eliminated by allogeneic stem cell transplant. Nevertheless, evidence suggests the presence of founder mutations, such as DNMT3A, is associated with higher risk of relapse after therapy. Distinction between persistent leukemia surviving induction therapy and pre-leukemic clones with higher secondary malignant potential will be critical in evaluating treatment response and likely will require either a multimodality approach using MFC and high-sensitivity molecular techniques and/or novel single-cell molecular approaches.

The current NCCN guidelines recommend bone marrow assessment after induction at the time of hematologic recovery. This morphology-focused schedule is not ideal for monitoring of therapeutic response. Studies using MFC and RT-qPCR have consistently demonstrated (1) the successive absence of MRD after induction and at the end of therapy is more predictive of stable remission; (2) the presence of MRD at the end of therapy is more predictive of relapse; (3) clearance of MRD detected by MFC after induction may not effectively neutralize the risk of relapse. These findings indicate that MRD assessment is best performed at multiple time points, minimally at the end of first induction and at the end of therapy. In addition, the high sensitivity of multiplexed molecular techniques allows MRD monitoring of peripheral blood after completion of therapy, a characteristic whose utility has already been demonstrated in core-binding factor AML and AML associated with mutated NPM1. Although MRD testing on peripheral blood is less sensitive than testing on the bone marrow, the ease of obtaining specimens allows the possibility for more frequent monitoring, and so peripheral blood testing is likely to play a larger role going forward.

Conclusion

In summary, the recent ELN recommendation to establish CR-MRD as a separate category of treatment response represents a milestone in integrating MRD research into clinical practice. In addition, RT-qPCR and the advent in NGS-based techniques promise high-sensitivity molecular MRD detection for the large majority of AML. Together with MFC immunophenotyping, comprehensive MRD evaluation during and after therapy is already providing much improved clinical assessment of treatment response and will play an increasingly important role in guiding disease management in the future.