Introduction

Acute myeloid leukemia (AML) that develops from pre-existing hematologic diseases, rather than developing de novo, is known as secondary AML (sAML). A number of hematologic malignancies can progress to sAML, including but not limited to myelodysplastic syndromes (MDSs), chronic myelomonocytic leukemia and atypical chronic myeloid leukemia.1, 2, 3 Roughly one-third of patients diagnosed with MDSs or other related hematologic malignancies (henceforth collectively referred to as MDS) progress to sAML.1 MDS are heterogeneous clonal hematopoietic disorders characterized by dysplastic changes in one or more cellular lineages, causing impaired bone marrow function.1 The transformation from a normal stem cell into a preleukemic or leukemic state involves the accumulation of genetic abnormalities.

High throughput sequencing technology has led to a number of discoveries in the field of AML biology, including the existence of recurrently mutated genes, their prognostic values and their genomic classifications.4, 5 For example, commonly mutated genes and their relevant biological pathways have been revealed in a number of different tumors, including AML.6, 7 The Cancer Genome Atlas (TCGA) consortium has compiled a list of eight most commonly mutated pathways in de novo AML, which accounts for over 99% of the adult AML cases in their cohort.6 In line with this, a number of studies in recent years have improved our understanding on the pathophysiology of MDS and its prognostic factors as well.8, 9, 10, 11, 12 In particular, studies have used next generation sequencing to identify genetic alterations that are commonly mutated and have prognostic value as well as great genetic heterogeneity that accounts for much of the clinical heterogeneity of MDS.10, 11, 12, 13, 14, 15, 16, 17, 18 Mutations in genes affecting splicing machinery and chromatin modifiers have been shown to be overrepresented in sAML when compared with de novo AMLor therapy-related AML.19

Although a number of studies have investigated the patterns of somatic mutations in various myeloid neoplasms, the complete molecular and genetic characteristics of sAML still remain largely unclear. It has been demonstrated that mutations in splicing machinery, DNA methylation and chromatin modifications are commonly mutated in MDS.10, 11, 19, 20, 21, 22, 23 On the other hand, mutations in activated signaling pathway such as RAS mutations have been associated with the leukemic progression of MDS.19, 20, 24, 25, 26, 27, 28, 29, 30 Although relative timing and pattern of mutation acquisition in sAML can be computationally inferred based on variant allele frequency (VAF), the inferred model of clonal evolution would still require validation. Furthermore, the highly heterogeneous mutational profiles of the MDS and sAML complicate the search for general patterns of sAML progression. Previous studies have used serial sequencing to examine that sAML progression is associated with acquisition of new mutations.19, 30, 31 In particular, Lindsley et al.19 demonstrated that sAML progression is associated with the acquisition of mutations in genes associated with activated signaling pathways and myeloid transcription factors. Similar observation was recurrently shown by a recent study by Makishima et al. as well.30

To address these questions and dissect the order of mutation acquisition throughout the course of the disease, we performed whole-exome sequencing and/or targeted deep sequencing on serial samples from 31 sAML patients as well as targeted deep sequencing on an additional 93 patients, the latter consisting of a progressed and a non-progressed cohort. Our approach of analyzing genes grouped by the pathways they affect helps de-convolute the heterogeneity and reveal generalized patterns of the disease model. In the following study, we present the dynamics of the VAFs of somatic variants across different disease stages and demonstrate correlations between associated biological pathways and leukemic progression using serial samples.

Subjects and methods

Cohorts

This study examined several cohorts with a combined total of 124 patients (Table 1, Supplementary Table S1 and Supplementary Methods, Section A.1). The discovery cohort (C1) consists of 31 patients diagnosed with antecedent hematologic malignancies who all progressed to sAML. Whole-exome sequencing was performed for each case in C1 on fractionated T-cell samples as controls (CD3+ fraction) and bone marrow samples taken at the diagnosis of the antecedent malignancy and after sAML progression. The other cohorts include 72 non-progressed MDS patients (C2a, median follow-up 3.5 years) and an additional (non-overlapping) 21 sAML patients progressed to sAML from MDS (C2b), for whom samples from the MDS stage were not available. Targeted deep sequencing was performed on all cohorts.

Table 1 Summary and cohort division of 124 patients involved in this study

Sample preparation, sequencing, variant calling and computational analysis

The study was approved by the Research Ethics Board at Chonnam National University (Hwasun, Korea), Kyungpook National University (Daegu), Korea and Yonsei University (Seoul), Korea. All sequencing data are deposited at the European Nucleotide Archive (Study Accession: PRJEB18698). T-cell (CD3+ fraction) fractionation was performed using the MACS separation column (25 MS Columns; Milteniy Biotec, Bergisch Gladbach, Germany). DNA for all 258 bone marrow mononuclear cells was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Venlo, Netherlands). Whole-exome sequencing (Agilent SureSelect v4) was performed on the 93 samples from the discovery cohort as per the manufacturer’s protocol using an Illumina HiSeq 2000 sequencer (Santa Clara, CA, USA). Exome sequencing reads were processed for each stage using the methodology described in our previous case study32 and a list of significant variants was generated for each case in the discovery cohort. We selected 92 genes (Supplementary Table S2) for validation using targeted deep sequencing on all 258 samples from all of the cohorts. Targeted sequencing was performed using an Agilent custom probe set of the selected 92 genes. We multiplexed and sequenced the samples using an Illumina Hiseq 2000. The reads were processed in the same way as the whole-exome sequencing data. T-cell samples were used as controls for disease samples when calling variants for all cases in whole-exome sequencing data and all cases except for C2b in targeted sequencing data, and were also used to call CHIP (clonal hematopoiesis of indeterminate potential) variants (Supplementary Methods A.3 and Supplementary Figure S1). Sequencing metrics are detailed in Supplementary Tables S3 and S4. Our in-house variant calling pipeline and algorithms, pathway analysis, and subclonal analysis are described in Supplementary Method, Sections A.2–5.

Results

Exome sequencing reveals somatic variants in MDS and secondary AML

Whole-exome sequencing was performed on a trio of T-cell, MDS and sAML samples for each patient in a discovery cohort of 31 patients (C1, median time until sAML progression of 1.2 years) to profile the origin and landscape of somatic variants (Figure 1 and Supplementary Table S5). The average read depth retrieved for target regions was 73 × and at least 80% of the captured regions had a read coverage of 20 × or greater for all 93 samples (Supplementary Table S4). After calling and prioritizing variants as described in Subjects and methods, we found a mean and median of 7.7 and 6 significant variants per patient at the time of initial diagnosis, and 12.4 and 10 variants after sAML progression, respectively (the mean depth of all significant variants was 109 ×). The mean VAF of the variants at the MDS stage expanded from 18.8 to 31.2% at the sAML stage (Supplementary Figure S2). In total, we detected 399 variants (see Supplementary Methods, Section A.2 for variant calling details). The 399 somatic mutations consist of 261 non-synonymous single nucleotide variants, 87 synonymous single nucleotide variants, 25 stop-gain mutations, 9 frameshift deletions, 5 frameshift insertions, 2 non-frameshift deletions and 10 splicing variants from 340 genes. Among these, 31 genes had variants in multiple patients. A total of 89 variants in 48 commonly mutated genes in myeloid disorders were further validated with a true positive rate of 100% using targeted deep sequencing in all 93 C1 samples (mean depth of 1224.24 × and VAF correlation between two platforms measured by Pearson’s Rho ~0.96, Supplementary Table S5 and Supplementary Figure S3). We also detected using Genomon-ITD that FLT3-ITD emerged in two patients (SAML-04 and SAML-07) after sAML progression.33

Figure 1
figure 1

Gene-wise mutational landscape. Clustered by gene using the whole-exome data of (a) samples taken after sAML, (b) samples taken at MDS and (c) T-cell samples. (d) The change in VAF from MDS to sAML. The color scale corresponds to VAFs in (ac), and the ΔVAFs in (d) Red ID labels indicate the patients that gained FLT3-ITD at sAML. Only genes that occur in at least two patients are shown for visual clarity.

Pathway analyses dissect the order of stepwise somatic variant acquisition in relevant biological pathways

We compared the proportion of the 31 cases having variants in genes affecting each pathway (see Table 2 for the gene lists per pathway) at each stage (Figure 2a). Consistent with previous studies, there are a significant number of cases with variants in genes involved in DNA methylation and/or splicing machinery (35.5% and 48.3%, respectively).10, 11 However, the portion of cases with variants affecting these pathways increases significantly at the MDS stage, but does not change much by the sAML stage. This suggests that variants within DNA methylation and splicing machinery contribute to the development of MDS but not to sAML progression. On the other hand, variants in genes involved in activated signaling pathways show a distinctive pattern. The portion of cases with variants affecting activated signaling pathways noticeably increases at the sAML step (25.8 to 54.8%). The changes in VAF between stages within cases of these variants reveal a similar pattern (Figure 2b). Indeed, the changes from T-cells to MDS VAFs are significantly distinct compared with the changes from antecedent malignancy to sAML VAFs in DNA methylation, splicing machinery and signaling pathway variants (two-sample Kolmogorov–Smirnov test, P<0.002, P<0.004 and P<0.013, respectively), though the change between time points in the case of signaling pathways was in the opposite direction of the other two biological pathways. Mutations affecting DNA methylation or splicing machinery were also detected in the T-cell samples of five patients with VAFs that all expanded by the diagnosis of the MDS (Figure 2c). The distinct behaviors of these pathways allow us to infer that there exists a general trend in the multi-step mutation acquisition process leading to sAML progression.

Table 2 Eight commonly mutated pathways in AML and genes involved in each of them
Figure 2
figure 2

Pathway hierarchy in the discovery cohort. Each case was considered to have affected pathway if it had at least one variant with a minimum VAF of 5% in a gene associated with that pathway. (a) Portion of cohort with mutated pathways in each stage. (b) Inter time point change of VAFs in mutated pathways. (c) Expansion of CHIP variants by MDS.

These findings are consistent with previous studies, which have shown that somatic variants in genes involved in activated signaling pathway are enriched in de novo AML.20 Changes in VAFs of splicing machinery variants (Figure 3a) and DNA methylation variants (Figure 3b) over disease stages have very similar patterns (general linear model with repeated measures insignificant, P-value ~0.357), while both have a distinct pattern compared with the change of VAFs in activated signaling variants (general linear model with repeated measures, P-values ~0.044 and ~0.012; Figure 3c). The VAF changes of variants associated with chromatin modifiers show a mixed pattern, suggesting that they can be either early or late events (Figure 3d).

Figure 3
figure 3

VAF dynamics of pathways across stages. Dynamics of VAFs at three time points of variants associated with (a) splicing machinery, (b) DNA methylation, (c) activated signaling and (d) chromatin modifier. The white points mark the medians per stage point, and spread marks indicate standard error.

Tracing the origin of somatic variants using serial samples

In our pathway analysis, we used a threshold of a 5% VAF when marking a pathway as 'mutated' in a given sample at a given stage. We also investigated the origin of somatic variants in the three pathways with distinct inter-stage patterns. As was done in Klco et al,34 we applied a threshold of 2.5% (which means 5% of cells carry the variant for a heterozygous mutation) to search for evidence of early development of mutations in cases where they were insignificant in one or two stages but significant in the other one or two. For each variant, we examined all disease stages prior to the time of clear detection level (VAF >5%). Importantly, none of the 20 mutations related to activated signaling pathways had detectable origins in T-cell samples (measured VAFs range from 0 to 0.82%) and only one variant not already deemed significant by the MDS stage had evidence of originating at that time (NRAS-G13D, 4.81% in whole-exome sequencing and 3.84% in targeted sequencing). Among the eight total mutations related to DNA methylation (there were four additional variants already significant in the T-cell samples), three of them were detected at more than 2.5% (3.16%, 4.76% and 3.5% in TET2, DNMT3A and IDH2, respectively). VAFs of all three were also measured at higher than 2.5% in the targeted deep sequencing data (4.00, 3.55 and 7.25%). Similarly, 2 of the 16 mutations related to splicing machinery were also detected with VAFs higher than 2.5% (4.35 and 4.55% in SRSF2) in T-cell samples (three additional variants had already been deemed significant in the T-cell samples in our pathway analysis). Overall, these tendencies show that the mutations related to activated signaling pathways occur at a later stage than the other two commonly mutated pathways. sAML progression is therefore not likely to directly originate from genetic characteristics of preleukemic clones.

Mutation profiles of progressed and non-progressed MDS patients support the disease model

We further validated our observations of stage-specific genetic traits using two independent non-overlapping cohorts consisting of progressed and non-progressed MDS patients (C2b and C2a). As such, 72 additional MDS patients who did not progress to sAML (C2a) and 21 sAML patients progressed from MDS (C2b) were included for targeted deep sequencing (Supplementary Figures S4 and S5 and Supplementary Tables S4 and 6, mean on-target coverage of 808x). Using Fisher’s exact test, no significant differences in proportion were noted between the two groups in DNA methylation (23.8% vs 20.8% in C2b vs C2a, not significant) and in splicing machinery (23.8% vs 20.8% in C2b vs C2a, not significant); however, there was a significantly higher frequency of variants in activated signaling pathway genes in C2b compared with those in C2a (19.0% vs 2.8%, P<0.03). As in the discovery cohort where 16% of patients harbored variants in T-cell samples, we found 20 out of 72 patients in the non-progressed cohort (C2a) with variants in the T-cell samples, 14 of which included variants in DNA methylation and splicing machinery (7/20 in DNA methylation, 7/20 in splicing machinery exclusively), and one of which included a variant in activated signaling pathways (KRAS). Eighteen of these 20 patients only carried one detected variant at this stage (Supplementary Figure S6).

Discussion

Our study describes the general pattern of sAML progression of hematologic malignancies using mutational profiles of serial samples. It is clear that sAML progression is associated with an increased mutation burden in terms of the number of variants and/or the VAFs (Figure 1). In particular, the mutation burden affecting activated signaling pathways increases significantly as MDS progress to sAML. On the other hand, mutations associated with DNA methylation and splicing machinery including preleukemic mutations increase significantly with the development of MDS but not during sAML progression.

Clonal origin of sAML from a pathway perspective

Many of the clonal analyses results of the cases in the discovery cohort had similar patterns to each other. We can posit two general clonal categories based on their dynamics across disease stages. The first contains pathways correlated with the development of the MDS, and includes clones driven by mutations in splicing machinery and DNA methylation. The second is characterized by the patterns shown in activated signaling variants, which undergo expansion during sAML progression. These two clear patterns are consistent with previous studies.19, 30 Pathways such as chromatin modification, which show mixed patterns in different cases with different particular variants, may have subdivisions that belong to either category.

Ten distinct clonal evolution patterns can be postulated based on the acquisition/expansion of mutations related to the three signature pathways (Figure 4 and Supplementary Figure S7 and Supplementary Methods A.5). Sixteen percent of patients develop the MDS from preleukemic mutations associated with DNA methylation or splicing machinery, and 26% are first seen to develop clones of this category at the MDS stage (a total of 42%). Forty-eight percent of patients show growth or development of clones containing activated signaling pathway variants at the sAML stage. A further 10% had clones with activated signaling pathways that did not significantly grow at the sAML stage, and these notably lacked observed variants from any of the other seven pathways. Twenty-two percent of patients carry only mutations associated with other commonly mutated pathways in AML or no mutations, but they were not observed frequently enough to generalize their patterns.

Figure 4
figure 4

Clonal evolution of sAML. (a) Clonal evolution of a single sAML patients (SAML-12). (b) Generalized pattern of clonal evolution observed in this study.

De novo AML vs secondary AML

Previous studies on de novo AML have already shown notably frequent occurrence of mutations in activated signaling pathways.6, 11 Recent work by Lindsley et al.19 concluded that variants in splicing machinery and chromatin modifiers are sAML-specific when compared with de novo AMLand therapy-related AML. In our study, we also noted a frequent occurrence of splicing machinery mutations at the MDS clone, which indeed carries over with a similar rate in the sAML clone. Notably, however, the VAFs of DNA methylation pathway variants and splicing machinery variants plateaued during the progression, implying their relatively weak association with sAML progression (Figures 3a and b).

Patients with preleukemic mutations

The presence of variants in T-cell samples in 5 C1 and 20 C2a patients provides evidence on the relative timing of early events for a subset of patients. The T-cell variants with pathway associations are all involved in DNA methylation (DNMT3A, IDH1/2 and TET2) or splicing machinery (SRSF2 and SF3B1). Both cohorts notably lack activated signaling pathway variants at this stage (with one exception in C2a). These patients show evidence of having CHIP, a preleukemic condition recently proposed by Steensma et al.35 CHIP refers to the condition where a somatic variant is present but failed to meet any diagnostic criteria of hematologic malignancies. A recent large-scale study demonstrated that variants associated with CHIP significantly increase the risk of hematologic cancer (hazard ratio=11.1).36 Most variants we found in T-cell samples are consistent with this study (TET2, DNMT3A, IDH1/2, U2AF1/2, SRSF2, SF3B1, ASXL1, BCOR, JAK2 and TP53).35 Interestingly, 24% of the patients in our cohorts (25/103) develop the MDS from such variants, suggesting that there could be a link between preleukemic mutations and hematologic malignancies. However, VAFs of these variants only expand significantly at the time of initial diagnosis of the myeloid disorder, suggesting that they may be involved in the development of MDS but might not play a direct role in the sAML progression.

Overview and future direction

Comprehensive analyses in this work present a novel overview of sAML progression by investigating groups of genes associated with biological pathways in serial samples, rather than focusing only on individual variants at single disease stages from unrelated individuals. Our analyses show that antecedent malignancies can be thought of as a subtype of preleukemic clones that are also themselves diseases. Our work shows clear patterns of clonal evolution from a pathway perspective and distinct roles of different biological pathways at different disease stages.

Multi-omic profiling of other genetic or epigenetic makeups such as RNA and bisulfite sequencing could provide further insight into progression, in particular for patients who do not acquire new mutations between stages. In addition, further validation in independent cohorts is an important next step in expanding on these discoveries. Our findings also call for functional studies on how somatic variants in the genes that affect activated signaling pathway are related to sAML progression. Since sAML ultimately appears to be distinguished from de novo AMLby the variants acquired during the development of the MDS, functional studies might also offer insight into how sAML clones containing variants from MDS differ from de novo AMLclones that lack them, as well as insight into their poor prognosis.

Conclusion

Our study dissects the distinct genetic characteristics of the development of MDS and their sAML progression. Progression to sAML is associated with additional mutation burden in genes affecting activated signaling pathways regardless of the subtype of the antecedent hematologic malignancies. This is consistent with the known biology behind de novo AML, thus sAML progression of antecedent malignancies is likely to be driven by altered activated signaling pathways.