Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

Despite extensive research, the etiology of childhood acute lymphoblastic leukemia (ALL) remains largely unknown. There is growing evidence that this cancer may arise from in utero chromosomal abnormalities that can lead to clonal expansion of pre-leukemic precursor cells. The risk factors for ALL in children are multiple, most notably common germline polymorphisms and rare genetic syndromes that directly influence hematopoiesis and cell cycling, as well as possibly infection-related aberrant DNA editing.

1.2 General Epidemiology

The incidence of ALL varies by age, ethnicity, geographic region, and also differs by immunologic and molecular subtypes. In both the United States and the Nordic countries, the overall incidence rate is 3.9 per 100,000/year before the age of 15 years [1, 2]. The incidence is higher in Hispanic Americans (4.1 per 100,000/year), and is lower in African American children (2.1 per 100,000/year) [3, 4]. In general, low-income countries have lower incidences of ALL than high-income countries, with a few exceptions such as Costa Rica (4.6 per 100,000/year), however these differences may be the result of incomplete registration [47]. The incidence of ALL shows a characteristic peak between 2 and 5 years after birth [2, 7], but age-related ALL risk differs substantially by cytogenetic subtype (Fig. 1.1). ALL in infants (<1 year) is in most cases characterized by MLL gene rearrangements (rMLL), which are rare in older children [810]. Between 2 and 5-year olds, ALL is dominated by high-hyperdiploid (HeH, modal chromosome number >50) and t(12;21)[ETV6-RUNX1] karyotypes, while T-cell ALL has a less pronounced peak around 4–9 years [2, 10, 11]. In low-income countries, the 2–5 year age peak is much less obvious, with a higher proportion of T-ALL [57, 1215]. Interestingly, some studies noted incremental increase in ALL incidences specifically in this age range as a function of economic growth and improving living conditions [1619]. Taken together, these observations (1) suggest that different ALL subtypes may have distinctive etiological mechanisms and (2) point toward possible effects of economic development-related environmental factors on ALL risk.

Fig. 1.1
figure 1

Age distribution of childhood ALL cases by immunologic and molecular subtypes. Numbers represented are all children diagnosed with ALL in Denmark, Sweden, Norway, Finland, and Iceland between 1992–2007. Upper panel: bar heights represent the number of cases in each age group relative to the total number of cases between 0 and 14 years. Lower panel: relative distribution of subtypes within each age group. The testing for t(12;21)[ETV6-RUNX1] by fluoresence in situ hybridization was gradually introduced during this period, and accordingly some amp(21) patients have been missed. Ph+, Philadelphia chromosome-positive; HeH, high-hyperdiploid

1.3 Natural History

Monozygotic twins have a 10–20% concordance rate for ALL, and concordant cases have been shown to harbor identical and clonotypic molecular signatures (e.g. ETV6-RUNX1 fusion sequence, or T-cell receptor (TCR), immunoglobuline (IGH) gene and MLL rearrangements) possibly because a leukemic or preleukemic clone arose prenatally in one twin and spread to the other through placental vascular anastomoses [2025]. Further evidence for a prenatal initiation is provided by studies backtracking disease-specific molecular markers in both twin and singleton leukemias in dried blood spot samples (DBSS) from birth (Table 1.1).

Table 1.1 Backtracking studies and their findings

For infant rMLL ALL, rearrangement has been identified in DBSS in the vast majority of cases, suggesting that this disease almost always arises prenatally. Older patients with rMLL are usually DBSS negative, but the translocation has been successfully backtracked in one case diagnosed at 6 years. Similar findings have been reported for ETV6-RUNX1 ALL. This translocation causes a fusion of the ETV6 and RUNX1 genes, and the resulting chimeric protein has been shown to promote cell survival in mice and human cells [4446]. Three studies on concordant (monozygotic) twins revealed identical ETV6-RUNX1 fusion sequences in both twins and clonal expansion of fusion-positive precursors at a minimum level of 10−4 preleukemic cells at birth. Prenatally initiated ETV6-RUNX1 + cases have had a latency of up to 14 years before overt leukemia occurred [47]. Furthermore, two ALL-discordant twin pairs have been described in which the healthy twin also harbored an ETV6-RUNX1 + clone at birth [30] or at age 3 [44], suggesting that the translocation in itself is insufficient for leukemia development. Leukemic ETV6-RUNX1 + cells harbor a variable number of additional mutations; often a deletion of the wildtype ETV6 allele or other genes involved in B-lymphocyte development and differentiation [4851]. Molecular studies of concordant monozygotic twins showed that these mutations are unique to each twin and thus occur as secondary postnatal events [51, 52]. An often cited study found that 1% of all healthy newborns harbored ETV6-RUNX1 at birth (i.e. 100-fold of the incidence of ETV6-RUNX1 ALL) [53], but subsequent validation studies have raised questions about the reliability of the initial finding [5458]. Thus, while healthy children in general may harbor ETV6-RUNX1 + cells without developing ALL, the exact prevalence of such an event has yet to be determined.

HeH ALL cases frequently have detectable clonotypic IGH rearrangements in neonatal blood spots (17 of 29) and the hyperdiploidy in itself can also arise prenatally [38]. Importantly, a hyperdiploid clone has been found in a healthy twin sibling of a child with HeH ALL [59]. Recently, a whole-genome sequencing approach has further supported the notion that gross chromosomal gains occur early in life as the sentinel event and that additional postnatal events are also necessary for leukemia development [60].

Clonal development of T-ALL is much less understood and leukemic genomic aberration is rarely detected at birth in children with this ALL subtype, suggesting an entirely different etiology compared to B-ALL.

In summary, rMLL, ETV6-RUNX1 +, and HeH ALL show the most convincing evidence of prenatal initiation, while other subtypes such as T-ALL, BCR-ABL and TCF3-PBX1 are less frequently or never prenatally initiated.

1.4 Environmental Risk Factors

1.4.1 Infectious Disease and Immune Stimulation

It has long been hypothesized that infectious disease plays a role in the development of ALL. In 1988, Leo Kinlen postulated that mixing of previously isolated populations could cause epidemics of an unidentified pathogen to which leukemia was a rare response [61]. This hypothesis was based on observed spatial and temporal clustering of ALL cases, which occurs at an exceedingly rare frequency [62]. The same year, Mel Greaves suggested that children with little early-life immune stimulation can develop leukemia as an aberrant response to a delayed exposure to common infections [63]. This ‘delayed-infection hypothesis’, which in many ways is similar to the ‘hygiene hypothesis’ concerning allergies and atopic disease, is particularly relevant to ALL risk in the 2–5 year age peak [6467]. In these cases, the prenatal formation of a preleukemic clone may constitute a commonly occurring ‘first hit’, and an aberrant immune response due to delayed immune maturation and subsequent uncontrolled proliferative stress on exposure to a common childhood infection occur subsequently will in rare cases cause a second hit and initiate malignant transformation [64, 65].

A substantial body of evidence has been gathered in support of an association between infections and ALL risk. Since the actual number of childhood infections is difficult to measure, proxy measures such as daycare-attendance (children in daycare are more exposed to common infections early in life) are typically examined [68]. A meta-analysis from 2010 by Urayama et al. included 14 studies and a total of 6108 cases and found a significantly reduced risk of ALL among children in daycare (OR = 0.76; 95% CI: 0.67–0.87) [69]. A recent study confirmed this finding and furthermore indicated that the protective effect of daycare is even stronger with earlier start of attendance [70]. Another measure of early immune stimulation is breastfeeding, for which two meta-analyses consistently found an association with a reduced risk of ALL; subsequently a large case-control study with 7,399 ALL cases and 11,181 controls also reported an OR of 0.86 (95% CI: 0.79–0.94) for children breastfed for 6 months or more [7072]. Other proxies for immune stimulation include birth order and vaccinations, but epidemiological findings on these exposures are inconsistent [70, 7379]. More direct attempts at measuring actual number of infections during early childhood have included patient registries [8084], questionnaires [70, 85, 86], and interviews [8789]. Generally, studies using parentally reported measures found an inverse or no association between infections and ALL risk, while the patient registry-based methods, which have the strength of eliminating recall bias, found either positive or null associations. Interpreting data from these studies is difficult for a number of reasons, most notably the heterogeneity of exposure definitions and the timing of infections in relation to ALL diagnosis. According to the delayed infection hypothesis, children prone to ALL-development should have fewer infections in early life and subsequently start developing aberrant responses to common infections, most likely resulting in symptomatic infectious disease. However, in the months leading up to ALL diagnosis the disease itself also becomes a risk factor for infections, and thus the expected direction of causality between infection and leukemia becomes difficult to identify in such epidemiologic studies [84].

Recent molecular studies have shed new lights on the role of infection in ALL development. Whole-genome sequencing of ETV6-RUNX1 ALL cells revealed that most of the somatic deletions commonly seen in this subtype are mediated by the RAG enzymes, the main function of which is V(D)J recombination in normal pre-B cells [90], potentially as a result of infection-related hyperactivation of RAG. Subsequently, Swaminathan et al. showed that premature activation of the AID enzyme (which normally mediates somatic hypermutation and class-switch recombination in mature B-cells) resulting in inappropriate, synchronous activation of AID and RAG increases genetic instability in pre-B cells, especially those with the ETV6-RUNX1 fusion [91]. The authors furthermore showed that while infectious stimuli (mimicked by lipopolysaccharide) could induce leukemic transformation of ETV6-RUNX1 + cells, this development was delayed or prevented in mice without functional AID or RAG, respectively. Another example highlighting a molecular mechanism involved in infection-mediated ALL development is PAX5, a gene commonly mutated in B-ALL. A recent study showed that PAX5 heterozygous mice were prone to develop ALL, but only if they were exposed to common infections [92]. It is important to note that these molecular studies show that infections are likely involved in ALL development, but provide no direct evidence of how early vs. late infection alters the risk of ALL during childhood.

1.4.2 Other Risk Factors

Despite a large number of epidemiological studies and meta-analyses, most findings regarding proposed environmental risk factors remain inconclusive. The only confirmed association is high birth weight, although the underlying mechanism is unknown [93]. Other factors such as ionizing radiation, electromagnetic fields and maternal smoking during pregnancy remain uncertain (Table 1.2). A common limitation is that the vast majority of these studies address ALL as a single disease entity and thus may have missed associations with specific ALL subtypes.

Table 1.2 Non-infectious environmental risk factors

1.5 Heritability of ALL

Studies addressing the risk of leukemia among offspring of childhood leukemia survivors have been hampered by small sample sizes [123126]. More reliable estimates of ALL heritability come from studies on risk in siblings of affected children. These studies have two important limitations: first, because of preleukemic cells’ ability to spread in utero, twins with leukemia need to be excluded before estimating disease heritability, and secondly it is difficult to distinguish genetic effects from shared environmental risk factors between siblings. A recent Nordic population-based study reported a standardized incidence ratio (SIR) of 3.2 for ALL risk among siblings [127]. Furthermore, one study investigating 54 sibships with two or more cases of ALL found an unexpectedly high subtype concordance, pointing to a genetic basis of ALL etiology [128]. On the basis of genome-wide SNP data, it was estimated that inherited genetic polymorphisms account for at least 24% (95% CI: 6–42%) of variation in ALL risk [129]. In conclusion, these reports provide evidence for a genetic component in disease susceptibility, although reliable quantitative estimates of genetic contribution to ALL risk are not available.

1.6 High-Penetrance Genetic Predisposition

Out of more than 125 known cancer predisposition genes (CPGs), only 27 genes (associated with 9 rare syndromes and two non-heritable congenital disorders) are convincingly linked to childhood ALL (Table 1.3 and Fig. 1.2) [130, 131].

Table 1.3 ALL syndromes
Fig. 1.2
figure 2

Effect sizes and frequencies for known ALL genetic risk factors. Syndrome risks and frequencies are based on best available evidence as described in Table 1.3; SNP odds ratios are based on references in Table 1.4, and SNP risk allele frequencies are based on worldwide populations from the 1000 Genomes Project. CMMRD, constitutional mismatch repair-deficiency; AT ataxia-telangiectasia, LFS Li-Fraumeni syndrome, DS Down syndrome, FA Fanconi anemia, NF neurofibromatosis type 1

In a 2015 a registry study of 4939 childhood ALL cases, only 29 subjects were diagnosed with non-Down syndrome (DS) predisposition syndromes (0.6%) [161]. However, a recent comprehensive study of whole genome or whole exome sequencing in 588 non-DS childhood leukemia cases found germline mutations in known CPGs in 26 cases (4.4%) [161, 162]. This suggests that high-penetrance Mendelian genetics, discussed in detail below, may play a larger role in ALL etiology than previously appreciated.

1.6.1 Syndromes Where ALL Is a Dominant Cancer Phenotype

DS is one of the most common congenital abnormalities (1 in 691 live births) and also the most recognizable ALL-predisposition syndrome [140, 163]. ALL and AML risk is significantly increased, with SIR before 30 years of 24.4 and 20.3, respectively. Interestingly, individuals with DS have significantly lower incidence of solid cancers than the background population [142, 164]. DS-associated ALL is more likely to have somatic rearrangements involving the CRLF2 gene and almost always has B-cell immunophenotype. DS patients represent the only known group where ALL is the most common malignancy at any age. Taken together DS-ALL constitutes 2–3% of ALL [131, 165].

While the driver of leukemogenesis remains uncertain for DS it is likely that chromosome 21 is involved, as an acquired extra copy of chromosome 21 is also seen in hyperdiploid ALL and the intracromosomal amplification of chromosome 21 seen in the iAMP21-ALL subtype [139].

In fact, iAMP21-ALL has recently been found to be more frequent in individuals with the germline translocation rob(15;21)(q10;q10)c, a rare constitutional genetic abnormality. Amplification of the genes involved in the translocation duplicates the entire abnormal chromosome and confers an estimated 2,700-fold increased risk of iAMP21-ALL [138]. However, considering the rarity of both iAMP21-ALL and rob(15;21)c, <1 in 1,500 ALL cases are likely to be related to rob(15;21)c associated.

PAX5 is known to be somatically mutated or deleted in approximately 30% of B-ALL cases [166]. In 2013, one germline PAX5 mutation was found in three kindreds of familial ALL [159, 160]. The 3 families had 18 documented and 3 obligate mutation carriers with 11 cases of B-ALL, with another 2 ALLs in untested children. These PAX5 mutations may be exclusively related to ALL risk, but further study is warranted.

ETV6, like PAX5, is known to be recurrently mutated or translocated in leukemic cells [166, 167]. In 2015, three studies independently reported nine families with ETV6 germline mutations, all having a dominantly heritable thrombocytopenia and high incidence of ALL among mutation carriers [143, 144, 146]. Collectively, 35 documented and 4 obligate carriers have developed a total of 14 leukemias (mostly ALL), with another 2 occurring in untested children. One systematic sequencing study targeting germline ETV6 in 4,405 ALL cases, identified 31 ETV6 variants potentially related to 35 ALL cases, with carriers found to be significantly older than non-carriers (mean age 10.2 vs. 4.7) [144]. Thus, ETV6 mutations may be present in nearly 1% of all ALL cases, and perhaps higher in patients over 5 years of age.

1.6.2 Syndromes Where ALL Is Part of a Mixed Cancer Phenotype

Li-Fraumeni Syndrome (LFS) is a rare cancer predisposition syndrome, in which germline TP53 mutation confers a ~90% lifetime risk of developing cancer in a spectrum of tissues with one third being diagnosed before 18 years of age. The increased ALL risk is largely restricted to cases with low hypodiploid leukemia karyotype (underlying TP53 mutation present in 43.3% of low hypodiploid ALL) [168].

Ataxia-telangiectasia (A-T) is a rare syndrome caused by recessive mutations in the ATM gene and typically presents with progressive cerebellar ataxia before 4 years of age [134]. A-T patients have a high risk of leukemias (especially T-cell ALL) and lymphomas, as well as hypersensitivity to ionizing radiation and chemotherapy related to the role of ATM in DNA repair [132, 134].

Bloom Syndrome is characterized by pre- and postnatal growth deficiency (stature typically <1.5 m), skin lesions and high risk of ALL, AML, lymphoma, and epithelial carcinomas [135]. Twelve ALLs were found in less than the 300 cases registered world-wide and in at least two cases ALL preceded Bloom Syndrome diagnosis [169, 170].

Nijmegen Breakage Syndrome (NBS) is another very rare recessive syndrome, which mainly occurs in Slavic populations [157] (a Slavic founder deletion of five bases in the NBN gene is found in >90% of NBS cases), yet NBS has also been described in >8 other countries with private mutations [157, 158, 171, 172]. Patients display microcephaly, intrauterine growth retardation with short stature, recurrent sinopulmonary infections and increased risk of cancers, especially lymphoma and leukemia [157, 173].

Fanconi Anemia (FA) is a rare recessive syndrome with a high risk of AML, MDS and other hematological diseases set at ~10%/year [147, 174]. In a registry with 1300 FA patients only 7 ALLs were reported and FA-leukemias are predominantly myeloid (96%) [147, 148]. While skeletal deformations and classic hematological findings often lead to diagnosis early in life, malignancies including ALL can be the presenting feature [175, 176].

There is a long string of genetic syndromes for which sporadic reports described ALL as a possible cancer manifestation, although the matter has not been systematically examined. The most common are RASopathies (e.g. NF1) [177179], where 6 ALLs were seen among 1176 mutation carriers in 1 study [180]. Others include: Bruton’s Agammaglobulinemia [181], Familial Platelet Disorder with Associated Myeloid Malignancies [182, 183], Weaver syndrome [184], Sotos syndrome [185], Rubinstein-Taybi syndrome [186], Börjeson-Forssman-Lehmann Syndrome [187] and SH2B3 deficiency [188].

It should be noted that ALL predisposition syndrome may not be symptomatic prior to leukemia diagnosis with only non-specific clinical features such as growth failure and microcephaly. Family history needs to be carefully examined to identify possible underlying genetic causes in a pediatric oncology setting.

1.7 Low-Penetrance Genetic Predisposition

Emerging from the ‘common disease—common variant’ hypothesis, the past two decades have seen the application of first candidate gene-driven and later genome-wide association studies in ALL etiology research [189, 190].

Single nucleotide polymorphism (SNP)-based candidate gene studies (CGSs) have explored ALL etiology by focusing on genes involved in carcinogen metabolism, folate metabolism, and DNA repair pathways. A 2010 systematic review identified 47 CGSs on 25 variations in 16 genes all tested for association with ALL, showing pooled significance (P < 0.05) in only 8 variants (OR range; 0.73–1.78) with an apparent false positive report probabilities of at least 20% [191]. Other studies have focused on human leukocyte antigen (HLA) genes, particularly class II loci HLA-DR and HLA-DP, with the latter showing evidence of significantly different associations between ALL subtypes as well as interactions with proxies for immune stimulation [192, 193]. However, a larger study has cast doubt on the validity of these findings [194].

2009 saw the first two genome-wide association studies (GWAS) independently demonstrating associations between ALL susceptibility and SNPs in ARID5, IKZF1 and CEBPE [195, 196]. Subsequently, SNPs in four other genes have been found to be associated with either overall ALL risk or subtype-specific risk, with a total of 13 SNPs in 6 genes having been widely validated thus far (Table 1.4) [197201].

Table 1.4 GWAS results

The heterogeneity of ALL is reflected in the GWAS findings, with some SNPs showing a stronger association with specific subtypes. ARID5B, for instance, is most strongly associated with HeH ALL. SNPs in TP63 and GATA3, on the other hand, show isolated associations with ETV6-RUNX1 ALL and Ph-like ALL, respectively [196, 199, 201, 204, 205].

While there is little doubt that the GWAS findings identified genuine inherited risk factors for ALL, there is a paucity of studies describing the molecular mechanisms underlying these associations. Somatic deletions in both CDKN2A and IKZF are frequent in ALL, and these two genes play important roles in tumor suppression and lymphocyte development, respectively [200, 206]. In one recent study, 35 tumors from CDKN2A risk variant rs3731217 carriers preferentially retained the risk allele, suggesting that the SNP is advantageous during tumor growth [204]. ARID5B is also involved in lymphocyte differentiation, but its mechanism in ALL development is poorly understood.

Within the validated risk variants, no significant gene–gene interactions have been reported [195, 196, 203]. The effects of these risk alleles are relatively stable across ethnicities, and risk allele frequencies correlate well with population differences in ALL incidence [197]. One pathway-based GWAS on ALL risk was recently described but these results have yet to be reproduced [207]. Inspired by the observations that ALL subtypes differ across both environmental and genetic risk factors, other researchers have attempted to identify interactions between the two by combining genotypes with data on various environmental exposures [208211]. These studies, however, have so far failed to reliably identify gene-environment interactions.

Studies on childhood ALL etiology will improve knowledge of the pathogenesis, predict disease risk, and provide new targets for treatment.

The low-penetrance genetic predispositions discussed above (e.g., risk alleles identified by GWAS) constitute a minor increase in the absolute risk of developing ALL, e.g. from 1 in 2,000 to 1 in 1,500. While the effects of these variants individually are modest with limited clinical implication, their cumulative impact can be comparable to those of the highly penetrant genetic predisposition syndromes. However, it is debatable whether early diagnosis of an aggressive cancer like ALL can lead to improved outcome [212]. Hence, clinical surveillance aimed at early diagnosis of ALL may not necessarily benefit at-risk subjects and may in fact lead to uncertainty and anxiety for the families [213].

Still, many of the genetic syndromes discussed above may modify health conditions other than the risk of developing ALL. Preemptive surveillance for non-ALL cancers (e.g. TP53 carriers) and/or treatment modification (e.g. avoidance of radiation therapy in cases with A-T) can lead to lower mortality and morbidity for the children and their at-risk family members [214218]. For this reason, recognition and diagnosis of predisposition syndromes in pediatric oncology is crucial. In fact, it has been suggested that pediatric cancer patients under the age of five should be evaluated for A-T before starting chemotherapy and/or radiotherapy because of potentially fatal adverse effects of conventional doses due to defective DNA repair in these cases [134, 219].

1.8 Future Directions

While substantial progress has been made in identifying risk factors for ALL (especially the role of inherited genetic variants), our understanding of ALL disease etiology is far from complete. An important field of research in the coming years will be to identify gene-gene and gene-environment interactions that contribute to ALL leukemogenesis, and whether approaches can be developed to target these processes and reduce disease risk and burden in genetically predisposal children.