Introduction

Mendelian diseases, also known as monogenic diseases, are considered to be rare individually, but collectively occur at a rate of 40–82 per 1000 live births, with an estimated 7.9 million children being born annually with a serious birth defect of genetic or partially genetic origin [1, 2]. Mendelian diseases are disorders caused by variations at a single genetic locus and include diseases such as tuberous sclerosis complex and cystic fibrosis, among others. These variations may be inherited or occur de novo. Inherited mutations are present in all the cells of the affected individual and are present in one or both the parents. De novo mutations refer to mutations that are present only in the proband but are absent in the parents. Typically, these de novo mutations occur for the first time in a parent germ cell prior to fertilization and are inherited by all cells of the offspring. Somatic mutations, a sub-type of de novo mutations, are post-zygotic mutational events that lead to an individual having two or more populations of cells with distinct genotypes, despite developing from a single fertilized egg [3••, 4••].

Somatic mutations lead to an individual being mosaic, with only a subset of cells harboring the mutation. These mutations are absent in the parents, and depending on the timing of mutation with respect to embryonic development, may affect cells across all cell types or may be present in only a few specific cell types [3••, 5, 6•]. While the role of somatic mutation in cancer cells is well established [7], the appreciation of an analogous role for somatic mutations that occur randomly during the normal mitotic cell divisions of embryonic development, and hence present in clones of cells in one or more tissues of the body, has been relatively recent.

With advances in genomic technologies, there has been an expansion in our knowledge of types of tissues in which somatic mutations have been detected as well as the kind of mutations being detected. Somatic mutations could reflect point mutations (also known as single nucleotide variants or SNVs), insertions/deletions (indels), copy number variants (CNVs) (including chromosomal aneuploidy), short tandem repeats (STRs), and transposable element variants. Here we will review the recent literature about the emerging role of somatic mutations in Mendelian disease.

Embryology: Development and Distribution of Somatic Mutations

As mosaicism is caused by mitotic error post fertilization, we will briefly discuss the events of early fetal development as this is essential to interpreting and understanding mosaicism (Fig. 1). Post fertilization, the zygote undergoes successive mitoses to produce a ball of cells, which then produces the blastocyst. The outermost layer of the blastocyst produces the placental tissue and the inner cell mass gives rise to the embryo. Initially, the embryo consists of two different cell layers—epiblast and hypoblast. The epiblast differentiates into the ectoderm, mesoderm, and endoderm, while hypoblast gives rise to the amniotic cavity. Structures of ectodermal origin include the central nervous system, facial structures, and the extremities, including skin. The mesoderm differentiates into the cardiac, abdominal, and urogenital structures and hematopoietic cells. The endoderm gives rise to epithelial tissues, including the lining of the urinary tract and lung [8].

Fig. 1
figure 1

Early human fetal development from zygote (day 0) to the formation of embryonic mesoderm, endoderm, and ectoderm (day 16). The mesoderm differentiates into the cardiac, abdominal, and urogenital structures and hematopoietic cells. The endoderm gives rise to epithelial tissues, including the lining of the urinary tract and lung. The ectoderm differentiates into the central nervous system, facial structures, and the extremities, including skin. Timing of the mutational event determines the distribution of the somatic mutation across different tissues

Somatic mutations that develop in the early post-zygotic stage will be distributed across different tissue types. The mutations will likely be detected by analysis of more accessible tissue, such as the peripheral blood leukocytes. On the other hand, some mutations develop later in fetal development and hence, direct examination of the affected tissue would provide the best chance of detecting the mutation. In these instances, analysis of peripheral blood leukocytes would be unrevealing.

Somatic Mutations in Mendelian Diseases

“Single Hit” Somatic Mutations

For Mendelian diseases that manifest in the haploinsufficient state, damaging protein-altering mutations in one allele (single hit) can lead to disease. When these mutations develop post fertilization, it leads to somatic mosaicism. Some of the de novo somatic mutations are referred to as ‘obligatory’ somatic mutations, as they are lethal when present in the germline and have only been described in the mosaic state. These disorders tend to be sporadic with no evidence of familial clustering. These obligatory somatic mutations have traditionally been described in cancers, where the somatic mutation confers growth advantage to the tumor tissue leading to uncontrolled proliferation. A tumor tissue may have thousands of mutations—only a subset of these, referred to as “driver” mutations, have been causally implicated in oncogenesis, while the remainder, referred to as “passenger” mutations, do not directly contribute to oncogenesis [9, 10]. On the other hand, ‘obligatory’ somatic mutations in Mendelian disorders typically involve only one genetic locus and include disorders such as McCune Albright syndrome (caused by activating mutations in GNAS) [11], Proteus syndrome (caused by activating mutations in AKT1) [12], and Sturge-Weber syndrome (caused by mutations in GNAQ) [13].

Somatic mutations have also been observed in a fraction of patients with traditional Mendelian diseases such as tuberous sclerosis complex (caused by mutations in TSC1/TSC2) [14], incontinentia pigmenti (caused by mutations in NEMO) [15], double cortex (caused by mutations in DCX, LIS1) [16, 17], and periventricular nodular heterotopia (caused by mutations in FLNA) [18, 19]. Similarly, a case of sporadic, early onset Alzheimer's disease has been attributed to a somatic mosaic presenilin-1 mutation present in the brain [20] and another case of sporadic Creutzfeldt–Jakob disease was caused by an early embryonic somatic mutation identified by the presence of three alleles for the gene encoding the major prion protein PrP [21]. Somatic mutations in alpha-synuclein have been reported in association with Parkinson's disease [22]. These individuals have a milder phenotype compared to the individuals with germline mutations in the same genes [6•, 18].

“Second Hit” Mutations Produce Mosaicism

In several dominantly inherited conditions, an individual exhibits a heterozygous germline variant, present in all cells, with a somatic second mutation leading to overgrowth of specific tissues, as per the “two hit” model of Knudson [23]. This “second hit” could account for the variable expressivity in several of the dominantly inherited disorders. One of the classic examples of this form of second hit somatic mutation is neurofibromatosis type 1 (NF1), which is characterized by germline mutations in the gene NF1. These neurofibromas have been reported to develop when a second mutation develops in the other NF1 allele [24]. Similarly, in tuberous sclerosis complex (TSC), somatic second mutations have been shown in non-nervous system tumors of TSC, subependymal giant cell astrocytomas as well as non-cancerous cortical tubers in patients with TSC [14, 25].

Diseases Modulated by Somatic Mutations

In addition to directly causing disease, somatic mutations have been shown to modify the disease process in some cases. Some neurodegenerative diseases, including Huntington disease, dentatorubral-pallidoluysian atrophy, and Fragile X syndrome, are caused by inheritance of microsatellite repeats that are highly unstable and can exhibit marked somatic heterogeneity in repeat lengths across brain regions and tissues of affected individual [2628]. DEPDC5 is a well-recognized cause of autosomal dominant focal epilepsy and imaging of individuals with mutations in DEPDC5 is typically normal. Recently, Scheffer and colleagues described a series of individuals with DEPDC5-related focal epilepsy who also had a range of brain malformations [29]. The presence of brain malformations in individuals with DEPDC5-related epilepsy could be partially explained by the “second hit” phenomenon with a somatic mutation in the other allele of DEPDC5 or another gene in the related pathway causing the malformations. Direct analysis of the brain tissues of these individuals could provide support for this hypothesis [30].

Gonadal Mosaicism—Mutation in the Parental Germline Cells

While somatic mutations are typically post-fertilization de novo events, in some instances, the de novo event occurs in the parent gonadal tissue after the germline cells have separated from the somatic cells during their embryological development. This leads to a mutation that is confined to the germline cells and may be present in only some of the germline cells. This is referred to as germline or gonadal mosaicism. In this scenario, although the parent is healthy (as his/her somatic cells are unaffected), he or she is at risk of having another affected child with the same disorder. Recurrence risk estimates due to gonadal mosaicism vary across different diseases and are typically around 2–5 % [31, 32], but may be higher in some conditions (such as Duchenne muscular dystrophy) [33, 34].

Types of Somatic Variants

Large-Scale Chromosomal Abnormalities

Chromosomal alterations such as whole chromosome aneuploidy, segmental aneuploidy, and structural alterations have been historically identified by cytogenetic analyses (Table 1). Chromosomal abnormalities are a fairly common cause of developmental disorders and affect 1 in 200 liveborn individuals and up to 50 % of spontaneous miscarriages [4••].

Table 1 List of large chromosomal aberrations associated with somatic mosaicism

Aneuploidy of chromosomes 13, 18, 21 and sex chromosomes accounts for nearly all the aneuploidy-related live births. Sex chromosome mosaicism usually arises from post-zygotic mitotic non-disjunction. Aneuploidy of other chromosomes is usually lethal to the developing fetus and is only observed in the mosaic state for some of the chromosomes (including 1, 2, 5, 8, 9, 16, 17, and 22) [3••]. These individuals have a variable phenotype as the severity of the disorder is determined by the particular chromosome involved as well as by the proportion of cells in the body carrying the aneuploidy. For example, mosaic trisomy 21, affecting 1 % of the cases of Down syndrome, leads to a less severe phenotype compared to germline trisomy 21 [3••, 4••].

Mosaicism for structural abnormalities is less common than aneuploidy, but has been identified in the form of translocations, deletions, duplications, inversions, ring chromosomes, and isochromosomes [3537]. Isochromosomes are structurally abnormal chromosomes created by the presence of two copies of one of the arms of a chromosome, while the other arm is missing. Four well-recognized isochromosome syndromes include Pallister–Killian syndrome (isochromosome 12p), cat-eye syndrome (isochromosome 22q), isochromosome 15q11, and isochromosome 18p. As the level of mosaicism in different tissue varies across individuals, all have been associated with variable degree of developmental delay and intellectual disability [4••].

Copy Number Variants

Recent advances in genomic tools such as microarray analysis have improved the resolution of cytogenomic imbalances to be detected at a submicroscopic level and, in some instances, to individual exons. These aberrations are referred to as copy number variants (CNVs). Germline CNVs have been reported in 10–15 % of patients with developmental delay and/or autism spectrum disorder [38], while mosaic CNVs have been reported in 0.5–3.74 % of patients with congenital and developmental anomalies [39•, 40, 41]. Mosaic CNV involving chromosome 1q has been well described in individuals with hemimegalencephaly [42, 43] and more recently in focal cortical dysplasia [44, 45]. Copy number variation has been noted across different tissues from the same individual [46] and mosaic CNVs have been observed in monozygotic twins with discordant phenotypes [47].

Single Nucleotide Variants

Mutations at the level of the nucleotide, either base substitutions or small insertions or deletions (indels), in the genomic DNA may lead to abnormal mRNA and consequent abnormal protein production. Single nucleotide variants (SNVs) can either lead to loss of function (missense or splicing variants), gain of function (missense), or absence of protein (nonsense, frameshift, canonical splicing variants). Mosaic SNVs have been reported in a range of disorders (Table 2), including those associated with brain malformations [1619, 43, 48, 49], Proteus syndrome [12], CLOVES syndrome [50], Sturge–Weber syndrome [13], and verrucous venous malformation [51], among others. Mosaic SNVs in the AKT-PIK3-mTOR pathway result in overgrowth and have been hypothesized to be lethal when constitutional as they disrupt early embryonic development. On the other hand, the mosaic variants in DCX, LIS1, TSC1, FLNA, etc. present as milder phenotypes compared to when they present as germline [3••]. Mosaicism has also been described in disorders related to developmental delay and/or epilepsy syndromes such as Pitt–Hopkins syndrome (caused by mutations in TCF4) [ 52], Rett syndrome (caused by mutations in MeCP2) [53], Cornelia de Lange syndrome (caused by mutations in NIPBL, SMC1A, SMC3) [54, 55], X-linked Charcot–Marie–Tooth type 1 (caused by mutations in GJB1) [56], FOXG1 syndrome [57], benign familial neonatal seizures (KCNQ2) [58], and Temple-Baraitser syndrome (KCNH1) [59]. More recently, mosaicism has been described in X-linked Alport syndrome(COL4A5) [60], a form of nephropathy, where individuals with the mosaic variant were either asymptomatic or very mildly affected.

Table 2 List of genes where single nucleotide variants have been associated with somatic mosaicism

Recurrence Risk and Genetic Counseling

Management of individuals with Mendelian diseases is restricted to anticipation of potential complications with limited ability to intervene to alter the natural history of the disease. Hence, recurrence risk counseling is the most effective intervention in clinical genetics. Detection of somatic mutation as a cause of disease in an individual suggests a post-fertilization event and hence, the risk of recurrence in the siblings is low and similar to the general population. The risk to the offspring of the proband depends on the timing of the somatic mutation. If the mutation developed prior to separation of germinal cells and the somatic cells, then there is a 50 % risk to the proband of having an affected child. For conditions which are lethal when constitutional, the pregnancy will be non-viable if the fetus inherits the mutated allele. On the other hand, if the mutation developed after the germinal cells had separated and is confined to the somatic cells, then there is no increased risk to the offspring.

Detection of Somatic Variants

The tools required to detect somatic variants depend on the type of variant (Table 3).

Table 3 Tools to detect somatic variants

Copy Number Variants

Karyotype and Fluorescence In Situ Hybridization

Cytogenetic analysis is the study of banded pattern of the chromosomes during metaphase of the cell cycle. It is carried out on a cell-by-cell basis and hence, mosaicism is easily recognized when only a few cells carry the chromosomal abnormality. The cytogenetic detection of low-level mosaicism is a challenge as adequate number of cells must be counted. This analysis can detect large-scale chromosomal abnormalities, most commonly, aneuploidy. However, the limit of detection of chromosomal aberration is around 5 Mb and CNVs below 5 Mb will be missed on routine cytogenetic analysis [4••].

Fluorescence in situ hybridization (FISH), on the other hand, utilizes tagged probes that bind to specific chromosome of interest and allows for detection of numeric and structural abnormalities, including submicroscopic (below 5 Mb) copy number changes. An advantage of FISH over chromosome analysis is that FISH does not require actively dividing cells and hence, can be performed in both the interphase and metaphase of the cell cycle. However, hybridization-related artifacts are common, and the high rates of mosaic aneuploidy reported with the use of karyotyping and FISH [61] have not been replicated in recent single-cell studies [42, 62••].

Chromosome Microarray Analysis

Microarray-based techniques have replaced conventional cytogenetic analysis in many instances as they are able to detect submicroscopic genomic imbalances or CNVs and, unlike FISH, allow for interrogation of CNVs across the entire genome. In addition, samples do not require culturing and cells in all cell cycle phases are analyzed and hence, the technique is less prone to artifacts [4••]. Types of microarray analyses include array comparative genomic hybridization (aCGH) which detects copy number aberrations, genome-wide single nucleotide polymorphism (SNP) arrays, which can analyze both SNPs and CNVs, and targeted high-density arrays focusing on a few known and candidate regions. aCGH is able to detect somatic CNVs when variant cells constitute >10 % of the total cell population [40]. SNP arrays are much more sensitive for mosaicism detection and mosaicism involving <5 % of cells has been detected using these arrays [41, 63, 64]. As SNP arrays are able to analyze the zygosity of the SNPs, they aid in understanding the genetic mechanism by which mosaicism has occurred. Targeted high-density arrays allow for greater precision in detection and quantification of mosaicism [65•].

Single-Cell Copy Number Analyses

Single-cell analysis allows one to isolate single nuclei [66•] that can then be subjected to amplification followed either by microarray or low-coverage whole-genome sequencing for CNV analysis. Single-cell genomic analyses have demonstrated that somatic aneuploidy in adult neurons is less common than previously estimated, while somatic CNVs are not that rare [67].

Digital Droplet PCR

Digital droplet PCR (ddPCR) provides the ability to quantify nucleic acids with high precision and sensitivity [68]. This method has been used successfully to reproducibly and reliably detect the presence of trisomy 21 down to 2 % mosaicism [69]. ddPCR was used to confirm the presence of low-frequency mosaic CNVs in the induced pluripotent stem cells derived from fibroblasts, when standard techniques were unable to detect these CNVs [70].

Single Nucleotide Variants, Including Insertions and Deletions

Sanger Sequencing

Sanger sequencing is the most widely used method for detection of SNVs but is unable to detect mosaic alleles below a threshold of 15–20 % [71] and can miss a significant proportion of low-level mosaic mutations [6•]. In addition, mosaic mutations at higher allele fractions are miscalled as germline, highlighting the limitations of Sanger sequencing in detecting mosaicism on both ends of the spectrum [6•].

Next-Generation Sequencing

Next-generation sequencing (NGS) is a high-throughput technique that allows parallel analysis of the multiple regions of the genome [72, 73]. Whole exome and genome sequencing (WES/WGS) technologies have allowed an exponential increase in our understanding of human genetic disorders and have been reported to detect somatic mutations in rare instances [7476]. However, due to inhomogeneities in library preparation and an average depth of 40–80×, WES/WGS may miss somatic mutations, especially when the read depth is low. On the other hand, deep-targeted sequencing allows one to interrogate a smaller portion of the genome to much greater depths and has been used successfully to detect low-level mosaicism as low as 5 % [6•, 49]. In addition, as many as 30 % of the mutations associated with unexplained brain malformations are somatic, the majority of which would be missed on Sanger sequencing and even on routine WES [6•, 77]. An alternative to performing deep sequencing is bioinformatic alteration of exome sequencing pipelines or performing next-generation sequencing on paired samples (affected and unaffected tissue), both of which have been used successfully to detect somatic mutations in cancerous [7] as well as non-cancerous disorders [12, 48].

Subcloning Followed by Sanger Sequencing

Subcloning of the amplified PCR products into a vector followed by transformation into a bacterium such as E. coli leads to formation of multiple colonies of the bacteria, each containing either the wild type or the mutant allele. Sanger sequencing of multiple individual colonies not only allows for confirmation of the presence of the mosaic variant, but also allows for quantification of the level of mosaicism [6•, 43].

Single-Cell Sequencing

Isolation, genome amplification, and sequencing of single cells allows for quantification of the level of mosaicism as well as to determine which cell lineage is affected. In an individual with hemimegalencephaly, bulk tissue analysis showed somatic point mutation in AKT3 at ~35 % mosaicism, while single-cell sequencing revealed the mutation in 39 % of neuronal nuclei and 27 % of non-neuronal nuclei. This suggests that the mutational event occurred in an early neocortical progenitor cell [66•]. This was consistent with the radiological phenotype of the individual that showed involvement of both gray and white matter [43].

Mass Spectrometry

Using mass spectrometry, Sequenom MassARRAY iPLEX platform allows a high-throughput method of validating and quantifying somatic point mutations in a given sample [78, 79].

Conclusion

The role of somatic mutations in Mendelian disease is still under appreciated. Somatic mutations can cause diseases by ‘single hit’ mechanism, ‘two hit’ mechanism, and by disease process modulation. The mutations can be in large chromosomal regions, copy number variants or single nucleotide variants. The disease presentations are wide ranging but the phenotype tends to be less severe compared to when the mutation is constitutional. Newer genomic technologies such as single-cell sequencing and deep next-generation sequencing will allow systematic measurement of somatic mutation rates in different Mendelian diseases. This will allow us to estimate the prevalence of somatic mutations as a cause of Mendelian disorders, and to understand to what extent somatic mutations modify the pathogenesis of Mendelian diseases. With this knowledge, we can counsel families more precisely.