Introduction

Cancer is fundamentally a genetic disease, and mutations (pathogenic variants) are pivotal to its etiology and progression. Carcinogenesis develops by accumulation of numerous genetic and epigenetic abnormalities [1,2,3,4]. Therefore, cancer has the following characteristics: sustained proliferative signaling, evasion of growth suppressors, resistance cell death, replicative immortality, angiogenesis induction, and activation of invasion and metastasis [5]. Therefore, elucidation of its etiology and development of therapeutic measures is essential [6]. Although rare, hereditary (familial) cancer syndromes are observed in cancers derived from any organ. In individuals with hereditary cancer syndrome, the initial cancer-causing mutation is inherited through the germline and therefore, is already present in every cell of the body. Lynch syndrome (MIM# 120435) is a highly penetrant autosomal-dominant syndrome characterized by several individuals in the family affected with colorectal cancer (CRC) or extracolonic tumors of the endometrium, stomach, small bowel, ureter, renal pelvis, ovary, and hepatobiliary tract [7]. Lynch syndrome occurs due to loss-of-function of the mismatch repair mechanism for genomic replication errors. This article outlines the basis of molecular genetics involved in Lynch syndrome.

DNA repair system

Large numbers of cell division are required to produce an individual with an estimated 37 trillion cells from a single-cell zygote. The frequency of replication errors is 10−10 per base of DNA per cell division, and in an estimated 1015 cell divisions during an individual’s lifetime replication errors cause thousands of new DNA mutations in the genome in every cell. Eukaryotes possess multiple repair systems to avoid replication errors (Table 1). Protecting integrity through genome repair prevents cancer development and progression by genomic abnormalities. Genes encoding molecules involved in genome repair are referred to as DNA repair genes, and as “caretaker tumor suppressor genes”.

Table 1 DNA repair systems and predisposition to cancer

The mismatch repair system was recognized in 1961, with proposal that the correction of DNA base pair mismatches within recombination intermediates is the basis for gene conversion [8]. Elucidation of the mismatch repair system was followed by fundamental research based on Escherichia coli [9]. The methyl-directed pathway depends on the products of four E. coli mutator genes: mutH, mutL, mutS, and uvrD [10,11,12]. Inactivation of any of these genes increases the generation of mutations in the E. coli cell by 50- to 100-fold, indicating the importance of this pathway in mutation avoidance and genetic stability. The reduction in mutability afforded by the E. coli methyl-directed system has been attributed to its role in the strand-specific elimination of DNA errors (Table 2) [6, 13,14,15,16,17,18]. Research on the mismatch repair system has advanced extensively and has clarified its mechanism and role as an essential mechanism for maintaining genome integrity in organisms and involved in predisposition to cancer development.

Table 2 DNA repair system for replication errors in Escherichia coli

Genes responsible for Lynch syndrome

Lynch syndrome (alias: hereditary nonpolyposis colorectal cancer—HNPCC) is an autosomal-dominant inherited disorder caused by germline mutations in DNA mismatch repair (MMR) genes. Patients with Lynch syndrome are at an increased risk of developing tumors from a young age and throughout their lifetime. Most of them suffer from multiple synchronous and/or metachronous primary tumors. Colorectal cancer and endometrial cancer (female) are well known in the tumor spectrum of Lynch syndrome. In addition, patients with Lynch syndrome have high potential for developing cancer of the urinary tract, the stomach, the small intestine, the biliary tract, the skin, the brain, and others.

Many human mismatch repair (MMR) proteins are known, and several encoding genes have been isolated so far. Currently, four types of MMR genes, MLH1 (MIM# 120436), MSH2 (MIM# 609309), MSH6 (MIM# 600678), and PMS2 (MIM# 600259), are used in the clinic applications related to Lynch syndrome. An outline of the responsible genes is shown in Table 3 and Fig. 1. The EPCAM, which encodes a cell adhesion molecule, is not an MMR gene. However, structural abnormality in EPCAM may cause Lynch syndrome, because it is adjacent to the MSH2 gene [19].

Table 3 Mismatch repair genes
Fig. 1
figure 1

The genes responsible for Lynch syndrome

In 1993, two research groups independently isolated MSH2, a human mismatch repair gene that is highly homologous to the mutator phenotype gene, mutS of E. coli [20, 21]. Genomic MSH2 covers approximately 73 kb and contains 16 exons and is mapped to chromosome 2p22-p21 [22, 23]. In 1994, as the second responsible gene of Lynch syndrome, MLH1, the E. coli mutL homologue, was isolated from 3p22.2 according to the mapping in the previous year [24, 25]. Human MLH1 consists of 19 coding exons spanning approximately 100 kb and is highly conserved in especially in exons 1–7 [26]. In 1995, mismatch binding factors were found as the 100 kDa MSH2 or as heterodimers of the 160 kDa polypeptide called GTBP (for G/T binding protein). Using sequence analysis, GTBP was recognized as a new member of the MutS homologue [27, 28]. MSH6 (GTBP) was first reported by Japanese researchers as a gene responsible for Lynch syndrome [29, 30]. In 1994, a germline deletion of the PMS2 was also identified in families with Lynch syndrome. Moreover, additional deletions in tumor samples with microsatellite instability (MSI)-high showed the presence of two-hits [31], indicating that there are pseudogenes corresponding to the PMS2, and that careful consideration is required for genetic testing [31, 32].

Structure and function of MMR proteins

Each MMR protein encoded by the corresponding MMR gene has a unique function in repairing replication errors. Therefore, MMR proteins possess unique functional domains. When mutations of MMR genes occur in the DNA site corresponding to the functional domain, DNA repair function may be impaired. Schematic representations of MLH1, MSH2, MSH6, and PMS2 proteins are shown in Fig. 2 [33,34,35,36,37]. Both MLH1 and PMS2 have an ATP binding domain and require ATP molecules for the endonuclease function.

Fig. 2
figure 2

Structure of mismatch repair proteins: a MLH1, b MSH2, c MSH6, d PMS2

Many human MMR-related proteins have been identified as homologues of E. coli MMR proteins (Table 4) [21,22,23,24,25,26,27,28, 38,39,40,41,42,43,44,45,46,47,48]. These include human homologues of MutS, MutL, ExoI, DNA polymerase δ (pol δ), proliferating cellular nuclear antigen (PCNA), replication factor (RFC), and DNA ligase I. Although, MutS and MutL proteins of E. coli form homodimers and perform DNA repair functions, functional heterodimer formation is necessary in humans. MSH2 heterodimerizes with MSH6 or MSH3 to form MutSα or MutSβ, respectively. These are involved in the mismatch-pair recognition and initiation of repair [49,50,51,52,53]. In particular, MutSβ recognizes the insertion/deletion loop. On the contrary, MLH1 heterodimerizes with PMS2, PMS1, or MLH3 to form MutLα, MutLβ, MutLγ, respectively [36, 37, 39, 50, 51, 53,54,55,56,57,58,59]. MutLα is a latent endonuclease, that forms a complex with MutS heterodimer, and breaks one chain of the heteroduplex DNA strand with mismatch pairs [57]. The DQHA(X)2E(X)4E motif of PMS2 is probably involved in this nick forming function. MutLβ is one of the endonucleases acting on single-strand breaks in DNA, but its specific function is still unclear. MutLγ is an endonuclease targeting single-strand breaks in supercoiled DNA and plays an important role in meiosis [60,61,62].

Table 4 Human MMR components

Mechanisms of mismatch repair

The mismatch repair (MMR) system consists of sequential steps for the recognition, removal, and re-synthesis of the mismatch site in DNA. This system that maintains DNA fidelity is well conserved from E. coli to eukaryotes. A schematic diagram of the pathway is shown in Fig. 3 [52, 57, 59, 61,62,63,64,65,66,67,68,69,70,71,72]. Base–base mismatches in double-strand DNA are recognized by MutSα (heterodimer of MSH2-MSH6). MutSα binds as a sliding clamp around the double-strand DNA. In this step, MSH2 requires ATP for sliding of the MutSα clamp on the double-strand DNA [73]. The ATP-activated state of MutSα can interact with MutLα (heterodimer of MLH1-PMS2 and forms a tetrameric complex) [74,75,76]. The tetrameric complex slides up and down the double-strand DNA and searches a single-strand DNA gap on the nascent (daughter) strand that recruits proliferating cell nuclear antigen (PCNA) and Replication factor C (RFC). MutLα can incise the nascent (daughter) strand upon activation by PCNA [57, 77]. Then, exonuclease 1 (Exo 1) is recruited and removes the nascent (daughter) strand around the error region. The re-synthesis step is accomplished by DNA polymerase (Polδ or Polε) and Ligase 1.

Fig. 3
figure 3

Mechanistic model of mismatch repair

MSH2 and MLH1 each have an ATPase domain whose product functions in a biological reaction by ATP-hydrolysis (Fig. 2). An ATP-hydrolysis reaction is necessary when MutSα recognizes a mismatch site or when MutLα forms a nick in the DNA strand [78,79,80,81]. Therefore, it is presumed that completion of the MMR pathway requires consumption of some energy.

Relationship between MMR system and DNA damages

Depending on the DNA damage pattern, specific mismatch repair molecules and complexes are involved (Fig. 4) [49, 63, 65, 82,83,84,85]. The MutSα (heterodimer of MSH2-MSH6) contributes to mismatch recognition by single nucleotide substitution (e.g., G:T mismatch pair) and recognition of small insertion–deletion loops (IDL, e.g., error of the repeat number in adenine clusters), whereas MutSβ (heterodimer of MSH2-MSH3) contributes to the repair of small loops and relatively large damages up to about 10 nucleotide loops. Recently, the function of MutSβ has attracted attention for its biological characteristics and as a prognostic factor of elevated microsatellite instability at selected tetranucleotide (EMAST) colorectal cancer, which shows instability in the repeat sequence of the tetranucleotides [86,87,88,89,90]. The clinical characteristics are presumed to involve the MSH3 deficiency state.

Fig. 4
figure 4

Schematic of DNA damage recognized by the mismatch repair pathway

MutL function mainly involves MutLα, a heterodimer of MLH1 and PMS2. However, MutLγ, a heterodimer of MLH1 and MLH3, is involved in repair in the case of instability greater than a trinucleotide repeat.

EPCAM as the gene responsible for Lynch syndrome

EPCAM is located at 2p21 adjacent to the MSH2 on the 5’ upstream, and encodes the EpCAM protein, expressed on the membrane of cells in epithelial tissues and plasma cells, and is deeply involved in the function of cell–cell interaction [91, 92]. Although EPCAM is not directly responsible for Lynch syndrome, it has a positional feature, as it is located 17 kb upstream of MSH2. Monoallelic cis-deletions of the last exons of EPCAM result in loss of its polyadenylation, transcriptional read-through into MSH2 with mosaic promotor methylation, and the generation of fused EpCAMMSH2 transcripts (Fig. 5) [19]. The cis-deleted alleles inhibit MSH2 expression, and finally causes Lynch syndrome in 1–3% of the affected families [19, 93].

Fig. 5
figure 5

A cis-deletion of EPCAM gene causes an epimutation of the MSH2 gene

In addition, biallelic inactivation of EPCAM is responsible for congenital tufting enteropathy (CTE, MIM# 613217) with an estimated incidence of one in 50,000–100,000 births in Western Europe [94,95,96]. CTE presents within the first months of life with severe chronic watery diarrhea and growth restriction. EPCAM abnormalities responsible for CTE are usually missense mutations, nonsense mutations, minute insertions/deletions, and splicing errors, which differ in type from extensive deletions that cause the EPCAM-associated Lynch syndrome [97]. Interestingly, in this case, one gene causes two unrelated genetic gastrointestinal disorders to be associated with different types of abnormalities.

Constitutional mismatch repair deficiency syndrome

Constitutional mismatch repair deficiency syndrome (CMMR-D) is caused by biallelic homozygous or compound heterozygous pathogenic germline mutations of MMR genes, and is a distinct childhood cancer preposition syndrome (MIM# 276300) with an autosomal recessive inheritance [98]. This condition was clarified from two different reports on children from consanguineous marriages within families with Lynch syndrome and MLH1 germline mutations who developed malignancies in early childhood (age range 14 months to 6 years) [99, 100]. In 1959, a condition strongly suspected as CMMR-D was reported by Turcot et al. [101] in two siblings with numerous colorectal adenomatous polyps, colorectal carcinoma and malignant brain tumors. Later, this condition was considered as a subtype of familial adenomatous polyposis (FAP) called Turcot’s syndrome [102].

In biallelic germline mutation carriers of MMR genes, hematological malignancies, brain/central nervous system (CNS) tumors and Lynch syndrome-associated carcinomas develop frequently [98]. In the gastrointestinal tract, bowel adenomatous polyposes are often observed as premalignant lesions that require differential diagnosis from FAP. The median age at diagnosis of hematological malignancies and brain/CNS tumors was, respectively, 6.6 (age range 1.2–30.8) and 10.3 (age range 3.3–40) years. However, Lynch syndrome-associated tumors developed later [median age at diagnosis 21.4 years (age range 11.4–36.6)], and are mostly colorectal cancers [103]. Various non-neoplastic features are related to CMMR-D including Cafe au lait spots (NF1 like), skin hypopigmentation, mild defects in immunoglobulin class switching recombination, agenesis of the corpus callosum, cavernous brain hemangioma, capillary hemangioma of the skin, combination of various congenital malformations, and Lupus erythematosus.

Lynch syndrome-associated tumors from patients with CMMR-D are considered to represent the characteristics of the DNA replication error as in the cases with Lynch syndrome. Thus, they often present with MSI-H findings, but not necessarily in all cases [103].

Genetic testing for Lynch syndrome

In order to select high-risk individuals with Lynch syndrome from among patients with colorectal cancer and to increase the efficiency of detecting germline mutations, microsatellite instability (MSI) testing and/or immunohistochemical staining (IHC) of MMR proteins is recommended as universal tumor screening, and should be conducted first [104,105,106]. The MSI testing facilitates easy identification of events in which genetic integrity has been damaged due to repair failures of DNA replication errors using simple repeated microsatellite sequences [107,108,109,110,111]. Five types of repeat-markers including mononucleotide and dinucleotide repeats have been used, but recently mononucleotide repeat-markers have been preferred. Cases with different numbers of repeats between normal tissue-derived DNA and cancer-derived DNA are considered as positive [112]. If two of the five markers show instability, the tumor is evaluated as MSI-high (MSI-H). The results of MSI-H colorectal cancer are shown in Fig. 6. If one of the markers shows instability, the tumor is considered as MSI-low (MSI-L). If positive markers are not observed, the mismatch repair system is evaluated to be proficient and is called MS-stable (MSS).

Fig. 6
figure 6

Analytic image of MSI testing: 4 out of 5 markers show microsatellite instability

Immunohistochemical staining of MMR proteins can reveal damaged molecules using specific antibodies. Staining with four antibodies—MLH1, MSH2, MSH6, and PMS2—can predict the gene causing Lynch syndrome, because the mismatch repair proteins form heterodimeric complexes (Table 5) [113,114,115,116,117,118,119,120].

Table 5 IHC findings associated with MLH1, MSH2, MSH6, and PMS2 mutations

For MSI testing, sensitivity ranged from 66.7 to 100.0% and specificity ranged from 61.1 to 92.5%, whereas for IHC staining, sensitivity ranged from 80.8 to 100.0% and specificity ranged from 80.5 to 91.9% [121].

Approximately 10–15% of sporadic colorectal cancers show MSI-H findings. The cause is mostly the loss of MSH1 protein due to methylation of the MLH1 gene promoter region. About half of MSI-H sporadic colorectal cancers show BRAFV600E mutation, which is not detected in colorectal cancers from patients with Lynch syndrome. MLH1 methylation analysis and BRAF V600E mutation testing in colorectal cancers can reduce the number of samples and simplify the genetic testing for Lynch syndrome, leading to cost and time savings [35, 122].

Final genetic testing for Lynch syndrome is performed using DNA sequencing in selected cases excluding sporadic colon cancer from all colorectal cancers. For a long time, genetic testing has mainly been performed using Sanger sequencing, and multiplex ligation-dependent probe amplification (MLPA) has been adopted for a wide range of abnormalities such as large deletions/insertions [123]. Clinical genetics is currently transitioning from phenotype-directed single gene testing to multigene panels [124]. Multigene panel testing using next generation sequencing for hereditary colorectal cancer has been evaluated as a feasible, timely, and cost-effective approach compared to single gene testing [125]. Previously, the distribution of germline mutations in MMR and EPCAM genes in Lynch syndrome was thought to predominantly occur in MSH2 and MLH1, and less frequently in MSH6 and PMS2. As a result of multigene panel testing without universal tumor screening, Espenschied et al. reported that MSH6 mutations were the most frequent, followed by PMS2, MSH2, MLH1, and EPCAM (Table 6a) [123, 126,127,128]. About 12% of individuals carrying MMR gene mutations have breast cancer alone. Moreover, even MMR gene mutation carriers do not necessarily meet the criteria for Lynch syndrome or the BRCA1/BRCA2 testing criteria. However, MSH6 and PMS2 germline pathogenic variants are associated with an increased risk for breast cancer [126, 129]. Figure 2 shows the gene-specific distributions of germline variants by the types of abnormalities in mismatch repair genes. Most MSH2, MLH1, and MSH6 pathogenic variants were truncated types such as nonsense mutations or frameshift mutations (Table 6b) [130]. A wide range of rearrangements were detected at 10, 7, and 10% for MSH2, MLH1 and PMS2, respectively. Therefore, selection of an appropriate analysis method is required for genetic testing.

Table 6 Germline mutation analyses in the responsible genes in Lynch syndrome

Effectiveness of immune check point blockades and a hypermutable state (high tumor mutational burden)

As cancer cells escape the host immune system by suppressing T cell activation, they have immunosuppressive functions attributed to immune checkpoint molecules. The immune checkpoint molecules include cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) and programmed cell death protein 1 (PD-1, CD279) [131, 132], which were found to negatively control the immune system [133, 134]. In human cancer treatment, anti-PD-1 antibody was found to be effective for non-small cell lung cancer, malignant melanoma, and renal cell cancer, and was also clinically applicable in safety [135]. The clinical efficacy of PD-1 inhibitor was found to be higher in mismatch repair-defective colorectal and non-colorectal cancers compared to proficient-mismatch repair cancers [136]. According to recent findings, high tumor mutational burden (TMB) is an excellent biomarker for predicting the efficacy of immune checkpoint inhibitors (ICIs) [137, 138], and the group of colorectal cancer patients with the biological characteristics of deficient mismatch repair (dMMR) has a significantly better response to ICIs than those with proficient mismatch repair (pMMR) [136, 139]. In gastrointestinal cancer, the state of microsatellite instability high (MSI-H) state has been shown to correlate well with high TMB based on an analysis of many cancer genomes [140]. The microsatellite instability (MSI) testing is used as a standard biomarker to predict the response of ICIs  [141, 142].

Future directions

The long-term and detailed research on two families with familial accumulation of various cancers conducted 100 years ago has subsequently led to the establishment of Lynch syndrome. On the other hand, mismatch repair genes have been elucidated as part of the genome integrity system in E. coli and yeast. These researchers worked together to understand the clinical, genetic, and molecular biology aspects of Lynch syndrome. With its natural history and molecular biological characteristics clarified, pre-symptomatic diagnosis by genetic testing for at-risk persons in the family and appropriate medically actionable interventions, such as early diagnosis, are becoming possible.

The development of ICIs is a major milestone in the treatment of Lynch syndrome, where the associated cancers are with almost MSI-H. These studies have shown new possibilities for the treatment of familial (hereditary) tumor syndrome. In future, we hope that advances in the integrated understanding of the clinical and molecular biology of Lynch syndrome will lead to the development of new effective treatments.