Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Mitochondria are the only cellular organelles that contain their own genetic material. Most human cell contains hundreds to thousands of mitochondria [1], each of which contains multiple copies of the 16,569 bp circular double-stranded mitochondrial DNA molecule. The number of mitochondria per cell depends on the energy demand of the specific tissue. Since there are multiple copies of mtDNA, if an mtDNA mutation occurs, the mutant mtDNA often coexists with the wild-type mtDNA, a phenomenon called “heteroplasmy.” The degree of heteroplasmy of a mutation, nature of the specific mutation, and its tissue distribution determine the clinical phenotype of the affected patient [2, 3]. Phenotype may also be modified by genetic background and environmental factors.

Unlike the nuclear genes, mitochondrial genome contains no introns in the protein-coding regions. The entire mitochondrial genome, encoding a total of 37 genes, is efficiently utilized. Polycistronic messages are produced from the mtDNA. Genes reside on both stands of the circular mitochondrial genome. The ATP6 and ATP8 genes even share part of their coding regions in different reading frames [4].

The 13 proteins encoded by the mtDNA are all components of the respiratory chain complexes. The mitochondrial genome also encodes two ribosomal RNAs and 22 tRNAs. Mutations in the rRNA and tRNA may also cause disease (http://www.mitomap.org/MITOMAP). Indeed, the majority of pathogenic mutations reside in the tRNA genes. For example, the common m.3243A > G mutation in the tRNALeu(UUR) gene is the most frequent cause of MELAS (mitochondrial encephalopathy lactic acidosis and stroke-like episodes) syndrome. The bacterial like rRNAs are similarly sensitive to some antibiotics that target bacterial ribosomes. Thus, the m.1555A > G in the 12S rRNA gene is associated with ototoxicity-induced hearing loss.

In addition to the dense coding regions, there is also an approximately 1.1 kb noncoding displacement loop (D-loop) region where the origins of replication are located.

The purpose of molecular diagnosis of mtDNA disorders is to identify the deleterious changes of mtDNA sequences that contribute to the disease [5]. Two types of mtDNA mutations are usually analyzed; mtDNA point mutations and large mtDNA deletions. While there are common recurrent point mutations, rare or novel pathogenic mutations [5] do occur. Therefore, the diagnostic analysis of the mtDNA usually includes the whole mitochondrial genome.

The mtDNA deletions may be a single large deletion or multiple deletions. Since these mutations lead to malfunction of the electron transport chain, there is frequently multisystem involvement. Historically, different methods have been required for the detection of point mutations and deletions, and for the quantification of mutation heteroplasmy, and the determination of deletion breakpoints [3, 5]. This chapter briefly reviews the conventional molecular diagnostic methods employed in the analysis of mtDNA disorders and then describes the comprehensive one-step approach enabled by the application of next-generation sequencing (NGS) technology.

2 Conventional Methods for the Diagnosis of Mitochondrial DNA Disorders

The methods used for the detection of point mutations are usually PCR-based, while the detection of large deletions is usually achieved by traditional Southern blotting methodology [2]. Since the degree of mutation heteroplasmy is critical in disease diagnosis, prognosis, and genetic counseling, various quantification techniques are used for the measurement of heteroplasmy after the detection of a deleterious mutation [6]. Table 11.1 lists the current stepwise molecular procedures that are required for a comprehensive analysis of the mitochondrial DNA [5]. Pitfalls of each method are also listed (Table 11.1).

Table 11.1 Current molecular procedures used for the diagnosis of the mitochondrial genome

2.1 Common Point Mutations: Detection and Quantification

Patients suspected of having maternally inherited mtDNA disorders are usually first screened for the common point mutations by PCR-based assays including RFLP [2, 3, 7] and allele specific oligonucleotide (ASO) dot blot hybridization methods as described in Chap. 2 [7]. Using radioactive probes, the ASO dot blot hybridization method is sensitive enough to detect point mutations at as low as 1 % [7, 8]. If a common point mutation is identified, analysis of the level of heteroplasmy can be carried out, usually by allele refractory mutation system-based quantitative PCR (ARMS qPCR) for the quantitative measurement [6, 8]. These methods require validation and are limited to the analyses of known point mutations [6, 8].

2.2 Detection of Unknown mtDNA Point Mutations

If the common point mutations and large deletions (see Sect. 2.3 below) are not detected, and the maternal inheritance of the disease is still hypothesized, the whole mitochondrial genome is analyzed by Sanger sequencing, which is performed following PCR amplification of the entire mitochondrial genome with multiple pairs of overlapping primers [9, 10]. Sanger sequencing was the gold standard for the identification of unknown mutations for many years before the advent of massively parallel sequence analysis. Sanger sequencing is not a quantitative method, it does not detect mutations at low heteroplasmic levels, and it does not detect large deletions [1113]. In addition, PCR-based methods will not accurately detect the sequence under the primer binding sites.

2.3 Detection of mtDNA Deletions

Large deletions in mtDNA are detected by Southern blot analysis. Unfortunately, this is a tedious procedure that does not provide deletion breakpoints or the degree of deletion heteroplasmy. These deficiencies can be addressed by array-comparative genome hybridization (aCGH), which not only detects the deletion but also provides deletion breakpoints and an estimate of deletion heteroplasmy [1113].

2.4 Detection of mtDNA Multiple Deletions

By their very nature, multiple mtDNA deletions are difficult to detect. The individual molecular species are often present at low levels, challenging methods with low sensitivity, such as Southern analysis. Conversely, PCR-based assays will amplify molecules to detectable levels but are very dependent upon choice of primer sites and may fail in the presence of sequence variations. In this method, multiple pairs of primers are employed in order to evaluate suspected regions on the mitochondrial genome. Primer pairs and PCR conditions are selected such that amplification will only occur when a primer pair encompasses a deletion. If multiple deletions are present, it is possible to amplify multiple fragments.

2.5 Determination of Deletion Junctions

Primers outside the approximate deletion regions are designed. Since the exact deletion breakpoints are usually not defined before the junction is sequenced, several pairs of primers covering different ranges of possible deletions have to be tested to find the approximate limits of the deletion [14]. PCR products are then purified followed by Sanger sequencing [14]. These procedures are time consuming and labor intensive even for single deletions. The presence of multiple deletions exacerbates this situation; more primer pairs must be tested, and more PCR products must be purified and sequenced. Even after extensive efforts, these procedures may not determine all of the different breakpoints. The detection and characterization of multiple deletions is greatly simplified by the adoption of massively parallel sequencing of the entire mitochondrial genome with uniformly deep coverage [15].

3 NGS-Based Analyses

3.1 Target Gene Enrichment

Methods for target gene enrichment including PCR-based and capture-based have been described in previous chapters in this book. Since the mitochondrial genome is small (16.6 kb) and does not contain any introns, the enrichment is usually achieved by PCR, which may use 24–36 pairs of primers [10, 16, 17] to amplify short overlapping regions or 2–3 pairs of primers for long-range PCR (LR-PCR) [1820]. Recently, we have designed LR-PCR primers to generate a single amplicon of the entire mitochondrial genome [15, 21].

Enrichment of the mitochondrial genome by capture in solution using RNA or DNA probes has been reported [20, 2224]. However, the coverage profile showed that different parts of the mitochondrial genome are not captured and sequenced uniformly [15, 22]. Therefore, it is not possible to detect large deletions or low heteroplasmic variants from these sequence data. Multiple copies of mitochondrial pseudogenes are stranded on each of the nuclear chromosomes [2527]. These nuclear mitochondrial sequences (NUMTs) are subject to genetic drift and therefore produce a significant background of sequence variants that must be contended with in order to discern the true mtDNA sequence. In addition, due to the abundance of NUMTs, exome capture/sequencing will co-capture NUMTs even in the absence of mtDNA-specific probes. Thus, interference from NUMT sequences may result in incorrect sequence information and/or errors in the quantification of mtDNA heteroplasmy [2527].

3.2 Platforms of Massively Parallel Sequencing

Massively parallel sequencing can be performed using various platforms, including 454, SOLiD, Affymetrix re-sequencing chip, Illumina, and IonTorent. The utilization of different MPS sequencing chemistries and machine hardware configurations has been reviewed [21, 2832]. Each method has its own advantages and disadvantages [21], and the properties of the mitochondrial genome influence selection of MPS methodology.

The mitochondrial genome contains a number of homopolymeric stretches, high GC content regions, and short tandem repeats. Since low heteroplasmy of deleterious mutations, including small indels in repeat regions, can be clinically significant, it is important to understand the limitations of each different sequencing method. Different platforms may also affect the depth of coverage and the ability to multiplex. Proper controls should be included and analyzed together with each indexed specimen to ensure accuracy [15]. Limit of detection of NGS-based assays should be assessed since quantification of mtDNA mutation heteroplasmy is an important analytical component. Different platforms provide different depth of sequence coverage, which may limit heteroplasmy detection [15, 1820, 22].

4 Reported Studies of the Mitochondrial Genome by MPS

Although this chapter focuses on the translation of massively parallel sequencing (MPS) to the clinical diagnosis of mtDNA disorders, various studies have utilized high throughput MPS analyses of the mitochondrial genome for different purposes (Table 11.2).

Table 11.2 Analyses of the human mitochondrial genome using massively parallel sequencing (MPS)

4.1 Detection of Pathogenic Point Mutations and Evaluation of Heteroplasmy

For the purpose of translation to molecular diagnosis, MPS was validated for its ability to simultaneously detect and quantify mtDNA variants of the entire mitochondrial genome by comparing the MPS results to those of Sanger sequencing [15, 19]. MPS has also been used to identify mtDNA variants in mtDNA-related disorders, including left ventricular noncompaction (LVNC) [24], maternally inherited cardiomyopathy [20], and Leigh syndrome [18]. However, these studies were limited to the detection of mtDNA variants at >5 % heteroplasmy in the analyzed tissue.

Due to the nonuniformity of mtDNA coverage, detection of mtDNA deletions was not possible in the previously reported studies [1820, 24]. An mtDNA single deletion at high heteroplasmy (94 %) and >25,000X coverage was detected by capture using RNA probes followed by MPS [22], in which coding regions of 1,300 nuclear genes involved in mitochondrial production and function were also captured and sequenced [22, 38]. Simultaneous detection of mutations in both the mitochondrial and the nuclear genes is the main goal of MPS-based comprehensive diagnosis for dual genome dysfunction, that is, conditions in which nuclear gene mutations perturb the mitochondrial genome. However, the application of the dual genome MPS approach in one step has not been fully validated for clinical diagnosis [22, 38].

Since the human mitochondrial genome is only 16.6 kb, high throughput MPS can usually provide excessive depth of coverage. Therefore, if only the mitochondrial genome is to be sequenced, multiple samples are usually multiplexed in order to efficiently use the capacity of the high throughput instrument (Table 11.2). The read length and depth of coverage vary according to the NGS platform used.

4.2 Comprehensive One-Step Analysis of the Whole Mitochondrial Genome

Our laboratory recently developed a one-step MPS approach that provides quantitative base calls, detection of large deletions, and the exact deletion breakpoints [15]. This approach uses one pair of primers (Fig. 11.1) – mt16426F-5′ccgcacaagagtgctactctcctc3′ and mt16425R-5′gatattgatttcacggaggatggtg3′ – for the long-range PCR (LR-PCR) amplification of the entire mitochondrial genome as a single amplicon, followed by library preparation and sequencing on Illumina HiSeq 2000 [15]. Since the whole mitochondrial genome is amplified as one single amplicon, every base is presumably represented in proportion to its occurrence in the starting population of molecules (Fig. 11.2a). However, molecules containing deletions can have a replicative advantage in limiting conditions. Therefore, while deletions are readily detected (Fig. 11.2b), their heteroplasmy level may be overestimated. By aligning the unmatched sequences to the mitochondrial reference sequence (rCRS) with less stringent parameters to allow >80 % match in half of the sequence read, the deletion breakpoints can be precisely mapped (Fig. 11.2c). This is a great advantage in contrast to conventional Southern analysis of large deletions where the detection of deletion and determination of the breakpoints are two separate tedious procedures.

Fig. 11.1
figure 00111

The primer positions for the LR-PCR of the whole mitochondrial genome

Fig. 11.2
figure 00112

The uniform coverage of the whole mitochondrial genome (a), the sharp deletion boundaries detected by MPS (b), and the deletion breakpoint sequences (c). Provided by Dr. Hui Yu

By multiplexing 12 samples per lane of the flow cell and 76 cycles of sequencing, an average coverage per base of ~20,000X can be routinely achieved [15]. At this depth of coverage and 0.326 % ± 0.335 % experimental error rate, heteroplasmies >1.5 % are easily detected. For the purpose of quality control, a sample with 1.1 % heteroplasmic m.3243A > G mutation and a series of reference DNA samples containing 1 %, 5 %, 10 %, 20 %, and 50 % of known variants have always been spiked-in and analyzed exactly the same way as the same indexed patient’s sample [15]. It has been demonstrated that the 1.1 % m.3243A > G control has always been detected at 1.14 % ± 0.09 % (unpublished observation). To date, more than 800 samples have been analyzed using this comprehensive one-step approach. Numerous homoplasmic or heteroplasmic variants have been detected. Most of these variants are reported benign SNPs. About 6 % of the samples analyzed harbored reported pathogenic mutations, and only <1 % are novel variants that are likely to be deleterious. Since not all heteroplasmic novel variants are clinically significant, other genetic, biochemical, pedigree, and clinical information, as well as results obtained from in silico analyses using protein structural/functional prediction algorithms, are used to help with the interpretation of these variants [39, 40].

The translation of this next-generation sequencing approach to the diagnosis of mtDNA-related disorders has been fully validated according to the regulatory criteria for the clinical diagnostic laboratories set forth by CLIA (Clinical Laboratory Improvement Amendments) and CAP (College of American Pathologists). All necessary quality and quantity reference samples are incorporated with the analyses of every test sample. All variants detected by Sanger methodology have been detected by this one-step comprehensive MPS method. In addition, this method detects low heteroplasmy changes that are not detected by Sanger sequencing. Thus far, this MPS strategy is the most comprehensive approach for the provision of accurate, reproducible heteroplasmy measurements of variants at every nucleotide position of the entire mitochondrial genome. Furthermore, large mtDNA deletions are detected and the deletion breakpoints are easily mapped [15].

4.3 MPS Investigation of mtDNA Variations in Cancers, Various Tissues, and Among Different Populations

In addition to molecular diagnosis of mtDNA disorders, MPS has also been employed to analyze the whole mitochondrial genome for the purposes of disease prognosis in cancers. Somatic mtDNA alterations in tumor cells can serve as biomarkers for monitoring disease progression [4144]. Traditionally, mtDNA alterations in cancer cells were analyzed by Sanger sequencing of overlapping short PCR fragments [4144]. This is tedious if a large number of tumor samples are to be analyzed. A recent report took advantage of massively parallel sequencing to analyze mtDNA of ten colon cancers, nine different tissues of one patient, and members of two CEPH families [33]. Their results revealed variable heteroplasmic mtDNA alterations in colon cancers, in different tissues of the same individual, and in different matrilineal relatives [33]. The degree of heteroplasmic changes varies from 1.6 % to 57 %. These studies provide insights into the nature and variability of mtDNA sequences during embryogenesis and cancer development and demonstrate that human individuals are characterized by a complex mixture of related mitochondrial genotypes rather than a single genotype. However, these studies were performed for research purposes, not for molecular diagnostics.

Similarly, MPS was used to study the effects of radiation therapy on mtDNA alteration [35] and to investigate the dynamics of mtDNA heteroplasmy in maternal transmission [34]. The latter study showed that frequencies of heteroplasmic changes may be lower than previously estimated but agreed with the results of He and coworkers [35] that mtDNA heteroplasmy varies among different tissues of an individual and between mother and child [34]. Furthermore, studies of mtDNA in 131 individuals from five Eurasian populations [36] and 147 individuals from the Caucasus and West Asia [37] revealed that mtDNA heteroplasmies are common and variable among populations. These studies involved a large number of samples and the detection of low level heteroplasmies. Only high throughput, deep coverage sequencing techniques allow these types of studies to be performed in a cost- and time-efficient fashion. These results also suggest caution when mtDNA variants are used for forensic identity verification purposes due to the dynamic occurrence of heteroplasmic changes [34]. However, since the NGS technologies used in these studies have not been evaluated with forensic standards, the application of MPS to forensic investigation requires further assessment [45]. A major concern of these MPS-based applications is that these studies did not discuss the potential interference of nuclear mtDNA homologues (NUMT), which may result in incorrect variant calls or inaccurate heteroplasmy measurements.

5 Regulatory Requirements for the Application of NGS-Based Tests to the Clinical Diagnosis of mtDNA Disorders

Testing of human specimens for diagnostic purposes must follow the regulatory procedures defined by the Clinical Laboratory Improvement Amendments (CLIA), which requires the assessment and documentation of performance characteristics including sensitivity, specificity, accuracy, reproducibility, and any other unique procedures applicable to the analytic validity of the test results. Due to the enormous amount of data produced on each test and the complex laboratory and computational analytical procedures involved, it is difficult to define the standards required for compliance of this newly developed technique with the CLIA regulation. An NGS guideline work-group has been actively exploring these issues. In general, before applying NGS-based tests clinically, the tests must be validated. In particular, for the diagnosis of mtDNA disorders by MPS, the specific parameters for the evaluation of the analytical performance of an NGS run should include depth of coverage, uniformity of distribution of read coverage, poorly covered regions, or base positions (e.g., small indels, repeat and homopolymer regions), quality of base calls, ability to detect mtDNA large deletions, and limit of detection of mutation heteroplasmy. Review of the published papers on the NGS-based analyses of the mtDNA (Table 11.2) revealed that most of these NGS-based assays were not fully validated for clinical diagnosis except for the one-step comprehensive approach reported by Zhang et al. [15].

Zhang and coworkers assessed the performance of NGS-based analysis of mtDNA by comparing the results to those obtained from Sanger sequencing and demonstrated 100 % sensitivity and specificity [15]. Since the measurement of the degree of mutation heteroplasmy is critical in result interpretation and genetic counseling, limit of detection of the NGS approach and the reproducibility of the quantified results from various runs were evaluated. Most importantly, this report described the incorporation of quality and quantity reference specimens with each indexed sample for simultaneous evaluation to ensure the accuracy and reproducibility of the results [15]. Furthermore, a “deep sequencing index” (DSI) formula was developed to evaluate the performance of each sequencing run and to compare the quality of sequencing results among different gene enrichment methods. This equation contains six parameters: (i) the mean number of reads mapped to quality control (QC) DNA, (ii) the mean number of sample reads normalized to the average number of reads of QC DNA, (iii) the correlation coefficient of the expected versus observed proportion of 6 QC DNA variants mixed by known ratios, (iv) the ratio of the standard deviation of the mean number of reads to the average number of reads mapped to sample DNA, (v) the analytical specificity, and (vi) the analytical sensitivity of a run determined from the reads mapped to mtDNA. The analytical specificity of a run was defined as the percentage of reads mapped to target mtDNA reference sequence compared to total reads generated for the sample, which, for the capture-based enrichment, is ~20 % and for the single amplicon LR-PCR based is ~99 %. The analytical sensitivity of a run was defined as the percentage of bases of the reference sequence covered by MPS reads. The analytical sensitivity should be 100 % to achieve 0 % false negative. Thus, it is clear that all performance parameters specific for the novel NGS technology are included in this formula for quality assessment. Each laboratory can define its own minimal passing score that represents the acceptable quality of performance. As a result, this numerical assessment can compare and help to standardize inter-laboratory performance.

6 Caveats in Making Diagnosis

The most difficult tasks in the molecular diagnosis of mtDNA disorders are (i) simultaneous detection and quantification of heteroplasmic mtDNA point mutations, (ii) simultaneous detection and mapping of mtDNA large deletions, and (iii) simultaneous detection and quantification of mtDNA point mutations and large deletions. The MPS approach can simultaneously accomplish each of these goals if it is performed in such a way that it (i) avoids the interference of NUMTs and mtSNPs and (ii) provides even coverage of all nucleotide positions. Since NUMTs are present throughout in the nuclear chromosomes and since mtSNPs are distributed throughout in the mitochondrial genome, the only way to avoid their interferences is to amplify the whole mitochondrial genome as a single piece using a pair of primers that contain the least number of mtSNPs at the lowest frequency. This approach will also provide even coverage of every single nucleotide position [15]. Alternative primers should also be validated for deployment when the first chosen primers fail to amplify efficiently. A set of inwardly facing primers that bind outside of the single amplicon primers is also needed in order to evaluate for SNPs at the main primer binding sites.

Most of the reported studies used at least two pairs of primers to amplify overlapping regions [18, 19, 24, 33, 34, 36, 37]. Therefore, these MPS approaches are not designed to detect large mtDNA deletions. Although the coverage depth may be sufficient to provide heteroplasmic measurements, the variant calls and heteroplasmy measurements must be interpreted cautiously since the presence of SNPs at the primer binding sites and the co-amplification of NUMTs can potentially skew both variant calls and the heteroplasmy quantification.

7 Conclusions

The translation of NGS to the clinical diagnosis of mitochondrial DNA-related disorders has already occurred. One laboratory [15] has fully validated the NGS-based assay by the documentation and implementation of quality control and quality assurance procedures that are required by CLIA and CAP. Experimental errors and limit of detection should be defined before offering the NGS-based quantification of mtDNA mutation heteroplasmy as a clinical test.