Keywords

1 Introduction

Mitochondrial diseases are clinically and genetically heterogeneous disorders due to defects in either the nuclear genome (three billion bp in 23 pairs of chromosomes) or the mitochondrial genome (16.6 kilo bp). Since the mitochondrial genome is small, traditional diagnosis of mitochondrial disorders focused on the Sanger analysis of the tiny circular double stranded mitochondrial DNA (mtDNA). This chapter describes NGS based molecular diagnosis of mitochondrial disorders caused by mtDNA defects.

Mitochondria are the only cellular organelles that contain their own genetic materials. Most human cell contains hundreds to thousands of mitochondria [2], each of which contains multiple copies of mtDNA molecules. The number of mitochondria and mtDNA molecules per cell depends on energy demand of the specific tissue. Since there are multiple copies of mtDNA, if mutation occurs, it often co-exists with the wild type mtDNA, a phenomenon called “heteroplasmy ”. The degree of heteroplasmy of a pathogenic variant, nature of the specific variant, and its tissue distribution determine the clinical phenotype of the affected patient, including variable penetrance, expressivity, and age of onset [3, 4]. Phenotype may also be modified by genetic background and environmental factors.

Unlike nuclear genes, mtDNA contains no introns in the protein coding regions. The entire mtDNA, encoding a total of 37 genes, is efficiently utilized. Polycistronic messages are produced from both strands of the mtDNA. Two genes, ATP6 and ATP8 share part of their coding regions in different reading frames [5].

The mtDNA encodes two ribosomal RNAs , 22 tRNAs and 13 respiratory chain complex protein subunits. Beside protein coding regions, mutations in the rRNA and tRNA can also cause diseases (http://www.mitomap.org/MITOMAP). The most frequently occurring mtDNA pathogenic variants reside in the tRNA genes, such as m.3243A>G in tRNALeu(UUR), the most common cause of mitochondrial encephalopathy lactic acidosis and stroke-like episodes (MELAS) , and m.8344A>G in tRNALys associated with mitochondrial epilepsy and ragged red fibers (MERRF) . In addition to the dense coding regions, there is also an approximately 1.1 kb non-coding displacement loop (D-loop) region where the origins of replication are located. The polymorphic nature of this non-coding region plays a critical role in forensic science and human evolution; however, pathogenic variants and large deletions in D-loop region have not been reported.

Traditional molecular diagnosis of mtDNA disorders to identify deleterious changes of mtDNA sequences was based on Sanger sequencing for single nucleotide variants (SNVs) and Southern blot analysis for large deletions [6]. However, these methods have several intrinsic technical limitations : (a) It does not detect low levels of heteroplasmy and does not provide reliable quantification; (b) Southern blot analysis is not sensitive enough to detect all deletions, and it alone does not determine deletion junction sequences; (c) It is difficult to distinguish single and multiple deletions unequivocally; (d) The presence of nuclear mtDNA homologs interferes with the accurate detection and quantification of SNVs; (e) The polymorphic variations across the entire mtDNA make it difficult to select multiple primer regions without any mtSNPs for amplifications. These pitfalls can be resolved by the application of the one-step comprehensive deep next generation sequencing (NGS) approach as described in the sections below.

2 The Design of One-step Amplification Coupled with High Coverage Depth Sequencing

2.1 Rationales for the Enrichment of the Whole mtDNA in One Piece

The absence or presence of various degrees of heteroplasmic pathogenic variants is critical in disease diagnosis, prognosis, and genetic counseling. The unambiguous detection of mtDNA single or multiple deletions in different tissues is also important in disease diagnosis [3]. However, traditional methods of amplicon based Sanger sequencing and Southern blot analysis have serious limitations in their application to molecular diagnosis (Table 1). The key to achieve comprehensive molecular diagnosis of mtDNA disorders that will solve these issues is to prepare the authentic entire mtDNA molecule for sequencing [1, 7]. The strategy uses a pair of back-to-back primers near the origin of mtDNA replication to enrich the entire mtDNA in one piece (Fig. 1). Since the back-to-back primers (m.16426F and m.16425R) are oriented in opposite directions, only the genuine circular mtDNA molecules, not found in the nuclear genome, can be amplified into one piece of DNA template, thus avoiding interference of the linear embedded mtDNA homologs (NUMT) (Table 1, #5). The advantage of this approach is that the linearized mtDNA products will include molecules with AND without deletion or duplication (Table 1, #4) [1, 7]. In addition, every single nucleotide of coding regions of the entire 16,569 bp mtDNA is equally represented for uniform coverage, facilitating reliable detection and quantification of heteroplasmic variants , as well as the determination of deletions and breakpoint sequences (Table 1, #1, 2 and 3).

Table 1 One-step NGS solutions to issues of traditional approaches in the diagnosis of mtDNA related disorders
Fig. 1
figure 1

The primer positions for the LR-PCR of the mitochondrial genome

2.2 Procedures for Single-Amplicon, LR-PCR Based mtDNA Enrichment

Successful amplification of the entire circular mtDNA into one piece of LR-PCR product is essential for high quality NGS analysis [1, 7]. The PCR products need to be checked by agarose gel for its presence and quality before continuing on to next library preparation step to ensure fruitful results. Forward and reverse primers are mt16426F- 5’ccgcacaagagtgctactctcctc3’ and mt16425R- 5’gatattgatttcacggaggatggtg3’. PCR was performed using TaKaRa LA Taq Hot Start polymerase kit (TaKaRa Bio Inc., Madison, WI, USA) and 100 ng of total genomic DNA isolated from blood or 15 ng from skeleton muscle as template in a 50 μl PCR system. After an initial 2 min incubation at 95 °C, it is followed by 30 cycles of PCR with 20 s of denaturation at 95 °C and 18 min of annealing and extension at 68°. The reaction is completed by one cycle of final extension at 68 °C for 20 min.1.5 μl PCR products was analyzed on 1.5% agarose gel with 1 kb plus DNA ladder (Invitrogen, Carlsbad, CA, USA).

3 The New Gold Standard for Molecular Diagnosis of mtDNA Related Disorders

3.1 Uniform Coverage: The Necessity of Amplification of the mtDNA in One Piece

The mtDNA must be enriched in one single LR-PCR step such that every nucleotide is equally represented and sequenced to ensure uniform coverage. The profiles of the coverage depth of the mtDNA using three different enrichment methods are compared in Fig. 2. The coverage profile of the mtDNA enriched by capture in solution using RNA or DNA [8,9,10] shows that different segments of the mtDNA are not captured and sequenced uniformly (Fig. 2a) [1, 9]. Therefore, it is not possible to detect large deletions or low level heteroplasmic variants with equal sensitivity from these sequence data. Multiple copies of mitochondrial pseudogenes/homologs are located on each of the nuclear chromosomes [11,12,13]. These nuclear mitochondrial sequences (NUMTs) are subject to genetic drift and therefore produce a significant background of sequence variants that must be contended with in order to discern the true mtDNA sequence. In addition, due to the abundance of NUMTs, in-solution probe hybridization /sequencing will co-capture NUMTs significantly even in the absence of mtDNA specific probes. Thus, interference from NUMT sequences may result in incorrect sequence information and/or errors in the quantification of mtDNA heteroplasmy [11,12,13].

Fig. 2
figure 2

Comparison of the coverage depth of the mitochondrial genome by different methods of enrichment methods. Enrichment of mtDNA by capture method (a), by PCR amplification (b), and the LR-PCR of the whole circular mitochondrial genome by one pair of back-to-back primers. The X axis is the nucleotide position from m.1 to m.16,569. The Y axis is the read depth at each nucleotide position

Since the mtDNA is small (16.6 kb) and does not contain any introns, enrichment has historically been achieved by PCR, using24–36 pairs of primers [14,15,16] to amplify short overlapping regions, or 2–3 pairs of primers for long range PCR (LR-PCR) [8, 17, 18]. However, the coverage profile of the mtDNA enriched by multiplex amplification revealed uneven coverage depth of each amplicon due to different PCR efficiencies with spiky over-covered primer positions (Fig. 2b). The inconsistency in amplicon coverage depth prevents detection of large deletions and renders the detection sensitivity unreliable.

In contrast to capture based or multiple amplicon approaches, enrichment of the entire mtDNA using only one pair of back-to-back LR-PCR primers generates a linearized mtDNA with every single nucleotide equally covered [1, 19], providing a uniform coverage profile (Fig. 2c) that can readily detect mtDNA deletions (Fig. 3) [1, 19]. Thus, it is clear that in order to detect large mtDNA deletions, the entire mtDNA must be enriched evenly throughout.

Fig. 3
figure 3

Detection of single and multiple mtDNA large deletions and mapping of breakpoints. Coverage depth pattern of no (a), single (b), and multiple mtDNA large deletions (c) and mapping of corresponding breakpoints for single deletion (d), and multiple deletions (e)

3.2 Detection and Mapping of mtDNA Deletions

Uniform coverage of the entire mtDNA by single amplicon LR-PCR/NGS allows the detection and mapping of mtDNA deletions. Sequencing of the mtDNA enriched by oligonucleotide probe-hybridization capture or multiplexed PCR does not provide uniform coverage throughout the genome [1]. Variable coverage depth among different regions or amplicons of the mtDNA makes it impossible to detect large mtDNA deletions reliably. In contrast, enrichment of the mtDNA with single amplicon LR-PCR amplification provides even coverage of the entire mtDNA (Figs. 2c and 3a), allowing accurate detection of large deletions.

Figure 3b shows LR-PCR/NGS analysis of a DNA sample from the blood specimen of a 10-year-old boy presenting with encephalopathy, exercise intolerance, easy fatigability, and sensorineural hearing loss. The sharp decrease in read coverage from m.7638 to m.15434 indicates a heteroplasmic large deletion of 7797 bp [7]. The degree of deletion heteroplasmy can be estimated by comparing the coverage of deleted versus non-deleted regions. The deletion breakpoint, at a single base resolution, is clearly revealed by this method (Fig. 3d). The deletion junction sequence can be confirmed by conventional PCR using primers flanking the deletion region followed by Sanger sequencing through the deletion junction [3].

The coverage profile of a muscle sample from a 70-year-old man with myopathy is shown in Fig. 3c. Instead of sharp deletion junctions, an arch-shaped coverage pattern was observed, suggesting multiple mtDNA deletions [7]. Realignment of the unmapped sequences to the reference sequences , with less stringent parameters, revealed multiple deletion junctions (Fig. 3e). A total of 48 junction sequences were identified. The majority of the deletion breakpoints that have been confirmed by targeted PCR followed by Sanger sequencing are shown in Fig. 3e.

3.3 Reliable Quantification of Heteroplasmy and Detection Limit

The mtDNA contains a number of homopolymeric stretches, high GC content regions, and short tandem repeats. Low heteroplasmy of deleterious mutations, including small indels in some repeat regions, can be clinically significant. Thus, it is important to assess the limit of detection of NGS-based assays [1, 7]. Different platforms may affect the depth of coverage and sequence error rate differently. Here, we focus on the analysis of mtDNA LR-PCR products by Illumina HiSeq sequencing . Several factors, such as DNA quality, amount of DNA template loaded to the sequencer, cluster generation, number of sequencing cycles, factors of multiplex, etc. can affect read depth. Therefore, proper controls should be included and analyzed together with each indexed specimen to ensure accuracy and reproducibility for reliable heteroplasmy sequence error rate quantification [1]. Different sequencing platforms provide different depth of read coverage, which may limit heteroplasmy detection [1, 8, 9, 17, 18].

3.3.1 Deep Coverage for More Accurate Quantification and Detection of Variants

The coverage depth depends on several factors including sequencing platforms, amount of DNA template loaded, and multiplexed samples, the coverage depth should be evaluated and optimized in individual laboratories. Although it seems that the deeper the coverage the better the detection sensitivity and accuracy, overly deep coverage may cause a concurrent increase in background noise, thus, worsening the limit of detection. The coverage depth should be optimized by balancing the sensitivity and specificity. In general, a coverage depth of greater than 10,000× will confidently detect the presence of 100× variant reads (1% heteroplasmy). However, the limit of detection also depends on the error rates .

3.3.2 Evaluation of Experimental and Analytical Errors

To assure that the quantification of heteroplasmy by NGS is reliable, a set of cloned synthetic 150 bp control DNAs (Fig. 4a) is spiked into each indexed sample as external quality controls (“ExQC”) . To mimic a range of heteroplasmy, different proportions of control DNAs with different nucleotide changes at specific positions are mixed to form a series of synthetic heteroplasmies (Fig. 4b). To evaluate the reliability of quantitative measurements, the spiked-in control DNA was indexed together with each sample using the same barcode during library preparation. The control DNA sequences, except the nucleotide positions marked for heteroplasmy measurements, are used to calculate error rates. The instrument sequencing error is determined by the number of incorrect nucleotides of the control DNA sequences versus the total number of nucleotides mapped to the control DNA sequences [1]. The sample DNA has an analytical error rate of 0.326+/−0.335%, as compared to 0.151+/−0.394% for the control DNAs, indicating that the Illumina NGS platform has relatively low sequence error, but the library construction may introduce error significantly. Thus, the limit of detection, calculated as three standard deviations above the mean error (0.326 + 3 × 0.335), is 1.33% under the described experimental, instrumental, and analytical conditions [1]. The experimental procedures can introduce errors, which, in general, are higher for mtDNA samples than for the synthetic ExQC samples. The observed and the expected percentages of the variants at specific positions exhibit an excellent correlation [1].

Fig. 4
figure 4

Spiked-in quality control samples . (a) The synthetic controls with the variant positions indicated (b) the mixing in ratios indicated in the table to generate the desired percentage of artificial heteroplasmy

3.3.3 Reproducibility

In addition to the spike-in ExQC samples, to ensure the reproducibility of heteroplasmy quantification, a DNA sample with known percentage of heteroplasmy should always be used as an additional batch control. For example, a sample with a 1.1% heteroplasmic m.3243A>G variant is always included in every batch to make sure that the percentage heteroplasmy consistently reproduced within 2SD. As shown, in Table 2, the sample analyzed in 16 different batches over a period of 2 years yielded an average of these independent measurements of 1.14+/−0.09% with a tight range of 0.94–1.35, and errors within 2 SD, suggesting a reproducible quantification. Two independent measurements of four other different variants at different degrees of heteroplasmy showed similar reproducible results (Table 2).

Table 2 Reproducibility of heteroplasmy quantification

3.3.4 The New Gold Standard for Molecular Diagnosis of mtDNA Disorders

Zhang and coworkers evaluated the clinical performance of the NGS analysis of the mtDNA generated by long range PCR with a set of non-overlapping back-to-back primers for the first time. This “Zhang-Wong method ” enriches each nucleotide of the entire 16.6 kb mtDNA equally with an average coverage depth of 5,000–20,000×, providing uniform coverage and sufficient depth for the quantification of heteroplasmy. Analysis of a total of 12 samples that had been thoroughly analyzed by Sanger sequencing revealed that all variants identified by Sanger sequencing were also detected by the NGS approach. However, there were three samples with heteroplasmic variants missed by Sanger sequencing. One of them is a novel m.1630A>G (tRNA Val) variant at a level of 3.7% heteroplasmy, too low to be detected by Sanger sequencing, in the asymptomatic mother of a 2-year old affected child who harbored 33% heteroplasmy for the same variant. Studies of matrilineal family members and clinical correlation revealed that the m.1630A>G co-segregated with the disease and was likely to be a causative mutation. The detection of low level heteroplasmy of the m.1630A>G mutation in the mother indicates that this is an inherited case, thus, the recurrence risk is higher than a de novo case. In addition to the low heteroplasmy m.1630A>G, an m.16193insC variant present at 15% heteroplasmy and a 27% heteroplasmy of m.303insC/CC were identified by NGS, but detected as homoplasmy by Sanger sequencing. These results indicate that small insertions and deletions within a homopolymeric stretch, such as m.303_309insC, m.311_315insC and m.16193insC, may be difficult to detect by Sanger methods , but can be accurately identified by NGS. Thus, in the molecular diagnosis of mtDNA disorder, the NGS approach described here becomes the new gold standard to detect low heteroplasmy and small indels in homopolymer stretches, in addition to mtDNA large deletions as described in Sect. 3.2.

4 Clinical and Genetic Significance of Low Level Heteroplasmic Mutations

In general, Sanger sequencing does not reliably detect heteroplasmy less than 15% [20]. While specific primers or probes can be designed for the quantification PCR of target positions [21, 22], it is laborious to validate the method for every novel mtDNA variant. In addition, primers may contain nucleotide modifications and probes are specific for either the wild type or mutant allele [23,24,25], thus, difference in PCR amplification efficiency for these two alleles is expected, leading to inherent inaccuracies with the measurement of the degree of heteroplasmy. As described in Sect. 3 above, mtDNA analysis at 5–20,000× coverage, the experimental error rate of the NGS using Illumina HiSeq2000 platform was 0.326+/−0.335%, with a limit of detection of 1.33% [1]. The examples below demonstrate the importance of the ability of deep NGS to detect low levels of heteroplasmy accurately and reproducibly in precision risk assessment and genetic counselling.

4.1 Heteroplasmy May Be Low in Blood but High in Affected Tissues

The proband in Fig. 5a had a history of hearing loss and vision problems . Although screening for mtDNA common point mutations by allele-specific oligonucleotide (ASO) hybridization detected m.3243A>G in the tRNALeu(UUR) gene [26], the heteroplasmy was too low to be detected by Sanger sequencing. The LR-PCR/NGS approach detected a heteroplasmy of 7%. Her daughter carried the same mutation at 9.3% heteroplasmy in her blood sample, and showed symptoms of mitochondrial myopathy. Since there is a selection against the m.3243A>G mutation in the rapidly dividing blood cells [23, 27], the degree of heteroplasmy could be higher in her muscle causing the myopathy phenotype . Sanger sequencing would have missed the detection of this low heteroplasmic pathogenic variant in both the proband and the daughter. However, the LR-PCR NGS readily detected and accurately quantified the pathogenic variant and provided a quick, definitive diagnosis in one single step. This case illustrates the importance of employing the one-step NGS analysis for the diagnosis of mtDNA disorders. While the heteroplasmy levels are low in blood, they may be higher in affected tissues.

Fig. 5
figure 5

Examples of pedigrees with implication of heteroplasmy in genetic counseling. (a) Definitive diagnosis may be missed by tradition Sanger method due to low heteroplasmy. (b) Somatic variant in affected tissue may not be transmitted to offspring. (c) Low heteroplasmy in asymptomatic mother implies high risk of intergenerational transmission. Low level of heteroplasmy is different from none at all. (d) Mother does not carry the pathogenic variant at all in her blood nor does the asymptomatic child. It is sporadic in the affected child

4.2 Sporadic Somatic Mutation in Affected Tissue

The proband of family in Fig. 5b is a 65-year-old woman with peripheral neuropathy, muscle weakness, ptosis, abnormal muscle histological findings and abnormal EMG. A heteroplasmic m.7222A>G (p.Y440C in COI) was detected in this patient’s muscle specimen by Sanger sequencing, and subsequently verified and quantified by LR-PCR/NGS to be at 37% heteroplasmy in her muscle sample, but was not detected in her blood or in blood samples from her sister and daughter. These results suggest that the mutation most likely arose sporadically in the proband’s muscle tissue. Therefore, the risk of passing this likely somatic mutation to her children is low.

4.3 Low Heteroplasmy in Two Tissues May Suggest Low Heteroplasmy in the Germline

The family in Fig. 5c has been described in Sect. 3.3.4. The mother carried a low heteroplasmy m.1630 mutation in the blood and the muscle, the affected child inherited both mutations at much higher heteroplasmy in both tissues. However, the variant was not detected by Sanger sequencing in the blood sample of her asymptomatic older sibling. This case suggested that the mother could have germline mosaicism. The affected child inherited higher heteroplasmy, which was probably present in all tissues, while the asymptomatic child inherited much lower heteroplasmic mutations.

4.4 De Novo and Near Homoplasmic Transmission in One Generation from Low Heteroplasmic Mother

The example in Fig. 5d shows a 2-year-old boy that presented with developmental delay, seizures, hypotonia, hearing loss, failure to thrive, abnormal brain MRI, and lactic acidemia. NGS detected an m.8969G>A (p.S148N, ATP6) pathogenic variant at 99.5% and 96% heteroplasmy in the muscle and blood, respectively. Since this mutation was not detected in the blood specimens from the proband’s asymptomatic mother and an older sibling, it likely occurred de novo in the proband; however, germline mosaicism cannot be excluded. The determination of whether an mtDNA mutation is inherited or sporadic is clearly dependent on the sensitivity of the detection method used. An accurate detection of low-level heteroplasmy therefore plays an important role in the molecular diagnosis and counseling of mtDNA- related disorders. Transmission from apparently 0% to near homoplasmy in one generation has been reported [28].

These four examples illustrated the power of LR-PCR NGS-based analyses of mtDNA in detecting low-level heteroplasmy in various tissues and assessing whether a mutation is inherited, de novo, sporadic, somatic, or possibly germline mosaicism.

5 Notable Advances

5.1 An Innovative Comprehensive Solution for a Complex, Multi-step Disease Diagnosis

Traditionally, diagnosis of mtDNA disorders first involved the analysis of common mutations by various screening methods and Southern blot analysis for large deletions. If negative, the entire mtDNA was studied by PCR based Sanger sequencing of 24–36 overlapping amplicons. These procedures are tedious step-by-step approaches, and require additional steps for deletion detection and quantification of variant heteroplasmy, which is typically inaccurate and not reproducible. The application of LR-PCR using a pair of back-to-back primers (Fig. 1) for specific amplification of the circular double stranded mtDNA followed by deep NGS is an innovative approach that not only can simultaneously detect point mutations and large deletions, but can also reliably quantify variant heteroplasmy as low as 1%, in addition to the capability and sensitivity of detecting and distinguishing single and multiple mtDNA deletions. This LR-PCR based NGS analysis is a significant improvement over previous methodologies in terms of simplicity, speed, accuracy, and reliability, making it the new gold standard for mtDNA analysis.

5.2 NUMT, The Old Problem Has a New Effective Solution

Computational in silico searches for the presence of homologous mtDNA sequences in the nuclear genome (NUMT) have identified more than a thousand NUMT sites with greater than 80% homology with mtDNA sequence [7, 29]. By mapping capture-enriched mtDNA sequencing data from an mtDNA depleted rho zero cell line readily revealed the scope of NUMTs. It is estimated that 0.1% of sequence reads from the total genomic DNA mapped to the mtDNA reference sequence. The presence of NUMTs resulted in many false positive and numerous low heteroplasmic variant calls. Software has been developed to distinguish low-level heteroplasmy from sequencing error to allow the removal of NUMTs [30]. However, the false negative and false positive rates were still significantly above acceptable levels for the application to clinical diagnostic laboratories [29, 31]. Although reads generated from NUMT regions can be partially dissected by a stringent alignment algorithm, regions that are nearly identical to mtDNA are practically impossible to remove. Therefore, regardless of stringency, sequence enrichment using specific mtDNA probes or as non-specific by-products of exome capture will inevitably co-capture NUMTs, which will then confound mutational analyses and lead to inaccurate clinical diagnosis.

mtDNA enrichment using multiple amplicons not only fails to address NUMT problems but also introduces additional errors, including SNPs within primer binding sites, resulting in reduced amplification efficiency, masked SNP data, and possibly preferential amplification of NUMTs. The LR-PCR approach uses carefully designed primers near the origin of replication, which contains the least number of reported SNPs occurring at lowest frequencies. Moreover, the primer sites are usually Sanger sequenced separately for each sample to ensure that no SNPs are sitting at the primer sites. If SNPs are found at the primer sites, alternative LR-PCR primers without SNPs can be used to repeat the LR-PCR/NGS. The major advantage of the “Zhang-Wong method” enrichment approach over the conventional Sanger sequencing, capture based or amplicon based method is the unbiased enrichment of the entire mtDNA while avoiding the interference of NUMTs and SNPs [1, 7].

5.3 Distinguishing Single and Multiple mtDNA Deletions

LR-PCR amplification of the entire mtDNA as a whole allows accurate detection of mtDNA large deletions with unequivocally mapped breakpoints in one single step (Fig. 2). This approach eliminates the drawbacks of Southern blot technique in its low sensitivity detecting low heteroplasmic mtDNA deletions, and its inability to map the breakpoint. Since LR-PCR preferentially amplifies smaller circular mtDNA molecules, the ability to detect low levels of mtDNA multiple deletions is greatly enhanced, leading to an increased clinical diagnostic yield. mtDNA multiple deletions are usually observed to be associated with aging and oxidative damage. However, most importantly, genetic defects in nuclear genes responsible for mtDNA biosynthesis and integrity maintenance, such as POLG, TWINKLE, OPA1, RRM2B, TK2 [32,33,34,35,36] often cause secondary mtDNA multiple deletions in muscle samples. Further studies of 15 muscle samples with mtDNA multiple deletions revealed that all patients were adults. Five of them had mutations identified in nuclear genes; POLG, RRM2B and OPA1 [7]. Nine of the 15 (60%) were older than 50 years [7]. These observations suggest that accumulation of mtDNA multiple deletions may be secondary to nuclear gene defects and aging. Undoubtedly, the LR-PCR based NGS approach described here facilitates the diagnosis of mitochondrial disorders caused by nuclear genes by detecting mtDNA multiple deletions followed by the subsequent identification of the responsible nuclear gene defects.

5.4 Accurate and Reliable Quantification of Heteroplasmy Facilitates Genetic Counseling

The degree of heteroplasmy of a pathogenic variant and its tissue distribution contribute to disease phenotype and severity. The quantification of heteroplasmy was first developed based on restriction fragment length polymorphism (RFLP) generated by the discriminating target variant. However, agarose gel analysis is not sensitive. This was improved by labeling with radioactive P32 nucleotide at the last cycle of PCR to enhance the sensitivity [3, 4]. Recently, real time allele refractory mutation system (ARMS) qPCR and pyrosequencing [21, 22] were validated for a limited number of common pathogenic variants. These methods share several drawbacks. They can only be applied to known and validated target variants and have low sensitivity, high variability, and limited reproducibility. This is because these methods are all short PCR based. PCR efficiencies are greatly influenced by the discriminating nucleotide in the mutant and wild type, which also affects their ratio, thus, heteroplasmy. Subtle differences in PCR conditions will affect the amplification efficiency and ratio dramatically, thus, affecting reproducibility. Another pitfall is that due to the high frequencies of mtDNA SNPs along the entire mtDNA, it is difficult to select primers without any SNPs.

The LR-PCR NGS that unbiasedly enriches all nucleotides in mtDNA will provide accurate quantification. In addition, with 5–20,000× deep coverage, it is sensitive enough to detect heteroplasmy as low as 1% at every nucleotide position of the mtDNA without the interference of NUMTs and SNPs. Accurate and reliable quantification is critical because it gives a “yes” or “no” answer to a carrier mother that has a great impact on the risk of transmitting pathogenic variant to her offspring. For example, if a mother of an affected child with m.3243A>G, and NGS analysis showed a 0.9% heteroplasmy for m.3243A>G. Normally, a 0.9% heteroplasmy would be filtered out as negative since it is below 1% cut-off. However, in this case, since the affected is affected with m.3243A>G, it is important to check if the mother is really zero or low heteroplasmy. To do so, first, the raw NGS data of all samples from the same batch were examined for the heteroplasmy at m.3243 position, and all showed less than 0.01%, yet the mother had 0.9%, which is significant, suggesting that it is not error by chance. Second, repeat the analysis with a second DNA extraction to show the reproducible results, confirming no sample mixed up and no batch effect. Thus, the mother carried a low heteroplasmy of m.3243. The risk of transmitting this variant to next fetus is significant. On the other hand, if a mother of an affected child with a 20% novel pathogenic variant had 0.1% heteroplasmy, and all samples analyzed with the sample batch showed heteroplasmy ranging from 0.08% to 0.11% at this rare variant position, then, the mother did not carry the rare pathogenic variant, and it likely occurred de novo. Without reliable quantification of the heteroplasmy , it will not be possible to provide accurate genetic counseling to patients and family.

6 Conclusion

LR-PCR using one pair of back-to-back primers to unbiasedly amplify the mtDNA as a whole followed by deep NGS analysis not only eliminates potential NUMTs and SNP interference but also renders a uniform coverage of the entire circular 16.6 kb mtDNA, which is essential for reliable and accurate detection of single and multiple large mtDNA deletions. Deep coverage and stringent quality control allow the detection of low-level heteroplasmy with high confidence. With the higher sensitivity, specificity and accuracy on the detection of a much broader spectrum of mutation types, ever before achieved, the LR-PCR/NGS of “Zhang-Wong method ” becomes the new gold standard for comprehensive analysis of the mitochondrial genome.