Introduction

Next generation sequencing (NGS) has taken the fast lane from the bench to the bedside. What was once a “last resort” technique for exceptional cases in which a molecular diagnosis could not be achieved by standards means has become a routine clinical test allowing physicians to translate genomic information into clinically actionable decisions (Goodwin et al. 2016). NGS-based testing has thus largely replaced Sanger sequencing, long considered to be the gold standard of sequencing tests. This shift has occurred despite lack of bidirectional performance comparisons of Sanger sequencing and NGS. Data on such comparisons can directly impact the decision-making process of clinicians aiming to determine the genetic basis of various conditions.

NGS has been applied for a variety of purposes, including exome/genome sequencing for diagnosis of genetic diseases (Gilissen et al. 2014; Lee et al. 2014), cancer genomics (Shen et al. 2015), discovery of transcription factor binding sites (Mundade et al. 2014), transcriptome profiling (Li et al. 2014), DNA methylation sequencing (Brunner et al. 2009) and noncoding RNA expression profiling (Ching et al. 2015). In the clinical context, the wide scope of testing enabled by NGS has facilitated the deciphering of the genetic bases of various human diseases for which the molecular mechanism was unknown (Ng et al. 2010), has expanded the number of genes known to be associated with defined phenotypes (De Rubeis and Buxbaum 2015) and has led to detection of novel mutations in genes with well described functions (Lerat et al. 2016; Maksemous et al. 2016).

Molecular investigation of genetically heterogeneous phenotypes often mandates the sequential or simultaneous sequencing of multiple genes which can be achieved through Sanger sequencing or NGS. Sanger sequencing entails the design of specific primers to guarantee successful amplification of regions of interest and may overcome difficulties such as sequencing within GC-rich regions and separating sequencing of genes from their pseudogenes (Wenzel et al. 2009; Sumner et al. 2014). However, Sanger sequencing of multiple genes for diagnostic purposes is costly and time consuming, whereas it is eminently feasible using NGS (Rabbani et al. 2016). The advantages of NGS platforms over Sanger sequencing with respect to genotyping capacity and cost have been demonstrated repeatedly (Goodwin et al. 2016). Indeed, NGS applications are readily available as gene panels or through exome sequencing (Iglesias et al. 2014) for analysis of various phenotypes, e.g. non-syndromic deafness, neuro-muscular diseases and Noonan-spectrum syndromes (Ankala et al. 2015; Neveling et al. 2013; Wang et al. 2016; Yang et al. 2013). However, NGS also has shortcomings. NGS solutions have been previously associated with higher error rates as compared to Sanger sequencing (Liu et al. 2012), which led to the common practice of confirming NGS results by Sanger sequencing. Other common problems include unequal coverage throughout the targeted region, sequencing of pseudogenes, and difficulty in sequencing GC-rich regions (Chen et al. 2013). It is therefore surprising that unbiased, two-way systematic comparisons of performance between various NGS platforms and Sanger sequencing is still rare, despite the important implications of such comparisons for genetic diagnostics.

Previous studies examined the ability of NGS to detect variants identified by Sanger sequencing. For a panel comprised of a limited number of genes for hereditary colon cancer, arrhythmias, cardiomyopathies, and other cardiovascular‐related genes, it was shown that all 919 variants previously observed by Sanger sequencing were also identified by NGS (Baudhuin et al. 2015). Another study, however, demonstrated that up to 18% of 137 pathogenic variants identified by Sanger sequencing, in the context of neuromuscular diseases, were not identified by exome sequencing due to low coverage (Ankala et al. 2015). In another study, variants identified by clinical Sanger sequencing was compared to results obtained from exome sequencing for the same patients (Hamilton et al. 2016). Exome sequencing identified 97.3% of the coding variants and 81.8% of the non-coding variants detected by Sanger sequencing. One of the discordant variants was proven to be Sanger-sequencing false-positive call, while the rest was exome-sequencing false-negative calls, despite sufficient coverage (30–60×). Nine genes were excluded from the comparison due to consistently low coverage in exome sequencing. These results demonstrated molecular circumstances in which Sanger sequencing performed better than NGS (Hamilton et al. 2016). Neither study examined performance of NGS independently of the Sanger-sequencing results, e.g. if there were additional variants identified by NGS that were missed by Sanger sequencing. In fact, all these studies were one-way comparisons that assessed sensitivity of NGS for variants identified by Sanger sequencing. None provided data regarding NGS false positives or Sanger-sequencing false negatives.

We present a comparative, unbiased evaluation of sequencing results of 258 genes comprising ~ 1.3% of the human exome. DNA from a single individual was extracted from peripheral blood leukocytes and buccal swabs. The platforms assessed were Sanger sequencing of the coding exons ± 8 bp of the 258 genes, the Agilent SureSelectQXT exome capture kit (SureSelect) and the Illumina Nextera Rapid Capture Expanded Exome (Nextera). We aimed to shed light on the performance and accuracy of each of the sequencing methods, from the two DNA sources, and to assess the implications of replacing Sanger sequencing with exome sequencing, using standard capture kits. Although only one individual was sequenced, to ensure a single reference sequence, we performed 11 different sequencing experiments for each of 258 genes, using different sequencing strategies and different DNA sources. This enabled in-depth comparisons between Sanger sequencing and NGS results.

Materials and methods

Samples and sequencing

Samples from blood and buccal cells were taken the same day from one healthy individual following informed consent. All DNA handling and sequencing procedures were completed at Gene by Gene’s CAP accredited laboratory in Houston, Texas. DNA was extracted from blood (peripheral leukocytes) and buccal swabs following standard protocols, and quantitated with a SpectraMax190 (Molecular Devices, Sunnyvale, CA). Sanger sequencing was completed for 258 genes on DNA extracted from buccal cells by amplifying the coding exons and their flanking regions (approximately 20 bp from each side) using conventional PCR techniques. PCR products were purified using magnetic-particle technology (Seradyn, Inc.). After purification, all fragments were sequenced with forward and reverse primers. Sequencing was performed on a 3730xl DNA Analyzer (Applied Biosystems), and the resulting sequences were analyzed with the Sequencher software (Gene Codes Corporation). Variants were analyzed relative to the reference sequences deposited in the National Center for Biotechnology Information.

Exome sequencing was performed using the Agilent enrichment capture kit (SureSelectQXT), and the Nextera enrichment rapid capture kit (FC-140-1006) on DNA extracted from both peripheral leukocytes and buccal cells. exome-sequencing data were generated using the Illumina HiSeq2500 with manufactures’ protocols for all runs. A coverage of 70 × was considered the minimal threshold for clinical grade sequencing.

Sequencing quality reassurance

For Sanger sequencing, each nucleotide was covered by at least two different sequences, preferably by one forward and one reverse sequences. In cases where this was not possible due to poly-regions, two sequences from the same direction were generated independently. The interpretation of the data was done by two operators independently. We used a Phred score of 30 as our confidence threshold.

Exome-sequencing data analysis and quality reassurance

Each exome-sequencing sample was analyzed using the Genoox platform (http://www.genoox.com). The NGS pipeline was based on BWA aligner (Li and Durbin 2009), and the two variant callers: GATK HaplotypeCaller (Mckenna et al. 2010) and FreeBayes (Garrison and Marth 2012). Coverage reports were obtained from Genoox platform. The minimal Mapping Quality (MQ) was MQ > 0 for each read. The main parameters used for filtering were identical for both variant callers and comprised a quality score > 100, a depth (DP) > 4 and the quality/depth ratio (QD) of 7.

Variants comparison

We included 258 genes in our analysis (Table S1). The list of genes was assembled from the genes previously Sanger-sequenced at Gene by Gene’s CAP accredited laboratory. The only genes excluded were those expected to perform less efficiently in NGS, due to issues involving pseudo genes or GC-rich regions (i.e. GBA (OMIM 606463), MAPT (OMIM 157140), PMS2 (OMIM 600259)). The analyzed regions include coding regions (exons) ± 8 bp flanking introns attempting to cover the splice sites. Following quality reassurance, we established the list of all the variants identified by any of the different platforms and experiments, and checked systematically how many experiments detected each variant. Variants that were detected by all experiments were classified as true variants. Variants that were not concordant between all experiments were considered potential false positives or false negatives. The Sanger sequencing suspected false positive or negative variants were re-sequenced with an alternative primer pair. The NGS suspected false positive variants were checked for their quality parameters (DP < 20; QD < 7; QUALITY < 100).

Results

We compared the results of Sanger sequencing and exome sequencing for 258 genes. We analyzed two exome-sequencing runs. In the first run, exome sequencing was performed on DNA from both sources (leukocyte and buccal), once using the SureSelect kit and once using the Nextera kit. Each exome sequencing was performed in duplicate, so this run included eight exome-sequencing tests, enabling within-run comparisons. The second run included two exome-sequencing tests performed on leukocyte-extracted DNA, one using the SureSelect kit and one using the Nextera kit. This enabled between-run comparisons.

Targeted genomic region

The targeted genomic region, as noted above, included 258 genes (Table S1), with a total of 4629 exons. The genomic region encompassed by these exons is 729,724 bps and 803,788 bps including ± 8 bp of the intron/exon boundaries. Table 1 summarizes the targeted genomic region, the expected overlap according to the SureSelect and Nextera BED files, and the actual adequately enriched regions (> 20×) obtained from the experiments. More than 98%/91% of the region was covered with > 20 × coverage for the SureSelect/Nextera experiments.

Table 1 NGS results: expected versus actual enriched genomic regions

Overall NGS performance

We performed ten NGS experiments. Table 2 highlights the pivotal metrics including the mean coverage obtained for the exome sequencing as well as the targeted region, and the percentage of bps obtained at a coverage of 0x, ≤ 10x, ≤ 20 × and > 20×.

Table 2 NGS coverage: comparison of capture-kits, DNA

Comparison of duplicate experiments within run 1, and comparison of run 1 and run 2 demonstrated no statistically significant differences for the major displayed metrics. For the entire exome, mean coverage was 166 × with the SureSelect kit and 90 × with the Nextera kit. In the targeted region (258 genes, including the exon ± 8 bp intron/exon boundaries), mean coverage of the SureSelect kit was about double that of the Nextera kit (Table 2). Notably, the total number of aligned reads was similar in both capture kits (> 99%), but the SureSelect kit mean coverage was about double that of the Nextera kit mean coverage since a large proportion of Nextera reads did not align to the target region. For both the SureSelect and the Nextera capture kits, less than 1% of the nucleotides had 0 × coverage in any of the experiments. The mean proportion of nucleotides with 0 × coverage across all experiments of the same kit were 0.1% and 0.62% for the SureSelect and Nextera, respectively.

Identified variants

We analyzed the exome-sequencing data using standard NGS analytic pipelines with predefined quality thresholds (Methods). A total of 449 variants were identified at least once, in a total of 258 genes (range of 0–13 variants per gene) (Table 3, Table S2, Table S3). The majority of those variants, 407/449 (90.6%), were detected by all platforms and experiments. The remaining 42/449 (9.4%) discordant variants were examined further to determine whether they represent true or false variants (Table 3, Table S3).

Table 3 Discordant variantsa

Discordant variants were classified into two groups: variants discordant between Sanger sequencing and NGS (Sanger–NGS discordance) and variants discordant within NGS experiments (within-NGS discordance). Sanger–NGS discordance was defined as discordance between Sanger sequencing and most (≥ 5) NGS experiments. Within-NGS discordance was defined as discordance between NGS experiments, where the majority of NGS results (> 5) were concordant with Sanger-sequencing results. Thirteen variants were Sanger–NGS discordant and 29 variants were within-NGS discordant (Table 3, Table S3). Among the 13 Sanger–NGS discordant variants, four variants (in the DLL3 (OMIM 602,768), CACNA1A (OMIM 601,011) and PIK3R2 (OMIM 603,157) genes) represent NGS false negatives due to coverage failure (< 10x), since they were identified by Sanger sequencing and by all NGS experiments with sufficient coverage. In the other nine Sanger–NGS discordant variants, there was sufficient coverage in all NGS experiments. For these variants, we repeated Sanger sequencing using an alternative primer pair. Discrepancies were resolved for seven of the nine variants, showing that the original Sanger sequencing included four false-negative and three false-positive calls. All three false positives were in the same PCR amplicon of the same gene (ABCC6 (OMIM 603,234)), which lies within a segmental duplication region. The two remaining variants (CFTR (OMIM 602,421) and NOTCH3 (OMIM 600,276) genes) were not detected by repeated Sanger sequencing despite several primer redesign attempts, and were thus considered as NGS false positive calls. Twenty-nine variants were within-NGS discordant. Of these, 14 variants were not detected by Sanger sequencing and detected by only one of all NGS experiments: eight in leukocyte-derived DNA and four in buccal-swab DNA with Nextera capture; one in leukocyte DNA and one in buccal-swab DNA with SureSelect capture. These variants barely passed the quality filters, and had coverage < 20 and/or Quality < 100 and/or QD < 7. Therefore, they are considered as NGS false positives due to low quality. Of the remaining 15 variants, 13 were detected by the Sanger sequencing, the SureSelect experiments and at least one Nextera experiment. The other two variants were detected by the Sanger sequencing, the Nextera experiments and at least one SureSelect experiment. All these 15 variants are thus considered as NGS false negatives. In 12 of them, lack of detection is explained by low coverage (< 20×).

Cumulatively, we determined that of the 42 discordant variants, 23 were true variant calls, including four Sanger-sequencing false negatives and 19 NGS false negatives. The remaining 19 variants were false positives, including three Sanger-sequencing calls and 16 NGS experiments calls (Table 3, Table S3). False-positive NGS calls included 9/16 with poor coverage (< 20x), and 7/16 variants with > 20 × coverage, but borderline quality (QD < 7). Notably, 12 of the 16 NGS false positives (75%) have rs IDs.

Thus, together with the 407 variants detected by all experiments, a total of 430/449 (95.8%) variants were defined as “true variants”, and regarded as the truth set for further analysis.

Comparative sequencing performance

As shown in Fig. 1, overall, sensitivity was > 97% in all platforms and experiments. Sanger sequencing had a detection rate of 99%. In NGS experiments, higher coverage resulted in better detection rates. The mean sensitivity of SureSelect and Nextera experiments was 99.5% and 98.5%, respectively.

Fig. 1
figure 1

Sensitivity across all platforms and experiments. The percentage of 430 “true variants” detected in each experiment. bu: buccal; bl: blood; D: duplicate

Figure 2 presents the false-positive and false-negative calls for each of the platforms and experiments. In general, the number of false-negative calls in the NGS experiments decreased with higher coverage. None of the differences was statistically significant using Chi-square test (p > 0.05). The PPV (positive predictive value) and NPV (negative predictive value) for all experiments were both > 0.99.

Fig. 2
figure 2

Number of false calls. The number of false-positive and false-negative variants is shown for each experiment. False-negative calls (solid bars). False-positive calls (striped bars). Number of each type of call is indicated within each bar. Bu: buccal; bl: blood; D: duplicate. None of the differences are statistically significant

Zygosity discordance between platforms and experiments

We observed zygosity discordance for a total of ten variants (Table 4). In 5/10 cases, the discordance was between Sanger sequencing and all NGS experiments (Sanger–NGS discordance). In four of those five cases, Sanger sequencing suggested a homozygous state while all NGS experiments suggested heterozygosity. In the fifth case, the opposite was noted. The NGS coverage for these variants ranged from 27 × to 307× (average 183×). For the other 5/10 variants there was within-NGS zygosity discordance: Sanger sequencing and SureSelect experiments were concordant but there was discordance with one of the Nextera experiments. Coverage was < 20 × in four of the cases, and 41 × in the fifth case (Table 4).

Table 4 Zygosity discordance between Sanger and NGS

We repeated Sanger sequencing using an alternative primer pair for all 10 variants. The zygosity status changed for the first 5 variants, which indicates Sanger sequencing error due to primer design in the first sequencing. The zygosity status of the other five variants did not change, which indicates a single NGS experiment error.

Discussion

Accuracy of NGS platforms and the ability to obtain full exome/genome-sequencing information are critical elements of clinical genetic testing and of precision medicine initiatives (Ashley 2016). We compared the performance of Sanger sequencing and NGS for 258 genes that are commonly sequenced in a commercial laboratory. Sequencing these genes is generally requested as part of routine clinical testing for well-defined OMIM (https://omim.org/) phenotypes or during the investigation of less defined, orphan phenotypes with unknown molecular bases. OMIM phenotypes are assigned to 257 of 258 genes included in our study and 30 are among the 59 genes listed in the American College of Medical Genetics and Genomics recommendations for reporting of incidental findings in clinical exome and genome-sequencing (Kalia et al. 2017). Accordingly, the genes we examined can be regarded as representative of common clinical scenarios mandating single or multiple gene sequencing. These scenarios often present the dilemma of choosing between Sanger sequencing of specific gene(s), NGS of a gene panel including the requested gene(s), or full exome sequencing that might serve both immediate and future needs of the patient. Compared to exome sequencing, sequencing specific gene(s) by either Sanger or NGS, involves higher sequencing costs per gene, and incurring additional costs of further genetic testing if a molecular diagnosis is not confirmed in the first round of testing. However, sequencing of single genes or gene panels are more likely to provide complete coverage of the targeted genes.

Although we only sequenced one individual in this study, the study we performed is unique in providing bidirectional comparison of Sanger sequencing and NGS. Using different sequencing strategies, different NGS solutions and different DNA sources, each gene was sequenced a total of 11 times. This way we could determine with confidence whether each variant observed represents a true or false call, examine false negative and false positive rates of the methods, and evaluate characteristics of both true and false calls.

We first examined the effect of the DNA source and capture kit on NGS results, as well as consistency of NGS results between duplicates and between different runs. NGS experiments were performed using the exact guidelines of the respective manufacturers. Within each of the kits, it is evident that no statistically significant differences were noted between duplicates in the same run or different sequencing runs (Table 2). Differences in performance between capture kits have been previously reported (Asan et al. 2011; Chilamakuri et al. 2014; Clark et al. 2011). Our results demonstrate that the mean coverage obtained by the SureSelect kit was higher, reaching 182×/183 × for DNA extracted from leukocytes or buccal cells, respectively. There were no significant differences in the performance of NGS on DNA extracted from these two different sources.

We then determined the rates and types of false calls in both Sanger sequencing and NGS. The detection rate of true variants was 99.1% (426/430) for Sanger sequencing, and ranged between 97.9 and 100% in the NGS experiments (Fig. 1). Although the sensitivity, false-positive and false-negative rates calculated for all experiments were not statistically different, probably due to the small number of discordant variants, the differences observed offer important insights into the strengths and weaknesses of each sequencing method.

For Sanger sequencing, the false-positive rate was 3.7E-6, and the false-negative rate was 0.009. Sanger false positives all occurred in a single amplicon of one gene (ABCC6 (OMIM 603234)), whereas false negatives occurred in three different genes. All false Sanger calls were resolved using alternative primer pairs. The false-negative calls could be idiosyncratic to polymorphisms in the tested individual, while the false positive calls are most likely explained by insufficiently specific primers in an amplicon located in a segmental duplicated region.

For NGS, the mean false positive rate was 2.5E-6 in the SureSelect experiments, and 5.2E-6 in the Nextera experiments. The mean false negative rates were 0.005 for SureSelect experiments, and 0.01 for Nextera experiments. Most false positive NGS calls were singular events, i.e. they occurred in only one of 10 NGS experiments. These variants had barely passed the quality filter, and failed one or more of the basic quality parameters (DP > 20; QUALITY > 100; QD > 7). Changing the quality filter thresholds might obviously decrease the number of false positives, but this would come at the cost of an increase in false negatives. Interestingly 15/19 (78.9%) of false-positive calls, including all three Sanger-sequencing false-positives, have rs IDs. This may indicate that current databases contain false positive calls presented as true variants. Conversely, all 430 true variants were detectable by NGS, as long as coverage was adequate and quality parameters were fulfilled, consistent with previous reports that there is no need for Sanger-sequencing confirmation of such high-coverage/high-quality variants (Goldfeder et al. 2016).

NGS false negatives were all associated with low (< 20×) coverage. In some cases, low coverage was a singular event (e.g. COL5A (OMIM 120215) or COL5A2 (OMIM 120190)), but in other cases low coverage was systematic to a specific kit (e.g. PIK3R2 (OMIM 603157) in SureSelect or FRMD7 (OMIM 300628) in Nextera), or to both capture methods (e.g. DLL3 (OMIM 602768)). Systematic false negatives indicate problematic genomic regions in which variant detection is hampered. This is a concern, since it was previously shown that large areas of medically actionable genes fall within low confidence regions (Baudhuin et al. 2015), which might explain consistently low coverage for some variants. In this study, using exome sequencing to target a specific set of genes encompassing ~ 1.3% of the exome, resulted in adequate coverage (> 20×) of > 98% and > 91%, of exons ± 8 bp with the SureSelect and the Nextera kits, respectively (Table 1). This suggests that ~ 16,075 and ~ 72,340 bp (of a total of 803,788 bp in 258 genes), would not be adequately covered. Clearly, disease causing variants could be located in such regions with insufficient coverage. Taken together, our data suggest that exome sequencing could miss ~ 2–3% of the coding variants and up to 7% of the non-coding variants. These results are in agreement with Hamilton et al. (Hamilton et al. 2016) who demonstrated a 92.3% concordance between exome sequencing and Sanger sequencing in the ± 20 bp intronic/exon boundaries.

We also found zygosity status errors for both Sanger sequencing and NGS. As presented in Table 4, Sanger sequencing determined a false zygosity status of five variants (four false homozygotes and one false heterozygote) that was correctly assessed in repeated Sanger sequencing using an alternative primer pair. In five additional variants, there was a zygosity status error in a single NGS experiment, which in 4/5 of cases was associated with low coverage (< 20×).

In the clinical setting, false negatives are more concerning than false positives. Whereas false positives can be re-evaluated, detecting false negatives requires complete re-testing, which is unlikely to be routinely performed. The number of false negatives indicated are for ~ 1.3% of the exome, so over an entire exome the absolute number of false negatives will be correspondingly higher. We also note that these results overestimate the performance of NGS over the entire exome, since we purposely did not include genes in which the NGS is expected a priori to be less effective, e.g. genes that are GC rich or known to have pseudogenes.

In summary, our results confirm the notion that neither Sanger sequencing nor NGS can be regarded as a “gold-standard” method, as both had false-positive and false-negative calls. We have demonstrated that the performances of both NGS and Sanger sequencing are satisfactory, and that no statistically significant differences were noted between the two sequencing methods with respect to the ability to sequence the targeted regions, and the observed, high (> 98%) variant detection rates. Within NGS, it is evident that while clear differences were noted in the obtained sequencing coverage for the different capture kits, the detection rates of the true calls were not significantly different, at least when reaching the preset threshold of 70 × coverage. Bearing in mind the potential caveats that might be associated with pseudogenes or GC-rich regions, our data cautiously suggests that off-the-shelf exome-sequencing solutions might serve as a viable and practical alternative to gene(s) or gene-panel sequencing, albeit with a false negative rate of at least 1–3%. Nevertheless, physicians must be aware of the limitations of both Sanger sequencing and NGS. As demonstrated herein, Sanger-sequencing primer binding-site polymorphisms and chance or systematic NGS-coverage failure are integral hurdles of these methods. We show that Sanger sequencing and different NGS solutions are synergistic, as any combination of the Sanger sequencing, SureSelect and Nextera experiments yielded an overall greater detection rate. Accordingly, high index of suspicious for a given clinical diagnosis must overcome negative molecular results of either Sanger sequencing or NGS and repeated investigation with an alternative method should be considered.