INTRODUCTION

Accurate equimolar pooling is important for the equal distribution of reads among samples in a single batch [1]. The unequal combination of libraries leads to the biased representation of certain libraries over others. Underrepresented libraries will need to be resequenced, costing time and money. Overrepresentation of libraries can result from the generation of more sequence data than required, wasting sequence capacity and decreasing the number of samples per batch. Considering a fixed price per sequencing run, it is economically sound to pool more samples in each target sequencing run with perfectly equal concentrations.

Post-pooling target enrichment is more cost-effective than pre-pooling enrichment, but it can cause unpredictable shifts in the ratios between samples in the same enrichment batch. Bacterial contamination of initial samples (e.g., extracted from saliva samples), differences in the library insert length distribution and many other factors cannot be simultaneously considered using common library quantification methods.

The current methods for DNA library quantification use a variety of techniques including UV absorption (e.g., Nanodrop, Thermo Fisher Scientific, USA) [2, 3], intercalating dyes (e.g., Qubit, Invitrogen, USA) [4, 5], capillary electrophoresis (Agilent Bioanalyzer 2100, Agilent Technologies Inc, USA) [6], 5'‑hydrolysis probes (e.g., Taqman probes) coupled with quantitative PCR (e.g., qPCR assays by Roche) [7, 8] or droplet digital emulsion PCR (ddPCR, Bio-Rad Inc, USA) [9]. These common methods have several limitations and may provide inaccurate results [10]. For example, UV spectrophotometers detect not only DNA but also UV-absorbing materials such as RNA, protein and phenol and are not sensitive enough to detect small amounts of DNA [11]. Fluorometric methods that only detect double-stranded DNA, such as Qubit, potentially overinflate the actual concentration of a library due to the binding of the dyes with partially ligated double-stranded libraries and adapter dimers. PicoGreen also binds with dsDNA, but this method is not specific for human DNA; any animal, bacterial or fungal DNA co-purified with the human DNA of interest will contribute to the final reading and could give a falsely high DNA quantification. Several studies indicate that qPCR is the most effective method for library quantification [1216].

Because the economic outcome of post-pooling capture target sequencing experiments depends on the library quantification accuracy, it is crucial to choose the most accurate, reliable and reproducible method. In this study, we compared several library quantification methods by their accuracy and cost to finally select the best method for library quantification prior to pooling before target capture and Illumina sequencing.

MATERIALS AND METHODS

DNA extraction was performed from both blood and saliva samples of patients using the QIAmp DNA mini kit (Qiagen, Germany) according to the manufacturer’s instructions. All samples were obtained with informed consent. DNA libraries were prepared using the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, MA, USA). The study design was a comparison of several techniques used for the quantification of libraries prior to pooling and target sequencing, including LabChip (PerkinElmer Inc., MA, USA), Qubit 3.0 (Thermo Fisher Scientific, MA, USA), several qPCR approaches, and Illumina MiSeq (with and without insert size correction according to our study [17].

Qubit 3.0

Quantification using Qubit 3.0 was performed according to the manufacturer’s recommended protocol.

Labchip

Quantification using Labchip was performed according to the manufacturer’s recommended protocol. We estimated library concentrations with fragment sizes ranging from 200 to 1000 bp. This allowed us to exclude fragments that were too short and too long. Fragments that are too short drop out in enrichment, while fragments that are too long do not participate in sequencing due to the peculiarities of cluster generation during Illumina sequencing.

qPCR Quantification

The library quantification was performed using the StepOnePlus real-time PCR System (Thermo Fisher Scientific, MA, USA) with SYBRGreen I. The cycling conditions were 95°C for 5 min followed by 45 cycles of 95°C for 20 s, 62°C for 20 s and 72°C for 55 s. We used the following amplification primers:

(1) P5/P7—This primer set was used to detect Illumina-compatible libraries irrespective of their insert size and sequence. This is the most common principle for quantifying sequencing libraries, such as in the QIAseq™ Library Quant Assay Kit, NEBNext Library Quant Kit for Illumina, KAPA Library Quantification Kit Illumina platforms, PerfeCTa NGS Library Quantification Kit for Illumina and other commercially available kits.

P5 AATGATACGGCGACCACCGA

P7 CAAGCAGAAGACGGCATACGAGAT

(2) GHRf/GHRr—Both primers anneal to the human GHR gene; thus, we detected the amount of human DNA irrespective of the presence of Illumina sequencing adapters.

GHRf CCCCTCTAAGGAGTGTAGCA

GHRr CTTTTGGTGCCTGGTAAGTT

(3) P5/GHRf—The P5 primer anneals to the Illumina adapter, and GHRf anneals to the human GHR gene. This allowed us to detect Illumina-compatible library fragments containing GHR gene fragments.

P5 ATGATACGGCGACCACCGA

GHRf CCCCTCTAAGGAGTGTAGCA

Ultra-Low Coverage Illumina Sequencing

The libraries were sequenced using MiSeq [18] with 150 bp PE reads on average. Reads were considered if they mapped to the human genome. We then calculated the relative concentration of the samples in the pool.

Ultra-Low Coverage Illumina Sequencing with Insert Size Correction

Fragments with different insert lengths are enriched with different efficiencies [17, 19]. Therefore, we corrected the number of reads obtained for each sample by MiSeq sequencing using coefficients reflecting the enrichment efficiency of fragments with specific lengths.

Post-Capture Pooling and Target Sequencing

After the quantification libraries were pooled, enrichment was performed with SureSelectXT2 Focused Exome (Agilent Technologies, CA, USA). Exome sequencing was performed using a HiSeq 2500 (Illumina, CA, USA). Reads were filtered and mapped to the human genome. The final distribution of reads was considered standard, as the purpose of this work was to determine the most accurate prediction of the data output from exome sequencing.

Statistical Analysis

We used log-transformation to reduce the skewness. We applied the Shapiro–Wilk test to ensure the data had a normal distribution after outlier removal. We used the Student’s t-test to check for bias. To estimate the accuracy, we compared the quantification results from the studied methods with the HiSeq results. The associations between the relative HiSeq concentration and the quantification methods were evaluated by Pearson correlation and linear regression.

RESULTS

In this study, we compared several methods for library quantification, including Labchip, Qubit 3.0, qPCR with three primer sets, Illumina MiSeq and Illumina MiSeq with insert size correction. For each method, we analyzed the accuracy (Fig. 1), cost per sample and time (Table 1).

Fig. 1.
figure 1

Scatter plots representing the accuracies of the library quantification methods. p-values are given to test the hypothesis that the correlation is statistically significantly different from 0.

Table 1.   Comparison of cost and time for the studied library quantification methods

We used the library concentration determined by HiSeq as the reference library concentration. All methods were compared by their ability to predict this concentration. A correlation analysis revealed that for 4 quantification methods (GHR qPCR, Qubit, MiSeq and MiSeq with insert size correction) the p-value is less than 0.05, which can be interpreted as an association (Fig. 1).

Generally, Qubit and MiSeq were better than qPCR and LabChip at predicting the final concentration. Thus, these methods were chosen for further comparison.

In the additional investigation, the data from Qubit and MiSeq were analyzed using linear regression. The best correlation with HiSeq was revealed for MiSeq with insert size correction (R2 = 85.63%, P < 0.001). There was a strong correlation between HiSeq and MiSeq data without insert size correction (R2 = 80.48%, P < 0.001) and Qubit (R2 = 81.12%, P < 0.001).

By comparing the accuracy of the different quantification methods, we revealed that MiSeq with insert size correction was the most accurate method for library quantification prior to post-pooling capture exome sequencing.

DISCUSSION

The various instruments for library quantification vary in accuracy, reproducibility and sensitivity, as well as in labor intensity, speed and cost. A reliable and accurate quantification strategy will permit investigators to fully utilize sequencer capacity, reducing the costs of sequencing even further. Therefore, the basic chemistry of NGS requires that a narrow input range of library fragments be prepared for sequencing.

Many studies have previously compared different NGS library quantification methods and shown contradictory results [16, 2025]. Hussing with colleagues quantified dsDNA oligos and revealed that BioAnalyzer, TapeStation and Qubit instruments give concentrations closest to the expected [21]. Katsuoka with colleagues showed that MiSeq works as an effective quantification method, but authors have not compared MiSeq with other methods [22]. There is no comparative analysis of methods for library quantification prior to pooling before target capture and Illumina sequencing.

To examine the most accurate and suitable methods for library quantification prior target sequencing, we compared four quantification methods, including LabChip, Qubit, quantitative PCR (qPCR) with three primer sets and Illumina MiSeq. Quantification using MiSeq was performed using 2 methods, with and without insert size correction. We have applied 7 different approaches for estimating the quantity of reads and have compared these estimates with the HiSeq data. We revealed that MiSeq data correlated most strongly with those obtained by HiSeq. This was confirmed by the linear regression analysis. MiSeq and insert size correction combined led to improved correlations with HiSeq data.

In addition to the actual library quantification, low-depth MiSeq sequencing allows us to determine the library insert size distribution with high details; we have previously shown that this affects the library enrichment efficiency, and therefore the relative library representation in the resulting enriched pool [17]. The enrichment efficiency differences caused by the insert length distribution allowed us to further improve the prediction accuracy of the library concentration in the final pool.

When comparing the cost and time required for the different methods, MiSeq is costlier and more time consuming than the other quantification methods. However, more hands-on time and a higher price for more accurate quantification may be preferred compared to a higher risk of large variations in library coverage, especially in clinical and forensic genetic laboratories.

Using MiSeq to quantify NGS libraries decreases overall sequencing costs by ensuring an accurate quantification upfront, which minimizes the need to re-run or repeat sequencing of samples. Nevertheless, our work also reported comparable quality results from the Qubit assay, suggesting that this method can be used when one has a clean and homogenous library with no primer dimer problems.

CONCLUSIONS

In summary, this work offers a comparative analysis of library quantification methods and reveals that MiSeq sequencing is the most accurate, reliable and reproducible method for library quantification prior to post-pooling capture target sequencing.