TruSPAdes: barcode assembly of TruSeq synthetic long reads

  • Brief Communication
  • Published:

From Nature Methods

The recently introduced TruSeq synthetic long read (TSLR) technology generates long and accurate virtual reads from an assembly of barcoded pools of short reads. The TSLR method provides an attractive alternative to existing sequencing platforms that generate long but inaccurate reads. We describe the truSPAdes algorithm ( for TSLR assembly and show that it results in a dramatic improvement in the quality of metagenomics assemblies.

Figure 1: The TSLR technology.
Figure 2: Contig length.

Accession codes

Primary accessions

European Nucleotide Archive


We are indebted to V. Montel, J. Stuzka and O. Schulz-Trieglaff at Illumina for many helpful discussions, sample preparation and TSLR data. We thank J. Banfiled and I. Sharon for providing their metagenomics TSLR data. This study was supported by the Russian Science Foundation (grant 14-50-00069 to A.B. and P.A.P.).

Author information

Authors and Affiliations



A.B. developed and implemented the truSPAdes algorithm and performed benchmarking. A.B. and P.A.P. conceived the study, designed the computational experiments and wrote the manuscript.

Corresponding author

Correspondence to Anton Bankevich.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 k-mer coverage histograms.

Histograms of k-mer coverage (k = 55) for the E. coli standard isolate dataset from Bankevich et al.16 (a), the E. Coli MDA-amplified single cell dataset from Bankevich et al.16 (b), one of the barcodes of TSLR data (c) and a single 10 Kb long fragment of a barcode (d). Conventional assemblers select a coverage threshold to separate correct from erroneous k-mers. The histogram for data from the standard isolate features a smaller peak on the left (formed by largely erroneous k-mers with low coverage) and a larger peak on the right (formed by largely correct k-mers with high coverage). Thus, one can choose a proper threshold that separates correct from false k-mers38. However, for both MDA and TSLR, there is no threshold separating correct and false k-mers.

Supplementary Figure 2 Barcode span.

Construction of the barcode span: red regions have rather uniform read coverage and length close to 10 Kb. Black reads do not belong to the selected barcode spans represent read mapping artifacts and are ignored.

Supplementary Figure 3 Typical misassemblies.

Two common types of misassemblies: false (a,b,c) and chimeric (d,e,f) connections. (a) Two unrelated instances of the blue repeat are located in red (left) and yellow (right) genome fragments. These instances are flanked by short dotted segments (b). These short dotted segments correspond to short dotted edges (tips) in the de Bruijn graph. (c) Tip trimming results in a single (misassembled) edge in the de Bruijn graph representing a false connection. (d) A region of the genome formed by consecutive yellow and green segments (e) Since the yellow fragment has been erroneously amplified from the opposite strand, the reverse complementary copy is added to the end of this region resulting in a chimeric fragment (f). In the de Bruijn graph, the corresponding yellow solid edge has two outgoing edges: one for each connection between the yellow and green parts of the genome fragment. One of these connections represents an erroneous chimeric connection (transition from solid yellow to dashed green). We note that our explanation for the experimental cause of the chimeric connection is just a hypothesis that accurately reflects the computational artifacts we observe.

Supplementary Figure 4 Iterative assembly.

A fragment of a genome along with four reads (1st panel) and de Bruijn graphs of these reads constructed for k = 3 (2nd panel), k = 4 (3rd panel), and k = 5 (4th panel). The parameter k = 4 represents the “sweet spot” in the iterative assembly since the de Bruijn graph for k = 3 is over-tangled while the de Bruijn graph for k = 5 is over-fragmented.

Supplementary Figure 5 TruSPAdes pseudocode.

Outline of truSPAdes pipeline. TruSPAdes specific modifications are highlighted in blue.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 (PDF 598 kb)

Bankevich, A., Pevzner, P. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat Methods 13, 248–250 (2016).

