Abstract
A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG (1996) Life with 6000 genes. Science 274(5287):546–567. https://doi.org/10.1126/science.274.5287.546
Tyler BM, Tripathy S, Zhang X, Dehal P, Jiang RH, Aerts A, Arredondo FD, Baxter L, Bensasson D, Beynon JL (2006) Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science 313. https://doi.org/10.1126/science.1128796
Bussey H, Kaback DB, Zhong W, Vo DT, Clark MW, Fortin N, Hall J, Ouellette BF, Keng T, Barton AB et al (1995) The nucleotide sequence of chromosome I from Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 92(9):3809–3813
Klosterman SJ, Subbarao KV, Kang S, Veronese P, Gold SE, Thomma BPHJ, Chen Z, Henrissat B, Lee Y-H, Park J, Garcia-Pedrajas MD, Barbara DJ, Anchieta A, de Jonge R, Santhanam P, Maruthachalam K, Atallah Z, Amyotte SG, Paz Z, Inderbitzin P, Hayes RJ, Heiman DI, Young S, Zeng Q, Engels R, Galagan J, Cuomo CA, Dobinson KF, Ma L-J (2011) Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathog 7(7):e1002137. https://doi.org/10.1371/journal.ppat.1002137
Cuomo CA, Güldener U, Xu J-R, Trail F, Turgeon BG, Di Pietro A, Walton JD, Ma L-J, Baker SE, Rep M, Adam G, Antoniw J, Baldwin T, Calvo S, Chang Y-L, DeCaprio D, Gale LR, Gnerre S, Goswami RS, Hammond-Kosack K, Harris LJ, Hilburn K, Kennell JC, Kroken S, Magnuson JK, Mannhaupt G, Mauceli E, Mewes H-W, Mitterbauer R, Muehlbauer G, Münsterkötter M, Nelson D, Donnell K, Ouellet T, Qi W, Quesneville H, Roncero MIG, Seong K-Y, Tetko IV, Urban M, Waalwijk C, Ward TJ, Yao J, Birren BW, Kistler HC (2007) The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization. Science 317(5843):1400
Ma L-J, van der Does HC, Borkovich KA, Coleman JJ, Daboussi M-J, Di Pietro A, Dufresne M, Freitag M, Grabherr M, Henrissat B, Houterman PM, Kang S, Shim W-B, Woloshuk C, Xie X, Xu J-R, Antoniw J, Baker SE, Bluhm BH, Breakspear A, Brown DW, Butchko RAE, Chapman S, Coulson R, Coutinho PM, Danchin EGJ, Diener A, Gale LR, Gardiner DM, Goff S, Hammond-Kosack KE, Hilburn K, Hua-Van A, Jonkers W, Kazan K, Kodira CD, Koehrsen M, Kumar L, Lee Y-H, Li L, Manners JM, Miranda-Saavedra D, Mukherjee M, Park G, Park J, Park S-Y, Proctor RH, Regev A, Ruiz-Roldan MC, Sain D, Sakthikumar S, Sykes S, Schwartz DC, Turgeon BG, Wapinski I, Yoder O, Young S, Zeng Q, Zhou S, Galagan J, Cuomo CA, Kistler HC, Rep M (2010) Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464(7287):367–373
Lamour KH, Mudge J, Gobena D, Hurtado-Gonzales OP, Schmutz J, Kuo A, Miller NA, Rice BJ, Raffaele S, Cano LM (2012) Genome sequencing and mapping reveal loss of heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen Phytophthora capsici. Mol Plant Microbe Interact 25
Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, Yeakley J, Bibikova M, Wickham Garcia E, McBride C, Steemers F, Garcia F, Kermani BG, Gunderson K, Oliphant A (2005) High-throughput SNP genotyping on universal bead arrays. Mutat Res 573(1–2):70–82. https://doi.org/10.1016/j.mrfmmm.2004.07.022
Pacific Biosciences (PacBio). http://www.pacb.com/. Accessed 29 Sept 2017
Oxford Nanopore Technologies. https://nanoporetech.com/. Accessed 29 Sept 2017
Bionano Genomics. https://bionanogenomics.com/. Accessed 29 Sept 2017
Dovetail Genomics. https://dovetailgenomics.com/. Accessed 29 Sept 2017
Phase Genomics. https://phasegenomics.com/. Accessed 29 Sept 2017
Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB (2017) Direct determination of diploid genome sequences. Genome Res 27(5):757–767. https://doi.org/10.1101/gr.214874.116
Derevnina L, Chin-Wo-Reyes S, Martin F, Wood K, Froenicke L, Spring O, Michelmore R (2015) Genome sequence and architecture of the tobacco downy mildew pathogen Peronospora tabacina. Mol Plant-Microbe Interact 28(11):1198–1215. https://doi.org/10.1094/MPMI-05-15-0112-R
Bradnam K, Korf I (2012) UNIX and Perl to the rescue!: a field guide for the life sciences (and other data-rich pursuits). Cambridge University Press, Cambridge
Software Carpentry Foundation; The Unix Shell. http://swcarpentry.github.io/shell-novice/. Accessed 13 Nov 2017
Köster J, Rahmann S (2012) Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28(19):2520–2522. https://doi.org/10.1093/bioinformatics/bts480
Jupyter. https://jupyter.org/. Accessed 7 Nov 2017
Kushwaha SK, Vetukuri RR, Grenville-Briggs LJ (2017) Draft genome sequence of the mycoparasitic oomycete Pythium periplocum strain CBS 532.74. Genome Announc 5(12):e00057-00017
Berger H, Yacoub A, Gerbore J, Grizard D, Rey P, Sessitsch A, Compant S (2016) Draft genome sequence of biocontrol agent Pythium oligandrum strain Po37, an oomycota. Genome Announc 4(2):e00215-00216
Kushwaha SK, Vetukuri RR, Grenville-Briggs LJ (2017) Draft genome sequence of the mycoparasitic oomycete pythium oligandrum strain CBS 530.74. Genome Announc 5(21). https://doi.org/10.1128/genomeA.00346-17
Kemen E, Gardiner A, Schultz-Larsen T, Kemen AC, Balmuth AL, Robert-Seilaniantz A, Bailey K, Holub E, Studholme DJ, MacLean D, Jones JDG (2011) Gene gain and loss during evolution of obligate parasitism in the white rust pathogen of Arabidopsis thaliana. PLoS Biol 9(7):e1001094. https://doi.org/10.1371/journal.pbio.1001094
Pais M, Win J, Yoshida K, Etherington GJ, Cano LM, Raffaele S, Banfield MJ, Jones A, Kamoun S, Saunders DGO (2013) From pathogen genomes to host plant processes: the power of plant parasitic oomycetes. Genome Biol 14(6):211. https://doi.org/10.1186/gb-2013-14-6-211
Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE, Cano LM, Grabherr M, Kodira CD, Raffaele S, Torto-Alalibo T, Bozkurt TO, Ah-Fong AMV, Alvarado L, Anderson VL, Armstrong MR, Avrova A, Baxter L, Beynon J, Boevink PC, Bollmann SR, Bos JIB, Bulone V, Cai G, Cakir C, Carrington JC, Chawner M, Conti L, Costanzo S, Ewan R, Fahlgren N, Fischbach MA, Fugelstad J, Gilroy EM, Gnerre S, Green PJ, Grenville-Briggs LJ, Griffith J, Grunwald NJ, Horn K, Horner NR, Hu C-H, Huitema E, Jeong D-H, Jones AME, Jones JDG, Jones RW, Karlsson EK, Kunjeti SG, Lamour K, Liu Z, Ma L, MacLean D, Chibucos MC, McDonald H, McWalters J, Meijer HJG, Morgan W, Morris PF, Munro CA, O’Neill K, Ospina-Giraldo M, Pinzon A, Pritchard L, Ramsahoye B, Ren Q, Restrepo S, Roy S, Sadanandom A, Savidor A, Schornack S, Schwartz DC, Schumann UD, Schwessinger B, Seyer L, Sharpe T, Silvar C, Song J, Studholme DJ, Sykes S, Thines M, van de Vondervoort PJI, Phuntumart V, Wawra S, Weide R, Win J, Young C, Zhou S, Fry W, Meyers BC, van West P, Ristaino J, Govers F, Birch PRJ, Whisson SC, Judelson HS, Nusbaum C (2009) Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461(7262):393–398
Ramezani-Rad M, Hollenberg CP, Lauber J, Wedler H, Griess E, Wagner C, Albermann K, Hani J, Piontek M, Dahlems U, Gellissen G (2003) The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis. FEMS Yeast Res 4(2):207–215
Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD (2007) Eukaryotic genome size databases. Nucleic Acids Res 35(Database issue):D332–D338. https://doi.org/10.1093/nar/gkl828
Egertová Z, Sochor M (2017) The largest fungal genome discovered in Jafnea semitosta. Plant Syst Evol 303(7):981–986. https://doi.org/10.1007/s00606-017-1424-9
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Accessed 13 Sept 2017
Hannon Lab FASTX Toolkit. doi:citeulike-article-id:9103573
Petersen KR, Streett DA, Gerritsen AT, Hunter SS, Settles ML (2015) Super deduper, fast PCR duplicate detection in fastq files. In: Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics. ACM, pp 491–492
Bushnell B (2016) BBMap short read aligner. University of California, Berkeley, CA URL: http://sourceforgenet/projects/bbmap
Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S (2012) FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7(12):e52249. https://doi.org/10.1371/journal.pone.0052249
Magoč T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27(21):2957–2963. https://doi.org/10.1093/bioinformatics/btr507
Streett DA (2015) FLASH2. https://github.com/dstreett/FLASH2. Accessed 29 Sept 2017
Zhang J, Kobert K, Flouri T, Stamatakis A (2014) PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5):614–620. https://doi.org/10.1093/bioinformatics/btt593
Liu B, Yuan J, Yiu S-M, Li Z, Xie Y, Chen Y, Shi Y, Zhang H, Li Y, Lam T-W (2012) COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly. Bioinformatics 28(22):2870–2874
Buffalo V (2014) Scythe [Software]. https://github.com/vsbuffalo/scythe. Accessed 29 Sept 2017
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Schubert M, Lindgreen S, Orlando L (2016) AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes 9:88. https://doi.org/10.1186/s13104-016-1900-2
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10–12. https://doi.org/10.14806/ej.17.1.200
Joshi NA, Fass, J.N. (2011) Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. https://github.com/najoshi/sickle. Accessed 13 Sept 2017
Fletcher K (2017) FastqFilter.sh [Software]. https://github.com/kfletcher88/FastqFilter. Accessed 29 Sept 2017
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Staton SE (2013) Pairfq [Software]. https://github.com/sestaton/Pairfq. Accessed 29 Sept 2017
Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M (2014) NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries. Bioinformatics 30(4):566–568. https://doi.org/10.1093/bioinformatics/btt702
Kelley DR, Schatz MC, Salzberg SL (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11(11):R116. https://doi.org/10.1186/gb-2010-11-11-r116
Marçais G, Yorke JA, Zimin A (2015) QuorUM: an error corrector for Illumina reads. PLoS One 10(6):e0130821
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770. https://doi.org/10.1093/bioinformatics/btr011
Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC (2017) GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33(14):2202–2204. https://doi.org/10.1093/bioinformatics/btx153
Chikhi R, Medvedev P (2013) Informed and automated k-mer size selection for genome assembly. Bioinformatics 30(1):31–37
Simpson J, Wong K, Jackman S, Schein J, Jones S, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19. https://doi.org/10.1101/gr.089532.108
Ye C, Hill CM, Wu S, Ruan J, Ma ZS (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci Rep 6:31900
Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA (2013) The MaSuRCA genome assembler. Bioinformatics 29(21):2669–2677. https://doi.org/10.1093/bioinformatics/btt476
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1):18. https://doi.org/10.1186/2047-217x-1-18
MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10. https://doi.org/10.1186/gb-2009-10-10-r103
Goltsman E, Ho I, Rokhsar D (2017) Meraculous-2D: haplotype-sensitive assembly of highly heterozygous genomes. arXiv preprint arXiv:170309852
Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. https://doi.org/10.1101/gr.170720.113
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19. https://doi.org/10.1089/cmb.2012.0021
Weisenfeld NI, Yin S, Sharpe T, Lau B, Hegarty R, Holmes L, Sogoloff B, Tabbaa D, Williams L, Russ C, Nusbaum C, Lander ES, MacCallum I, Jaffe DB (2014) Comprehensive variation discovery in single human genomes. Nat Genet 46:1350. https://doi.org/10.1038/ng.3121
Peng Y, Leung HC, Yiu SM, Chin FY (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28. https://doi.org/10.1093/bioinformatics/bts174
Fletcher K (2017) AssemblyFilter.sh [Software]. https://github.com/kfletcher88/AssemblyFilter. Accessed 29 Sept 2017
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. https://doi.org/10.1016/s0022-2836(05)80360-2
Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8):1072–1075. https://doi.org/10.1093/bioinformatics/btt086
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351
Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23. https://doi.org/10.1093/bioinformatics/btm071
Quinlan AR (2014) BEDTools: the Swiss-army tool for genome feature analysis. Current Protoc Bioinformatics 47:11.12.11–11.12.34. https://doi.org/10.1002/0471250953.bi1112s47
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5. https://doi.org/10.1186/gb-2004-5-2-r12
Soderlund C, Bomhoff M, Nelson WM (2011) SyMAP v3.4: a turnkey synteny system with application to plant genomes. Nucleic Acids Res 39. https://doi.org/10.1093/nar/gkr123
Lyons E, Freeling M (2008) How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J 53(4):661–673
Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ (2017) KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33(4):574–576. https://doi.org/10.1093/bioinformatics/btw663
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27. https://doi.org/10.1093/bioinformatics/btq683
Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22(3):549–556. https://doi.org/10.1101/gr.126953.111
Dayarian A, Michael TP, Sengupta AM (2010) SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics 11:345–345. https://doi.org/10.1186/1471-2105-11-345
Adey A, Kitzman JO, Burton JN, Daza R, Kumar A, Christiansen L, Ronaghi M, Amini S, Gunderson KL, Steemers FJ, Shendure J (2014) In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. https://doi.org/10.1101/gr.178319.114
Yeo S, Coombe L, Chu J, Warren RL, Birol I (2017) ARCS: Assembly Roundup by Chromium Scaffolding. bioRxiv. https://doi.org/10.1101/100750
Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJ, Birol I (2015) LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience 4:35. https://doi.org/10.1186/s13742-015-0076-3
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7. https://doi.org/10.1371/journal.pone.0047768
Boetzer M, Pirovano W (2014) SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics 15:211. https://doi.org/10.1186/1471-2105-15-211
Boetzer M, Pirovano W (2012) Toward almost closed genomes with GapFiller. Genome Biol 13(6):R56. https://doi.org/10.1186/gb-2012-13-6-r56
Paulino D, Warren RL, Vandervalk BP, Raymond A, Jackman SD, Birol I (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics 16(1):230. https://doi.org/10.1186/s12859-015-0663-4
Pacific Biosciences (2017) SMRT-Link. https://github.com/PacificBiosciences/SMRT-Link. Accessed 29 Sept 2017
Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ (2016) Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res 44(19):e147–e147. https://doi.org/10.1093/nar/gkw654
Huang S, Kang M, Xu A (2017) HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx220
Pryszcz LP, Gabaldon T (2016) Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res 44(12):e113. https://doi.org/10.1093/nar/gkw294
Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14(5):R47. https://doi.org/10.1186/gb-2013-14-5-r47
Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S (2017) Scaffolding of long read assemblies using long range contact information. BMC Genomics 18(1):527. https://doi.org/10.1186/s12864-017-3879-z
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31. https://doi.org/10.1038/nbt.2727
Smit A, Hubley R (2008–2015) RepeatModeler Open-1.0
Smit A, Hubley R, Green P (2013–2015) RepeatMasker open-4.0
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196. https://doi.org/10.1101/gr.6743907
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5(1):59. https://doi.org/10.1186/1471-2105-5-59
Slater GSC, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31. https://doi.org/10.1186/1471-2105-6-31
Papanicolaou A (2013) Just annotate my genome (JAMg) v. RC1. http://jamg.sourceforge.net/. Accessed 1 Oct 2017
Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33(Web Server issue):W465–W467. https://doi.org/10.1093/nar/gki458
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454. https://doi.org/10.1093/nar/gki487
Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879. https://doi.org/10.1093/bioinformatics/bth315
Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13(12):1050–1054. https://doi.org/10.1038/nmeth.4035
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. https://doi.org/10.1101/gr.215087.116
Xiao C-L, Chen Y, Xie S-Q, Chen K-N, Wang Y, Luo F, Xie Z (2016) MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads. bioRxiv. https://doi.org/10.1101/089250
Jayakumar V, Sakakibara Y (2017) Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Brief Bioinform:bbx147. https://doi.org/10.1093/bib/bbx147
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12(4):656–664. https://doi.org/10.1101/gr.229202 Article published online before March 2002
Baxter L, Tripathy S, Ishaque N, Boot N, Cabral A, Kemen E, Thines M, Ah-Fong A, Anderson R, Badejoko W (2010) Signatures of adaptation to obligate biotrophy in the Hyaloperonospora arabidopsidis genome. Science 330. https://doi.org/10.1126/science.1195203
Edge P, Bafna V, Bansal V (2016) HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. https://doi.org/10.1101/gr.213462.116
Putnam NH, O’onnell BL, Stites JC, Rice BJ, Blanchette M, Calef R, Troll CJ, Fields A, Hartley PD, Sugnet CW, Haussler D, Rokhsar DS, Green RE (2016) Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res 26(3):342–350. https://doi.org/10.1101/gr.193474.115
Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, Lander ES, Aiden EL (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3(1):95–98. https://doi.org/10.1016/j.cels.2016.07.002
Cao H, Hastie AR, Cao D, Lam ET, Sun Y, Huang H, Liu X, Lin L, Andrews W, Chan S, Huang S, Tong X, Requa M, Anantharaman T, Krogh A, Yang H, Cao H, Xu X (2014) Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. GigaScience 3(1):34. https://doi.org/10.1186/2047-217x-3-34
Nagarajan N, Read TD, Pop M (2008) Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10):1229–1235. https://doi.org/10.1093/bioinformatics/btn102
Neely RK, Deen J, Hofkens J (2011) Optical mapping of DNA: single-molecule-based methods for mapping genomes. Biopolymers 95(5):298–311. https://doi.org/10.1002/bip.21579
Fierst JL (2015) Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools. Front Genet 6(220). https://doi.org/10.3389/fgene.2015.00220
Nayaka SC, Shetty HS, Satyavathi CT, Yadav RS, Kishor PBK, Nagaraju M, Anoop TA, Kumar MM, Kuriakose B, Chakravartty N, Katta AVSKM, Lachagari VBR, Singh OV, Sahu PP, Puranik S, Kaushal P, Srivastava RK (2017) Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen. Biotechnol Rep 16(Suppl C):18–20. https://doi.org/10.1016/j.btre.2017.07.006
Fletcher K (2017) runHM2.sh [Software]. https://github.com/kfletcher88/HM2-RunLight. Accessed 14 Nov 2017
Morgulis A, Gertz EM, Schaffer AA, Agarwala R (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22(2):134–141. https://doi.org/10.1093/bioinformatics/bti774
Reyes-Chin-Wo S, Wang Z, Yang X, Kozik A, Arikit S, Song C, Xia L, Froenicke L, Lavelle DO, Truco M-J, Xia R, Zhu S, Xu C, Xu H, Xu X, Cox K, Korf I, Meyers BC, Michelmore RW (2017) Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat Commun 8:14953. https://doi.org/10.1038/ncomms14953
Peichel CL, Sullivan ST, Liachko I, White MA (2016) Improvement of the threespine stickleback (Gasterosteus aculeatus) genome using a Hi-C-based Proximity-Guided Assembly method. bioRxiv:068528
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL (2017) De novo assembly of the Aedes aegypti; genome using Hi-C yields chromosome-length scaffolds. Science 356(6333):92–95
Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, Lee J, Lam ET, Liachko I, Sullivan ST, Burton JN, Huson HJ, Nystrom JC, Kelley CM, Hutchison JL, Zhou Y, Sun J, Crisa A, Ponce de Leon FA, Schwartz JC, Hammond JA, Waldbieser GC, Schroeder SG, Liu GE, Dunham MJ, Shendure J, Sonstegard TS, Phillippy AM, Van Tassell CP, Smith TPL (2017) Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49(4):643–650. https://doi.org/10.1038/ng.3802
Jiao W-B, Accinelli GG, Hartwig B, Kiefer C, Baker D, Severing E, Willing E-M, Piednoel M, Woetzel S, Madrid-Herrero E (2017) Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res 27(5):778–786
Mohr DW, Naguib A, Weisenfeld N, Kumar V, Shah P, Church DM, Jaffe D, Scott AF (2017) Improved de novo genome assembly: linked-read sequencing combined with optical mapping produce a high quality mammalian genome at relatively low cost. bioRxiv:128348
Earl D, Bradnam K, St. John J, Darling A, Lin D, Fass J, Yu HOK, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung W-K, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol İ, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang S-P, Wu W, Chou W-C, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B (2011) Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 21(12):2224–2241. https://doi.org/10.1101/gr.126599.111
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou W-C, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis É, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam T-W, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, MacCallum I, MacManes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu S-M, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2(1):10. https://doi.org/10.1186/2047-217X-2-10
Hunt M, Newbold C, Berriman M, Otto TD (2014) A comprehensive evaluation of assembly scaffolding tools. Genome Biol 15(3):R42. https://doi.org/10.1186/gb-2014-15-3-r42
Dekker J, Rippe K, Dekker M, Kleckner N (2002) Capturing chromosome conformation. Science 295(5558):1306
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE (2014) Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415. https://doi.org/10.7717/peerj.415
Burton JN, Liachko I, Dunham MJ, Shendure J (2014) Species-level deconvolution of metagenome assemblies with Hi-C–based contact probability maps. G3 (Bethesda) 4(7):1339–1346. https://doi.org/10.1534/g3.114.011825
Paulsen J, Sekelja M, Oldenburg AR, Barateau A, Briand N, Delbarre E, Shah A, Sørensen AL, Vigouroux C, Buendia B, Collas P (2017) Chrom3D: three-dimensional genome modeling from Hi-C and nuclear lamin-genome contacts. Genome Biol 18(1):21. https://doi.org/10.1186/s13059-016-1146-2
Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159. https://doi.org/10.1016/j.cell.2014.11.021
Howe K, Wood JMD (2015) Using optical mapping data for the improvement of vertebrate genome assemblies. GigaScience 4(1):10. https://doi.org/10.1186/s13742-015-0052-y
Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK (1993) Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262(5130):110–114
Ananiev GE, Goldstein S, Runnheim R, Forrest DK, Zhou S, Potamousis K, Churas CP, Bergendahl V, Thomson JA, Schwartz DC (2008) Optical mapping discerns genome wide DNA methylation profiles. BMC Mol Biol 9:68. https://doi.org/10.1186/1471-2199-9-68
Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, Tosser-Klopp G, Wang J, Yang S, Liang J (2013) Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol 31(2):135–141
Jiang N (2013) Overview of repeat annotation and de novo repeat identification. In: Peterson T (ed) Plant transposable elements: methods and protocols. Humana Press, Totowa, NJ, pp 275–287. https://doi.org/10.1007/978-1-62703-568-2_20
Campbell MS, Holt C, Moore B, Yandell M (2014) Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinformatics 48:4.11.11–14.11.39. https://doi.org/10.1002/0471250953.bi0411s48
Adhikari BN, Hamilton JP, Zerillo MM, Tisserat N, Lévesque CA, Buell CR (2013) Comparative genomics reveals insight into virulence strategies of plant pathogenic oomycetes. PLoS One 8(10):e75072
Levesque CA, Brouwer H, Cano L, Hamilton JP, Holt C, Huitema E, Raffaele S, Robideau GP, Thines M, Win J (2010) Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol 11. https://doi.org/10.1186/gb-2010-11-7-r73
Rujirawat T, Patumcharoenpol P, Lohnoo T, Yingyong W, Lerksuthirat T, Tangphatsornruang S, Suriyaphol P, Grenville-Briggs LJ, Garg G, Kittichotirat W, Krajaejun T (2015) Draft genome sequence of the pathogenic oomycete pythium insidiosum strain Pi-S, isolated from a patient with pythiosis. Genome Announc 3(3):e00574-00515. https://doi.org/10.1128/genomeA.00574-15
Mestre P, Carrere S, Gouzy J, Piron MC, Tourvieille de Labrouhe D, Vincourt P, Delmotte F, Godiard L (2016) Comparative analysis of expressed CRN and RXLR effectors from two Plasmopara species causing grapevine and sunflower downy mildew. Plant Pathol 65(5):767–781. https://doi.org/10.1111/ppa.12469
Sharma R, Xia X, Cano LM, Evangelisti E, Kemen E, Judelson H, Oome S, Sambles C, van den Hoogen DJ, Kitner M, Klein J, Meijer HJG, Spring O, Win J, Zipper R, Bode HB, Govers F, Kamoun S, Schornack S, Studholme DJ, Van den Ackerveken G, Thines M (2015) Genome analyses of the sunflower pathogen Plasmopara halstedii provide insights into effector evolution in downy mildews and Phytophthora. BMC Genomics 16(1):741. https://doi.org/10.1186/s12864-015-1904-7
Hall B, DeRego T, Geib S (2014) GAG: the genome annotation generator (version 1.0) [Software]. http://genomeannotation.github.io/GAG. Accessed 26 Oct 2017
Mondo SJ, Dannebaum RO, Kuo RC, Louie KB, Bewick AJ, LaButti K, Haridas S, Kuo A, Salamov A, Ahrendt SR, Lau R, Bowen BP, Lipzen A, Sullivan W, Andreopoulos BB, Clum A, Lindquist E, Daum C, Northen TR, Kunde-Ramamoorthy G, Schmitz RJ, Gryganskyi A, Culley D, Magnuson J, James TY, O'Malley MA, Stajich JE, Spatafora JW, Visel A, Grigoriev IV (2017) Widespread adenine N6-methylation of active genes in fungi. Nat Genet 49(6):964–968. https://doi.org/10.1038/ng.3859
Flusberg BA, Webster D, Lee J, Travers K, Olivares E, Clark TA, Korlach J, Turner SW (2010) Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7(6):461–465. https://doi.org/10.1038/nmeth.1459
Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B (2017) Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods 14(4):411–413. https://doi.org/10.1038/nmeth.4189
Fraser J, Ferrai C, Chiariello AM, Schueler M, Rito T, Laudanno G, Barbieri M, Moore BL, Kraemer DC, Aitken S (2015) Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol Syst Biol 11(12):852
Olivares-Chauvet P, Mukamel Z, Lifshitz A, Schwartzman O, Elkayam NO, Lubling Y, Deikus G, Sebra RP, Tanay A (2016) Capturing pairwise and multi-way chromosomal conformations using chromosomal walks. Nature 540(7632):296–300. https://doi.org/10.1038/nature20158
Smith DR (2017) Goodbye genome paper, hello genome report: the increasing popularity of ‘genome announcements’ and their impact on science. Brief Funct Genomics 16(3):156–162. https://doi.org/10.1093/bfgp/elw026
Acknowledgments
We would like to thank Sebastian Reyes-Chin-Wo (UC Davis) and Lida Derevnina (now The Sainsbury Laboratory, Norwich, UK) for their contributions in the initial setup of the workflow and William Palmer and Kelsey Wood (both UC Davis) for their reviews during preparation. We thank the UC Davis Bioinformatics Core for their computational and software support. We also thank Diane Saunders and members of her lab Guru Radhakrishnan, Daniel C.E. Bunting, and Antoine Persoons for their helpful comments. The work was supported by The Novozymes Inc. Endowed Chair in Genomics to RWM.
Conflicts of interest: The statements regarding Hi-C are based on experience resulting from collaboration with Dovetail Genomics. The authors declare that there are no other potential conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Fletcher, K., Michelmore, R. (2018). From Short Reads to Chromosome-Scale Genome Assemblies. In: Ma, W., Wolpert, T. (eds) Plant Pathogenic Fungi and Oomycetes. Methods in Molecular Biology, vol 1848. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8724-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8724-5_13
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8723-8
Online ISBN: 978-1-4939-8724-5
eBook Packages: Springer Protocols