Abstract
Genome drafts for the phytoplasmas may be rapidly and efficiently assembled from NGS sequence data alone exploiting the proper bioinformatic tools and starting from properly collected samples. Here, we describe the use of the Phytoassembly pipeline (https://github.com/cpolano/phytoassembly), a fully automated tool that accepts as input row Illumina data from two samples (a phytoplasma infected sample and a healthy reference sample) to produce a phytoplasma genome draft, using the healthy plant host genome as a filter and profiting from the difference in reads coverage between the genome of the pathogen and that of the host. For phytoplasma infected samples containing >2% of pathogen DNA and an isogenic healthy reference sequence the resulting assemblies span the almost entire genomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tran-Nguyen LTT, Gibb KS (2007) Optimizing phytoplasma DNA purification for genome analysis. J Biomol Tech 18:104–112
Saeed E, Seemüller E, Schneider B et al (1994) Molecular cloning, detection of chromosomal DNA of the Mycoplasmalike organism (MLO) associated with Faba bean (Vicia faba L.) phyllody by southern blot hybridization and the polymerase chain reaction (PCR). J Phytopathol 142:97–106. https://doi.org/10.1111/j.1439-0434.1994.tb04519.x
Oshima K, Kakizawa S, Nishigawa H et al (2004) Reductive evolution suggested from the complete genome sequence of a plant-pathogenic phytoplasma. Nat Genet 36:27–29. https://doi.org/10.1038/ng1277
Bai X, Zhang J, Ewing A et al (2006) Living with genome instability: the adaptation of phytoplasmas to diverse environments of their insect and plant hosts. J Bacteriol 188:3682–3696. https://doi.org/10.1128/JB.188.10.3682-3696.2006
Kube M, Schneider B, Kuhl H et al (2008) The linear chromosome of the plant-pathogenic mycoplasma “Candidatus Phytoplasma Mali”. BMC Genomics 9:306. https://doi.org/10.1186/1471-2164-9-306
Tran-Nguyen LTT, Kube M, Schneider B et al (2008) Comparative genome analysis of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-a) and “Ca. Phytoplasma asteris” strains OY-M and AY-WB. J Bacteriol 190:3979–3991. https://doi.org/10.1128/JB.01301-07
Andersen MT, Liefting LW, Havukkala I, Beever RE (2013) Comparison of the complete genome sequence of two closely related isolates of “Candidatus Phytoplasma australiense” reveals genome plasticity. BMC Genomics 14:529. https://doi.org/10.1186/1471-2164-14-529
Mitrovic J, Siewert C, Duduk B et al (2014) Generation and analysis of draft sequences of “Stolbur” phytoplasma from multiple displacement amplification templates. J Mol Microbiol Biotechnol 24:1–11. https://doi.org/10.1159/000353904
Lee I-M, Shao J, Bottner-Parker KD et al (2015) Draft genome sequence of “Candidatus Phytoplasma pruni” strain CX, a plant-pathogenic bacterium. Genome Announc 3:e01117–e01115. https://doi.org/10.1128/genomeA.01117-15
Kakizawa S, Makino A, Ishii Y et al (2014) Draft genome sequence of “Candidatus Phytoplasma asteris” strain OY-V, an unculturable plant-pathogenic bacterium. Genome Announc 2:e00944-14. https://doi.org/10.1128/genomeA.00944-14
Fischer A, Santana-Cruz I, Wambua L et al (2016) Draft genome sequence of “Candidatus Phytoplasma oryzae” strain Mbita1, the causative agent of Napier grass stunt disease in Kenya. Genome Announc 4:e00297–e00216. https://doi.org/10.1128/genomeA.00297-16
Saccardo F, Martini M, Palmano S et al (2012) Genome drafts of four phytoplasma strains of the ribosomal group 16SrIII. Microbiology 158:2805–2814. https://doi.org/10.1099/mic.0.061432-0
Chung W-C, Chen L-L, Lo W-S et al (2013) Comparative analysis of the peanut witches’-broom phytoplasma genome reveals horizontal transfer of potential mobile units and effectors. PLoS One 8:e62770. https://doi.org/10.1371/journal.pone.0062770
Davis RE, Zhao Y, Dally EL et al (2013) “Candidatus Phytoplasma pruni”, a novel taxon associated with X-disease of stone fruits, Prunus spp.: multilocus characterization based on 16S rRNA, secY, and ribosomal protein genes. Int J Syst Evol Microbiol 63:766–776. https://doi.org/10.1099/ijs.0.041202-0
Quaglino F, Zhao Y, Casati P et al (2013) “Candidatus Phytoplasma solani”, a novel taxon associated with stolbur- and bois noir-related diseases of plants. Int J Syst Evol Microbiol 63:2879–2894. https://doi.org/10.1099/ijs.0.044750-0
Chen W, Li Y, Wang Q et al (2014) Comparative genome analysis of wheat blue dwarf phytoplasma, an obligate pathogen that causes wheat blue dwarf disease in China. PLoS One 9:e96436. https://doi.org/10.1371/journal.pone.0096436
Quaglino F, Kube M, Jawhari M et al (2015) “Candidatus Phytoplasma phoenicium” associated with almond witches’-broom disease: from draft genome to genetic diversity among strain populations. BMC Microbiol 15:148. https://doi.org/10.1186/s12866-015-0487-4
Tritt A, Eisen JA, Facciotti MT, Darling AE (2012) An integrated pipeline for de novo assembly of microbial genomes. PLoS One 7:e42304. https://doi.org/10.1371/journal.pone.0042304
Aziz RK, Bartels D, Best AA et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75. https://doi.org/10.1186/1471-2164-9-75
Glass EM, Meyer F (2011) The Metagenomics RAST server: a public resource for the automatic phylogenetic and functional analysis of metagenomes. In: Metagenomics complement approaches, Handbook of molecular microbial ecology, vol 8, pp 325–331. https://doi.org/10.1002/9781118010518.ch37
Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: How Phytoassembly Works
Appendix: How Phytoassembly Works
-
1.
The procedure uses the A5 pipeline to assemble the healthy plant sequence reads (Healthy.contigs.fasta), if no assembly was provided. The remaining files are archived. Next, the diseased plant reads are assembled (producing the file Diseased.contigs.fasta). A step in the A5 pipeline produces error corrected reads (Diseased.ec.fastq), which are used in all the subsequent steps.
-
2.
The assembled reference sequence file is then indexed and aligned with the error corrected reads by the BWA tool [21] using the index and mem commands. Using the samtools (http://www.htslib.org/doc/samtools.html) commands view, sort, index and idxstats, a summary of statics is produced (Diseased.sorted.csv), consisting of the reference sequence name, sequence length, number of mapped, and unmapped reads.
-
3.
This summary is passed to a phytocount.pl to estimate a cutoff value, by running once with cutoff 0, then using a fraction of the ratio between the sum of the lengths of the non-mapping reads at cutoff 0 (Stage2.0.nonmatch.fastq, see below) and the sum of the lengths of the error corrected reads (Diseased.ec.fastq) of the diseased plant. Alternatively, if the user wants to supply a range of specifies fixed cutoff values, then the pipeline repeats the following steps from the lowest to the highest values provided (represented here as $cutoffval).
-
4.
From the summary of statistical data (Diseased.sorted.csv), per-contig coverages are calculated and saved in a text file (Diseased.sorted.cov.csv).
-
5.
The contigs with a coverage higher than $cutoffval are exported (Diseased.cutoff.$cutoffval.fasta, where $cutoffval is, e.g., “10”). The error-corrected reads from the diseased plant (Assembly.ec.fastq) are then aligned to the contigs in that last file using BWA (Stage1.$cutoffval.match.sam).
-
6.
Using phytofilter.pl the reads above the cutoff from the alignment file are extracted and exported (Stage1.$cutoffval.match.fastq), using the sam flag #4 (“the query sequence itself is unmapped”) as filter.
-
7.
These reads are now aligned with BWA against the healthy plant reference (Healthy.contigs.fasta), and the reads that do not align are exported (Stage2.$cutoffval.nonmatch.fastq). These non-aligned reads are assembled with the A5 pipeline (Stage3.$cutoffval.contigs.fasta).
-
8.
A blast nucleotide database is created, using phytoblast.pl, from the reference healthy plant file (Healthy.contigs.fasta, which could also be a combination of different references) and used to query the contigs outputted by the previous stage (Stage3.$cutoffval.contigs.fasta) using tblastx. The results are saved in table (Stage3.$cutoffval.contigs.csv), which is then filtered according to the identity percentage (IP): entries with an IP greater than 95% are attributed to the plant (Stage3.$cutoffval.contigs.plant.csv), while those with a lower IP are attributed to the phytoplasma (Stage3.$cutoffval.contigs.phyto.csv). Using this last file the contigs pertaining to the phytoplasma are extracted from the query (Stage3.$cutoffval.phyto.fasta).
-
9.
Lastly, the main outputs are archived and moved to a folder (Results_$timestamp), statistical data such as contigs size and number are calculated, and intermediate files are moved to a sub-folder (Other_files).
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Polano, C., Firrao, G. (2019). Assembly of Phytoplasma Genome Drafts from Illumina Reads Using Phytoassembly. In: Musetti, R., Pagliari, L. (eds) Phytoplasmas. Methods in Molecular Biology, vol 1875. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8837-2_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8837-2_16
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8836-5
Online ISBN: 978-1-4939-8837-2
eBook Packages: Springer Protocols