Abstract
High-throughput sequencing is an increasingly accessible tool for cataloging gene complements of plant pathogens and their hosts. It has had great impact in plant pathology, enabling rapid acquisition of data for a wide range of pathogens and hosts, leading to the selection of novel candidate effector proteins, and/or associated host targets (Bart et al., Proc Nat Acad Sci U S A doi:10.1073/pnas.1208003109, 2012; Agbor and McCormick, Cell Microbiol 13:1858–1869, 2011; Fabro et al., PLoS Pathog 7:e1002348, 2011; Kim et al., Mol Plant Pathol 2:715–730, 2011; Kimbrel et al., Mol Plant Pathol 12:580–594, 2011; O’Brien et al., Curr Opin Microbiol 14:24–30, 2011; Vleeshouwers et al., Annu Rev Phytopathol 49:507–531, 2011; Sarris et al., Mol Plant Pathol 11:795–804, 2010; Boch and Bonas, Annu Rev Phytopathol 48:419–436, 2010; Mcdermott et al., Infect Immun 79:23–32, 2011).
Identification of candidate effectors from genome data is not different from classification in any other high-content or high-throughput experiment. The primary aim is to discover a set of qualitative or quantitative sequence characteristics that discriminate, with a defined level of certainty, between proteins that have previously been identified as being either “effector” (positive) or “not effector” (negative). Combination of these characteristics in a mathematical model, or classifier, enables prediction of whether a protein is or is not an effector, with a defined level of certainty. High-throughput screening of the gene complement is then performed to identify candidate effectors; this may seem straightforward, but it is unfortunately very easy to identify seemingly persuasive candidate effectors that are, in fact, entirely spurious.
The main sources of danger in this area of statistical modeling are not entirely independent of each other, and include: inappropriate choice of classifier model; poor selection of reference sequences (known positive and negative examples); poor definition of classes (what is, and what is not, an effector); inadequate training sample size; poor model validation; and lack of adequate model performance metrics (Xia et al., Metabolomics doi:10.1007/s11306-012-0482-9, 2012). Many studies fail to take these issues into account, and thereby fail to discover anything of true significance or, worse, report spurious findings that are impossible to validate. Here we summarize the impact of these issues and present strategies to assist in improving design and evaluation of effector classifiers, enabling robust scientific conclusions to be drawn from the available data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bart R, Cohn M, Kassen A, McCallum EJ, Shybut M et al (2012) High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance. Proc Natl Acad Sci U S A. doi:10.1073/pnas.1208003109
Agbor TA, McCormick BA (2011) Salmonella effectors: important players modulating host cell function during infection. Cell Microbiol 13:1858–1869. doi:10.1111/j.1462-5822.2011.01701.x
Fabro G, Steinbrenner J, Coates M, Ishaque N, Baxter L et al (2011) Multiple candidate effectors from the oomycete pathogen Hyaloperonospora arabidopsidis suppress host plant immunity. PLoS Pathog 7:e1002348. doi:10.1371/journal.ppat.1002348
Kim J-G, Taylor KW, Mudgett MB (2011) Comparative analysis of the XopD type III secretion (T3S) effector family in plant pathogenic bacteria. Mol Plant Pathol 12:715–730. doi:10.1111/j.1364-3703.2011.00706.x
Kimbrel JA, Givan SA, Temple TN, Johnson KB, Chang JH (2011) Genome sequencing and comparative analysis of the carrot bacterial blight pathogen, Xanthomonas hortorum pv. carotae M081, for insights into pathogenicity and applications in molecular diagnostics. Mol Plant Pathol 12:580–594. doi:10.1111/j.1364-3703.2010.00694.x
O'Brien HE, Desveaux D, Guttman DS (2011) Next-generation genomics of Pseudomonas syringae. Curr Opin Microbiol 14:24–30. doi:10.1016/j.mib.2010.12.007
Vleeshouwers VGAA, Raffaele S, Vossen JH, Champouret N, Oliva R et al (2011) Understanding and exploiting late blight resistance in the age of effectors. Annu Rev Phytopathol 49:507–531. doi:10.1146/annurev-phyto-072910-095326
Sarris PF, Skandalis N, Kokkinidis M, Panopoulos NJ (2010) In silico analysis reveals multiple putative type VI secretion systems and effector proteins in Pseudomonas syringae pathovars. Mol Plant Pathol 11:795–804. doi:10.1111/j.1364-3703.2010.00644.x
Boch J, Bonas U (2010) Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu Rev Phytopathol 48:419–436. doi:10.1146/annurev-phyto-080508-081936
Mcdermott JE, Corrigan A, Peterson E, Oehmen C, Niemann G et al (2011) Computational prediction of type III and IV secreted effectors in gram-negative bacteria. Infect Immun 79:23–32. doi:10.1128/IAI.00537-10
Xia J, Broadhurst DI, Wilson M, Wishart DS (2012) Translational biomarker discovery in clinical metabolomics: an introductory tutorial. Metabolomics. doi:10.1007/s11306-012-0482-9
Cornelis GR (2006) The type III secretion injectisome. Nat Rev Microbiol 4:811–825. doi:10.1038/nrmicro1526
Whisson SC, Boevink PC, Moleleki L, Avrova AO, Morales JG et al (2007) A translocation signal for delivery of oomycete effector proteins into host plant cells. Nature 450:115–118. doi:10.1038/nature06203
Löwer M, Schneider G (2009) Prediction of type III secretion signals in genomes of gram-negative bacteria. PLoS ONE 4:e5917. doi:10.1371/journal.pone.0005917
Arnold R, Brandmaier S, Kleine F, Tischler P, Heinz E et al (2009) Sequence-based prediction of type III secreted proteins. PLoS Pathog 5:e1000376. doi:10.1371/journal.ppat.1000376
Sui T, Yang Y, Wang X (2013) Sequence-based feature extraction for type III effector prediction. Int J Biosci Biochem Bioinforma 3:246–251. doi:10.7763/IJBBB.2013.V3.206
Liu C, Che D, Liu X, Song Y (2013) Applications of machine learning in genomics and systems biology. Comput Math Methods Med 2013:587492. doi:10.1155/2013/587492
Broadhurst D, Kell DB (2006) Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2:171–196
O'Brien HE, Thakur S, Gong Y, Fung P, Zhang J et al (2012) Extensive remodeling of the Pseudomonas syringae pv. avellanae type III secretome associated with two independent host shifts onto hazelnut. BMC Microbiol 12:141
McNally RR, Toth IK, Cock PJA, Pritchard L, Hedley PE et al (2012) Genetic characterization of the HrpL regulon of the fire blight pathogen Erwinia amylovora reveals novel virulence factors. Mol Plant Pathol 13:160–173. doi:10.1111/j.1364-3703.2011.00738.x
Arnold DL, Jackson RW (2011) Bacterial genomes: evolution of pathogenicity. Curr Opin Plant Biol 14:385–391. doi:10.1016/j.pbi.2011.03.001
Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE et al (2009) Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461:393–398. doi:10.1038/nature08358
Win J, Morgan W, Bos JIB, Krasileva KV, Cano LM et al (2007) Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes. Plant Cell 19:2349–2369. doi:10.1105/tpc.107.051037
Bhattacharjee S, Hiller NL, Liolios K, Win J, Kanneganti T-D et al (2006) The malarial host-targeting signal is conserved in the Irish potato famine pathogen. PLoS Pathog 2:e50. doi:10.1371/journal.ppat.0020050
Petnicki-Ocwieja T, Schneider DJ, Tam VC, Chancey ST, Shan L et al (2002) Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000. Proc Natl Acad Sci U S A 99:7652–7657. doi:10.1073/pnas.112183899
Greenberg JT, Vinatzer B (2003) Identifying type III effectors of plant pathogens and analyzing their interaction with plant cells. Curr Opin Microbiol 6(1):20–28
Bogdanove AJ, Schornack S, Lahaye T (2010) TAL effectors: finding plant genes for disease and defense. Curr Opin Plant Biol 13: 394–401. doi:10.1016/j.pbi.2010.04.010
Boch J, Scholze H, Schornack S, Landgraf A, Hahn S et al (2009) Breaking the code of DNA-binding specificity of TAL-type III effectors. Science. doi:10.1126/science.1178811
Yang Y (2012) Identification of novel type III effectors using latent Dirichlet allocation. Comput Math Methods Med 2012:696190. doi:10.1155/2012/696190
Wang Y, Zhang Q, Sun M-A, Guo D (2011) High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics 27:777–784. doi:10.1093/bioinformatics/btr021
Macho AP, Ruiz-Albert J, Tornero P, Beuzón CR (2009) Identification of new type III effectors and analysis of the plant response by competitive index. Mol Plant Pathol 10:69–80. doi:10.1111/j.1364-3703.2008.00511.x
Xu S, Zhang C, Miao Y, Gao J, Xu D (2010) Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif. BMC Genomics 11(Suppl 3):S1. doi:10.1186/1471-2164-11-S3-S1
Jehl M-A, Arnold R, Rattei T (2010) Effective – a database of predicted secreted bacterial proteins. Nucleic Acids Res. doi:10.1093/nar/gkq1154
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. doi:10.1093/bioinformatics/btm344
Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis: principles and applications. Umetrics AB, Umea
Brereton RG (2003) Chemometrics: data analysis for the laboratory and chemical plant. Wiley, Chichester UK
Efron B, Tibshirani R (1997) Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 92:548–560. doi:10.1080/01621459.1997.10474007
Obuchowski NA, Lieber ML, Wians FH (2004) ROC curves in clinical chemistry: uses, misuses, and possible solutions. Clin Chem 50:1118–1125. doi:10.1373/clinchem.2004.031823
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39(4):561–577
Lasko TA, Bhagwat JG, Zou KH (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, New York
About this protocol
Cite this protocol
Pritchard, L., Broadhurst, D. (2014). On the Statistics of Identifying Candidate Pathogen Effectors. In: Birch, P., Jones, J., Bos, J. (eds) Plant-Pathogen Interactions. Methods in Molecular Biology, vol 1127. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-986-4_4
Download citation
DOI: https://doi.org/10.1007/978-1-62703-986-4_4
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-985-7
Online ISBN: 978-1-62703-986-4
eBook Packages: Springer Protocols