Abstract
Post-genomic science is producing bounteous data floods, and as the above quotation indicates the extraction of the most meaningful parts of these data is key to the generation of useful new knowledge. Atypical metabolic fingerprint or metabolomics experiment is expected to generate thousands of data points (samples times variables) of which only a handful might be needed to describe the problem adequately. Evolutionary algorithms are ideal strategies for mining such data to generate useful relationships, rules and predictions. This chapter describes these techniques and highlights their exploitation in metabolomics.
The fewer data needed, the better the information. And an overload of information, that is, anything much beyond what is truly needed, leads to information blackout. It does not enrich, but impoverishes.Peter F. Drucker - Management: Tasks, Responsibilities, Practices
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Al-Jowder O, Defernez M, Kemsley EK, Wilson RH. Mid-infrared spectroscopy and chemometrics for die authentication of meat products. J Agric Food Chem 47: 3210–3218 (1999).
Allen JK, Davey HM, Broadhurst D et al. Metabolic footprinting: a high-throughput, high-information approach to cellular characterisation and functional genomics. Nature Biotechnol submitted (2002).
Alsberg BK, Goodacre R, Rowland JJ, Kell DB. Classification of pyrolysis mass spectra by fuzzy multivariate rule induction - comparison with regression, k-nearest neighbour, neural and decision-tree methods. Anal Chim Acta 348: 389–407 (1997).
Alsberg BK, Kell DB, Goodacre R. Variable selection in discriminant partial least squares analysis. Anal Chem 70: 4126–4133 (1998).
Altshuler D, Daly M, Kruglyak L. Guilt by association. Nature Genet 26: 135–137 (2000).
Bäck T, Fogel DB, Michalewicz Z. Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997).
Banzhaf W, Nordin P, Keller RE, Francone FD. Genetic Programming: An Introduction. Morgan Kaufmann, San Francisco (1998).
Barnaby W. The Plague Makers: The Secret World of Biolgoical Warfare. Vision Paperbacks, London (1997).
Beavis RC, Colby SM, Goodacre R et al. Artificial intelligence and expert systems in mass spectrometry. In Encyclopedia of Analytical Chemistry. Meyers RA (Ed) pp. 11558–11597, John Wiley and Son, Chichester (2000).
Beyer H-G. The Theory of Evolution Strategies. Springer, Berlin (2001)
Bishop CM. Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995).
Bø TH, Jonassen I. New feature subset selection procedures for classification of expression profiles. http://genomebiologvcom/2Q02/3/4/researcli/00171 3: research0017.1–0017.11 (2002).
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth Inc, Pacific Grove (1984).
Brent R. Functional genomics: learning to think about gene expression data. Curr Biol 9: R338–R341 (1999).
Brent R. Genomic biology. Cell 100: 169–183 (2000).
Broadhurst D, Goodacre R, Jones A et al. Genetic algorithms as a method for variable selection in PLS regression, with application to pyrolysis mass spectra. Anal Chim Acta 348: 71–86 (1997).
Broomhead DS, Lowe D. Multivariate function interpolation and adaptive networks. Complex Sys 2: 321–355 (1988).
Chatfield C, Collins AJ. Introduction to Multivariate Analysis. Chapman and Hall, London (1980).
Corne D, Dorigo M, Glover F (Ed). New Ideas in Optimization. McGraw Hill, London (1999).
Dainty RH. Chemical/biochemical detection of spoilage. Int J Food Microbiol 33: 19–33 (1996).
Dando M. Biological Warfare in the 21 st Century. Brassey’s Ltd., London (1994).
Darby RM, Maddison A, Mur LAJ et al. Cell specific expression of salicylate hydroxylase in an attempt to separate localised HR and systemic signalling establishing SAR in tobacco. Plant Mol Pathol 1: 115–124 (2000).
Downey G, McElhinney J, Fearn T. Species identification in selected raw homogenized meats by reflectance spectroscopy in the mid-infrared, near-infrared, and visible ranges. Appl Spectr 54: 894–899 (2000).
Doyle MP, Beuchat LR, Montville TJ (Ed) Food Microbiology: Fundamentals and Frontiers. American Society of Microbiology Press, Washington DC (1997).
Duda RO, Hart PE, Stork DE. Pattern Classification. 2nd Edn. John Wiley and Sons, London (2001).
Ellis DI, Broadhurst D, Kell DB et al. Rapid and quantitative detection of the microbial spoilage of meat using FT-IR spectroscopy and machine learning. Appl Env Microbiol 68: 2822–2828 (2002).
Everitt BS. Cluster Analysis. Edward Arnold, London (1993).
Fell DA. Understanding the Control of Metabolism. Portland Press, London (1996).
Fiehn O. Metabolomics — the link between genotypes and phenotypes. Plant Mol Biol 48: 155–171 (2002).
Fiehn O, Kloska S, Altmann T. Integrated studies on plant biology using multiparallel techniques. Curr Opin Biotechnol 12: 82–86 (2001).
Fiehn O, Kopka J, Dormann P et al. Metabolite profiling for plant functional genomics. Nature Biotechnol 18: 1157–1161 (2000a).
Fiehn O, Kopka J, Trethewey RN, Willmitzer L. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72: 3573–3580 (2000b).
Fogel DB. A comparison of evolutionary programming and genetic algorithms on selected constrained optimization problems. Simulation 64: 397–404 (1995).
Fogel DB. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway (2000).
Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979).
Gilbert RJ, Goodacre R, Woodward AM, Kell DB. Genetic programming: a novel method for the quantitative analysis of pyrolysis mass spectral data. Anal Chem 69: 4381–4389 (1997).
Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989).
Goodacre R, Neal MJ, Kell DB. Quantitative analysis of multivariate data using artificial neural networks: a tutorial review and applications to the deconvolution of pyrolysis mass spectrtra. Z Bakteriol 284: 516–539 (1996).
Goodacre R, Shann B, Gilbert R et al. The detection of the dipicolinic acid biomarker in Bacillus spores using Curie-point pyrolysis mass spectrometry and Fourier transform infrared spectroscopy. Anal Chem 72: 119–127 (2000).
Goodacre R, Timmins EM, Burton R et al. Rapid identification of urinary tract infection bacteria using hyperspectral, whole organism fingerprinting and artificial neural networks. Microbiol 144: 1157–1170 (1998).
Harrington PB. Fuzzy rule-building expert systems: minimal neural networks. J Osmometries 5: 467–486 (1991).
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, Berlin (2001).
Heinrich R, Schuster S. The Regulation of Cellular Systems. Chapman and Hall, New York (1996).
Holland JH. Adaption in Natural and Artificial Systems. MIT Press, Cambridge (1992).
Horchner U, Kalivas JH. Further investigation on a comparative study of simulated annealing and genetic algorithm for wavelength selection. Anal Chim Acta 311: 1–13 (1995).
Johnson HE, Gilbert RJ, Winson MK et al. Explanatory analysis of the metabolome using genetic programming of simple, interpretable rules. Genet Program Evolv Mach 1: 243–258 (2000).
Jolliffe IT. Principal Component Analysis. Springer-Verlag, New York (1986).
Kell DB. Defence against the flood: a solution to the data mining and predictive modeling challenges of today. Bioinformatics World (part of Scientific Computing News) Issue 1: 16–18 (2002a) http://www.abcrgc.com/biwppl6–18 as publ.pdf.
Kell DB. Genotype-phenotype mapping: genes as computer programs. Trends Genet in press (2002b).
Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. Plant Phys 126: 943–951 (2001).
Kell DB, King RD. On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends Biotechnol 18: 93–98 (2000).
Kell DB, Mendes P. Snapshots of systems: metabolic control analysis and biotechnology in the post-genomic era. In Technological and Medical Implications of Metabolic Control Analysis. Cornish-Bowden A, Cardenas ML (Ed) pp. 3–25, Kluwer Academic Publishers, Dordrecht (2000) (see http://qbab.aber.ac.uk/dbk/mca99.htm).
Kell DB, Sonnleitner B. GMP — Good Modelling Practice: an essential component of Good Manafacturing Practice. Trends Biotechnol 13: 481–492 (1995).
Kell DB, Westerhoff HV. Towards a rational approach to the optimization of flux in microbial biotransformations. Trends Biotechnol 4: 137–142 (1986).
King RD, Muggleton S, Lewis RA, Sternberg MJE. Drug design by machine learning — the use of inductive logic programming to model the structure-activity-relationships of trimethoprim analogs binding to dihydrofolate-reductase. Proc Natl Acad Sci USA 89: 11322–11326 (1992).
Koza JR. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992).
Koza JR. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge (1994).
Koza JR, Bennett FH, Keane MA, Andre D. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, San Francisco (1999).
Langdon WB. Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Kluwer Academic Publishers, Boston (1998).
Langdon WB, Poli R. Fitness causes bloat: mutation. In Proc First European Workshop on Genetic Programming. Vol. 1391. Banzhaf W, Poli R, Schoenauer M, Fogarty TC (Ed) pp. 37–48, Springer-Verlag, Berlin (1998).
Langdon WB, Poli R. Foundations of Genetic Programming. Springer-Verlag, Berlin (2002).
Lavrac N, Dzeroski S. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester (1994).
Leardi R, Seasholtz MB, Pell RJ. Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data. Anal Chim Acta 461: 189–200 (2002).
Lindon JC, Nicholson JK, Holmes E, Everett JR. Metabonomics: metabolic processes studied by NMR spectroscopy of biofluids. Concepts Magn Reson 12: 289–320 (2000).
Lloyd JW. Foundations of Logic Programming. Springer-Verlag, Berlin (1987).
Manly BFJ. Multivariate Statistical Methods: A Primer. Chapman and Hall, London (1994).
Martens H, Naes T. Multivariate Calibration. John Wiley and Sons, Chichester (1989).
McGovern AC, Broadhurst D, Taylor J et al. Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: application to gibberellic acid production. Biotechnol Bioeng 78: 527–538 (2002).
McGovern AC, Ernill R, Kara BV et al. Rapid analysis of the expression of heterologous proteins in Escherichia coli using pyrolysis mass spectrometry and Fourier transform infrared spectroscopy with chemometrics: application to α2-interferon production. J Biotechnol 72: 157–167 (1999).
Mendes P. Emerging bioinformatics for the metabolome. Briefings Bioinformat 3: 134–45 (2002).
Mendes P, Kell DB, Westerhoff HV. Why and when channeling can decrease pool size at constant net flux in a simple dynamic channel. Biochim Biophys Acta 1289: 175–186 (1996).
Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin (1994).
Michalewicz Z, Fogel DB. How to Solve It: Modern Heuristics. Springer-Verlag, Heidelberg (2000).
Mitchell M. An Introduction to Genetic Algorithms. MIT Press, Boston (1995).
Mitchell TM. Machine Learning. McGraw Hill, New York (1997).
Muggleton SH. Inductive logic programming. New Generation Comput 8: 295–318 (1990).
Nychas GJE, Tassou CC. Spoilage processes and proteolysis in chicken as detected by HPLC. J Sci Food Agric 74: 199–208 (1997).
Oldroyd D. The Arch of Knowledge: An Introduction to the History of the Philosophy and Methodology of Science. Methuen, New York (1986).
Oliver SG. Proteomics: guilt-by-association goes global. Nature 403: 601–603 (2000).
Oliver SG, Winson MK, Kell DB, Baganz F. Systematic functional analysis of the yeast genome. Trends Biotechnol 16: 373–378 (1998).
Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993).
Raamsdonk LM, Teusink B, Broadhurst D et al. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnol 19: 45–50 (2001).
Radovic BS, Goodacre R, Anklam E. Contribution of pyrolysis mass spectrtrometry (Py-MS) to authenticity testing of honey. J Anal Appl Pyrolysis 60: 79–87 (2001).
Roger JM, Bellon-Maurel V. Using genetic algorithms to select wavelengths in near-infrared spectra: application to sugar content prediction in cherries. Appl Spectr 54: 1313–1320 (2000).
Rudolph G. Convergence Properties of Evolutionary Algorithms. Verlag Dr Kovac, Hamburg (1997).
Sana A, Keller JD. Algorithms for better representation and faster learning in radial basis functions. In Advances in Neural Information Processing Sytems. Vol. 2. Touretzky D (Ed) pp. 482–489, Morgan Kaufmann, San Mateo (1990).
Schwefel H-P. Evolution and Optimum Seeking. John Wiley and Sons, New York (1995).
Seasholtz MB, Kowalski B. The parsimony principle applied to multivariate calibration. Anal Chim Act 277: 165–177 (1993).
Shaw AD, Kaderbhai N, Jones A et al. Non-invasive, on-line monitoring of the biotransformation by yeast of glucose to ethanol using dispersive Raman spectroscopy and chemometrics. Appl Spectr 53: 1419–1428 (1999).
Tukey JW. Exploratory Data Analysis. Addison-Wesley, Reading (1977).
Vaidyanathan S, Kell DB, Goodacre R. Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification. J Am Sot-Mass Spectrom 13: 118–128 (2002).
Vaidyanathan S, Macaloney G, McNeill B. Fundamental investigations on the near-infrared spectra of microbial biomass as applicable to bioprocess monitoring. Analyst 124: 157–162 (1999).
Vaidyanathan S, Rowland JJ, Kell DB, Goodacre R. Rapid discrimination of aerobic endospore-forming bacteria via electrospray-ionisation mass spectrometry of whole cell suspensions. Anal Chem 73: 4134–4144 (2001).
Werbos PJ. The Roots of Back-Propagation: From Ordered Derivatives to Neural Networks and Political Forecasting. John Wiley and Sons, Chichester (1994).
Westerhoff HV, Kell DB. What BioTechnologists knew all along…? J Theor Biol 182: 411–420 (1996).
Wilkinson L. The Grammar of Graphics. Springer-Verlag, New York (1999).
Williams RR, Paradkar RP. Correcting fluctuating baselines and spectral overlap with genetic regression. Appl Spectr 51: 92–100 (1997).
Winson MK, Goodacre R, Woodward AM et al. Diffuse reflectance absorbance spectroscopy taking in chemometrics (DRASTIC). A hyperspectral FT-IR-based approach to rapid screening for metabolite overproduction. Anal Chim Acta 348: 273–282 (1997).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Goodacre, R., Kell, D.B. (2003). Evolutionary Computation for the Interpretation of Metabolomic Data. In: Harrigan, G.G., Goodacre, R. (eds) Metabolic Profiling: Its Role in Biomarker Discovery and Gene Function Analysis. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0333-0_13
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0333-0_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5025-5
Online ISBN: 978-1-4615-0333-0
eBook Packages: Springer Book Archive