Abstract
The study of prediction of toxicity is very important and necessary because measurement of toxicity is typically time-consuming and expensive. In this paper, Recursive Partitioning (RP) method was used to select descriptors. RP and Support Vector Machines (SVM) were used to construct structure–toxicity relationship models, RP model and SVM model, respectively. The performances of the two models are different. The prediction accuracies of the RP model are 80.2% for mutagenic compounds in MDL’s toxicity database, 83.4% for compounds in CMC and 84.9% for agrochemicals in in-house database respectively. Those of SVM model are 81.4%, 87.0% and 87.3% respectively.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Abbreviations
- SVM:
-
Support Vector Machines
- RP:
-
Recursive Partitioning
References
Benigni R. (2005) Structure–activity relationship studies of chemical mutagens and carcinogens: mechanistic investigations and prediction approaches. Chem Rev 105:1767–1800
World Health Organization (WHO) (1985) Guide to short-term tests for detecting mutagenic and carcinogenic chemicals. Environmental Health Criteria 51:100–114
Ashby J, Tennant RW (1991) Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat Res 257:229–306
Klopman G, Rosenkranz HS (1992) Testing by artificial intelligence: Computational alternatives to the determination of mutagenicity. Mutat Res 272:59–71
Ridings JE, Barratt MD, Cary R, Earnshaw GG, Eggington E, Ellis MK, Judson PN, Langowski JJ, Marchant CA, Payne MP, Watson WP, Yih TD (1996) Computer prediction of possible toxic action from chemical structure; an update on the DEREK system. Toxicology 106:267–279
Klopman G (1992) MULTICASE 1. A hierarchical computer automated structure evaluation program. Quant Struct Act Relat 11:176–184
Enslein K, Gombar VK, Blake BW (1994) Use of SAR in computer-assited prediction of carcinogenicity and mutagenicity of chemicals by the TOPKAT program. Mutat Res 305:47–61
Young SS, Gombar VK, Emptage MR, Cariello NF, Lambert C (2002) Mixture- deconvolution and analysis of Ames mutagenicity data. Chem Intel Lab Sys 60:5–11
Bacha PA, Gruver HS, Den Hartog BK, Tamura SY, Nutt RF (2002) Rule extraction from a mutagenicity data set using adaptively grown phylogenetic-like trees. J Chem Inf Comput Sci 42:1104–1111
(a) Kazius J, McGuire R, Bursi R Derivation and validation of toxicophores for mutagenicity prediction, J Med Chem 48 312–320 (b) Data from http://www.cheminformatics.org/
(a) Helma C, Cramer T, Kramer S, Raedt L (2004) Data mining and machine learning techniques for the identification of mutagenicity: inducing substructures and structure activity relationships of noncongeneric compounds, J Chem Inf Comput Sci 44 1402–1411, (b) Data from http://www.predictive-toxicology.org/data/cpdb_mutagens/
(a) Feng J, Lurati L, Ouyang H, Robinson T, Wang Y, Yuan S, Young SS (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J Chem Inf Comput Sci 43 1463–1470, (b) Data from http://www.niss.org/publications.html
Liao Q, Yao JH, Li F, Yuan SG, Doucet JP, Panaye A, Fan BT (2004) CISOC-PSCT: a predictive system for carcinogenic toxicity. SAR QSAR Environ Res 15:217–235
Liao Q, Yao JH, Yuan SG (2006) SVM approach for predicting LogP. Mol Divers 10:301–309
Breiman L, Friedman JH, Olshen RA, Stone CG (1984) Classification and regression trees. Wadsworth International Group, Belmont, CA
Myles AJ, Brown SD (2003) Induction of decision trees using fuzzy partitions. J Chemomet 17:531–536
Vapnik VN (ed) (1998) Statistical learning theory. John Wiley & Sons, New York
Cristianini N, Shawe-Taylor J (eds) (2000) An introduction to support vector machines. Cambridge University Press, Cambridge, UK
Burges CJC (1998) A tutorial on support vector machine for pattern recognition. Data Min. Knowl. Disc 2:121–167
http://www.mdli.com/products/predictive/toxicity/
http://www.mdli.com/products/knowledge/medicinal_chem/
http://www.nature.com/nrg/journal/v5/n4/glossary/nrg 1317_glossary.html
Rusinko A, Farmen MW, Lambert CG, Brown PL, Young SS (1999) Analysis of a large structure/biological activity data set using Recursive Partitioning. J Chem Inf Comput Sci 39:1017–1026
Blower P, Fligner M, Verducci J, Bjoraker J (2002) On combining Recursive Partitioning and Simulated Annealing to detect groups of biologically active compounds. J Chem Inf Comput Sci 42:393–404
Tong W, Hong H, Fang H, Xie Q, Perkins R (2003) Decision forest: combining the predictions of multiple independent decision tree models. J Chem Inf Comput Sci 43:525–531
Daszykowski M, Walczak B, Xu QS, Daeyaert F, de Jonge MR, Heeres J, Koymans LM, Lewi PJ, Vinkers HM, Janssen PA, Massart DL (2004) Classification and Regression Trees-studies of HIV reverse transcriptase inhibitors. J Chem Inf Comput Sci 44:716–726
DeLisle RK, Dixon SL (2004) Induction of Decision Trees via Evolutionary Programming. J Chem Inf Comput Sci 44:862–870
Bai JPF, Utis A, Crippen G, He HD, Fischer V, Tullman R, Yin HQ, Hsu CP, Jiang L, Hwang KK (2004) Use of classification regression tree in predicting oral absorption in humans. J Chem Inf Comput Sci 44:2061–2069
Furnkranz J (1997) Pruning algorithms for rule learning. Mach Learn 27:139–172
Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design by machine learning: Support Vector Machines for pharmaceutical data analysis. Comput Chem 26:5–14
Song M, Breneman CM, Bi J, Sukumar N, Bennett KP, Cramer S, Tugcu N (2002) Prediction of protein retention times in anion-exchange chromatography systems using Support Vector Regression. J Chem Inf Comput Sci 42:1347–1357
Kramer S, Frank E, Helma C (2002) Fragment generation and Support Vector Machines for inducing SARs. SAR QSAR Environ Res 13:509–523
Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, Pletnev IV (2003) Drug discovery using Support Vector Machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions. J Chem Inf Comput Sci 43:2048–2056
Luan F, Zhang RS, Zhao CY, Yao XJ, Liu MC, Hu ZD, Fan BT (2005) Classification of the carcinogenicity of N-Nitroso compounds based on Support Vector Machines and Linear Discriminant Analysis. Chem Res Toxicol 18:198–203
Byvatov E, Fechner U, Sadowski J, Schneider G (2003) Comparison of Support Vector Machine and Artificial Neural Network systems for drug/nondrug classification. J Chem Inf Comput Sci 43:1882–1889
Chang CC, Lin CJ, LIBSVM – A library for Support Vector Machines, http://www.csie.ntu.edu.tw/∼cjlin/libsvm/index.html
Hsu CW, Chang CC, Lin CJ, A practical guide to Support Vector Classification, http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf
Acknowledgments
The authors thank Dr. R. Bursi, Dr. S. S. Young and Dr. C. Helma for supplying the data sets. This work was supported in part by the National Basic Research Program (also called 973 Program) of China, through Grants 2003CB114400; by the National High-Tech. Program (also called 863 Program), through Grants 2006AA02Z39; by National Natural Science Foundation of China through Grants 20473112 and 20572120; by Chinese Academy of Sciences, through Grants KGCX2-SW-213-05 and KGCX2-SW-213-01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liao, Q., Yao, J. & Yuan, S. Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines. Mol Divers 11, 59–72 (2007). https://doi.org/10.1007/s11030-007-9057-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-007-9057-5