Abstract
The fundamental and more critical steps that are necessary for the development and validation of QSAR models are presented in this chapter as best practices in the field. These procedures are discussed in the context of predictive QSAR modelling that is focused on achieving models of the highest statistical quality and with external predictive power. The most important and most used statistical parameters needed to verify the real performances of QSAR models (of both linear regression and classification) are presented. Special emphasis is placed on the validation of models, both internally and externally, as well as on the need to define model applicability domains, which should be done when models are employed for the prediction of new external compounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
REACH (2007) http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm
OECD Guidelines (2004) http://www.oecd.org/dataoecd/33/37/37849783.pdf
Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in chemoinformatics and QSAR modelling research. J Chem Inf Model 50:1189–1204
Tropsha A (2010) Best practices for QSAR model development. Validation, and Exploitation Mol Inform 29:476–488
http://www.netsci.org/Resources/Software/Modeling/CADD/adapt.html
Katritzky AR, Karelson M, Petrukhin R CODESSA PRO, University of Florida 2001–2005. http://www.codessa-pro.com/
MolConnZ (2003) Ver. 4.05, Hall Ass. Consult., Quincy, MA. http://www.edusoft-lc.com/molconn/
DRAGON—Software for the calculation of molecular descriptors. Talete srl, Milan, Italy. (http://www.talete.mi.it/products/dragon_description.htm)
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Wiley-VCH, Weinheim
(2002) HyperChem 7.03 Hypercube, Inc., Florida, USA. www.hyper.com
Jackson JE (1991) A user’s guide to principal components. Wiley, New York
Todeschini R, Consonni V, Maiocchi A (1999) The K correlation index: theory development and its application in chemometrics. Chemom Int Lab Syst 46:13–29
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281
Kubinyi H (1996) Evolutionary variable selection in regression and PLS analyses. J Chemom 10:119–133
Gramatica P, Pilutti P, Papa E (2004) Validated QSAR prediction of OH tropospheric degradability: splitting into training-test set and consensus modelling. J Chem Inf Comp Sci 44:1794–1802
Papa E, Villa F, Gramatica P (2005) Statistically validated QSARs and theoretical descriptors for the modelling of the aquatic toxicity of organic chemicals in Pimephales promelas (Fathead Minnow). J Chem Inf Model 45:1256–1266
Liu H, Papa E, Gramatica P (2006) QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem Res Toxicol 19:1540–1548
Gramatica P, Giani E, Papa E (2007) Statistical external validation and consensus modeling, A QSPR case study for Koc prediction. J Mol Graph Model 25:755–766
Gramatica P (2009) Chemometric methods and theoretical molecular descriptors in predictive QSAR modeling of the environmental behaviour of organic pollutants. In: Puzyn T, Leszczynski J, Cronin MTD (eds) Recent advances in QSAR studies. Springer, New York
Bhhatarai B, Gramatica P (2010) Per- and poly-fluoro toxicity (LC50 inhalation) study in rat and mouse using QSAR modeling. Chem Res Toxicol 23:528–539
Eriksson L, Jaworska J, Worth A et al (2003) Methods for reliability, uncertainty assessment, and applicability evaluations of regression based and classification QSARs. Environ Health Perspect 111:1361–1375
Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44:1–12
Golbraikh A, Tropsha A (2002) Beware of q2. J Mol Graph Model 20:269–276
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26:694–701
Efron B (1979) Bootstrap methods, another look at the jackknife. Ann Stat 7:1–26
Marengo E, Todeschini R (1992) A new algorithm for optimal distance-based experimental design. Chemom Int Lab Syst 16:37–44
Golbraikh A, Tropsha A (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J Comput Aid Mol Des 16:357–369
Gasteiger J, Zupan J (1993) Neural networks in chemistry. Angew Chem Int Ed Engl 32(503):527
Shi LM, Fang H, Tong W et al (2001) QSAR models using a large diverse set of estrogens. J Chem Inf Comput Sci 41:186–195
Schuurmann G, Ebert RU, Chen J et al (2008) External validation and prediction employing the predictive squared correlation coefficients test set activity mean vs training set activity mean. J Chem Inf Model 48:2140–2145
Roy PP, Somnath P, Indrani M et al (2009) On two novel parameters for validation of predictive QSAR models. Molecules 14:1660–1701
Consonni V, Ballabio D, Todeschini R (2009) Comments on the definition of the Q2 parameter for QSAR validation. J Chem Inf Model 49:1669–1678
Consonni V, Ballabio D, Todeschini R (2010) Evaluation of model predictive ability by external validation techniques. J Chemom 24:194–201
Nicola Chirico N, Gramatica P (2011) Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J Chem Inf Model 51(9):2320–2335
Chirico N, Papa E, Kovarich S, Cassani S, Gramatica P (2011) QSARINS, software for QSAR MLR model calculation and validation, 2008–2012 University of Insubria, Varese, Italy. http://www.qsar.it
Breiman L, Friedman JH, Olshen RA et al (1998) Classification and regression trees. Chapman & Hall, Boca Raton
Sharaf MA, Illman DL, Kowalski BR (1986) Chemometrics. Wiley Interscience, New York
Li J, Gramatica P (2010) Classification and identification of androgen receptor antagonists with various methods and consensus approach. J Chem Inf Mod 50:861–874
Zhu H, Tropsha A, Fourches D et al (2008) Combinational QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model 48:766–784
Netzeva TI, Worth AP, Aldenberg T et al (2005) Current status of methods for defining the applicability domain of (quantitative) structure–activity relationships. ATLA 33:155–173
Atkinson AC (1985) Plots, transformations and regression. Clarendon, Oxford
Acknowledgments
I wish to thank Dr. Nicola Chirico for his collaboration in preparing the Tables and Figures and for the implementation of QSARINS software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Gramatica, P. (2013). On the Development and Validation of QSAR Models. In: Reisfeld, B., Mayeno, A. (eds) Computational Toxicology. Methods in Molecular Biology, vol 930. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-059-5_21
Download citation
DOI: https://doi.org/10.1007/978-1-62703-059-5_21
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-058-8
Online ISBN: 978-1-62703-059-5
eBook Packages: Springer Protocols