Abstract
Quantitative structure-activity/property relationship (QSAR/QSPR) has been instrumental in unraveling the origins of the mechanism of action for biological activity of interest by means of mathematical formulation as a function of the physicochemical description of chemical structures. Of the growing number of QSAR models being published in the literature, it is estimated that the majority of these models are not reproducible given the heterogeneity of the components of the QSAR model setup (e.g., descriptor, learning algorithm, learning parameters, open-source and commercial software, different software versions, etc.) and the limited availability of the underlying raw data and analysis source codes used to construct these models. This inherently poses a challenge for newcomers and practitioners in the field to reproduce or make use of the published QSAR models. However, this is expected to change in light of the growing momentum for open data and data sharing that are being encouraged by funders, publishers, and journals as well as driven by the nextageneration of researchers who embrace open science for pushing science forward. This chapter examines these issues and provides general guidelines and best practices for constructing reproducible QSAR models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8(7):74–88
Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V (2010) Advances in computational methods to predict the biological activity of compounds. Exp Opin Drug Discov 5(7):633–654
Piir G, Kahn I, Garcia-Sosa AT, Sild S, Ahte P, Maran U (2018) Best practices for QSAR model reporting: physical and chemical properties, ecotoxicity, environmental fate, human health, and toxicokinetics endpoints. Environ Health Perspect 126(12):126001
Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194(4824):178–180
Fujita T, Winkler DA (2016) Understanding the roles of the “Two QSARs”. J Chem Inf Model 56(2):269–274
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010
Sprous DG, Palmer RK, Swanson JT, Lawless M (2010) QSAR in the pharmaceutical research setting: QSAR models for broad, large problems. Curr Top Med Chem 10(6):619–637
Fjodorova N, Novich M, Vrachko M, Smirnov V, Kharchevnikova N, Zholdakova Z et al (2008) Directions in QSAR modeling for regulatory uses in OECD member countries, EU and in Russia. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 26(2):201–236
Garabedian TE (1997) Laboratory record keeping. Nat Biotechnol 15(8):799–800
Rubacha M, Rattan AK, Hosselet SC (2011) A review of electronic laboratory notebooks available in the market today. J Lab Autom 16(1):90–98
Mascarelli A (2014) Research tools: jump off the page. Nature 507(7493):523–525
Macmillan Publishers Limited (2016) Announcement: where are the data? Nature 537(7619):138
Celi LA, Citi L, Ghassemi M, Pollard TJ (2019) The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS ONE 14(1):e0210232
Vasilevsky NA, Minnier J, Haendel MA, Champieux RE (2017) Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ 5:e3208
Greenwald NF, Bandopadhayay P, Beroukhim R (2017) Open data: spot data glitches before publication. Nature 550(7676):333
Gedeck P, Skolnik S, Rodde S (2017) Developing collaborative QSAR models without sharing structures. J Chem Inf Model 57(8):1847–1858
Polanski J, Bak A, Gieleciak R, Magdziarz T (2006) Modeling robust QSAR. J Chem Inf Model 46(6):2310–2318
Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N et al (2017) Towards the revival of interpretable QSAR models. In: Roy K (ed) Advances in QSAR modeling: applications in pharmaceutical, chemical, food, agricultural and environmental sciences. Springer International Publishing, Cham, pp 3–55. Available from: https://doi.org/10.1007/978-3-319-56850-8_1
Guha R, Willighagen E (2012) A survey of quantitative descriptions of molecular structure. Curr Top Med Chem 12(18):1946–1956
Grisoni F, Consonni V, Todeschini R (2018) Impact of molecular descriptors on computational models. In: Brown JB (ed) Computational chemogenomics. Humana Press, New York, pp 171–209
Guha R, Van Drie JH (2008) Structure–activity landscape index: identifying and quantifying activity cliffs. J Chem Inf Model 48(3):646–658
Sisay MT, Peltason L, Bajorath J (2009) Structural interpretation of activity cliffs revealed by systematic analysis of structure-activity relationships in analog series. J Chem Inf Model 49(10):2179–2189
Guimarães MC, Duarte MH, Silla JM, Freitas MP (2016) Is conformation a fundamental descriptor in QSAR? A case for halogenated anesthetics. Beilstein J Org Chem 12:760–768
Pissurlenkar RR, Khedkar VM, Iyer RP, Coutinho EC (2011) Ensemble QSAR: a QSAR method based on conformational ensembles and metric descriptors. J Comput Chem 32(10):2204–2218
Wicker JG, Cooper RI (2016) Beyond rotatable bond counts: capturing 3D conformational flexibility in a single descriptor. J Chem Inf Model 56(12):2347–2352
Dearden J, Cronin M, Kaiser K (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20(3–4):241–266
Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22(1):69–77
Golbraikh A, Muratov E, Fourches D, Tropsha A (2014) Data set modelability by QSAR. J Chem Inf Model 54(1):1–4
Roy PP, Kovarich S, Gramatica P (2011) QSAR model reproducibility and applicability: a case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo-)triazoles. J Comput Chem 32(11):2386–2396
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L et al (2018) Conformal regression for quantitative structure-activity relationship modeling-quantifying prediction uncertainty. J Chem Inf Model 58(5):1132–1140
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11(1):4
Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O (2018) Predicting off-target binding profiles with confidence using conformal prediction. Front Pharmacol 9:1256
Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE (2010) Towards interoperable and reproducible QSAR analyses: exchange of datasets. J Cheminform 2:5
Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J et al (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinfo 8:59
Ruusmann V, Sild S, Maran U (2014) QSAR DataBank – an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25
Ruusmann V, Sild S, Maran U (2015) QSAR DataBank repository: open and linked qualitative and quantitative structure-activity relationship models. J Cheminform 7:32
Ruusmann V, Sild S, Maran U (2012) r-qsardb R package. https://code.google.com/archive/p/r-qsardb/
Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50(7):1189–1204
Fourches D, Muratov E, Tropsha A (2016) Trust, but verify II: a practical guide to chemogenomics data curation. J Chem Inf Model 56(7):1243–1252
Fourches D, Muratov E, Tropsha A (2015) Curation of chemogenomics data. Nat Chem Biol 11(8):535
Landrum G (2016) Reading and writing molecules 1. https://raw.githubusercontent.com/greglandrum/rdkit-tutorials/master/notebooks/001_ReadingMolecules1.ipynb
Acknowledgement
This work is supported by the Research Career Development Grant (No. RSA6280075) from the Thailand Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Nantasenamat, C. (2020). Best Practices for Constructing Reproducible QSAR Models. In: Roy, K. (eds) Ecotoxicological QSARs. Methods in Pharmacology and Toxicology. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0150-1_3
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0150-1_3
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0149-5
Online ISBN: 978-1-0716-0150-1
eBook Packages: Springer Protocols