Abstract
Chemical creativity in the design of new synthetic chemical entities (NCEs) with drug-like properties has been the domain of medicinal chemists. Here, we explore the capability of a chemistry-savvy machine intelligence to generate synthetically accessible molecules. DINGOS (design of innovative NCEs generated by optimization strategies) is a virtual assembly method that combines a rule-based approach with a machine learning model trained on successful synthetic routes described in chemical patent literature. This unique combination enables a balance between ligand-similarity-based generation of innovative compounds by scaffold hopping and the forward-synthetic feasibility of the designs. In a prospective proof-of-concept application, DINGOS successfully produced sets of de novo designs for four approved drugs that were in agreement with the desired structural and physicochemical properties. Target prediction indicated more than 50% of the designs to be biologically active. Four selected computer-generated compounds were successfully synthesized in accordance with the synthetic route proposed by DINGOS. The results of this study demonstrate the capability of machine learning models to capture implicit chemical knowledge from chemical reaction data and suggest feasible syntheses of new chemical matter.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The trained machine learning model, CAS numbers of the training data and reaction SMARTS used in this Article are provded in the Code Ocean capsule https://doi.org/10.24433/CO.6930970.v132. All molecules were preprocessed in accordance with the procedure stated in the Methods (see ‘Molecular building blocks’ section).
Code availability
The code for this Article, along with an accompanying computational environment, are available and executable online as a Code Ocean capsule: https://doi.org/10.24433/CO.6930970.v132.
References
Shih, H.-P., Zhang, X. & Aronov, A. M. Drug discovery effectiveness from the standpoint of therapeutic mechanisms and indications. Nat. Rev. Drug Discov. 17, 19–33 (2017).
Hartenfeller, M. & Schneider, G. Enabling future drug discovery by de novo design. Wiley Interdiscip. Rev. Comput. Mol. Sci. 1, 742–759 (2011).
Blakemore, D. C. et al. Organic synthesis provides opportunities to transform drug discovery. Nat. Chem. 10, 383–394 (2018).
Schneider, P. & Schneider, G. De novo design at the edge of chaos. J. Med. Chem. 59, 4077–4086 (2016).
Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharmacol. Rev. 66, 334–395 (2013).
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018).
Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inform. 37, 1700153 (2018).
Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018).
Merk, D., Grisoni, F., Friedrich, L. & Schneider, G. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators. Commun. Chem. 1, 68 (2018).
Lowe, D. M. Chemical reactions from US patents (1976–Sep2016) (2017); https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Feng, F., Lai, L. & Pei, J. Computational chemical synthesis analysis and pathway design. Front. Chem. 6, 199 (2018).
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Grisoni, F. et al. Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity. Commun. Chem. 1, 44 (2018).
Merk, D., Grisoni, F., Friedrich, L., Gelzinyte, E. & Schneider, G. Scaffold hopping from synthetic RXR modulators by virtual screening and de novo design. Med. Chem. Commun. 9, 1289–1292 (2018).
Grisoni, F., Merk, D., Byrne, R. & Schneider, G. Scaffold-hopping from synthetic drugs by holistic molecular representation. Sci. Rep. 8, 16469 (2018).
MACCS-II (MDL Information Systems, 1987).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proceedings of 3 rd International Conference on Learning Representations, ICLR2015, 1–13 (2015).
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).
ChEMBL Database (EBI, 2017); https://www.ebi.ac.uk/chembl/
Johnson, M. A. & Maggiora, G. M. Concepts and Applications of Molecular Similarity (Wiley, 1990).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).
Reutlinger, M. et al. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inform. 32, 133–138 (2013).
Molecular Operating Environment (MOE) (Chemical Computing Group, 2017).
O’Boyle, N. M. & Sayle, R. A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminform. 8, 1–14 (2016).
RDKit: Open-source Cheminformatics (RDKit); www.rdkit.org
Reaxys (Elsevier).
Wolber, G. & Langer, T. LigandScout: 3D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 45, 160–169 (2005).
Button, A., Merk, A., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Code Ocean (2019); https://doi.org/10.24433/CO.6930970.v1
Acknowledgements
The authors thank L. Friedrich, C. Brunner, B. Huisman, X. Zhang and R. Byrne for stimulating discussions and technical support. D.M. was financially supported by an ETH Zurich Postdoctoral Fellowship (grant no. 16–2 FEL-07). This research was financially supported by the Swiss National Science Foundation (grant no. 205321_182176 to G.S.).
Author information
Authors and Affiliations
Contributions
A.B. programmed the software and performed the computational experiments. A.B., J.A.H. and G.S. designed the algorithm and analysed the data. D.M. supervised the chemical part of the study and, together with A.B., synthesized the compounds. G.S. designed the study. All authors analysed the results and contributed to the manuscript.
Corresponding author
Ethics declarations
Competing interests
G.S. declares a potential conflict of interest in his role as life-science industry consultant and cofounder of inSili.com GmbH, Zurich. No other competing interests are declared.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary material
Supplementary figures and tables
Rights and permissions
About this article
Cite this article
Button, A., Merk, D., Hiss, J.A. et al. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat Mach Intell 1, 307–315 (2019). https://doi.org/10.1038/s42256-019-0067-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-019-0067-7
- Springer Nature Limited
This article is cited by
-
Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B
Journal of Cheminformatics (2024)
-
Prospective de novo drug design with deep interactome learning
Nature Communications (2024)
-
An algorithmic framework for synthetic cost-aware decision making in molecular design
Nature Computational Science (2024)
-
Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR
Nature Reviews Drug Discovery (2024)
-
AMAdam: adaptive modifier of Adam method
Knowledge and Information Systems (2024)