Machine learning-aided generative molecular design

  • Review Article
  • Published:

From Nature Machine Intelligence

Machine learning has provided a means to accelerate early-stage drug discovery by combining molecule generation and filtering steps in a single architecture that leverages the experience and design preferences of medicinal chemists. However, designing machine learning models that can achieve this on the fly to the satisfaction of medicinal chemists remains a challenge owing to the enormous search space. Researchers have addressed de novo design of molecules by decomposing the problem into a series of tasks determined by design criteria. Here we provide a comprehensive overview of the current state of the art in molecular design using machine learning models as well as important design decisions, such as the choice of molecular representations, generative methods and optimization strategies. Subsequently, we present a collection of practical applications in which the reviewed methodologies have been experimentally validated, encompassing both academic and industrial efforts. Finally, we draw attention to the theoretical, computational and empirical challenges in deploying generative machine learning and highlight future opportunities to better align such approaches to achieve realistic drug discovery end points.

Fig. 1: Generative ML-assisted molecular design pipeline.
Fig. 2: Illustrations for generative tasks, generative strategies and molecular representations.
Fig. 3: Selected examples of experimentally validated generative designs.

J.G. and P.S. acknowledge support from the NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation. A.R.J. is funded by a Biotechnology and Biological Sciences Research Council (BBSRC) DTP studentship (BB/M011194/1). Y.W. acknowledges the support of Cornell Presidential Life Science Fellowship. We are grateful to K. Atz and A. Mueller for helpful feedback and discussion.

Author information

Authors and Affiliations



Y.D., A.R.J. and J.G. led this work under the supervision of P.S. and T.L.B. and contributed equally. T.F. and C.H. also contributed equally. All authors contributed ideas and discussions to writing, reviewing and editing of the paper before submission.

Corresponding authors

Correspondence to Philippe Schwaller or Tom L. Blundell.

Ethics declarations

Competing interests

A.R.J. declares a potential financial conflict of interest due to his role as a machine learning scientist at Prescient Design, Genentech. The other authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks J. B. Brown and Ola Engkvist for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

