Introduction

Computer-Aided Drug Design and Computational Chemistry (here termed CADD) are an integral component of drug discovery programs at multiple Boehringer Ingelheim (BI) research sites. The CADD scientists at BI work across different therapeutic areas at their sites in close proximity to the medicinal chemists (including combinatorial chemistry). The CADD groups at BI contribute to individual drug discovery projects employing a multitude of approaches extending from chemoinformatics to molecular modeling. These approaches include most aspects of structure-based and ligand-based drug design, predictive modeling, as well as the prioritization and analysis of compound selections through virtual screening, triaging of screening hit sets, and the design of combinatorial library screening decks. The application of these techniques ranges from target selection to lead discovery and optimization including toxicity predictions. In addition, providing CADD technology and encouraging the uptake of decision supporting solutions by project teams is a driving force for CADD activities at BI. Tasks related to bioinformatics, such as pathway or gene data analyses, are typically the remit of the computational biology groups. With very few exceptions, the CADD work at BI focuses on small molecule drug discovery, although in some cases biologics research has been supported [1, 2].

Although the CADD groups mostly support site-specific projects at the three main BI research sites in Ridgefield, Biberach and Vienna, we have also been implementing a global concept for developing key strategies, best practices, sharing of workflows, protocols and software solutions across all sites. We will illustrate the synergy that can be gained from this growing global focus using examples taken from the recently established Computational Chemistry Framework (CCFW), an in-house global virtual screening platform for designing libraries for lead identification, and a global infrastructure for deploying numerous predictive models. In addition, we will illustrate how the CADD scientists contribute to the advancement of projects, interact with medicinal chemists and develop technology that impacts project decisions.

The roles of CADD in drug discovery

Usually, CADD scientists at BI join research project teams at the stage of hit identification. A CADD scientist fulfills different roles within a drug discovery project team, which are categorized here as project contributor, data analyst and technical enabler. In the major role as project contributor, a CADD scientist applies rational computational chemistry approaches to finding novel compounds with improved overall property profiles. To this end, a CADD scientist contributes to devising and executing hit finding plans, often including virtual screening or de novo design, analyzing screening hit sets, and finding and designing hit analogs, as well as target deconvolution and the identification of target-ligand associations in phenotypic screening campaigns. Hypotheses and models are generated that guide further compound optimization and trigger novel design ideas. During lead optimization, CADD influences or guides the team direction by solving project-specific problems that are often associated with target-independent parameters ranging from target selectivity to PK, tolerability and safety related issues. As more and more unprecedented targets are being pursued in drug discovery, a number of factors help to inform target selection decisions or to design research plans for target enablement. The preferred modes of action (allosteric or orthosteric target modulators), target drug ability and opportunities to identify potential tool compounds are typical examples of CADD scientists getting involved early on before hit identification. All of these tasks are part of the standard repertoire of every CADD scientist at BI.

CADD scientists collaborate very closely with scientists from other experimental disciplines (i.e., screening and profiling units or protein crystallography, for structure-enabled projects) and the medicinal chemists. BI encourages project team members to compete for generating the best design ideas. All design proposals are collected and shared. CADD scientists work very closely with individual chemists to understand their objectives, plans, strategies, timelines and synthesis conditions. Particularly in relation to structure-based projects, the CADD scientists hold regular 3D design and brainstorming sessions with the entire project team to interactively generate new synthesis suggestions from within the team. The team then comes together and decides which compounds will be synthesized next. This decision is based on a transparent and data driven analysis that is independent of the source of the idea. Open, transparent communication, sharing of ideas and team spirit are essential for entering into a productive competition of ideas capable of inspiring even better designs.

The second role of a CADD scientist is to act as a data analyst. Within the context of CADD a data analyst is specialized in transforming the relevant experimental data into hypotheses, which are in turn used to drive the discovery and optimization of compounds. This role is supported by the knowledge of the existence, availability, content, integrity [3] and architecture of internal and external data sources, as well as by the expertise in accessing, processing, and analyzing the data. Pipelining tools such as Pipeline Pilot [4] or KNIME [5] are key to extracting, combining and pre-processing data before they are subjected to more sophisticated computational analyses, such as principal component analysis, machine learning or clustering. Very often, the role of a data analyst also includes project team support by compiling the project-relevant data, e.g. for SAR analysis. Additional input to data-driven decision making within project teams is provided by analyzing cross-project data, such as the results from previous screening campaigns for the identification of potential off-targets or the triaging of hit sets [6].

The third, and increasing, role of a CADD scientist is to enable medicinal chemists to utilize the computer-aided drug design tools on their own. The medicinal chemists at BI design compounds in collaboration with computational chemists and use some CADD design tools independently. Medicinal chemists have been trained in the use of certain CADD tools and have gained an adequate understanding of the methodologies involved. This stimulates discussions about novel computational tools for compound design and simplifies the alignment between CADD and medicinal chemistry. CADD tools in the hands of the medicinal chemist enable a rapid iteration of design ideas within the context of the constraints imposed by synthetic accessibility and compound profile requirements, leading to overall faster design cycles. Enabling the medicinal chemists in this way frees up time for the CADD scientists, who can then concentrate on more demanding design or analysis tasks or new tool development.

The CADD scientists at BI are permanent or temporary project team members contributing to projects beyond their three core roles described above. Where appropriate, they can also lead the chemistry component of early drug discovery projects in the exploratory and hit identification phases. Due to the close collaboration with structural biologists and assay scientists, as well as the focus on devising hit finding plans tailored to the individual projects, experienced CADD scientists are well suited to this role.

CADD adds value to a project when it drives or influences the decisions taken by a project team, and when it facilitates faster decisions. While it is straightforward to describe the qualitative indicators for value added, a quantitative assessment is much more difficult. Different metrics for CADD performance have been discussed, including categorizing and counting the CADD contributions to individual projects [7] or quantifying the quality of CADD work (e.g., agreement of computations with experimental data [8]). Another way of measuring performance is, of course, the assessment of the customer (project team) satisfaction. At BI we collect internal feedback from project team leaders and other key partners. In our experience, a direct dialogue about the mutual expectations and the subsequent impact of CADD on a project is an appropriate way of assessing the added value. These discussions occur on an ongoing basis throughout the year to ensure an optimal alignment of CADD work and project needs within the context of the project portfolio.

In addition to contributing to drug discovery projects, CADD scientists advance the portfolio of CADD technologies that can be applied to projects. The use of sophisticated and computationally demanding CADD technology is encouraged at BI. While we agree, at least in principle, with the concept of parsimony when applying modeling approaches to projects, as postulated recently by Roche scientists [9], we are also convinced that it is worth investing in computer-intensive technologies when the results lead to new, meaningful and testable hypotheses. Molecular dynamics simulations are performed for multiple purposes at BI, including the analysis of water clusters, thermodynamic integration calculations [10] for calculating binding free energies, or simulations for assessing the stability of proposed binding modes of compounds or fragments [11]. We use GPU clusters to enable high speed MD simulations, and we have also started exploring cloud computing as an additional resource for MD simulations and other computer-intensive tasks.

In order to advance new computational chemistry methodologies into productive, value-adding applications in drug discovery projects, the CADD scientists proactively monitor the trends and new developments within the field. To keep in-house efforts associated with the implementation and maintenance of new algorithms at a minimum, we typically take advantage of new functionalities when they are added to our commercial computational chemistry software suites or robust open-source frameworks. However, very often the scheduling of new functionalities added to commercial or open-source software tools is not in-line with the in-house needs for enhancements as governed by ongoing drug discovery programs. Tapping into the full potential of new technology requires the use of in-house data sets for validation, preferably in prospective settings. Also a tight integration of new technology into internal workflows becomes necessary to harvest the full potential of technology developments. Collaborations with academic groups have proven to be an essential element in advancing the technology that is available for in-house applications. Many technologies that are now part of the productive CADD portfolio at BI were initially explored in collaboration with academic partners and summer students. These methods include SAR analyses [12], predictive modeling [13, 14], quantum-chemical calculations of H-bond strengths [15], optimal pi–pi stacking geometries [16], GPCR modeling [17], computational protocols to postulate druggable binding sites at protein–protein-interaction interfaces [18], and conformational analyses [19]. We are also engaged in research into methodologies that are still at a very early stage but that we anticipate will have a high impact on applications in the future. An example of this type of research is the decomposition of protein–ligand interaction energies employing quantum-chemical calculations [20]. Furthermore, academic collaborations have great value in stimulating discussions between academic groups and the CADD scientists at BI, as current procedures are challenged and ideas for further advancement are generated. Often, the integration of tools from academic collaborations into the in-house IT and high-performance computing environment requires additional effort. We have found that working with widely accessible toolkits or platforms such as Java, Python [21], Knime [5] or RDKit [22] helps to keep this effort to a minimum. In some cases we have implemented new algorithms from scratch [23, 24] and, more recently, we have employed crowd-sourcing as part of a community challenge to generate predictive models for mutagenicity [25, 26].

A common platform for compound design

A common platform shared by CADD scientists, structural biologists and medicinal chemists strengthens both teamwork and collaboration in compound design. Moreover, it enhances the efficiency and transparency of decision making because hypotheses, such as proposed binding modes and ideas for compounds to be synthesized, can be shared in a common format. Molecular modeling tools such as MOE [27] are complex and powerful expert systems, with their own built-in development toolkits. These allow the development and implementation of new modules, and the design of highly customized user interfaces for incorporating in-house and external tools. In recent years, medicinal chemists have become more amenable to using such tools and to conducting, for example, structure-based compound design campaigns. However, exploitation of the full potential of modeling tools requires skills such as scripting and GUI customization, and an in-depth understanding of the individual functionalities and their underlying computational algorithms. These skills are more likely to reside with computational chemists and IT experts. At BI, a global MOE working group has been installed to coordinate the deployment of tools and new features in MOE, as well as the individual customization of the user interface at each of the three BI research sites. An efficient, world-wide deployment procedure has been implemented for new releases of MOE and the updating of BI-specific features. Computational chemists and IT experts compile one global MOE package that also allows the inclusion of site-specific settings. For example, there is a site-specific top-level menu for each site that governs the access to available tools. All site-specific customizations are setup independently to avoid dependencies. Based on a mechanism that determines from which site a MOE session is being started, the appropriate menus and libraries are loaded. MOE has been enhanced by a number of external tools that are invoked via a communication meta-layer (vide infra). These tools include, for example, various property (e.g., logP) and in silico ADME descriptors, including property profile meta-services. The services can be invoked from within the MOE system or from the MOE database viewers (MDB).

Another recent example is the introduction of a DFT-based torsional analysis tool to medicinal chemists. This tool permits an easy, color-coded assessment of optimal compound conformations (Fig. 1).

Fig. 1
figure 1

View of DFT-based torsion scan results in MOE. The conformations are color-coded by torsion strain energy. Structures with carbon atoms colored in green represent the low energy conformations

Following the interactive selection of the four atoms that define a torsion angle, the molecule along with the torsion angle specification, is submitted to a calculation service engine (CCFW, vide infra). Input molecules with fixed incremental torsion angles are constructed and subjected to QM calculations on an HPC cluster. Due to the computation times required, a synchronous service cannot be operated interactively. Therefore, the results are collated in MOE, MOE database and Excel spreadsheet formats, and are automatically sent to the user by email after the calculation has been completed. Another complex service estimates the mutagenicity potential of compounds based on ab initio calculations of nitrenium ion stability for aromatic amines [28, 29]. These examples illustrate how fairly complex and CPU intensive tasks can be transferred to the medicinal chemists at BI, provided that the tasks can be reasonably standardized or formalized as a routine workflow without the need for manual intervention by the user.

Another tool frequently used by the medicinal chemistry community is a docking utility that runs GOLD [30] and, optionally, a 2D–3D conversion step using CORINA [31] in the background. The preparation, optimization and provision of the GOLD configuration files and pre-aligned protein structures are the responsibility of the computational chemistry experts. The various configuration files that control the different docking scenarios are provided, such as constrained, unconstrained or covalent docking, depending on the individual project requirements. Medicinal chemists can then select the most appropriate docking protocol from a web-based interface that submits all input files (protein structure and ligand, configuration file) to the backend service. Docking poses are returned in SDF or MOE format ready to be used in further design cycles. The docking results can be automatically combined with property predictions that prompt the medicinal chemists to consider multi-parameter optimization criteria as part of their decision making.

An important part of the support given to medicinal chemists in computational compound design is the provision of 3D-structural data that are ready for use. We have established an automated workflow for compiling pre-aligned structural data in so-called project master files that can be customized for each project. The workflow utilizes the MOEProject framework and in-house scripts for calculating the crystallographic packing environment around ligand binding sites, splitting protein multimers into biologically relevant units, protonating the structures, and aligning defined protein chains onto a reference. In addition, structures can be annotated and grouped by chemical scaffolds, biological activity data, calculated properties and customized classifications such as, for example, the flip state of a certain side chain, point mutation, or cofactor type. This facilitates the survey of the available structural information allowing immediate utilization of the appropriate structures in compound design.

Learning from data: predictive modeling and matched molecular pair (MMP) analyses

Prospectively predicting experimental parameters or the effect of molecular transformations on molecular properties significantly impacts the efficiency and shortens the design-learning cycle. Therefore, predictive modeling for ADMET endpoints has been a growing focus for computational chemists at BI [32, 33]. Unsurprisingly, commercially available models for most endpoints are not as relevant as those built from the vast repository of assay data accumulated over time at BI. We have established the following principles for building in silico models and for sharing them with the medicinal chemists:

  1. 1.

    Frequent, automated re-training and updating of predictive models guarantees the use of all relevant data, including the most recent data that can be particularly relevant for predictions on new analogs of actively explored compound series. After a few weeks, a deterioration in the predictive power of the predictive models can be measured [33]. Recently, we introduced a model rebuilding scheme that starts automatically as soon as new data becomes available. Automatically updating models ensure that the best prediction is available at all times. This means that predictions made today can be different from those made yesterday. The idea of giving up prediction consistency in favor of prediction accuracy has been gaining wider acceptance at BI.

  2. 2.

    Each prediction is returned together with a confidence estimate. Providing an easily interpretable way of managing expectations was found to be of great value when discussing prediction results with medicinal chemists. Predicted values with a confidence value below a certain threshold are not typically taken into account in the decision making process. In our experience confidence estimates derived from prediction agreements from an ensemble of different models perform particularly well (Fig. 2). However, applicability in descriptor space and confidence assessments by prediction distribution methods are used as well.

    Fig. 2
    figure 2

    Influence of different confidence measures on the prediction accuracy of test compounds for a predictive human liver microsomal stability model. An ensemble agreement rate for 50 models used as confidence measure (shown in red) generates much higher prediction accuracies for a given fraction of compounds than a k-nearest neighbor (kNN) confidence measure does (shown in blue)

  3. 3.

    Early human dose predictions are used as an alternative to multi-parameter optimization scores. Currently, human dose predictions are triggered when in vitro stability and potency data are collected [34]. Initially, volume of distribution, plasma protein binding, and the efficacious dose are predicted by in silico methods [13] and are subsequently refined as experimental data becomes available.

  4. 4.

    Seamless access to predicted properties and integration with other modeling output, such as docking results, is highly valued. Automated docking results are, for example, sometimes combined with predicted ADME properties without a specific request. The properties are then predicted on the fly when chemists draw or modify the structures in Marvin or MOE.

Although there is a trend towards building more predictive models serving a global (across sites) medicinal chemistry community at BI, many models are still built at the individual sites, primarily for local customers and often for project-specific purposes. However, we have standardized the deployment of and access to models by establishing a meta-layer (vide infra) that permits local model building and global consumption at multiple front-ends, including MOE, Marvin, Knime and Pipeline Pilot. This means that any model can easily be switched from a local to a global model and vice versa. This setup allows the combination of model predictions with the results from other tools, such as automated docking engines. For performance reasons we have moved away from workflow software such as Knime and Pipeline Pilot for in silico model building, in favor of a Python-based model building framework that is optimized for speed. It enables chemists to generate prediction results using Marvin [35] or MOE as the front-end within a few seconds of drawing or modifying a molecule. This allows predictive models to be integrated into the synthesis planning in an interactive way.

Table 1 summarizes in silico models used productively at BI. Similar to what has been reported for other pharmaceutical research efforts [36], we focus on predicting end points for which a sizable number of data points are available. We employ commonly-used machine learning methodologies such as random forests and support vector machines to train regression and classification models. Improving the predictive models for the parameters that are most relevant for human dose predictions has become a recent focus, with emphasis on in vitro clearance, volume of distribution, and plasma protein binding models. In addition, we are building target potency and efficacy models based on data automatically extracted from our compound database, and are combining them with models and data from the literature [37].

Table 1 In silico models in production at BI

We are currently working on integrating the growing number of predictive models into the decision workflows of project teams, both for making synthesis decisions and for advancing compounds towards in depth profiling. One element is the use of in silico models, in parallel with the actual experimental assay, to advance compounds immediately onto the next level of a screening cascade, thereby reducing the time for learning cycles. Another element is the exploration of early human dose predictions as a holistic alternative to multi-parameter optimization measures, as a means of simplifying the decision making criteria for the chemists.

While these predictive models are very often black-box models that are difficult to interpret, we have recently enabled more illustrative ways of mining the wealth of in-house data to support SAR analysis and compound design by analyzing matched molecular pair MMP transformations [4244]. Within the context of a specific project, target-related analyses are supported by displaying matched molecular series for each individual structural class. To support the improvement of optimization parameters, such as solubility or metabolic stability, we have provided statistical analyses of all molecular transformations in our corporate database and their effect on the respective parameter. These analyses have been made available to medicinal chemists who can then apply favorable in silico transformations to ongoing design campaigns. Recently, we have extended the MMP methodology to peptides [45].

A central hub for integrating calculation engines into medicinal chemistry tools: Computational Chemistry Framework

The provision of powerful computational tools to an extended user community requires CADD and IT expertise to ensure that the workflows and applications implemented are robust and scientifically validated. Success depends on the seamless and user-friendly integration of these tools into the desktop applications that are used by medicinal chemists in their daily work for drawing molecules or performing SAR analyses, such as Marvin, Spotfire [46] or D360 [47]. These applications are not part of modeling tools such as MOE, chemoinformatics packages, or the workflow tools (e.g. PipelinePilot or KNIME) that are normally used in the realm of CADD. Also, CPU-intensive workflows cannot be executed on standard PCs, requiring complex hardware infrastructure instead such as HPC clusters or clouds. To bridge the gap between the medicinal chemistry and computational chemistry software tool worlds we have developed a meta layer called the Computational Chemistry Framework (CCFW), which allows flexible connection of the front ends used by medicinal chemists with the computational chemistry calculation engines in the backend (Fig. 3). Rather than implementing a single, fully integrated system, the CCFW has been designed as a middle layer between these two worlds, allowing automated CADD tasks to be wrapped in web services, using defined parameters and standardized I/O file exchange formats. The CCFW and its backend services are completely client independent. Selected frontends can be independently enabled to trigger the CCFW services by using APIs or plugins. As a result, the CCFW services such as property calculators can be called from within MOE, Marvin or other clients without the need to develop and maintain multiple backend services for the same purpose. Since the CCFW calls and results are standardized, any amendments, bug fixes or upgrades of the backend services do not require further modification at the frontend. The concept of integrating the frontend and backend components in a modular way ensures high flexibility for developing and maintaining the CCFW backend services. In addition to the provision of automated services to the medicinal chemistry community, the CCFW also offers the opportunity to integrate automated and standardized services in analyses that are conducted by CADD scientists, thereby increasing the efficiency.

Fig. 3
figure 3

Computational Chemistry Framework (CCFW) as a bridge between the backend CADD calculation engines and user frontends. CCFW services can be invoked via client-independent web service calls from various clients. Calculations in the backend are typically performed via scripts (Python, shell) or pipelining tools

The CADD scientists are responsible for developing the CCFW services in close cooperation with IT and the medicinal chemists. Typically used backend engines are command-line based services operated by scripting tools (BASH, PYTHON), or workflow protocols (PipelinePilot, KNIME) that can be directly invoked from within the CCFW.

BI global development, maintenance, and application of CADD technology

Virtual screening (VS) is a key technique for hit finding at BI. Fast and robust ligand- and structure-based VS workflows enable rapid execution of tailored initial hit finding or iterative follow-up VS campaigns. Ligand-based VS and analoging workflows employing multiple complementary similarity search methods and data fusion [48] have been successful in multiple hit finding projects. VS workflows and compound decks can be flexibly adapted to available assays (biophysical or biochemical, throughput) in a project, which is essential for facilitating swift screening deck compilation for the diverse early drug discovery target portfolio at BI.

However, VS is typically restricted to the chemical space defined by the in-house and commercially available compound databases screened [4952]. Therefore, a focus over the past 10 years has been to expand into the vast virtual chemical space accessible via combinatorial chemistry. To achieve this expansion we have created a global platform called the BI Comprehensive Library of Accessible and Innovative Molecules (BICLAIM) [53]. It consists of putative library cores and reagents that we extract computationally from our corporate compound databases—chemical space that we know or assume is accessible by mining electronic lab notebooks, as well as commercial sources. The current version of BICLAIM contains almost 90,000 cores and tens of thousands of reagents spanning a combinatorial chemistry space of more than 1017 compounds. In addition to maintaining and growing BICLAIM, we have been developing search methods that allow us to mine this compound space using various computational techniques to prioritize combinatorial libraries for synthesis. These libraries have the advantage of a reduced synthesis risk that counter-balances the higher risk associated with making these de novo compounds without proof of activity against the intended target. We have demonstrated that combining the power of large numbers of de novo compounds from libraries with the uncertainties of virtual screening [50] is an effective way of finding attractive hits.

Several 2D and 3D workflows have been developed at BI to search the BICLAIM space. If at least one template ligand with validated activity is known, 2D FeatureTree [54] searches followed by ROCS [55] 3D matches are carried out. Several examples have been reported where novel chemical matter has been identified using this approach, including GPR119 agonists [56] and CDK2 inhibitors [57].

In addition, direct 3D searches of the BICLAIM space have been enabled using the PharmShapeCC software that allows 3D pharmacophore and shape complementarity searches in the entire BICLAIM space [23]. Examples of finding novel MMP13 inhibitors, CCR1 antagonists, CXCR5 antagonists [23] and RORC inverse agonists [58] have been published. More recently, we have enabled 3D searching in partially enumerated BICLAIM subspaces using ROCS. BICLAIM is maintained as a global platform for virtual screening at BI with regular updates and access to substructure searches from within PipelinePilot and D360. After virtual screening results have been generated using 2D and 3D tools, a final selection of library cores and building blocks is made in collaboration between CADD scientists, combinatorial chemistry experts, and medicinal chemists.

BICLAIM is maintained and developed as a truly global resource at BI with input from all research sites, not only from CADD but also from the medicinal and combinatorial chemistry groups. Other examples of in-house developments that have been made accessible across sites are the de novo design program BiBuilder [24] which has, for example, been applied to identifying a novel CB2 agonist [59], and a Python-based model building framework which was developed at one site and has been adopted as a global platform for predictive modeling. Workflow scripts have also been exchanged on various occasions (Pipeline Pilot and Knime).

Concluding remarks and outlook

There is currently an increased focus in pharmaceutical research towards enabling project teams to make earlier decisions in drug discovery projects governing where CADD typically invests time and resources at BI. In recent years, advances in computer power, flexibility in creating intricate workflows and the seamless access to CADD software tools connected through a meta-layer (such as the CCFW at BI) to multiple front-ends have allowed chemists easy access to fairly complex calculation engines and computational services. Even more importantly, the response times to queries made against such calculation service layers (e.g., property and model predictions) has significantly decreased to the point where the interactive use of modeling results has become feasible for both medicinal and computational chemists alike. As a consequence, the usage of novel predictive models in the context of compound profiling has increased significantly, and often guides subsequent decisions as to whether or not chemical matter should be progressed in a drug discovery project. The ease of access to computational tools has been changing the way in which CADD scientists and medicinal chemists interact in project teams. An increasing number of automatable CADD-related tasks are becoming amenable to medicinal chemists, without compromising the quality of modeling outcomes. An added benefit of medicinal chemists conducting automatable CADD tasks is that it frees up time for the CADD scientist to invest in the development of more sophisticated technology, which can then be applied to project advancement and additional design ideas in new ways, or to address aspects that traditionally may not have been within scope of CADD. A unique technology developed and broadly practiced at BI is the use of large scale combinatorial chemistry combined with virtual screening to identify hits and leads early in drug discovery projects.

As pointed out recently by scientists from Bayer [36], computational design plays a much smaller role in the pharmaceutical industry than in other industries. However, given the challenges for the pharma industry, which result in an ever increasing need for speed, efficiency and a higher share of unprecedented targets in the portfolio, we believe that CADD has the potential to influence the way in which drug design and discovery is pursued. The impact of CADD will continue to depend on translating CADD results into insights, along with tangible and reliable recommendations to medicinal chemists as to what compound should be made next. A continued investment into developing more accurate and robust predictive methods is necessary to increase the CADD impact. This will need to be supported by an increased availability of experimental data. Public data (e.g., ChEMBL [60]) are already being integrated seamlessly into in-house data sources [61]. In addition, the increased realization by many pharmaceutical companies that they need to share pre-competitive data [62] means that new opportunities for building predictive models with higher accuracy and wider applicability will be opened. We also expect that the computing resources used for CADD work will become a commodity. The increased availability of cloud computing will encourage the development of more accurate, albeit computationally-intensive, methods allowing their application on a much larger scale than is currently being carried out. Collaborations with academic groups will continue to play a key role in strengthening the portfolio of tools and the exploration of new methodologies, supplemented by crowd sourcing initiatives which provide easy access to a wealth of scientific talent for solving specific problems on demand. Modern drug discovery aims to tackle unprecedented targets that pose significant challenges to CADD. These targets, such as protein–protein interactions or RNA binding, usually have low druggability, and often require the invention of chemical matter and design modalities beyond the established small-molecule domain. There is clearly a growing need to expand the applicability domain of the CADD technology portfolio into these areas. With a computational infrastructure such as the CCFW, strong external scientific networks and the consistent execution/implementation of the three CADD scientist roles (as described in this paper) these challenges can be overcome.