Co-supervised Pre-training of Pocket and Ligand

Gao, Zhangyang; Tan, Cheng; Xia, Jun; Li, Stan Z.

doi:10.1007/978-3-031-43412-9_24

Zhangyang Gao¹²,
Cheng Tan¹²,
Jun Xia¹² &
…
Stan Z. Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1755 Accesses
1 Citations

Abstract

Can we inject the pocket-ligand complementarity knowledge into the pre-trained model and jointly learn their chemical space? Pre-training molecules and proteins have attracted considerable attention in recent years, while most of these approaches focus on learning one of the chemical spaces and lack the consideration of their complementarity. We propose a co-supervised pre-training (CoSP) framework to learn 3D pocket and ligand representations simultaneously. We use a gated geometric message passing layer to model 3D pockets and ligands, where each node’s chemical features, geometric position, and direction are considered. To learn meaningful biological embeddings, we inject the pocket-ligand complementarity into the pre-training model via ChemInfoNCE loss, cooperating with a chemical similarity-enhanced negative sampling strategy to improve the representation learning. Through extensive experiments, we conclude that CoSP can achieve competitive results in pocket matching, molecule property prediction, and virtual screening.

Z. Gao and C. Tan—Equal Contribution.

Access provided by Autonomous University of Puebla. Download conference paper PDF

LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification

Article Open access 07 July 2024

Force field-inspired molecular representation learning for property prediction

Article Open access 06 February 2023

Neighborhood Complex Based Machine Learning (NCML) Models for Drug Design

Keywords

1 Introduction

Is there a pre-trained model that explores the chemical space of pockets and ligands while considering their complementarity? Recently, many deep learning methods have been proposed to understand the chemical space of protein pockets or drug molecules (or called ligands) and facilitate drug design in many aspects, e.g., finding hits for a novel target [59], repurposing ancient drugs for new targets [25, 57, 67], and searching for similar pockets and molecules [35, 46]. While these models have shown promising potential in learning separate pocket space or molecular space for specific tasks [17, 21, 31, 47, 71], jointly pre-training pockets and ligands considering their complementarity remains to be explored.

We propose co-supervised pretraining (CoSP) framework for understanding the joint chemical space of pockets and ligands. Taking the ligand as an example, contrastive self-supervised pre-training [17, 49, 56] has yielded significant achievements in recent years. By identifying well-defined positive and negative ligand pairs via contrastive loss, the model can learn the underlying knowledge to facilitate downstream tasks. However, these self-supervised methods only capture data dependencies in the "self" domain while ignoring additional information from other complementary fields, such as bindable pockets. Meanwhile, previous studies [1, 5, 11, 37] have shown that pocket-ligand complementarity play a crucial role in determining molecular properties, since chemically similar ligands tend to bind to similar pockets. Inspired by this, we introduce cross-domain dependencies between pockets and ligands to improve molecular representation learning.

We propose gated geometric massage passing (GGMP) layer to extract expressive bio-representations for 3D pockets and ligands. All bio-objects are treated as 3D graphs [20, 24] in that each node contains invariant chemical features (atomic number, etc.) and equivalent geometric features (position and direction). For each bio-object, we optimize the pairwise energy function [22], which considers both chemical features and geometric features via the gated operation. By minimizing the energy function, we derive the updating rules of position and direction vectors. Finally, we combine these rules with classical message passing, resulting in GGMP.

We introduce ChemInfoNCE loss to reduce the negative sampling bias [9, 39]. When applying contrastive learning, the false negative pairs that are actually positive will lead to performance degradation, called negative sampling bias. Chuang [9] assumes that the label distribution of the classification task is uniform and propose DebiasedInfoNCE to alleviate this problem. Considering the specificity of the molecules and extending the situation to continuous properties prediction (regression task), we introduce chemical similarity-enhanced negative ligand sampling. Interestingly, improving the sampling strategy is equivalent to modifying sample weights; thus, we provide a systematic understanding from the view of loss functions and propose ChemInfoNCE.

We evaluates our model on several downstream tasks, from pocket matching, molecule property prediction to virtual screening. Numerous experiments show that our approach can achieve competitive results on these tasks, suggesting that the pocket-ligand complementarity could improve biorepresentation learning.

2 Related Work

Motivation. Protein and molecule achieve their biological functions by binding to each other [7], thus exploring the protein-ligand complex help to improve the understanding of both proteins, molecules, and their interactions. To improve generalization and reduce complexity, we further consider local patterns about the protein pocket x and the bindable ligand $\hat{x}$. Taking $(x,\hat{x})$ as the positive pair, while $(x, \hat{x}^-)$ as the negative pair, where $\hat{x}^-$ cannot bind to x, we aims to pre-train a pocket model $f: x \mapsto \boldsymbol{h}$ and a ligand model $\hat{f}: \hat{x} \mapsto \hat{\boldsymbol{h}}$, such that the mutual information between $\boldsymbol{h}$ and $\hat{\boldsymbol{h}}$ are maximized.

Table 1. Protein and molecule pre-training methods

Full size table

Equivalent 3D GNN. Extensive works have shown that 3D structural conformation can improve the quality of bio-representations with the help of equivalent massage passing layer [4, 6, 10, 19, 41, 50]. Inspired by the energy analysis [20, 22], we propose a new gated geometric massage passing (GGMP) layer that consider not only the node position but also its direction, where the latter could indicate the location of pocket cavities and the angle of molecular bonds.

InfoNCE. The original InfoNCE is proposed by [36] to contrast semantically similar (positive) and dissimilar (negative) pairs of data points, such that the representations of similar pairs $(x, \hat{x})$ to be close, and those of dissimilar pairs $(x, \hat{x}^-)$ to be more orthogonal. By default, the negative pairs are uniformly sampled from the data distribution. Therefore, false negative pairs will lead to significant performance drop. To address this issue, DebaisedInfoNCE [9] is proposed, which assumes that the label distribution of the classification task is uniform. Although DebaisedInfoNCE has achieved good results on image classification, it is not suitable for direct transfer to regression tasks, as the uniform distribution assumption is too strict. For bio-objects, we discard the above assumption, extend the situation to continuous attribute prediction, use fingerprint similarity to measure the probability of negative ligands, and propose ChemInfoNCE.

Self Bio Pre-training. Many pre-training methods have been proposed for a single protein or ligand domain, which can be classified as sequence-based, graph-based or structure-based. We summarize the protein pre-training models in Table.1. As for sequential models, CPCPort [33] maximizes the mutual information between predicted residues and context. Profile Prediction [48] suggests predicting MSA profile as a new pre-training task. OntoProtein [70] integrates GO (Gene Ontology) knowledge graphs into protein pre-training. While most of the sequence models rely on the transformer architecture, CARP [66] finds that CNNs can achieve competitive results with much fewer parameters and runtime costs. Recently, GearNet [74] explores the potential of 3D structural pre-training from the perspective of masked prediction and contrastive learning. We also summarize the molecule pre-training models in Table.1. As for sequential models, FragNet [44] combines masked language model and multi-view contrastive learning to maximize the inner mutual information of the same SMILEs and the agreement across augmented SMILEs. Beyond SMILEs, more approaches [17, 29, 40, 49, 56, 71, 73] tend to choose graph representation that can better model structural information. For example, Grover [40] integrates message passing and transformer architectures and pre-trains a super-large GNN using 10 million molecules. MICRO-Graph [71] and MGSSL [73] use motifs for contrastive learning. Considering the domain knowledge, MoCL [49] uses substructure substitution as a new data augmentation operation and predicts pairwise fingerprint similarities. Although these pre-training methods show promising results, they do not consider the 3D molecular conformations. To fill this gap, GraphMVP [31] and 3DInfomax [47] explore to maximize the mutual information between 3D and 2D views of the same molecule and achieve further performance improvements. Besides, GEM [16] proposes a geometry-enhanced graph neural network and pre-trains it via geometric tasks. For the pre-training of individual proteins or molecules, these methods demonstrate promising potential on various downstream tasks but ignore their complementarity.

Cross Bio Pre-training. In parallel with our study, Uni-Mol [76], probably the first pre-trained model that can handle both protein pockets and molecules, released the preprinted version. However, they pre-train the pockets and ligands separately without considering their interactions, whereas our approach differs in pre-training data, pre-training strategy, model structure and downstream tasks.

3 Methodology

3.1 Co-Supervised Pre-training Framework

We propose the co-supervised pretraining (CoSP) framework, as shown in Figure.1, to explore the joint chemical space of protein pockets and ligands, where the methodological innovations include:

1.
We propose the gated geometric message passing layer to model 3D pockets and ligands.
2.
We establish a co-supervised pre-training framework to learn pocket and ligand representations.
3.
We introduce ChemInfoNCE with improved negative sampling guided by chemical similarity.
4.
We evaluate the model on pocket matching, molecule property prediction, and virtual-screening tasks.

3.2 Geometric Representation

We introduce the unified data representation and neural network for modeling 3D pockets and ligands. We use structures collected from the BioLip dataset [64] as pretraining data for developing $\text {CoSP}_{\text {base}}$ model. Further, we use augmente the pretraining data with CrossDock dataset [18], resulting in the $\text {CoSP}_{\text {large}}$ model. In downstream tasks where ligand conformations are not provided, we generate 3D conformations using MMFF [52] (if successful) or their 2D conformations (if failed).

Pocket and Ligand Graph. We represent bio-object as graph $\mathcal {G}(X, \mathcal {V}, \mathcal {E})$ , consisting of coordinate matrix $X \in \mathbb {R}^{n,3}$, node features $\mathcal {V} \in \mathbb {R}^{n, d_f}$ , and edge features $\mathcal {E} \in \mathbb {R}^{n, d_e}$, where n, $d_f$ and $d_e$ represent the number of node, node features dimension and edge features dimension. For pockets, the graph nodes include amino acids within 10 $\mathring{A}$ to the ligand, X contrains the position of $C_{\alpha }$ of residues, on which we construct $\mathcal {E}$ via k-nn algorithm. For molecules, the graph nodes include all ligand atoms except Hs, X contrains the atom positions, and we use the molecular bonds as $\mathcal {E}$.

Gated Geometric Massage Passing. From layer t to $t+1$, we use the gated geometric massage passing (GGMP) layer to update 3D graph representations, i.e., $[\boldsymbol{v}_i^{t+1}, \boldsymbol{x}_{i}^{t+1} , \boldsymbol{n}_{i}^{t+1}] = \textrm{GGMP}(\boldsymbol{v}_i^{t}, \boldsymbol{x}_{i}^{t}, \boldsymbol{n}_{i}^{t})$, where $\boldsymbol{n}_{i}$ is the direction vector. For molecules, $\boldsymbol{n}_{i}$ points to the negative neighborhood center of node i; for pockets, $\boldsymbol{n}_{i}$ indicates the position of protein caves. Given 3D conformations, we minimize the pairwise energy function E:

$$\begin{aligned} E(X, F, \mathcal {E}) = \sum _{(i,j) \in \mathcal {E}} u(\boldsymbol{v}_i, \boldsymbol{v}_j, \boldsymbol{e}_{ij}) g(\langle \boldsymbol{n}_i, \boldsymbol{n}_j \rangle , d_{ij}^2) \end{aligned}$$

(1)

where $d_{ij}^2 = ||\boldsymbol{x}_i-\boldsymbol{x}_j||^2$, both chemical energy $u(\cdot )$ and geometric energy $g(\cdot )$ are considered. By calculating the gradients of $\boldsymbol{x}_i$ and $\boldsymbol{n}_i$, we obtain their updating rules:

$$\begin{aligned} \begin{aligned} -\frac{\partial E(X, F, \mathcal {E})}{\partial \boldsymbol{x}_i} = -\sum _{j \in \mathcal {N}_i} 2 u_{ij} \frac{\partial g_{ij} }{\partial d_{ij}^2} (\boldsymbol{x}_i-\boldsymbol{x}_j) \\ \approx \sum _{j \in \mathcal {N}_i} u(\boldsymbol{v}_i, \boldsymbol{v}_j, \boldsymbol{e}_{ij}) \phi _x(d_{ij}^2, \langle \boldsymbol{n}_{i}, \boldsymbol{n}_{j} \rangle ) (\boldsymbol{x}_{i}-\boldsymbol{x}_{j}) \end{aligned} \end{aligned}$$

(2)

$$\begin{aligned} \begin{aligned} -\frac{\partial E(X, F, \mathcal {E})}{\partial \boldsymbol{n}_i} = -\sum _{j \in \mathcal {N}_i} u_{ij} \frac{\partial g_{ij} }{\partial \langle \boldsymbol{n}_i, \boldsymbol{n}_j \rangle } \boldsymbol{n}_j \\ \approx \sum _{j \in \mathcal {N}_i} u(\boldsymbol{v}_i, \boldsymbol{v}_j, \boldsymbol{e}_{ij}) \phi _n(d_{ij}^2, \langle \boldsymbol{n}_{i}, \boldsymbol{n}_{j} \rangle ) \boldsymbol{n}_{j} \end{aligned} \end{aligned}$$

(3)

Note that $\phi _x$ and $\phi _n$ are the approximation of $\frac{\partial g_{ij} }{\partial d_{ij}^2}$ and $\frac{\partial g_{ij} }{\partial \langle \boldsymbol{n}_i, \boldsymbol{n}_j \rangle }$. Combining graph message passing, we propose the GGMP layer:

$$\begin{aligned}&\boldsymbol{m}_{ij} = \phi _{m}(\boldsymbol{v}_i^t, \boldsymbol{v}_j^t, e_{ij} )\end{aligned}$$

(4)

$$\begin{aligned}&\boldsymbol{g}_{ij} = \phi _{g}(d_{ij}^2, \langle \boldsymbol{n}_{i}^{t}, \boldsymbol{n}_{j}^{t} \rangle )\end{aligned}$$

(5)

$$\begin{aligned}&\boldsymbol{h}_i^{t+1} = \phi _{h}(\boldsymbol{h}_i^t, \sum _{j \in \mathcal {N}_i} \boldsymbol{m}_{ij} \boldsymbol{g}_{ij})\end{aligned}$$

(6)

$$\begin{aligned}&\boldsymbol{x}_{i}^{t+1} = \boldsymbol{x}_{i}^{t} + \lambda \sum _{j\in \mathcal {N}_{i}} u(\boldsymbol{m}_{ij}) \phi _x(\boldsymbol{g}_{ij}) (\boldsymbol{x}_{i}^{t}-\boldsymbol{x}_{j}^{t}) \end{aligned}$$

(7)

$$\begin{aligned}&\boldsymbol{n}_{i}^{t+1} = \boldsymbol{n}_{j}^{t} + \lambda \sum _{j\in \mathcal {N}_{i}} u(\boldsymbol{m}_{ij}) \phi _n(\boldsymbol{g}_{ij}) \boldsymbol{n}_{j}^{t} \end{aligned}$$

(8)

where $\phi _*$ and u are approximated by neural networks, $\lambda $ is a hyperparameter, and $\boldsymbol{n}^0_{i} = -\sum _{j\in \mathcal {N}(i)}{\boldsymbol{x}^0_j}/||\sum _{j\in \mathcal {N}(i)}{\boldsymbol{x}^0_j}||$.

3.3 Contrastive Loss

In contrastive learning, the biased negative sampling impairs model performance by sampling false negative data during training. Previous methods [9, 39] address this problem with the assumption that false-negative samples are uniformly distributed under the classification setting. We propose chemical knowledge-based sampling to better address this issue, where fingerprint similarity is used to measure the probability of negative ligands. Interestingly, the change in sampling distribution is equivalent to the design of a weighted loss, and we provide a comprehensive understanding from the perspective of contrastive loss.

Uni-contrastive Loss. Given the pocket $x \sim p$, we draw positive ligands $\hat{x}^+$ from the distribution $\hat{p}_x^+$ of bindable molecules and negative ligands $\{\hat{x}_i^-\}_{i=1}^N $ from the distribution $\hat{q}$ of non-bindable ones. By default, the positive ligands are determined by the pocket-ligand complexes, while negative ones are uniformly sampled from the ligand sets. We use pocket model f and ligand model $\hat{f}$ to learn the latent representations $\boldsymbol{h}$, $\hat{\boldsymbol{h}}^+$ and $\{ \hat{\boldsymbol{h}}_i^- \}_{i=1}^N$, where the proxy task is to maximize the positive similarity $s^+(\boldsymbol{h}, \hat{\boldsymbol{h}}^+)$ against the negative similarities $s_{i}^-(\boldsymbol{h} ,\hat{\boldsymbol{h}}_i^-), i=1,2,\cdots $, resulting in:

$$\begin{aligned} L_{\text {Uni}}&= \mathbb {E}_{x \sim p, \hat{x}^+ \sim \hat{p}_x^+, \atop \{\hat{x}_i^-\}_{i=1}^N \sim \hat{q}} \left[ \log {(1 + \frac{Q}{N} \sum _{i=1}^N \frac{s_{i}^-(\boldsymbol{h} ,\hat{\boldsymbol{h}}_i^-)}{s^+(\boldsymbol{h}, \hat{\boldsymbol{h}}^+)})} \right] \end{aligned}$$

(9)

where Q and N are constants. For each data sample x, the gradients contributed to $s^+$ and $s_i^-$ are:

$$\begin{aligned} \frac{\partial {L}}{\partial s^+}&= \frac{1}{1+\sum _{i=1}^N s_i^- / s^+} \sum _{i=1}^N {\frac{\partial s_i^-/s^+}{\partial s^+}}\end{aligned}$$

(10)

$$\begin{aligned} \frac{\partial {L}}{\partial s_i^-}&= \frac{1}{1+\sum _{i=1}^N s_i^- / s^+} \frac{\partial s_i^-/s^+}{\partial s_i^-} \end{aligned}$$

(11)

The $L_{\text {Uni}}$ provides balanced gradient to positive and negative samples, i.e., $\frac{\partial {L}}{\partial s^+} = \sum _i {\frac{\partial {L}}{\partial s_i^-}}$. One can verify that InfoNCE is the special case of $L_{\text {Uni}}$ by setting $s^+(\boldsymbol{h}, \hat{\boldsymbol{h}}^+) = e^{\gamma \boldsymbol{h}^T \boldsymbol{h}^+}$ and $s_{i}^-(\boldsymbol{h} ,\hat{\boldsymbol{h}}_i^-) = e^{\gamma \boldsymbol{h}^T \boldsymbol{h}_i^-}$.

DebiasedInfoNCE. Uniformly sampling negative ligands from the data distribution $\hat{q}$ could mistaken positive samples as negative ones. Denote $h(\cdot )$ as the labeling function, [9] suggests to draw negative samples from the real negative distribution $\hat{q}_x^-(\hat{x}^-) = p(\hat{x}^-|h(\hat{x}^-) \ne h(x))$. To handle the $\{ h(\hat{x}^-) \ne h(x) \}$ event, the joint distribution $p(\hat{x},c)=p(\hat{x}|c)p(c)$ over data $\hat{x}$ and label c is considered. Assume the class probability $p(c)=\tau ^+$ is uniform, and let $\tau ^-=1-\tau ^+$ be the probability of observing any different class, $\hat{q}$ could be decomposed as $\tau ^- \hat{q}_x^-(\hat{x}^-) + \tau ^+ \hat{q}_x^+(\hat{x}^-)$. Therefore, $\hat{q}_x^- = (\hat{q} - \tau ^+ \hat{q}_x^+)/\tau ^-$, and the DebiasedInfoNCE is:

$$\begin{aligned} L_{\text {Debiased}}&= \mathbb {E}_{x \sim p, \hat{x}^+ \sim \hat{p}_x^+, \atop \{\hat{x}_i^-\}_{i=1}^N \sim \hat{q}_x^-} \left[ \log {(1 + \frac{Q}{N} \sum _{i=1}^N \frac{s_{i}^-(\boldsymbol{h} ,\hat{\boldsymbol{h}}_i^-)}{s^+(\boldsymbol{h}, \hat{\boldsymbol{h}}^+)})} \right] \end{aligned}$$

(12)

where $s^+(\boldsymbol{h}, \hat{\boldsymbol{h}}^+)=e^{\boldsymbol{h}^T \hat{\boldsymbol{h}}^+}$, $s_{i}^-(\boldsymbol{h} ,\hat{\boldsymbol{h}}_i^-)=e^{\boldsymbol{h}^T \hat{\boldsymbol{h}}_i^-}$. With mild assumptions, the approximated debaised InfoNCE can be written as:

$$\begin{aligned} \mathbb {E}_{x \sim p, \hat{x}^+ \sim p_x^+, \atop \{\hat{x}_i^-\}_{i=1}^N \sim \hat{q}} \left[ \log { (1+ \frac{Q}{\tau ^-} \sum _{i=1}^{N} (e^{\boldsymbol{h}^T \boldsymbol{h}_i^- - \boldsymbol{h}^T \boldsymbol{h}^+} - \tau ^+) ) }\right] \end{aligned}$$

(13)

ChemInfoNCE. Although DebiasedInfoNCE solves the problem of sampling bias to some extent, it suffers from some shortcomings. Firstly, for classification with discrete labels, the assumption of uniform class probabilities may be too strong, especially for the unbalanced dataset. Secondly, when it comes to regression, molecules have continuous chemical properties and the event $\{ h(\hat{x}) \ne h(\hat{x}^-) \}$ can not describe the validity of negative data. To address these issues, we introduce a new event $\{ \text {sim}(\hat{x},\hat{x}^-)<\tau \}$ to measure the validity of negative samples, where $\text {sim}(\cdot , \cdot )$ is the function of chemical similarity. The underlying assumption is that molecules with lower chemical similarity to the reference ligand are more likely to be negative samples.

$$\begin{aligned} \begin{aligned} q_x^-(\hat{x}^-)&:= q(\hat{x}^-| \text {sim}(x, \hat{x}^-) < \tau ) \\&\propto \max (1-\text {sim}(x,\hat{x}^-)-\tau ,0 ) \cdot p(\hat{x}^-) \end{aligned} \end{aligned}$$

(14)

By denoting $w_i = \max (1-\text {sim}(x,\hat{x}^-)-\tau ,0 )$, the final ChemInfoNCE can be simplfied as:

$$\begin{aligned} L_{\text {Chem}}&\approx \mathbb {E}_{x \sim p, \hat{x}^+ \sim p_x^+, \atop \{\hat{x}_i^-\}_{i=1}^N \sim \hat{q}} \left[ \log { (1+ \sum _{i=1}^{N} ( \rho _i e^{\boldsymbol{h}^T \hat{\boldsymbol{h}}_i^- - \boldsymbol{h}^T \hat{\boldsymbol{h}}^+} ) ) }\right] \end{aligned}$$

(15)

where $\rho _i = \frac{w_i}{\sum _{i=1}^N w_i}$.

Table 2. Molecule property prediction. We compare different methods across 9 benchmarks. The best and sub-optimum results are highlighted in bold and underline.

Full size table

4 Experiments

In this section, we conduct extensive experiments to verify the effectiveness of the proposed method from three perspectives:

1.
Ligand: Could the ligand model provide competitive results in predicting molecular properties?
2.
Pocket: How does the pre-trained pocket model perform on the pocket matching tasks?
3.
Pocket-ligand: Could the joint model find potential binding pocket-ligand pairs, i.e., virtual screening?

4.1 Pre-training Setup

Pre-training Dataset. We adopt BioLip [64] dataset for pre-training $\text {CoSP}_\text {base}$, where the original BioLip contains 573,225 entries up to 2022.04.01. Compared to PDBBind [54] with 23,496 complexes, BioLip contains more complexes that lack binding affinity, thus could provide a more comprehensive view of binding mode analysis. To focus on the drug-like molecules and their binding pockets, we filtered out other unrelated complexes that contain peptides, DNA, RNA, single ions, etc. In addition, we augment the pretraining data with the CrossDock dataset [18] to develop $\text {CoSP}_\text {large}$.

Experimental Setting. We pre-train $\text {CoSP}_\text {base}$ with 6 layer GGMPs via ChemInfoNCE loss, where the hidden feature dimension is 128. We train the model for 50 epochs using Adam optimizer on NVIDIA A100s, where the initial learning rate is 0.01 and the batch size is 100. The chemical ligand similarity is calculated by RDKit [28]. To achieve better performance, $\text {CoSP}_\text {large}$ extends the 6-layer GNN to 12 layers, with hidden dimensions from 128 to 1024, and uses augmentated dataset (BioLip+CrossDock).

4.2 Downstream Task 1: Molecule Property Prediction

Experimental Setup. Could the model learn expressive features for molecule classification and regression tasks? We evaluate CoSP on 9 benchmarks collected by MoleculeNet [61]. Following previous researches, we use scaffold splitting to generate train/validation/test set with a ratio of 8:1:1. We report AUC-ROC and RMSE metrics for classification and regression tasks, respectively. The mean and standard deviations of results over three random seeds are provided by default. We finetune the model using the similar code of MGSSL [73].

Baselines. We evaluate CoSP against a broad of baselines, including D-MPNN [65], Attentive FP [63], $\text {N-Gram}_{\text {RF}}$, $\text {N-Gram}_{\text {XGB}}$ [30], MolCLR [56], PretrainGNN [23], GraphMVP-G, GraphMVP-C [32], 3DInfomax [47], MICRO-graph [71], $\text {GROVER}_{\text {base}}$, $\text {GROVER}_{\text {large}}$ [40], GEM [16], and Uni-Mol [76]. Most these baselines are pre-training methods, except for $\text {N-Gram}_{\text {RF}}$ and $\text {N-Gram}_{\text {XGB}}$. Some of the methods mentioned in the related works are not included because the experimental setup, e.g., data spliting, may be different.

Results and Analysis. We show results in Table.2. The main observations are: (1) $\text {CoSP}_\text {large}$ could achieve the best results on 4/9 downstream tasks, and top-3 results on 9/9 downstream tasks. (2) Pre-training techniques help improve the model’s generalization ability, and the model could learn expressive molecular features via co-supervised pre-training. By extending the model size and pre-training data volumn, $\text {CoSP}_\text {large}$ achieves non-trivial performance gains compared to $\text {CoSP}_\text {base}$. (3) Through ablation studies, we further identified the superiority of ChemInfoNCE over DebaisedInfoNCE by achieving consistent performance gains on various datasets.

Table 3. Pocket matching results. We compare different methods across 10 benchmarks.

Full size table

4.3 Downstream Task 2: Pocket Matching

Experimental Setup. Could the pre-trained model identify chemically similar pockets? We explore the discriminative ability of the pocket model with the pocket matching tasks. To comprehensively understand the potential of the proposed method, we evaluated it on 10 benchmarks recently collected in the ProSPECCTs dataset [15]. For each sub-dataset, the positive and negative pairs of pockets are defined differently according to the research objectives. We summarize five research objectives as O1: Whether the model is robust to the pocket definition? O2: Whether the model is robust to the pocket flexibility? O3: Can the model distinguish between pockets with different properties? O4: Whether the model can distinguish dissimilar proteins binding to identical ligands and cofactors? O5: How about the performance on real applications? We report the AUC-ROC scores on all benchmarks.

Baselines. We compare CoSP with both classical and deeplearning baselines. The classical methods can be divided into profile-based, graph-based and grid-based ones. The profile-based methods encode topological, physicochemical and statistical properties in a unified way for comparing various pockets, e.g., SiteAlign [42], RAPMAD [27], TIFP [13], FuzCav [58], PocketMatch [69], SMAP [62], TM-align [72], KRIPO [60] and Grim [13]. The graph-based methods adopt isomorphism detection algorithm to find common motifs, e.g., Cavbase [43], IsoMIF [8], ProBiS [26]. Grid-based methods represent pockets by regularly spaced pharmacophoric grid points, e.g.,VolSite/Shaper [12]. Another tools include SiteEngines [45] and SiteHopper [3]. We also compare with the recent deeplearning model–DeeplyTough [46].

Results and Analysis. We present the pocket matching results in Table. 3, where the pre-trained model achieves competitive results in most cases. Specifically, CoSP is robust to pocket definition (O1) and achieves the highest AUC scores in D1 and D1.2. The robustness also remains when considering conformational variability (O2), where $\text {CoSP}_\text {large}$ achieves 1.00 AUC score in D2. It should be noted that robustness to homogeneous pockets does not mean that the model has poor discrimination; on the contrary, the model could identify pockets with different physicochemical and shape properties (O3) in D3 and D4. Compared with previous deep learning methods (DeeplyTough), $\text {CoSP}_\text {large}$ provides better performance in distinguishing different pockets bound to the same ligands and cofactors (Q4), refer to the results of D5, D5.2, D6 and D6.2. Last but not least, $\text {CoSP}_\text {large}$ showed good potential for practical applications (O5) with 0.90 AUC score. In addition, we found that pocket direction plays a key role in extracting pocket features, which is helpful to indicate the location of the pocket cavity. As shown in Table. 3, the performance of pocket matching will be degraded if the directional feature $\boldsymbol{n}$ is removed.

Table 4. Virtual screening results on DUD-E.

Full size table

4.4 Downstream Task 3: Virtual Screening

Experimental Setup. Could the model distinguish molecules most likely to bind to the given pocket? We evaluate CoSP on the DUD-E [34] dataset which consists of 102 targets across different protein families. For each target, DUD-E provides 224 actives (positive examples) and 10,000 decoy ligands (negative examples) in average. The decoys were calculated by choosing them to be physically similar but topologically different from the actives. During finetuning, we use the same data splitting as GraphCNN [51], and report the AUC-ROC and ROC enrichment (RE) scores. Note that $x\% \text {RE}$ indicates the ratio of the true positive rate (TPR) to the false positive rate (FPR) at $x\%$ FPR value.

Baselines. We compare $\text {CoSP}_{\text {large}}$ with AutoDock Vina [53], RF-score [2], NNScore [14], 3DCNN [38], GraphCNN [51], DrugVQA [75], GanDTi [55], and AttentionSiteDTI [68]. AutoDock Vina is an commonly used open-source program for doing molecular docking. RF-score use random forest capture protein-ligand binding effects. Other methods use deeplearning models to learn the protein-ligand binding.

Results and Analysis. We present results in Table.4, and observe that: (1) Random forest and MLP-based RF-score and NNScore achieve competitive results to Vina, indicating the potential of machine learning in virtual screening. (2) Deeplearning-based Graph CNN, 3DCNN, DrugVQA, GanDTi, and AttentionSiteDTI significantly outperforms both RF-score and NNScore. (3) $\text {CoSP}_\text {large}$ achieves competitive AUC score and outperforms all baselines in RE scores. The improvement of $\text {CoSP}_\text {large}$ suggests that the model can effectively learn protein-ligand interactions from the pre-training data. (4) In addition, we select Top 1% ligands identified by the model as actives for the given pocket and use AutoDock Vina to validate the docking results. In Fig. 2, the visual results show that our model can identify high-affinity ligands, which is helpful for drug discovery.

5 Conclusion

This paper proposes a co-supervised pre-training framework to learn the joint pocket and ligand spaces via chemically inspired contrastive loss. The pre-trained model could achieve competitive results on molecule property predictions, pocket matching, and virtual screening. We hope the unified modeling framework could further advance the development of AI-guided drug discovery.

References

Altalib, M.K., Salim, N.: Similarity-based virtual screen using enhanced siamese deep learning methods. ACS omega 7(6), 4769–4786 (2022)
Article Google Scholar
Ballester, P.J., Mitchell, J.B.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)
Article Google Scholar
Batista, J., Hawkins, P.C., Tolbert, R., Geballe, M.T.: Sitehopper-a unique tool for binding site comparison. J. Cheminform. 6(1), 1–1 (2014)
Google Scholar
Batzner, S., et al.: E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Commun. 13(1), 1–11 (2022)
Article Google Scholar
Boström, J., Hogner, A., Schmitt, S.: Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49(23), 6716–6725 (2006)
Article Google Scholar
Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E., Welling, M.: Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905 (2021)
Chaffey, N.: Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. Molecular Biology of the Cell. 4th edn. (2003)
Google Scholar
Chartier, M., Najmanovich, R.: Detection of binding site molecular interaction field similarities. J. Chem. Inform. Model. 55(8), 1600–1615 (2015)
Article Google Scholar
Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. Adv. Neural Inform. Process. Syst. 33, 8765–8775 (2020)
Google Scholar
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International conference on machine learning, pp. 2990–2999. PMLR (2016)
Google Scholar
Dankwah, K.O., Mohl, J.E., Begum, K., Leung, M.Y.: Understanding the binding of the same ligand to gpcrs of different families. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2494–2501 (2021). https://doi.org/10.1109/BIBM52615.2021.9669761
Desaphy, J., Azdimousa, K., Kellenberger, E., Rognan, D.: Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes (2012)
Google Scholar
Desaphy, J., Raimbaud, E., Ducrot, P., Rognan, D.: Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inform. Model. 53(3), 623–637 (2013)
Article Google Scholar
Durrant, J.D., McCammon, J.A.: Nnscore: a neural-network-based scoring function for the characterization of protein- ligand complexes. J. Chem. Inform. Model. 50(10), 1865–1871 (2010)
Article Google Scholar
Ehrt, C., Brinkjost, T., Koch, O.: A benchmark driven guide to binding site comparison: an exhaustive evaluation using tailor-made data sets (prospeccts). PLoS Comput. Biol. 14(11), e1006483 (2018)
Article Google Scholar
Fang, X.: Geometry-enhanced molecular representation learning for property prediction. Nature Mach. Intell. 4(2), 127–134 (2022)
Article Google Scholar
Fang, Y., Yang, H., Zhuang, X., Shao, X., Fan, X., Chen, H.: Knowledge-aware contrastive molecular graph learning. arXiv preprint arXiv:2103.13047 (2021)
Francoeur, P.G.: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inform. Model. 60(9), 4200–4215 (2020)
Article Google Scholar
Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv. Neural Inform. Process. Syst. 33, 1970–1981 (2020)
Google Scholar
Ganea, O.E., et al.: Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786 (2021)
Gao, Z., Tan, C., Li, S., et al.: Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079 (2022)
Guan, J., Qian, W.W., Ma, W.Y., Ma, J., Peng, J., et al.: Energy-inspired molecular conformation optimization. In: International Conference on Learning Representations (2021)
Google Scholar
Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
Jing, B., Eismann, S., Suriana, P., Townshend, R.J., Dror, R.: Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411 (2020)
Kinnings, S.L., Liu, N., Buchmeier, N., Tonge, P.J., Xie, L., Bourne, P.E.: Drug discovery using chemical systems biology: repositioning the safe medicine comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput. Biol. 5(7), e1000423 (2009)
Article Google Scholar
Konc, J., Janežič, D.: Probis algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26(9), 1160–1168 (2010)
Article Google Scholar
Krotzky, T., Grunwald, C., Egerland, U., Klebe, G.: Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J. Chem. Inform. Model. 55(1), 165–179 (2015)
Article Google Scholar
Landrum, G.: Rdkit: Open-source cheminformatics software (2016). https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4
Li, P., et al.: An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief. Bioinform. 22(6), bbab109 (2021)
Google Scholar
Liu, S., Demirel, M.F., Liang, Y.: N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry-rethinking self-supervised learning on structured data
Google Scholar
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry. arXiv preprint arXiv:2110.07728 (2021)
Lu, A.X., Zhang, H., Ghassemi, M., Moses, A.: Self-supervised contrastive learning of protein representations by mutual information maximization. BioRxiv (2020)
Google Scholar
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K.: Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. J. Med. Chem. 55(14), 6582–6594 (2012)
Article Google Scholar
Nguyen, T., Le, H., Quinn, T.P., Nguyen, T., Le, T.D., Venkatesh, S.: Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37(8), 1140–1147 (2021)
Article Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Pu, L., Govindaraj, R.G., Lemoine, J.M., Wu, H.C., Brylinski, M.: Deepdrug3d: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 15(2), e1006718 (2019)
Article Google Scholar
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., Koes, D.R.: Protein-ligand scoring with convolutional neural networks. J. Chem. Inform. Model. 57(4), 942–957 (2017)
Article Google Scholar
Robinson, J., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)
Rong, Y.: Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inform. Process. Syst. 33, 12559–12571 (2020)
Google Scholar
Satorras, V.G., Hoogeboom, E., Welling, M.: E (n) equivariant graph neural networks. In: International Conference on Machine Learning, pp. 9323–9332. PMLR (2021)
Google Scholar
Schalon, C., Surgand, J.S., Kellenberger, E., Rognan, D.: A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins: Struct., Funct., Bioinform. 71(4), 1755–1778 (2008)
Google Scholar
Schmitt, S., Kuhn, D., Klebe, G.: A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323(2), 387–406 (2002)
Article Google Scholar
Shrivastava, A.D., Kell, D.B.: Fragnet, a contrastive learning-based transformer model for clustering, interpreting, visualizing, and navigating chemical space. Molecules 26(7), 2065 (2021)
Article Google Scholar
Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Siteengines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 33(suppl_2), W337–W341 (2005)
Google Scholar
Simonovsky, M., Meyers, J., Meyers, J.: Deeplytough: learning structural comparison of protein binding sites. J. Chem. Inform. Model. 60(4), 2356–2366 (2020)
Article Google Scholar
Stärk, H., et al.: 3d infomax improves GNNs for molecular property prediction. arXiv preprint arXiv:2110.04126 (2021)
Sturmfels, P., Vig, J., Madani, A., Rajani, N.F.: Profile prediction: An alignment-based pre-training task for protein sequence models. arXiv preprint arXiv:2012.00195 (2020)
Sun, M., Xing, J., Wang, H., Chen, B., Zhou, J.: Mocl: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3585–3594 (2021)
Google Scholar
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., Riley, P.: Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)
Torng, W., Altman, R.B.: Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inform. Model. 59(10), 4131–4149 (2019)
Article Google Scholar
Tosco, P., Stiefl, N., Landrum, G.: Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 6(1), 1–4 (2014)
Article Google Scholar
Vina, A.: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading trott, oleg; olson, arthur j. J. Comput. Chem 31(2), 455–461 (2010)
Google Scholar
Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The pdbbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)
Article Google Scholar
Wang, S., Shan, P., Zhao, Y., Zuo, L.: Gandti: a multi-task neural network for drug-target interaction prediction. Comput. Biol. Chem. 92, 107476 (2021)
Article Google Scholar
Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Mach. Intell. 4(3), 279–287 (2022)
Article Google Scholar
Weber, A., et al.: Unexpected nanomolar inhibition of carbonic anhydrase by cox-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 47(3), 550–557 (2004)
Article Google Scholar
Weill, N., Rognan, D.: Alignment-free ultra-high-throughput comparison of druggable protein- ligand binding sites. J. Chem. Inform. Model. 50(1), 123–135 (2010)
Article Google Scholar
Willmann, D., et al.: Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 131(11), 2704–2709 (2012)
Article Google Scholar
Wood, D.J., Vlieg, J.d., Wagener, M., Ritschel, T.: Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J. Chem. Inform. Model. 52(8), 2031–2043 (2012)
Google Scholar
Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Article Google Scholar
Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. National Acad. Sci. 105(14), 5441–5446 (2008)
Article Google Scholar
Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2019)
Article Google Scholar
Yang, J., Roy, A., Zhang, Y.: Biolip: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 41(D1), D1096–D1103 (2012)
Article Google Scholar
Yang, K., et al.: Analyzing learned molecular representations for property prediction. J. Chem. Inform. Model. 59(8), 3370–3388 (2019)
Article Google Scholar
Yang, K.K., Lu, A.X., Fusi, N.: Convolutions are competitive with transformers for protein sequence pretraining. In: ICLR2022 Machine Learning for Drug Discovery (2022)
Google Scholar
Yang, Y., et al.: Computational discovery and experimental verification of tyrosine kinase inhibitor pazopanib for the reversal of memory and cognitive deficits in rat model neurodegeneration. Chem. Sci. 6(5), 2812–2821 (2015)
Article Google Scholar
Yazdani-Jahromi, M., et al.: Attentionsitedti: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief. Bioinform. 23(4), bbac272 (2022)
Google Scholar
Yeturu, K., Chandra, N.: Pocketmatch: a new algorithm to compare binding sites in protein structures. BMC Bioinform. 9(1), 1–17 (2008)
Article Google Scholar
Zhang, N., et al.: Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147 (2022)
Zhang, S., Hu, Z., Subramonian, A., Sun, Y.: Motif-driven contrastive learning of graph representations. arXiv preprint arXiv:2012.12533 (2020)
Zhang, Y., Skolnick, J.: Tm-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)
Article Google Scholar
Zhang, Z., Liu, Q., Wang, H., Lu, C., Lee, C.K.: Motif-based graph self-supervised learning for molecular property prediction. Adv. Neural Inform. Process. Syst. 34, 15870–15882 (2021)
Google Scholar
Zhang, Z., et al.: Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125 (2022)
Zheng, S., Li, Y., Chen, S., Xu, J., Yang, Y.: Predicting drug-protein interaction using quasi-visual question answering system. Nature Mach. Intell. 2(2), 134–140 (2020)
Article Google Scholar
Zhou, G., et al.: Uni-mol: A universal 3d molecular representation learning framework (2022)
Google Scholar

Download references

Acknowledgements

We thank the open-sourced codes of previous studies. This work was supported by the National Key R &D Program of China (Project 2022ZD0115100), the National Natural Science Foundation of China (Project U21A20427), the Research Center for Industries of the Future (Project WU2022C043).

Author information

Authors and Affiliations

Zhejiang University AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China
Zhangyang Gao, Cheng Tan, Jun Xia & Stan Z. Li

Authors

Zhangyang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xia
View author publications
You can also search for this author in PubMed Google Scholar
Stan Z. Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stan Z. Li .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

Our submission does not involve any ethical issues, including but not limited to privacy, security, etc.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Tan, C., Xia, J., Li, S.Z. (2023). Co-supervised Pre-training of Pocket and Ligand. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_24
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Co-supervised Pre-training of Pocket and Ligand

Abstract

Similar content being viewed by others

LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification

Force field-inspired molecular representation learning for property prediction

Neighborhood Complex Based Machine Learning (NCML) Models for Drug Design

Keywords

1 Introduction

2 Related Work