Keywords

1 Introduction

One of the critical requirements for advancing the field of drug discovery and development is identifying potentially harmful interactions between two or more drugs when they are co-administered to a patient. The challenge is to discover a drug-drug interaction (DDI) before the drug is approved to be used for therapeutic cure of a certain disease or a symptom [11]. Computational predictive methods are seen as a faster and safer alternative to the time-consuming and laborious process of identifying interactions between drugs. However, the performance of the methods experimented in the past is typically strongly affected by the quality of manually-engineered features. Various features that require domain-expert knowledge have been experimented with, from the molecular structure of a drug to its side effects.

The development of feasible computational methods to predict unknown association between two drugs have received a heightened attention in recent years. While early DDI prediction methods have relied upon detailed information on the structural properties of the drugs and their targets i.e. proteins, most applications and research studies today rely on extracting information from publicly available repositories i.e. drug-drug interaction networks. Graph representation learning approaches based on random walk and deep learning in DDIs networks have emerged as a new trend. The methods evaluated in this paper fall under this category. The high-dimensional dense matrix of graph data are usually mapped to a low-dimensional vector space using some of the techniques for graph embeddings.

In the last decade, graph-based approaches have achieved unprecedented improvements in performance across a broad spectrum of problems ranging from recommender systems to learning molecular fingerprints. The interest graph-based analysis has attracted in the research on drug discovery have motivated our research in the direction of extracting knowledge from drug-drug and drug-food interaction networks to facilitate the discovery of new interaction that might have unwanted effects on patients.

What early research work in the field has in common is that information from a variety of sources were exploited for discovering new interactions between two drugs. In particular, similarity metrics based on local and global network structures in drug-drug interaction networks [10, 14, 20, 21], chemical molecular drug compounds [12], drug targets, drug’s side effects and other features have been frequently explored [22].

While similarity-based approaches [12, 14, 20, 21] are still in use as baseline methods, representation learning on graphs, that is finding low-dimensional embeddings of nodes that preserve the similarity of nodes and structural properties of entire graphs are hailed as state-of-the art approaches to link prediction. In general, graph embedding techniques belong to three categories: factorization methods, random walk based [10] and deep learning based [11] methods. Identifying the type of a new computationally discovered drug-drug interaction usually accompanies or complements the task of DDI prediction. Knowing the interaction type of a DDI has a potential to enhance our knowledge and understanding of drug interactions, especially those DDI that might cause adverse drug reactions [11,12,13].

The objective of this research was two-fold: 1) to investigate the potential of several methods for predicting hypothetical i.e. unknown drug interactions, and 2) to explore different classification methods for identifying the type a given drug interaction belongs to. We focus our study pertaining to the first objective by formulating the problem of discovering new drug interactions as a link prediction. Link prediction refers to the task of predicting likely but missing links in a graph; for the problem at hand we use the graph that encodes information about previously confirmed interactions between drugs. Regarding our second research objective i.e. determining the type of a newly discovered drug-drug interaction, chemical fingerprint of the pairs involved are utilized.

Creation of benchmarking repositories containing previously confirmed drug-drug interactions, such as DrugBankFootnote 1 gold standard DDI dataset established by Wishart et Al [19] gold standard DDI dataset, SemMedDBFootnote 2, and BioSNAPFootnote 3 have played an important role in advancing the field. A subset of 1024 drugs and their 70,000 interactions generated by a previous computational framework known as DeepDDI [12] have been targeted in our extensive experimentation. The chemical structure in SMILES format was extracted from the DrugBank.

After a brief summary of related research presented in Sect. 2, we introduce the datasets and present the details of the adopted methodology in Sect. 3. The findings and the interpretation of the results are presented in Sect. 4, while the last section concludes the paper and points to direction for future research.

2 Related Work

A review of current research on the topic of computational discovery of new DDIs have revealed common themes and differences, however the studies closely related to ours are discussed.

Drug-drug prediction has been previously approached as a link prediction problem using ensemble-based classifier and two novel matrix factorization methods as proposed by Shtar et al. in [14]. In the study, the authors have used the previous releases of the DrugBank dataset for training their predictive model that was subsequently tested on newer DataBank releases containing more drugs and interactions between drug pairs. Several similarity measures, such as Adamic-Adar, Jaccardi and Katz have been tested.

Several graph embeddings, including DeepWalk, node2vec, LINE and metapath2vec were trained on a drug-target interaction network and later evaluated on several benchmark datasets for DDI prediction [11]. The findings of the study showed that the graph auto-encoder outperformed other classifiers for learning graph embeddings.

A computational framework DeepDDI that uses deep neural network for predicting drug reactions based on the names and structural properties of a pair of drugs is proposed in [12]. Previously confirmed drug reactions between drugs contained in the gold standard drug-drug interaction dataset, DrugBank [19] has been used as a basis for the prediction. Chemical structural information has been represented as a molecular fingerprint of the drug in a simplified SMILES format. The DeepDDI model predicts the type of a new DDI that could belong to one of the 86 interaction types with 92.4% mean accuracy. The output of the model is a human-readable sentence that specifies the pharmacological effects and/or adverse drug reactions between two drugs. The same predictive model DeepDDI has been used for predicting the type of interactions between a pair of drug and food constituent [12]. While similar in the objective to ours, our study differs in the methods used for prediction and classification of new drug-drug interactions and their type.

Many of the early studies on drug-drug and drug-target prediction task, suffered from the fundamental weakness of taking into account only the connectivity between pair of drugs without any regard for the global structural properties of the underlying network. Lee and Nam have proposed employing random walk with restart (RwR) to predict interactions between drugs and targets on the basis of the protein-protein interactome network [10]. The authors have pointed out that reweighting features by the RwR method has led to performance advantage compared to previous research methodologies.

Inspired by previous successes of using GraphSAGE for inductive learning of node representation [7] on similar tasks [5, 6], we have also experimented with its applicability on our dataset comprised of DDIs augmented with molecular structural information on drug pairs.

DECAGON was proposed in [22] as a novel graph convolutional network for predicting polypharmacy side effects that can be used on large multimodal graphs that include various types of information on drugs, proteins and side effects, which are relevant for the task at hand. We are currently extending the information on which the new DDI prediction will rely, by incorporating and fusing information from a variety of sources i.e. existing repositories and knowledge graphs.

The approaches tested in this research have drawn upon the experiences of previous research on related problems and similar datasets.

3 Datasets

The first dataset was based on the network generated by a previous computational framework known as DeepDDI [12]. The output of DeepDDI is a human-readable sentence that describes the drug interaction between a pair of drugs if exists plus the number indicating the interaction type. Previously confirmed drug reactions between drugs contained in the gold standard drug-drug interaction dataset. DrugBank [19] has been used as a basis for the prediction. For our first objective of predicting new interactions between drugs, we have created a dataset that contains 70,000 DDIs generated by the DeepDDI model.

For each drug, its molecular structure has been converted into a molecular fingerprint that represents a one hot encoding vector. The binary value for each element represents the presence or absence of a particular substructure in a molecule. Graph Only Fingerprint (GraphFP), a specialized version of molecular fingerprint that uses 1024 bits to encode the structure of a drug without chemical bond information, has been used for generating our second dataset. The PaDEL-descriptor software was used to create the molecular fingerprints. This software uses the open source Chemistry Development Kit (CDK) [18] to represent the chemical concepts into suitable data structures that can be further manipulated. For each DDI interaction pair, the molecular fingerprint encodings of both drugs in a pair have been concatenated. By doing so, every interaction between a pair of drugs was represented by a vector of 2048 features. In addition, the interaction type was included in the molecular fingerprint feature vector. Graph fingerprints were retrieved in a simplified molecular input line entry system (SMILES) format from the DrugBank site, while the missing molecular structures were manually extracted from the PubChem siteFootnote 4. SMILES format represents the chemical structure in a machine-readable form as a concatenated list of symbols for atomic elements and symbols for the type of bond between them.

4 Methodology

The task of predicting new and unknown drug-drug interactions has been defined as a link prediction on the graph representing the information on previously identified drug interactions. The nodes in the graph represent drugs and the edges between them denote their known interactions. An edge between two nodes was created based on the information in our first dataset containing 1123 drugs and roughly 70,000 interactions between drug pairs. The dataset was split into a training and a testing dataset; 10% of the edges representing existing interactions have been removed to serve as positive samples. An equal number of negative samples were randomly chosen from the set of non-existent edges.

Several approaches have been explored, including similarity-based, random walk-based and deep graph convolutional link prediction methods.

4.1 Similarity Based Link Prediction Methods

The basic formulation of the similarity-based link prediction task between a pair of nodes u and v, uv \(\epsilon \) V, in a graph G, is given by a score S(uv). The underlying assumption according to [2], is that a pair of nodes with higher similarity scores have more chances to be connected i.e., an edge between them exists, compared to the ones with lower values. The existing nodes and edges in a graph, in our case the drug-drug interaction network, are used to calculate the similarity scores of the nodes and predict the possibility of existence of new links that are more likely to happen than others.

We have evaluated three similarity-based methods, namely Jaccard coefficient, Adamic-Adar index and Preferential attachment score that have been widely used in previous research on a number of tasks [2]. Similarity metrics, Jaccard and Adamic-Adar use the similarity of the local neighborhood of a pair of nodes to infer the probability of them being connected by an edge. In contrast, the metrics such as the Preferential attachment score use the global structural properties of the network by assigning similarity scores to each node.

Jaccard coefficient between two nodes u and v is calculated as the ratio of the number of the neighboring nodes they share compared to the total number of their neighboring nodes. Given two nodes u, v \(\epsilon \) V, the Jaccard coefficient is calculated by the following formula:

$$\begin{aligned} S^{Jaccard}(u,v) = \frac{|\varGamma (v) \cap \varGamma (u) |}{\varGamma (v) \cup \varGamma (u)} \end{aligned}$$
(1)

where \(\varGamma |u|\) and \(\varGamma |v|\) represent the sets of neighbours of nodes u and v, respectively.

Adamic-Adar (AA) index is a topology-based similarity measure, initially proposed to predict connections in social networks based on the degree centrality of the common neighbors two nodes share. It is calculated as the sum of the inverse degree centrality of two nodes, with an implication that the more friends two nodes have in common, the lower the AA index value will be. The formula for calculating the AA index between two nodes u and v follows [1]:

$$\begin{aligned} S^{AA}(u,v) = \sum _{w \epsilon \varGamma (v) \cap \varGamma (u)}^{} \frac{1}{log|\varGamma (w)|} \end{aligned}$$
(2)

where, \(\varGamma |w|\) is the set of adjacent nodes of w, which is a common neighbour of the node u and node v.

The underlying premise of the preferential attachment score algorithm is that the more connected a node is, the more likely the node will establish new links to other nodes [3]. The similarity score, based on the preferential attachment algorithm takes into account the degree of both nodes in the pair under investigation, as shown in formula 3. The computational complexity of calculating the scores on a global scale for a given network is the major drawback of this similarity-based measure.

$$\begin{aligned} |\varGamma (u)|*|\varGamma (v)| \end{aligned}$$
(3)

4.2 Random Walk-Based Link Prediction

Network-based methods for link prediction, such as random walks are based on establishing a suitable ranking algorithm for graph entities that is used for prediction of new links. Random walk with restart (RwR) is one of the most popular network propagation algorithm that has shown to be particularly applicable to biomedical problems involving graph-based knowledge [2, 15].

Given a connected weighted graph G(V, E) with a set of nodes V = \(v_1\), \(v_2\), ..., \(v_N\) and a set of links E = \(\{\)(\(v_i\), \(v_j\)) | \(v_i\), \(v_j\) \(\in \) V\(\}\), a set of source/seed nodes S \(\subseteq \) V and a N x N adjacency matrix W of link weights are defined. Random walk with restart is a variation of the original random walk algorithm [16] that assumes a random walker moves from a current node to a randomly selected adjacent node or going back to a seed node source nodes with a back-probability \(\gamma \)(0, 1). The RwR algorithm can be formally defined using the following notations [9]:

$$\begin{aligned} p^{t+1} = (1-\gamma )W^{t}p^{t} + \gamma p^{0} \end{aligned}$$
(4)

\(p^{t}\) is a \(N x 1 \) vector of probabilities for each of the |V| nodes at a time t, where the i-th element denotes the probability that the walker is currently at node \(v_{i}\) \(\in \) V, and \(p^{0}\) is the \(N x 1 \) initial vector of probabilities defined as follows [9]:

$$\begin{aligned} p^{0}=\left\{ \begin{array}{ll} \frac{1}{|S|} , &{} \text{ if } v_{i} \epsilon S.\\ 0, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
(5)

A graph transition matrix \(W^{'}\)is defined as follows [9]:

$$\begin{aligned} (W^{'})_{ij} = \frac{(W)_{ij}}{\sum _{j}^{} (W)_{ij}} \end{aligned}$$
(6)

where each element in \(W^{'}\) denotes a probability that a walker at \(v_{i}\) moves to \(v_{j}\) among V\(\backslash \){\(v_i\)}.

4.3 Deep Learning Link Prediction Methods

Motivated by previous success of deep neural networks, in particular Graph Auto-Encoder presented in [8], for learning node and graph representation, we have also evaluated two deep learning approaches on the task of predicting new drug interactions.

Graph Auto-Encoder consists of graph convolutional encoder and decoder networks, that learn in an unsupervised manner the low-dimensional representation of node embeddings. The output of the decoder reconstructs the adjacency matrix of a given graph, G.

Let \(a_{i}\) \(\in \) R\(^{N}\) be an adjacency vector i.e., a row in an adjacency matrix A of a graph G, which represents the local neighborhood of the i-th node. A four-layer auto-encoder architecture we have used to learn the node embeddings from local neighborhood structures is known under the name of LoNGAE (Local Neighborhood Graph Auto-encoder) [17]. The LoNGAE neural model includes a set of non-linear transformations on \(a_i\) by its two components: an encoder g(\(a_{i}\)): R\(^{N}\) \(\rightarrow \) R\(^{D}\) , and decoder f(\(z_{i}\)): R\(^{D}\) \(\rightarrow \) R\(^{N}\). A two-stacked-layers encoder derives a D-dimensional latent feature representation of the i-th node \(z_{i}\) \(\in \) R\(^{D}\), while the two-stacked-layers decoder outputs an approximate reconstruction of the vector, \(\hat{a_{i}}\) \(\in \) R\(^{N}\). The reconstructed output \(\hat{a}\) represents the likelihood score for existence of an edge between two nodes. The hidden distributions for the encoder and decoder are given with the following [17]:

$$\begin{aligned} z_{i} = g(a_i) = ReLU(W \cdot ReLU (Va_i + b^{(1)}) + b^{(2)}) - Encoder Part, \end{aligned}$$
(7)
$$\begin{aligned} \hat{a_i} = f(z_i) = V^{T} \cdot ReLU(W^{T}z_i + b^{(3)}) + b^{(4)} - Decoder Part, \end{aligned}$$
(8)
$$\begin{aligned} \hat{a_i} = h(a_i) = f(g(a_i)) - Auto-encoder. \end{aligned}$$
(9)

The activation function used in our model implementation was the rectified linear unit, ReLU(x) = max(0, x) . The auto-encoder is constrained to be symmetrical with shared parameters for W, V between the encoder and decoder, resulting in almost two times fewer parameters compared to an unconstrained architecture. Note that the bias units denoted by b do not share parameters. \(\{W^{T} , V^{T}\}\) are transposed versions of \(\{W, V\}\). The parameters to be learned are summarized as \(\theta \) = \(\{W , V, b^{(k)}\}\), where k = 1, 2, 3, 4.

A graph-convolutional network (GCN) has been used for learning node representations, for which the initial graph of DDIs was augmented with additional node and edge features i.e. molecular fingerprint of each drug pair and the interaction type accordingly. A variation of Graph Convolutional Networks, very similar to the implementation of GraphSAGE framework [7] was used for the task of predicting drug-drug interactions in [6]. GraphSAGE is framework that has proved to be effective for training on large scale networks to inductively learn the embeddings by its sample and aggregate approach. Sampling refers to the process of drawing node’s neighborhoods of a certain depth k, while an set of aggregator functions incrementally, with each iteration, aggregate features from node’s neighborhoods. The aggregator functions, denoted as \(AGGREGATE_k\), \(\forall \)k \( \epsilon \{1 ,\dots , K\}\) are jointly learned with the weight matrices\(W_k\), \(\forall \)k \( \epsilon \{1, ..., K\}\) by propagating information to node’s neighbors at various depth. The embeddings generation process is described by the following equations [7]:

$$\begin{aligned} h^k_{N(v)}\leftarrow AGGREGATE_k(\{h^{k-1}_u,\forall u\in N(v)\}) \end{aligned}$$
(10)
$$\begin{aligned} h^k_v\leftarrow \sigma \Big ( W^k \cdot CONCAT(h^{k-1}_v, h^{k}_{N(v)}) \Big ) \end{aligned}$$
(11)

where \(h^k\) is the node representation at depth k, vu \(\in \) V, \(N: v \rightarrow 2^{n}\) is a neighbourhood function, \(\sigma \) denotes the non-linearity.

Standard stochastic gradient descent and backpropagation techniques are used for learning in a fully unsupervised manner. The graph based loss function is given by the following formula [7]:

$$\begin{aligned} J_G(z_u) = -log(\sigma (z_u^T z_v)) - Q \cdot E_{v_n \sim P_n(v)} log(\sigma (z_u^T z_{v_n})), \end{aligned}$$
(12)

where v is a node that appears near node u on fixed-length random walk, \(\sigma \) is the sigmoid function, \(P_n\) is a negative sampling distribution, and Q defines the number of negative samples.

There is no ordering of nodes in a graph, so the aggregation function needs to be symmetric. Three aggregator functions have been examined in the original proposal: mean aggregator, LSTM-based (Long Short Term Memory) and pooling aggregator function, with pooling aggregator showing the best results [7].

Our GraphSAGE model was consisted of two GraphSAGE layers, and was implemented using Mean aggregator, as set by default. The embeddings of the nodes within a drug pair were combined by inner product function in order to obtain the final prediction whether a link between them is predicted to exist or not.

4.4 Drug-Drug Interaction Type Classification

The second task we have posed in this research was to identify the interaction type of a newly discovered drug interaction, for which the concatenated feature vectors consisted of molecular structural information have been used. Three methods we have chosen to tackle the second problem of determining the interaction type of a new DDI belong to both, traditional machine learning and deep learning approaches. The traditional classifiers we have chosen as suitable for the task were: k-nearest neighbors (KNN) classifier and an ensemble classifier. The rationale for the first relates to utilizing the similarity of feature vectors as a distance measure, while XGBoost [4] was used as an ensemble classifier because of their superiority in performance across a variety of tasks [4]. The longer training time of the ensemble classifier was expected. A simple deep LSTM (Long Short Term Memory) neural network, implemented as a 6–8 layers architecture including dense and dropout layers with around 300,000 parameters was also employed for classifying the drug-drug interaction types. The second dataset we have used that incorporates a vector of 2049 features as a molecular fingerprint for each pair of drugs is rather unbalanced; the number of positive/negative samples varies across all 86 interaction types (classes).

5 Discussion of Results

5.1 Prediction of Drug-Drug Interactions

Table 1 shows the performance metrics of the similarity-based link prediction methods, namely the accuracy, ROC, precision, recall and F1 that are macro-averaged. The performance analysis of link prediction methods used for discovering new drug-drug interactions have pointed out the advantages and limitations of different models we have employed on the task. The hyperparameters were experimentally set, i.e. the threshold value for the Jaccard coefficient, Adamic-Adar Index and Preferential attachment score was set to 0.001, 0.05 and 150, respectively. Preferential attachment score has obtained the best performance results, with an overall accuracy of 0.69, although it should be noted that the time complexity of this method depends on the size of the network and might be of a great concern for large scale networks. The performance results vary across classes and we could speculate that the unbalanced dataset used in the experimental evaluation has a degrading effect on the performance.

Table 1. Performance results of similarity-based link prediction methods on the task of predicting new drug interactions.

Table 2 presents the comparative performance results of the four different types of methods, Preferential attachment score as the best performing similarity-based link prediction method, random walk-based, Graph Auto-Encoder and GraphSAGE method. The performance results strongly demonstrate that the random ralk with restart model has outperformed the other methods achieving the best ROC value of 0.9 and an average precision score of 0.84. The obtained value for the macro-averaged recall was 0.90, which points out that the RwR model handles better the variance in data. Little evidence was found for the advantages of the graph auto-encoder model compared to the RwR model. We might speculate that augmenting the models like RwR and Graph Auto-Encoder with additional features for drug pairs and not relying only on the network topological properties could lead to some performance gains. Our implementation of GraphSAGE has achieved accuracy value of 0.76, making it the second best model, after RwR in our case. A deeper neural structure of the GraphSAGE model which would require more powerful processing resources is expected to achieve significantly higher performance values. It should be noted that even though this deep learning model was modest in size with around 16 000 parameters, it has achieved very encouraging results.

Table 2. Comparative performance results of the best performing similarity-based method, random walk with restart (RwR), graph auto-encoder (Graph AE) and GraphSAGE on the task of predicting new drug interactions.

5.2 Classifying the Type of Drug-Drug Interaction

The second equally important objective in our study was to evaluate several methods for classifying a newly discovered drug-drug interaction type. Surprisingly, the simple LSTM deep learning network did not lead to any performance gains compared to other methods. Further improvements might be expected for batch sizes larger than 100. Our current research efforts are focused on hyperparameter optimization of the LSTM neural network.

The performance of the KNN model were not satisfactory, slightly above 0.54 for k = 3 neighbors. According to the results summarized in Table 3, the XGBoost ensemble model has obtained an overall accuracy of 0.63. Considering that the dataset used for interaction type classification did not have a uniform class distribution, the varying performance values, especially the precision and recall, were expected. We are currently attempting on improving the quantity and the quality of the data samples to improve the performance on this task.

Table 3. Performance results of the k-nearest neighbor and XGBoost on the task of interaction type classification.

6 Conclusions

Undesirable effects of a medication that occur when simultaneously two or more drugs are administered to a patient need to be identified promptly. Improving our knowledge of potentially dangerous drug interactions is a recurring challenge in the field of drug discovery and development.

Due to considerations of scale, experimental design, and the cost of experimentally verifying interactions between drugs, it is safe to posit that most unknown interactions are unlikely to be experimentally characterized; instead computational and machine learning methods have been successfully deployed. Following the results of previous research in the field, we have experimented with and evaluated several models for the task of identifying new drug interactions and classifying their interaction type. Random walk with restart algorithm, performed best on the task of discovering drug-drug interactions, while the XGBoost ensemble model performed best on the task of classifying the drug-drug interaction type. The GraphSAGE model has achieved promising results on the task of drug interaction prediction and should be considered a method of choice when training of large scale dataset that require significant processing power are available.

Our current research efforts are directed towards extending our models and feature representation learning by including other relevant drug-related attributes, such as drug descriptions, currently available information on potential side effects, etc.