Keywords

1 Introduction

Proteins play a vital role in biological systems and have numerous functions such as catalysts, transporters, regulators of signal transduction . They are linear heteropolymers folded into three-dimensional structures. The amino acid residues interact through various covalent and non-covalent bonds in a specific manner to obtain a particular three-dimensional structure, which determines their functions. Knowledge of the relationship between protein structure and its function is important in drug design , molecular medicine, and biotechnology.

Different computational methods have been used for investigations of protein structures and their functions, finding functionally important residues, prediction protein–protein interactions , discovering new biological active compounds. In the most approaches, the protein structures have been viewed as linear sequences of amino acid residues packed into 3D globules. In the last decade, an alternative view of proteins structures has emerged that describe the protein spatial structure as network of amino acids residues interaction.

Network analysis has successfully used in different fields, such as social networks [1], Internet networks [2], road networks [3]. In biology, this method is widely used for analysis of networks of gene regulation, protein–protein interaction , metabolites flow, prediction of drug side effects, etc., [4,5,6,7,8,9]. Applying network methodology for polypharmacology was reviewed in [10].

A network method is based on the graph theory and includes a set of entities (nodes) and of the relationships (edges) occurring among them. These nodes and edges can have various attributes. Depending on the object of the study, nodes can represent genes, proteins, small compounds, and edges connecting these nodes represent the physical interactions, genetic regulatory, or other properties linking the nodes. Edges can have additional information, such as weights, directions.

According to the structure of protein, every amino acid residue in it is considered to be a “node” or “vertex,” and the interaction of residues represents “edge” (Fig. 1). The existence of an edge between two nodes depends only on their spatial position in protein globule and has no relation to position in their primary sequence. The interaction can be represented as distance between Cα or any other atoms of amino acid residues, non-covalent interaction (electrostatic , hydrophobic , H-bonds) of the particular amino acids [11]. Additionally, in residue interaction network (RIN), the energy of interaction between residues can be used for weighting the edges [12, 13]. Proteins can be also modeled as subnetworks of amino acid residues having similar physiochemical properties. RIN method reduces spatial protein architectures to simple maps including nodes (residues) and edges (inter-residue interactions). Analysis of these graphs yields a characterization of the protein’s topology and network characteristics.

Fig. 1
figure 1

Structure of SH2 domain of proto-oncogene tyrosine-protein kinase SRC (PDB ID 1o41) in cartoon (A) and RIN representation

There are several names of the resultant intraprotein amino acid residue interaction networks . They are called residue interaction graphs [14], protein structure graphs [15, 16], protein residue networks [17], protein contact networks [18], protein energy networks [13], amino acid networks [19], protein structure networks [20], residue interaction networks [21]. In this review, we will use the residue interaction networks (RINs) to distinguish it from network of protein–protein interactions .

The application of RIN method in drug design is just at a beginning. RINs have been used to analyze protein stability and folding [22, 23], 3D structure modeling [19, 23], finding functionally important amino acid residues and sites [14, 24], analyzed protein–protein interactions [25], allosteric regulation [26], influence of amino acid mutations [27]. These studies showed that RIN method is valuable approaches allowed to improve the drug discovery process. Recently, several reviews on RINs have been published [28,29,30,31].

Herein, we aim to review the investigation of the construction, analysis, and application of RINs in fields related to drug design .

2 Graph Theory and Residue Interaction Network

Graph theory represents complex system as a set of elements (called vertices or nodes) with their connections (called edges). Each node can be connected to each other through multiple edges. Adding order of nodes in the graph, we get a directed graph, where edges are directed and usually represented as arrows. Introduction of the quantitative characteristics of the edges results in a weighted graph. Nodes with edges form a network. The network representation helps to analyze the interaction among individual elements and to characterize the whole system.

Residue interaction network is constructed on the base of the three-dimensional atomic coordinates of protein structure and consists of nodes and edges. Each node represents amino acid residue (or Cα atom) that is connected to the neighbor node. In the simplest variant, the edges are defined on the base of predefined cutoff of the distances in 3D structure between nodes. The values of distance may be varied based on nature of interactions (van der Waals , hydrophobic , electrostatic interactions, etc.). Frequently, the covalent backbones are included as edges in the networks. The edges can be weighed based on energy of interactions, knowledge-based potentials, or amino acid fluctuations in molecular dynamics simulation [30, 31]. The differential network (DDN) method was proposed where network formed by unique edges that are present only in one state but are absent in other ones [32].

Networks have several most common characteristics; some of them that frequently have been used for analysis of biological systems are listed below [28, 31, 33].

A degree of a node is a number of edges in a network that connect node with its neighbors. In a directed network, there might be two types of degrees, the in-degree, and the out-degree depending on the orientation of the edges. An average degree is the average number of connections that the nodes have in a network.

A connectivity represents a minimum number edges that need to be removed to make a disconnected graph. The connectivity structure and the degree of nodes analysis in RINs help to identify important residues, i.e., participating in ligand binding sites.

A shortest path is a path in which the two nodes are connected by the smallest number of intermediate nodes. A characteristic path length is defined as the number of edges in the shortest path between two nodes, averaged over all pairs of nodes. Residues with small shortest path lengths are often located in the active or ligand binding sites of proteins [17] and participate in allosteric pathways [34, 35].

A betweenness centrality of a node is the number of times that a node is included in the shortest path between each pair of nodes, normalized by the total number of pairs.

A closeness centrality of a node is the reciprocal of the average shortest path length.

The network concept is widely used to analyze and predict properties in different biological systems, from intramolecular interaction to whole cells and organisms. Biological networks are small worlds that means that two nodes are connected to each other via only a few other nodes [23, 30]. There are several network parameters for characterizing different aspects of biological networks.

A hub is defined as a node with a high degree or connectivity in a network. Hubs may play a structural role in proteins increasing the thermodynamic stability of proteins [14, 36].

A cluster is a set of nodes with the number of connections, which is higher than in the other nodes. Clusters often are equivalent to a domain of protein and participate in intramolecular interactions.

A clique is a set of nodes in which each node of graph is connected to every other node. Studies of cliques can help to understand ligand -induced population shift in protein [37].

There are several software packages, Web servers, and plug-ins available for construction and analyzing of RINs, such as Xpyder (http://xpyder.sourceforge.net/) [38], Network View [39], RING (http://protein.bio.unipd.it/ring/) [21, 40], RINalyzer (http://www.rinalyzer.de) [41], structureViz (http://www.cgl.ucsf.edu/cytoscape/structureViz/) [42].

Web server RING constructs physicochemically RINs from PDB files for subsequent visualization in the Cytoscape (software platform for the analysis and visualization of biological networks) (http://www.cytoscape.org) or Pymol (https://pymol.org/). Interactions (edges) are disulfide bonds, salt bridges, hydrogen bonds , aromatic interactions, and van der Waals contacts. Several features can be added to nodes and edges, such as secondary structure, solvent accessibility, energy score, sequence conservation. Subnetwork can be also constructed.

RINalyzer and structureViz are plug-ins for Cytoscape [43] that link Cytoscape with the molecular viewer UCSF Chimera (http://www.cgl.ucsf.edu/chimera/) [44]. They allow interactive structure analysis of RINs together with the corresponding 3D protein structure.

NetworkView plug-in for VMD (https://www.ks.uiuc.edu/Research/vmd/) allows to study allostery and signaling through network models. This plug-in can display the dynamical network representations.

3 RINs Application

3.1 Ligand Binding Sites

Identification of the ligand binding sites of proteins and functionally important residues is a crucial first step in drug design . However, it is a difficult task in the case of the absence of homologous proteins.

Several topological parameters of RINs may be used for the prediction of ligand binding sites. Several investigations showed that closeness and betweenness values of residues are correlated with ligand binding site residues [14, 34, 45,46,47,48]. The accuracy of prediction such residues may be improved by combining with such parameters as their solvent accessibility. So, Amitai et al. [14] could predict active site residues in 70% of the analyzed 178 enzymes proteins, using closeness centrality and solvent accessibility parameters. The similar result was obtained in [49]. The closeness centrality was used as parameter in machine learning methods for prediction of functionally important residues [50] or in score for docking [25].

However, for non-enzyme proteins correlation between closeness centrality and binding sites has not observed [34, 51]. In addition, global closeness centrality gave unsatisfactory result for non-globular and oligomer proteins. For such proteins, more tolerable prediction was obtained with local closeness [52]. It seems that the ligand binding sites in enzymes are correlated with centrality due to their typical location in cavities of the enzymes, whereas in oligomer proteins, the protein–protein interfaces are more flat [53], which reduces the centrality of their residues.

Coevolution residues networks, which include information about coevolved residues, were also used for predicting functionally important residues [54, 55]. RIN analysis was applied for prediction similarity of ligand binding sites in different proteins [56, 57].

The node-weighted RIN, called node-weighted amino acid contact energy network (NACEN) was developed for prediction hotspots, catalytic residues, and allosteric residues. Nodes were weighted based on structural, sequence, physicochemical and dynamic properties of the residues. SVM was used for design model to identify functionally important residues. The results revealed that parameters from node-weighted RIN have advantages over ones from unweighted network [58].

Poirrette et al. [56] designed RIN of the influenza sialidase binding site of Zanamivir and used it to predict proteins having the similar binding sites. Such an approach may be used for repurposing drugs or prediction of side effects.

3.2 Protein–Protein Interactions

Protein–protein interactions (PPIs) are crucial for many biological processes and functions; inhibition of PPIs with small molecules is a perspective way in drug design [53]. RIN method was used for analysis of protein–protein interfaces, prediction of hotspots, and selection of protein poses in the protein–protein docking .

Several investigations were done using RIN for analysis of protein–protein interfaces. They showed that hydrophobic and charged residues are predominant in the dimer interface and that arginine, histidine, glutamic acid, phenylalanine, and tyrosine are located in clusters at the interface [59, 60]. In those clusters, highly connected residues correlate with experimentally identified hotspots in the protein complexes [15, 16, 61, 62].

Correct prediction of protein–protein complexes using individual proteins by docking method is a big challenge, since the docking gives many false-positive solutions [63, 64]. Protein–protein complex formation may be viewed as combining of two RINs, where additional edges have appeared between nodes from different subunits. The interaction of residues occurs in accordance with their properties. Since native protein–protein complexes are far from random, the correct and incorrect poses have different topologies.

Chang et al. [65] designed hydrophobic and hydrophilic RINs of a protein–protein complex. Three terms based on these networks (degree, clustering coefficient, and characteristic path length) were calculated and used in network-based scoring function HPNet. Combining it with energy terms of RosettaDock [66] results in new combined scoring function HPNet-combine. It was found that HPNet-combine could improve the discrimination of the RosettaDock scoring function.

The similar methodology based on the construction of a hydrophobic and hydrophilic RINs of protein–protein complexes was used for the development NPPD scoring function [67]. Protein–protein docking , HoDock, and scoring function HPNCscore (hydrophobic , and polar network combined scoring function) were developed. It showed good results for several targets in Critical Assessment of PRedicted Interactions (CAPRI) rounds [68].

The weighed RINs were used for the development of Sn scoring function [69]. Two weighted parameters (strength and weighted average nearest neighbors’ degree) were introduced to develop a scoring function. The testing of this scoring function for 42 protein–protein complexes had shown a satisfied performance.

The scoring function based on the local network patterns, iScore, was proposed [70]. It achieved 83.6% specificity with 82% sensitivity for training set of ~1800 two domain proteins, homo- and heterodimers.

3.3 Allosteric Regulation

Allosteric regulation is a common mechanism to control the protein activities. The perturbation at the allosteric site results in transmission of signal through the protein structure to other sites leading to modification of catalytic activity, oligomerization, etc. [71, 72].

Allosteric sites became attractive target for drug design at last decade. Allosteric drugs have several potential benefits over orthosteric drugs . They may be more specific due to less similarity of allosteric sites comparing to active site in homologous proteins; they can increase or decrease the activity of enzymes and receptors; partially inhibiting by allosteric drugs may cause less side effects [73, 72].

Using allosteric sites for drug design, it is required to predict allosteric sites, residues involved in signal transduction pathways to the active sites. The search of allosteric sites by RIN method is similar to the other sites described above.

Allosteric pathways show how the signal may be transmitted over a long distance from allosteric to active sites within the protein. RIN is accurate and not time-consuming method for prediction such pathways.

Once the RIN constructed, several algorithms can be used to find allosteric pathways within the RINs. The common method is to find the shortest paths connecting the allosteric and active sites [34, 35, 74]. The shortest path may be determined by Floyd–Warshall algorithm. It was shown that many proteins may be considered as a set of modules (subgraphs with many interconnections and with few connections to other subgraphs). The residues involved in the interaction of such modules can participate in allosteric pathways [75]. It is proposed that such residues are conservative that also may be used for their prediction [76,77,78]. Proteins can have multiple allosteric pathways, which may preexist without effector binding at allosteric site [79]. Various pathways may be involved depending on the different changes in allosteric site.

However, RINs constructed based on a single structure do not take into account the structural changes in protein globule. Therefore, the combination of molecular dynamics simulation (MD) followed by RINs design frequently has been used to detect and to analyze allosteric pathways. In these cases, the edges in RINs are defined using various parameters obtained from MD. The edges may reflect the correlation of displacements of the residues [74, 80], the fluctuation of distances [81], interaction energy [82], etc.

Aminoacyl-tRNA synthetases are convenient objects for analysis of allosteric communication. The combination of MD with RIN was used for discovering pathways from anticodon region to the aminoacylation region for methionyl-tRNA synthetase [74, 83], glutaminyl-tRNA synthetase [84], cysteinyl-tRNA synthetase [35], and tryptophanyl-tRNA synthetase [85, 86]. Particularly, analysis of tryptophanyl-tRNA synthetase showed changes of flexibility around the active site induced by allosteric ligands binding and allowed to explain the molecular mechanism of half-of-the-sites reactivity (tryptophanyl-tRNA synthetase is a homodimer).

Another popular object is G protein-coupled receptors (GPCRs) [87,88,89]. It is a large family of membrane receptors, which have ligand binding site on the extracellular side of membrane and activation domain on its internal side. Using RIN method, several conservative residues participating in the signal transduction were discovered for the lutropin receptor [76] and A2A adenosine receptor [87] (Fig. 2).

Fig. 2
figure 2

Structure of A2A adenosine receptor (PDB ID 2ydv). One of the predicted allosteric pathways is shown in rainbow color scheme. The synthetic agonist NECA is in stick

3.4 Analyses of Mutations

RIN methods may be used for analysis and prediction of effects of amino acid mutation on protein properties, which may be useful for protein design , investigations of disease-associated single nucleotide polymorphisms, or mechanism of the drug resistance [27, 90,91,92].

Recently, we used RIN for investigation of the influence of several mutations on structure and flexibility of β-lactamase [93]. β-lactamases are class of enzymes responsible for bacteria resistant to β-lactam antibiotics . Besides, the key mutations, responsible for the extended spectrum β-lactamases or inhibitor resistance phenotype, secondary mutations , located far from active site and with a weak impact on the protein structure and enzyme activity, have been often appeared [94]. Analysis of MD trajectories showed that the secondary mutations , and the key mutations can exhibit opposite effect on the flexibility of the Ω-loop of β-lactamase that participate in antibiotic hydrolysis and transport in the active site [93]. Detailed analysis of RIN maps of proteins of consistent mutations from wild-type TEM-1 to TEM-72 (carrying two key mutations G238S and E240K and two secondary ones M182T and Q39K) showed that key mutations (responding for extended spectrum β-lactamases) lead to weakening interactions of the Ω-loop with protein globule. The appearance of secondary mutation M182T resulted in dramatic changing of conformation of R65, and this residue began to interact with the Ω-loop and fixed it near protein globule (manuscript submitted) (Fig. 3).

Fig. 3
figure 3

Part of the networks near Ω-loop of β-lactamases TEM-1 and its triple mutant (G238S, E240K, M182T). The additional interactions appeared in the triple mutant that results in freeze of movement of Ω-loop are in green

4 Conclusion

Herein, we have reviewed the development and current stage of RINs and their application for drug discovery.

RINs provide complex analysis of the proteins and their complexes. Residues are in tight contact with each other in protein globules, and RINs allowed to estimate their interdependence and to predict different properties and functionality of the individual residues and the whole proteins. In addition to topology, RINs allow to use chemicophysical properties of residues and energy of their interaction in RIN construction and analysis of proteins.

Besides, using RINs for investigation protein structure and functions, they may be applied in drug design in several ways.

Prediction of functionally important residues and sites can be helpful for understanding functions and regulation of uncharacterized proteins, finding active sites, allosteric and cryptic ligand binding sites. It may decrease the amount of “undruggable” protein, increasing field for drug design . On the other hand, many drug candidates fail in the late and costly stages of clinical trials [95]. Side effects are one of the main reasons for drug failure [96]. The detection of similarity in network topologies and interactions with ligands for several targets may indicate the promiscuity of drug candidates and possibly their side effects.

The development of inhibitors of protein–protein interactions is a perspective way in drug design , and RIN showed their applicability for this purpose. The analysis of networks may help to select correct poses in protein–protein docking that is important for the selection of inhibitor binding sites; incorporation of the terms from RINs may improve docking scoring functions.

Allosteric inhibitors are another mainstream in drug design in last decade. It is proposed that such inhibitors may regulate cellular processes more accurately. Allosteric regulation is the common property of protein, which may increase the number of druggable targets . RINs are convenient for finding allosteric sites, investigation of mechanism of intraprotein signal transmission. Prediction of the effect of amino acid mutations on protein structure and dynamics is crucial for the development drugs against diseases with a high probability of occurrence drug resistance , in particular antibacterial , antiviral , and anticancer drugs.

Nowadays, the application of RIN methods for drug discovery is at their early stage, but they already help to understand intimate properties of proteins and provide a new view for drug discovery.