ACO Based Core-Attachment Method to Detect Protein Complexes in Dynamic PPI Networks

Liang, Jing; Lei, Xiujuan; Guo, Ling; Tan, Ying

doi:10.1007/978-3-319-93815-8_11

Jing Liang¹⁶,
Xiujuan Lei¹⁶,
Ling Guo¹⁷ &
…
Ying Tan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10941))

Included in the following conference series:

International Conference on Swarm Intelligence

1602 Accesses
1 Citations

Abstract

Proteins complexes accomplish biological functions such as transcription of DNA and translation of mRNA. Detecting protein complexes correctly and efficiently is becoming a challenging task. This paper presents a novel algorithm, core-attachment based on ant colony optimization (CA-ACO), which detects complexes in three stages. Firstly, initialize the similarity matrix. Secondly, complexes are predicted by clustering in the dynamic PPI networks. In the step, the clustering coefficient of every node is also computed. A node whose clustering coefficient is greater than the threshold is added to the core protein set. Then we mark every neighbor node of core proteins with unique core label during picking and dropping. Thirdly, filtering processes are carried out to obtain the final complex set. Experimental results show that CA-ACO algorithm had great superiority in precision, recall and f-measure compared with the state-of-the-art methods such as ClusterONE, DPClus, MCODE and so on.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Core-Attach Based Method for Identifying Protein Complexes in Dynamic PPI Networks

Detecting protein complexes with multiple properties by an adaptive harmony search algorithm

Article Open access 07 October 2022

Identification of Functional Modules in Dynamic Weighted PPI Networks by a Novel Clustering Algorithm

Keywords

1 Introduction

Protein complex is a basic structural unit that can cooperate with each other to complete the specific biological function [1, 2]. The protein-protein interaction (PPI) network [3] is composed of a number of complexes which are related to each other to perform certain functions.

In recent years, many methods have been proposed to predict protein complexes. These methods greatly promote the progress in the field of complexes prediction. According to the category of complex recognition algorithms, these algorithms can be divided into the following categories: recognition algorithm based on dense sub-graph, recognition algorithm based on hierarchical clustering, recognition algorithm based on core-attachment structure.

Based on the theory of dense sub-graph in PPI networks, lots of algorithms are proposed. In 2003, Spirin et al. [2] proposed the algorithm using the results of traversing the fully connected graphs to identify complexes. Bader et al. [4] proposed a method, called molecular complex detection (MCODE). Palla et al. [5] proposed clique percolation method (CPM) based on the close connection of sub-graph filtering algorithm. And the algorithm’s application software is developed, called CFinder [6]. In 2006, Altaf-UI-Amin et al. [7] proposed the DPClus which can get the overlapping complexes. In 2008, Li et al. [8] proposed IPCA algorithm, which is based on dense sub-graph to identify overlapping complexes. In 2009, Liu et al. [9] proposed clustering based on maximal cliques (CMC) algorithm, which can dig out the dense sub-graph in PPI networks.

Hierarchical clustering theory is also used to predict protein complexes. In 2002, Aivan and Newman proposed GN algorithm [10], which is used to partition the modules in complex networks. In 2004, Hartuv et al. [11] proposed highly connected sub-graph (HCS) algorithm. In 2009, Li et al. [12] proposed a fast hierarchical clustering algorithm based on the local variable and edge clustering coefficient, and redefined the protein complex. In 2012, Wang et al. [13] proposed OMIM algorithm to predict duplicate complexes in hierarchical clustering system.

The core-attachment structure of complex is a significant view to detect protein complexes. Leung et al. [14] designed CORE algorithm which calculates the p-value for all pairs of proteins to detect cores. Wu et al. [15] proposed COACH algorithm.

Recently, swarm intelligence algorithms have been successfully applied to the detection of complexes in PPI networks [20]. In addition, there are many other algorithms, such as Markov Clustering (MCL) [16, 17] algorithm, ClusterONE [18] algorithm, SPICi [19] algorithm and so on.

In this paper, we proposed a protein complex prediction algorithm based on core-attachment structure and ant colony optimization method, CA-ACO. First, we adopt the weighted matrix of the dynamic PPI network as the similarity matrix of the undirected graph. Second, we use the clustering coefficient value of every node to obtain the core proteins. We mark every neighbor node of core protein with unique label through picking and dropping principle of ACO. Third, filtering processes are carried out to obtain the clustering result.

2 Methods

2.1 The ACO Based Core-Attachment Design

Since the protein is not always active in the cell cycle, in order to construct a dynamic model, we integrate the static PPI data and gene expression data because gene expression level and protein expression level are consistent. If the gene expression data at a certain timestamp is better than a threshold, then it can be considered that the protein is active at this timestamp. By using three-sigma [29] principle, active threshold is set. At a certain timestamp, if two proteins are active and interactional, it can be considered that there is an edge between the two proteins at this time. As gene expression data has 12 timestamps, the static network is divided into 12 sub-graphs which correspond to 12 timestamps. Eventually, the dynamic PPI network is constructed. Figure 1 shows a process of dynamic PPI networks construction.

Considering the organization of complexes, we combine the structure of core-attachment [14] with the principle of picking and dropping to predict complexes, and propose a novel algorithm, named CA-ACO. Figure 2 shows a part of the core-attachment formation design. There are two clusters which are in dotted circles of green and blue. Protein p and protein q are seed proteins. The connection between a protein and others is represented by a solid line or dotted line. The full line represents that two proteins belong to a cluster. Otherwise, they don’t belong to a cluster. Taking p’s neighbor protein a as an example, protein a does not belong to the cluster whose seed protein is protein p. Then, we should pick up a, and decide if protein a belongs to other cluster. Protein b is the neighbor protein of protein p and q. As the Fig. 2 shows, the protein b belongs to two clusters whose seed proteins are protein p and protein q. For protein c which is connected with protein p and protein q. Firstly, node c did not belong to cluster p and was picked up. Then, we have to decide its relationship with other seed nodes. There is a full line between c and q. Then, node c needs to be dropped out and put in the cluster whose seed node is protein q.

2.2 Description of CA-ACO Algorithm

The process of CA-ACO algorithm can be divided into 3 steps: similarity matrix initialization, clustering and purification.

In the first step, the similarity matrix of the dynamic PPI network is composed of 12 sub networks’ weighted matrix. The greater the weight value is, the greater the similarity between the nodes is. The similarity matrix of PPI network initialization is shown in Eq. (1).

$$ S({\text{v}}_{i} ,{\text{v}}_{j} ) = \left\{ {\begin{array}{*{20}c} {s({\text{v}}_{i} ,{\text{v}}_{j} )} & {if(v_{i} ,v_{j} ) \in E} \\ 0 & {else} \\ \end{array} } \right. $$

(1)

Where s(v_i, v_j) is weight value of edge (v_i, v_j). It represents the strength of the interaction of proteins v_i and protein v_j in weighted PPI network.

In the second step, we can obtain protein complex set from dynamic PPI network. There are two main processes: seed protein selection and attachment formation. In seed protein selection process, we need to compute the clustering coefficient value [30] of every protein by Eq. (2).

$$ ccv({\text{v}}_{i} ) = \frac{{2 \times n_{i} }}{{|neigh({\text{v}}_{i} )| \times (|neigh({\text{v}}_{i} )| - 1)}} $$

(2)

where neigh(v_i) is neighbor nodes set of a node v_i. A protein whose clustering coefficient value is greater than the threshold will be added to the core protein set. In attachment formation step, we need to access to neighbor protein set of every core protein. By carrying out the picking and dropping operation [21] of ACO, seed proteins’ neighbor proteins can be clustered. The probability of picking is calculated by Eq. (3).

$$ pp({\text{v}}{}_{j}) = (\frac{{k_{p} }}{{k_{p} + s({\text{v}}_{i} ,{\text{v}}_{j} )}})^{2} $$

(3)

where k_p is a picking constant whose range of values is from 0 to 1, s(v_i, v_j) is the similarity between the protein v_j and the current core protein v_i. In the operation of picking, the probability of picking is compared with a random probability. When the probability of picking is more than the random probability, the operation of picking is executed. Otherwise, the protein v_j is labeled the complex whose seed protein is the protein v_i. The probability of dropping is calculated by Eq. (4).

$$ pd({\text{v}}_{j} ) = \left\{ {\begin{array}{*{20}c} {2 \times s({\text{v}}_{i} ,{\text{v}}_{j} )} & {\begin{array}{*{20}c} {if} & {s\text{(}{\text{v}}_{i} \text{,}{\text{v}}_{j} \text{)} < k_{d} } \\ \end{array} } \\ 1 & {else} \\ \end{array} } \right. $$

(4)

where k_d is a dropping constant whose range of values is 0 to 1, s(v_i, v_j) is the similarity between the protein v_j and the current core protein v_i. The probability of dropping is compared with random probability in the operation of dropping. When pd(v_j) is more than a random probability, the operation of dropping is executed. Therefore, the protein v_j is labeled the complex whose seed protein is the protein v_i. Through the clustering process, we can get the initial clustering results where complexes have core-attachment structure.

In the third step, purification process is carried out. In the complex set, the complex with just one protein is deleted, and protein complex which has same proteins as others is removed. The protein complex set is obtained.

The pseudo-code of CA-ACO method is described as follows.

3 Experiments and Results

3.1 Experimental Dataset

In this paper, we adopt the PPI data of S.cerevisiae from DIP [24], MIPS [31] and Krogan database [32]. Dynamic PPI networks [3] at 12 timestamps correspond to 12 static PPI subnets. Different subnets have different size, as shown in Table 1.

Table 1. The number of proteins and interactions in each subnet of different PPI networks

Full size table

In this paper, we use the standard known protein complex set, CYC2008 [25], which contains 408 complexes and 1,628 proteins. The biggest cluster has 81 proteins while the smallest cluster has 2 proteins in protein complexes of CYC2008.

3.2 Evaluation Criteria

The precision [24] indicates the proportion of the predicted protein complexes successfully matched by the standard protein complexes in the prediction of the complex. It can be defined as:

$$ precision = \frac{{N_{cp} }}{|P|} $$

(5)

where |P| represents the number of predicted protein complexes, and N_cp indicates that the number of the predicted complexes successfully matched by the known protein complexes.

The recall [24] indicates the proportion of the known protein complexes successfully matched by the predicted complexes in the standard of the complex. It can be defined as:

$$ recall = \frac{{N_{cb} }}{|B|} $$

(6)

where |B| represents the number of known protein complexes, and N_cb indicates that the number of the standard protein complexes successfully matched by the predicted protein complexes.

The f-measure [24] denotes the harmonic mean of precision and recall. It can be defined as:

$$ f - measure = \frac{2 \times precision \times recall}{precision + recall} $$

(7)

In order to further validate the biological significance of the predicted protein complexes, we need to carry out the functional enrichment analysis by using p-value [36] formulated as follows:

$$ p - value = \sum\limits_{i = m}^{n} {\frac{{\left( {\begin{array}{*{20}c} M \\ i \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {N - M} \\ {n - i} \\ \end{array} } \right)}}{{\left( {\begin{array}{*{20}c} {\text{N}} \\ {\text{n}} \\ \end{array} } \right)}}} $$

(8)

where N is the number of protein in the PPI network, M is the number of proteins in a GO term, n is the number of proteins which are annotated with the same GO term. Generally, the smaller the p-value of a protein complex is, the stronger biological significance the complex processes will be.

3.3 Comparison with Other Methods

To evaluate the performance of CA-ACO algorithm, we compare CA-ACO with CMC [9], MCODE [4], CFinder [6], ClusterONE [18], CORE [14], COACH [15], RNSC [25], DPClus [7], MCL [16, 17], ACO-MCL [26], HC-PIN [27], MOEPGA [28] and FOCA [20] in terms of precision, recall and f-measure in the DIP dataset. It is obvious that the precision value of our method is greater than other methods precision value. The recall values of CMC, MOEPGA and FOCA algorithms are superior to our method, which are 0.5900, 0.6000 and 0.6360 respectively. However, the f-measure value of our method is much higher than other typical algorithms’ f-measure value. Our method’s f-measure value is 0.6653. It indicates that the performance of CA-ACO algorithm is optimal. The above analysis can be shown in Fig. 3.

Moreover, we also compare our method with the following prediction methods: CSO [33], ClusterONE [18], COACH [15], CMC [9], HUNTER [34] and MCODE [4] in terms of precision, recall and f-measure in the MIPS and Krogran dataset. As shown in Fig. 4, our method achieves the highest f-measure of 0.6025, recall of 0.5524 and precision of 0.6665 in MIPS dataset. On the Fig. 5, our method achieves the highest f-measure of 0.5844, recall of 0.4347 and precision of 0.8920 in the Krogan dataset.

We use functional enrichment analysis to validate the biological significance of methods. We calculate the p-value of detected complexes whose size are greater than or equal to 3. A complex is considered significant when its p-value is less than 0.01.

Table 2 lists the number and percentage of the identified complexes whose p-value is in the range of <E−10, [E−10, E−5), [E−5, 0.01), >=0.01. Table 2 shows the comparison of the functional enrichment of complexes detected by CA-ACO, MCL, CORE and ClusterONE on DIP, MIPS and Krogan datasets. As shown in Table 2, we can obtain the number of predicted protein complexes by various methods on different datasets. The percentage and the amount of the predicted protein complexes with p-value fall into corresponding intervals. The percentage of complexes whose p-value is greater than 0.01 in predicted complexes by CA-ACO algorithm is the smallest. So, most of the predicted protein complexes by CA-ACO are meaningful. These illustrate that our proposed algorithm is competent to identified significant protein complexes in dynamic PPI networks.

Table 2. Functional enrichment analysis of complexes detected on DIP, MIPS and Krogan dataset

Full size table

4 Conclusions

Many of the current methods predicting protein complexes are running in a static PPI network, which ignoring the dynamic properties of the PPI network and the inherent organization of the protein complex. In this paper, we proposed a novel method for detecting protein complexes in dynamic protein interaction networks, CA-ACO, which is based on the core-attachment structure of protein complexes. We compare the performance of the CA-ACO algorithm with other state-of-the-art methods in DIP, MIPS and Krogan dataset. Experimental results show that CA-ACO algorithm is obviously superior to other methods. In addition, the shift from static PPI networks to dynamic PPI networks is important to analyze the biological significance of complexes identified from PPI networks. In the future, we will further optimize our algorithm to improve the efficiency of algorithm and the effect of biological research.

References

Gavin, A.C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)
Article Google Scholar
Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks. Proc. Nat. Acad. Sci. U.S.A. 100, 12123–12128 (2003)
Article Google Scholar
Wang, J., Peng, X., Peng, W., Wu, F.X.: Dynamic protein interaction network construction and applications. Proteomics 14, 338–352 (2014)
Article Google Scholar
Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2–28 (2003)
Article Google Scholar
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Article Google Scholar
Adamcsek, B., Palla, G., Farkas, I.J., Vicsek, T.: CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 22, 1021–1023 (2006)
Article Google Scholar
Altaf-Ul-Amin, M., Shinbo, Y., Mihara, K., Kurokawa, K., Kanaya, S.: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform. 7, 207–219 (2006)
Article Google Scholar
Li, M., Chen, J., Wang, J., Chen, G.: Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinform. 9, 398–413 (2008)
Article Google Scholar
Liu, G., Wong, L., Chua, H.N.: Complex discovery from weighted PPI networks. Bioinformatics 25, 1891–1897 (2009)
Article Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 7821–7826 (2002)
Article MathSciNet Google Scholar
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76, 175–181 (2000)
Article MathSciNet Google Scholar
Li, M., Wang, J., Chen, J., Pan, Y.: Hierarchical organization of functional modules in weighted protein interaction networks using clustering coefficient. Bioinform. Res. Appl. 5542, 75–86 (2009)
Article Google Scholar
Wang, X., Li, L., Cheng, Y.: An overlapping module identification method in protein-protein interaction networks. BMC Bioinform. 13, S4 (2012)
Article Google Scholar
Leung, H.C., Xiang, Q., Yiu, S.M., Chin, F.Y.: Predicting protein complexes from PPI data: a core-attachment approach. J. Comput. Biol. 16, 133–144 (2009)
Article MathSciNet Google Scholar
Wu, M., Li, X., Kwoh, C.K.: A core-attachment based method to detect protein complexes in PPI networks. BMC Bioinform. 10, 169–184 (2009)
Article Google Scholar
Leal, J.P., Enright, A., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins Struct. Funct. Bioinform. 54, 49–57 (2003)
Article Google Scholar
Van Dongen, S.M.: Graph clustering by flow simulation (2001)
Google Scholar
Nepusz, T., Yu, H., Paccanaro, A.: Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Meth. 9, 471–472 (2012)
Article Google Scholar
Jiang, P., Singh, M.: SPICi: a fast clustering algorithm for large biological networks. Bioinform. 26, 1105–1111 (2010)
Article Google Scholar
Lei, X., Ding, Y., Hamido, F., Zhang, A.: Identification of dynamic protein complexes based on fruit fly optimization algorithm. Knowl. Based Syst. 105, 270–277 (2016)
Article Google Scholar
Leumer, E., Faieta, B.: Diversity and adaption in populations of clustering ants. In: Proceedings of the 3rd International Conference on Simulation of Adaptive Behavior: From Animal to Animals, pp. 499–508. MIT Press, Cambridge (1994)
Google Scholar
Xenarios, L., Salwínski, L., Duan, X.J., Higney, P., Kim, S., Eisenberg, D.: DIP: the database of interaction proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002)
Article Google Scholar
Pu, S., Wong, J., Turner, B., Cho, E., Wodak, S.J.: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 37, 825–831 (2009)
Article Google Scholar
Keretsu, S., Sarmah, R.: Weighted edge based clustering to identify protein complexes in protein–protein interaction networks incorporating gene expression profile. Comput. Biol. Chem. 65, 69–79 (2016)
Article Google Scholar
King, A.D., Pržulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20, 3013–3020 (2004)
Article Google Scholar
Seçkiner, S.U., Eroglu, Y., Emrullah, M., Dereli, T.: Ant colony optimization for continuous functions by using novel pheromone updating. Appl. Math. Comput. 219, 4163–4175 (2013)
MathSciNet MATH Google Scholar
Wang, J., Li, M., Chen, J., Pan, Y.: A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. Comput. Biol. Bioinform. 8, 607–620 (2011)
Google Scholar
Cao, B., Luo, J., Liang, C., Wang, S., Song, D.: MOEPGA: a novel method to detect protein complexes in yeast protein–protein interaction networks based on MultiObjective Evolutionary Programming Genetic Algorithm. Comput. Biol. Chem. 58, 173–181 (2015)
Article Google Scholar
Vlasblom, J., Wodak, S.J.: Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform. 10, 99 (2009)
Article Google Scholar
Dimitrakopoulosa, C., Theofilatosa, K., Pegkasb, A., Likothanassis, S., Mavroudi, S.: Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods. Artif. Intell. Med. 71, 62–69 (2016)
Article Google Scholar
Güldener, U., Münsterkötter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H., et al.: MPact: the MIPS protein interaction resource on Yeast. Nucleic Acids Res. 34, 436–441 (2006)
Article Google Scholar
Krogan, N., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
Article Google Scholar
Zhang, Y., Lin, H., Yang, Z., Wang, J., Li, Y., Xu, B.: Protein complex prediction in large ontology attributed protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 729–741 (2013)
Article Google Scholar
Chin, C., Chen, S., Ho, C., Ko, M., Lin, C.: A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles. BMC Bioinform. 11, S25 (2010)
Article Google Scholar
Wang, J., Peng, X., Li, M., Pan, Y.: Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics 13(2), 301–312 (2013)
Article Google Scholar
Shen, X., Yi, L., Jiang, X., Zhao, Y., Hu, X., He, T., Yang, J.: Neighbor affinity based algorithm for discovering temporal protein complex from dynamic PPI network. Methods 110, 90–96 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Shaanxi Normal University, Xi’an, 710119, China
Jing Liang & Xiujuan Lei
College of Life Science, Shaanxi Normal University, Xi’an, 710119, China
Ling Guo
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Ying Tan

Authors

Jing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xiujuan Lei
View author publications
You can also search for this author in PubMed Google Scholar
Ling Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiujuan Lei .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi
Tongji University, Shanghai, China
Qirong Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, J., Lei, X., Guo, L., Tan, Y. (2018). ACO Based Core-Attachment Method to Detect Protein Complexes in Dynamic PPI Networks. In: Tan, Y., Shi, Y., Tang, Q. (eds) Advances in Swarm Intelligence. ICSI 2018. Lecture Notes in Computer Science(), vol 10941. Springer, Cham. https://doi.org/10.1007/978-3-319-93815-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-93815-8_11
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93814-1
Online ISBN: 978-3-319-93815-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics