1 Introduction

With the completion of human genome project, proteomics has become the frontier of life science and natural science research in the post-genome time. Proteins are the main undertakers of the life activity where they rarely perform a function in an individual way, but do it by interacting with other proteins [1]. All the interactions between proteins in an organism form a kind of molecular networks called protein-protein interaction (PPI) network. The set of proteins that interact with each other to perform a specific biological function in a PPI network is a functional module. Identifying functional modules in PPI networks is an important part of proteomics, which not only helps people understand the mechanism of life at the molecular level, but also has great significance for the diagnosis of diseases and the development of new drugs [2].

The biological experiments, as the original methods of detecting functional modules in PPI networks, often take a lot of time, manpower and material resources, which appears to be quite helpless when facing the explosive growth of PPI data and can not meet the urgent need of life science research in the post-genome time [3]. Fortunately, computational approaches based on clustering obtain a rapid development due to the advantages of short time and low cost, and have become more important means of detecting functional modules in recent years. In general, these computational approaches based on clustering first represent a PPI network by an undirected graph G(V,E), where V is the set of nodes with each node denotes a protein and E is the set of edges with each edge denotes an interaction (link). Then they uses various clustering technologies to divide the network and get corresponding functional modules [2]. So far there have been many kinds of clustering approaches which adopt different ideas and schemes to detect functional modules such as density based clustering approaches [4,5,6], hierarchy based clustering approaches [7, 8], partition based clustering approaches [9,10,11], flow simulation approaches [12, 13], spectral clustering based approaches [14, 15], core attachment approaches [16, 17], and supervised approaches [18, 19]. Besides swarm intelligence and evolutionary algorithms which simulate certain behavioral activities and evolutionary mechanisms of natural organisms have been widely applied into clustering analysis of different fields [20,21,22,23,24,25,26]. It is noted that although functional module detection is a clustering problem in essence, different application fields have their own unique characteristics. It is just for the unique characteristics which make the implementation of swarm intelligence and evolutionary algorithms different. To release potential of swarm intelligence and evolutionary algorithms for functional module detection, some researchers study the individual solution representation and realization of optimization mechanisms. Now swarm intelligence and evolutionary algorithms have also become the new spot in detecting functional modules from PPI networks. For example, Sallim et al. introduced ant colony optimization (ACO) into functional module detection by taking it as the optimization problem of solving traveling salesman problem, which started a new page for detecting functional modules using swarm intelligence and evolutionary based methods [27]. Pizzuti et al. made use of genetic algorithm to mine functional modules (called as GA-FMD) [28]. Ji et al. combined ACO with multi-agent evolution to identify functional modules [29]. Subsequently, they again presented an algorithm based on ant colony clustering for functional module detection (called as ACC-FMD) [30]. Recently, Yang et al. developed an method based on bacterial foraging optimization for detecting functional modules (called as BFO-FMD) [31]. Although many effective swarm intelligence and evolutionary based approaches (as listed above) have been proposed for detecting functional modules, this kind of approaches are still evolving with efforts aimed at detecting functional modules more effectively in face of the explosive growth of PPI data nowadays.

Inspired by explosion of fireworks, Tan and Zhu proposed a new swarm intelligence algorithm called fireworks algorithm (FA) in 2010 [32]. This algorithm takes each firework as a candidate solution in the search space. Each firework would explode and generate a set of sparks. The fireworks and sparks with better fitness values are selected to form the fireworks of the next generation. This explosion process continues until a desired solution is found, or the stopping criterion is met. Although FA does not come early, it have been widely used in many fields due to being able to self-tune local search and global search [33,34,35,36,37,38,39,40]. This good problem-solving capability inspire us to develop its potential to detect functional modules from PPI networks. However, references [41,42,43,44] have found that FA lacks of information communication among firework individuals which easily cause the algorithm to fall into local optima. At the same time, these studies find differential evolution (DE) algorithm can help to enhance information communication among firework individuals. Among them, references [41,42,43] used the hybrid method of FA and DE to solve function optimization problems while reference [44] applied the hybrid method of FA and DE into the design of fuzzy classification system. That is, a hybrid of FA and DE algorithm can enhance their optimization ability on different application areas. In fact, many other studies also agree that a hybrid of different swarm intelligence and evolutionary algorithms can learn from each other and have the advantages that single method do not have [45,46,47,48,49,50,51,52]. Thus, to find a more effective way to detect functional modules, this paper propose a novel hybrid approach of firework algorithm and differential evolution algorithm for functional module detection in PPI networks (called HFADE-FMD). The main contributions of HFADE-FMD are summarized as follows: (1) This paper presents a novel approach which takes advantage of FA and DE for functional module detection. (2) Aimed at the problem of detecting functional modules, HFADE-FMD uses a new individual initialization methods and new implementation method of optimization mechanisms. (3) Systematic experiments have not been only conducted on the Saccharomyces cerevisiae datasets but also on the Homo sapiens datasets. The experimental results show that HFADE-FMD has prominent performance in term of Recall, Sn, PPV, and ACC metrics while performing well on Precision and F-measure metrics. Therefore, this method is highly competent to detect functional modules in PPI networks.

This paper is structured as follows: Section 2 presents the proposed HFADE-FMD algorithm in detail. Section 3 carries out comparative experiments to validate the performance of HFADE-FMD. Finally, Section 4 concludes this paper and outlines some future research directions.

2 The proposed HFADE-FMD algorithm

2.1 Basic idea

The proposed HFADE-FMD algorithm takes functional module detection (i.e., clustering on protein nodes) as an optimization problem. To solve this optimization problem, HFADE-FMD first initialized each firework individual into a candidate functional module partition based on label propagation. Then it optimizes iteratively each firework individual by the explosion operation (the core optimization mechanism of FA), mutation, crossover, and selection operations (three DE strategies). During the optimization process, each firework individual (a firework or a spark) is evaluated according to the modularity density (the fitness function) that is commonly used in the partition of a network into modules [53], and is given below:

$$ f(X) = \sum\limits^{m}_{i = 1} \frac{2l_{i} - \bar{l}_{i}}{n_{i}}, $$
(1)

where X is a candidate module partition represented by a individual solution (a firework or a spark), m is the number of modules, li is the number of edges between nodes in module i, \(\bar {l}_{i}\) is the number of edges between one node in module i and the other node outside it, and ni is the number of nodes in module i. The higher the modularity density, the better the detected functional module partition (i.e., the better the corresponding individual). In the following, the key parts including the explosion operator of FA, and three differential evolution strategies will be explained in detail.

2.2 Initialization of firework individuals

Each firework individual corresponds to a candidate functional module partition, and is encoded as a string of length n where n is the number of protein nodes in a PPI network and the character at each location is the label of the corresponding protein node. Each firework individual is initialized based on label propagation. First, HFADE-FMD numbers all the protein nodes in a PPI network and builds a neighbor order list for each protein node. Then, each firework individual probabilistically selects a neighbor node for each protein node. Now the authors assume a firework individual selects a neighbor node h for a protein node k from its neighbor nodes, then two nodes k and h will be assigned same label according to the following three rules of label propagation:

  1. (1)

    If both protein nodes k and h are not assigned labels yet, then assign them a same and new label.

  2. (2)

    If one of protein nodes k and h is assigned a label, then propagate this label to the other node.

  3. (3)

    If both protein nodes k and h are assigned labels, then do nothing in case of that they have same labels, and otherwise, propagate the label of node k to those nodes that have the same label with node h.

In the above initialization process, the probability that each firework individual selects a neighbor node h for a node k is given as follows:

$$ {P_{k}^{h}} = \frac{S_{kh}}{\sum\limits_{r\in {\Gamma}(k)} S_{kr}}, $$
(2)

where Γ(k) is the neighbor node set of node k, Skh is the similarity between nodes k and h. Here, Skh combines the structural similarity and functional similarity between nodes k and h, and is defined as below:

$$ S_{kh}=\frac{s_{kh}+f_{kh}}{2}, $$
(3)

where skh and fkh are the structural similarity and functional similarity between nodes k and h, respectively.

Given two protein nodes k and h, the structural similarity formula is given as follows [54]:

$$ s_{kh}=\frac{|{\Gamma}(k)\bigcap{\Gamma}(h)|}{\sqrt{|{\Gamma}(k)||{\Gamma}(h)|}}, $$
(4)

where |Γ(k)| is the size of the set Γ(k).

Based on the annotation information of gene ontology (GO), the function similarity is expressed as follow [55]:

$$ f_{kh} = \frac{|g^{k} \bigcap g^{h}|}{|g^{k} \bigcup g^{h}|}, $$
(5)

where gk and gh are GO term sets of nodes k and h, respectively.

To clearly show the solution representation and initialization process, Fig. 1 illustrates an example in a simplified PPI network with 8 nodes. Figure 1a is a PPI network containing 8 nodes marked from 1 to 8. Figure 1b gives the neighbor order lists of 8 nodes. Figure 1c shows a firework individual solution obtained by the above label propagation process. Figure 1d is the functional modules corresponding to the firework individual shown in Fig. 1c.

Fig. 1
figure 1

Firework individual representation and initialization process: a PPI network, b neighbor order lists, c firework individual solution, and d a functional module partition

2.3 Explosion operator

The explosion operator is the core mechanism of FA. Each firework explodes and produces a certain number of sparks. One explosion can be viewed as a search around a firework. The explosion operator is able to self-tune local search and global search. The better the firework individual, the more the sparks and the smaller the amplitude of explosion. The more sparks can make a careful search around the corresponding firework individual in a smaller range, which reflects the local search process. The worse the firework individual, the less the sparks and the bigger the amplitude of explosion. The less sparks will make a extensive exploration in a bigger range, which embodies the global search process. In general, the number of sparks and the amplitude of explosion for firework individual Xi are respectively defined as below:

$$ B_{i}=\check{B}\times \frac{f(X_{i})-f_{\min}+\varepsilon}{\sum\limits_{i=1}^{N} (f(X_{i})-f_{\min})+\varepsilon}, $$
(6)
$$ R_{i}=\check{R}\times \frac{f(X_{i})-f_{\max}+\varepsilon}{\sum\limits_{i=1}^{N} (f(X_{i})-f_{\max})+\varepsilon}, $$
(7)

where N is the size of the firework population, \(f_{\min \limits }\) and \(f_{\max \limits }\) are respectively the highest and lowest modularity density among all the firework individuals, \(\check {R}\) and \(\check {B}\) are two parameters that respectively control the explosion amplitude and the number of sparks generated, and ε is a small constant to avoid zero-division-error.

For each firework individual Xi, the method of generating a spark individual Xsp is described as follows. First, produce a random number q uniformly distributed in (0,1) for each protein node k. If this random number is not less than sigmoid(Ri), the label of node k in the spark individual Xsp is the same as that in the firework individual Xi. Otherwise, the label of node k in the spark individual Xsp is the same as that of the neighbor node which has the largest influence on node k.

The influence that a neighbor node h on node k is defined as follows:

$$ {I_{k}^{h}}=\alpha \times \frac{num_{k}}{|{\Gamma}(k)|} + (1-\alpha)\times \frac{S_{kh}}{\sum\limits_{j\in{\Gamma}(k)} S_{kj}}, $$
(8)

where numk is the number of neighbor nodes which have the same label with node k, α is a parameter that controls the importance of similarity nodes and nodes with the same labels. From this definition, the explosion operation not only considers the number of nodes with same labels, but also the similarity between nodes, which makes full use of interrelationships between nodes in PPI networks.

After each firework individual Xi explodes and produces Bi spark individuals at each iteration, all the firework individuals and the spark individuals form a big population. The best individual (whether firework or spark) with the highest modularity density is chosen to be a firework individual of the next generation. Then N − 1 firework individuals are selected from the remaining big population using the roulette strategy. In this way, HFADE-FMD realizes the explosion operator and produces a new generation of firework population.

2.4 Differential evolution operators

Although the explosion operator combines local search and global search, there is little communication and cooperation among firework individuals, which is very detrimental to find a global optimal functional module partition. Introduced by Storn and Price, DE algorithm has much interaction among individual solutions [56]. Thus, to compensate the disadvantage of explosion operator in FA, three DE operators are used to help firework individuals to search better module partitions. The specific implementation of three DE strategies are as follows.

2.4.1 Mutation operator

According to the mutation probability Pm, each firework individual Xi mutates into a variation individual Vi by the random single point variation in HFADE-FMD. The concrete steps are as follows. First, randomly select a node k for firework individual Xi. Then, probabilistically choose a neighbor node h from neighbor nodes of node k according to (2), and pass the label of node h to node k based on the mutation probability Pm. After the above process is repeated n times (n is the number of nodes in the PPI network), the mutation operator is completed. Figure 2 shows the method of updating the label for one node in the mutation operator. In Fig. 2, the firework individual Xi chooses neighbor node 3 for node 4 according to probability formula (2). The label of node 3 is 2. If a random number in (0,1) is smaller than the mutation probability Pm, pass the label 2 to node 4.

Fig. 2
figure 2

The method of updating the label for one node in mutation operator

2.4.2 Crossover operator

HFADE-FMD makes use of one-way mergence crossover operator to produce a trial individual Ui for the variation individual Vi of firework individual Xi. The detailed procedures are given below. Firstly, randomly select a firework individual as one source individual Y1 and take Vi as the other source individual. Secondly, randomly choose a crossover position k. Let \(D_{Y_{1}}(k)\) denotes the node set of the functional module that node k belongs to in source individual Y1, and \(D_{Y_{2}}(k)\) represents the node set of the functional module that node k belongs to in source individual Y2. At last, change the labels of nodes that are in \(D_{Y_{1}}(k)\) and do not belong to \(D_{Y_{2}}(k)\) into the label of node k in source individual Y2, and generate a trail individual Ui.

To be more clear, Fig. 3 gives the sketch map of one-way mergence crossover operator. As shown in Fig. 3, randomly select a crossover position on node 7. \(D_{Y_{1}}(k)\), i.e., the node set of functional module that node 7 belongs to in source individual Y1, is {7,8}. \(D_{Y_{2}}(k)\), i.e., the node set of functional module that node 7 belongs to in source individual Y2, is {5,7}. The node 8 is in \(D_{Y_{1}}(k)\) and do not belong to \(D_{Y_{2}}(k)\). So change the label of node 8 into the label of node 7 in source individual Y2, and obtain a trial individual Ui.

Fig. 3
figure 3

The sketch map of the crossover operator

2.4.3 Selection operator

After performing the crossover operator, HFADE-FMD makes evaluation on firework individual Xi and its trial individual Ui, and keeps the one with higher modularity density according to the following formula:

$$ X_{i} = \left\{ \begin{array}{ll} U_{i}, & if~f(U_{i}>f(X_{i})\\ X_{i}, & otherwise \end{array}. \right. $$
(9)

2.5 Algorithm description

According to the previously detailed introduction, the pseudocode of HFADE-FMD is given in Algorithm 1. This algorithm first starts with an initial firework population, each of which represents a module partition and is constructed based on label propagation. Next, it iteratively executes explosion, mutation, crossover, and selection operators to search better modularity partitions with higher modularity density.

Based on the description of Algorithm 1, the complexity of HFADE-FMD can be analyzed simply as follows. Let the maximum number of the node degree be \(d_{\max \limits }\). The time complexity of the initialization process is \(O(N \times n \times d_{\max \limits })\). The time complexity of the explosion operator is \(O(T \times N \times n \times d_{\max \limits }\times \check {B}\)). The time complexity of three differential evolution operators is \(O(T \times N \times d_{\max \limits }\times (N+d_{\max \limits }))\). Therefore, the time complexity of the proposed HFADE-FMD algorithm is \(O(T\times N \times d_{\max \limits }\times (\check {B}\times n +n+d_{\max \limits }))\).

figure d

3 Evaluation

In this section, extensive experiments will be taken to evaluate the performance of the proposed HFADE-FMD algorithm. The experimental platform is a PC with Intel Core i5-6500 3.20GHz CPU, 12.0GB RAM and Windows 8.1, and HFADE-FMD is implemented by Java language.

3.1 Datasets

Four public PPI datasets including two Saccharomyces cerevisiae datasets and two Homo sapiens datasets are used in the experiments. Table 1 shows a summary of the datasets, where the 3th and 4th columns provide the number of proteins and the number of interactions respectively. To evaluate the functional modules discovered by a computational method, two set of gold standard functional modules for each species are used as the benchmark. Two benchmarks of Saccharomyces cerevisiae are SGD [57] and CYC2008 [58], while two benchmarks of Homo sapiens are PCDq [59] and CORUM [60]. Table 2 gives a summary of the four benchmarks, where the 3th and 4th columns provide the number of functional modules and the web links respectively.

Table 1 Datasets used in experiments
Table 2 Set of gold standard functional modules used in experiments

3.2 Evaluation metrics

Two types of popular evaluation metrics are used to evaluate the quality of the detected modules [3].

3.2.1 Precision, Recall, F-measure

Precision, Recall, and F-measure are three common evaluation metrics in information retrieval and machine learning. Using the three metrics, it is necessary to define how well a detected module h = (Vh,Eh) matches a standard module s = (Vs,Es). Many researches use a Neighborhood Affinity score (NA) to assess the matching degree, which is given as follows:

$$ NA(h,s) = \frac{|V_{h} \bigcap V_{s}|^{2}}{|V_{h}| \times |V_{s}|}. $$
(10)

If NA(h,s) ≥ ω, two modules h and s are considered to be matched (generally, ω = 0.2). Let H be the set of detected modules and S be the set of gold standard functional modules. The number of the detected modules in H which at least matches one standard module in S is denoted by Nch = |{h|hH,∃sS,NA(h,s) ≥ ω}|, while the number of the standard modules in S which at least matches one detected module in H is indicated by Ncs = |{s|sS,∃hH,NA(h,s) ≥ ω}|. Thus, Precision and Recall is given below:

$$ Precision = \frac{N_{ch}}{|H|}, $$
(11)

and

$$ Recall = \frac{N_{cs}}{|S|}. $$
(12)

F-measure is a harmonic mean of Precision and Recall. So it can be used to evaluate the overall performance, and is expressed as:

$$ F-measure = \frac{2 \times Precision \times Recall}{Precision + Recall}. $$
(13)

3.2.2 Sensitivity, positive predictive value and accuracy

Sensitivity (Sn), Positive Predictive Value (PPV) and Accuracy (Acc) are three usual metrics to evaluate the performance of a detection method. Let Tij be the number of the common proteins in i th standard module and j th detected module, then Sn and PPV are defined as:

$$ Sn = \frac{\sum\limits_{i=1}^{|S|}\max\limits_{j}\{T_{ij}\}}{\sum\limits_{i=1}^{|S|} N_{i}}, $$
(14)

and

$$ PPV = \frac{\sum\limits_{j=1}^{|H|}\max\limits_{i}\{T_{ij}\}}{\sum\limits_{j=1}^{|H|}T_{\cdot j}}, $$
(15)

where Ni is the number of the proteins in the i th standard module, and \(T_{\cdot j} = \sum \limits _{i}^{|S|}T_{ij}\).

In general, high Sn means that the detected results have a good coverage of the proteins in the gold standard modules, while high PPV represents the detected modules are likely to be the standard modules. Acc is a comprehensive metric, and is defined as the geometric mean of Sn and PPV:

$$ Acc = (Sn \times PPV)^{1/2}. $$
(16)

3.3 Effects of DE strategies

In this section, the authors investigated the effects of DE strategies by comparing HFADE-FMD with FA-FMD (without using the DE strategies). In the experiments, the authors chose a set of parameters after doing some preliminary experimentations, and set N = 40, T = 100, \(\check {R}=90\), \(\check {B}=50\), α = 0.4, Pm = 0.1, Pc = 0.8. Table 3 provides the basic information of the detection results for the two algorithms on the four datasets. For each algorithm, the authors have listed the number of detected modules (Number of modules), the average number of proteins in each module (Average size of modules), the number of detected modules which match at least one gold standard module (Nch) and the number of gold standard modules that match at least one detected module (Ncs). Taking HFADE-FMD on Gavin dataset as an example, it has detected 194 modules, and the average size of the 194 detected modules is about nine proteins. Among the 194 detected modules, there are 86 modules matching 134 standard modules using SGD as benchmark, while there are 101 detected modules matching 131 standard modules using CYC2008 as benchmark.

Table 3 The basic results of FA-FMD and HFADE-FMD on four datasets

Figures 456, and 7 show the comparison results of HFADE-FMD and FA-FMD on the four datasets in terms of various evaluation metrics including Precision, Recall, F-measure, Sensitivity, PPV, and Accuracy. From these figures, it is easy to see that no matter which benchmark is used, HFADE-FMD performs better than FA-FMD in terms of nearly all metrics on three out of four datasets (Gavin, DIPCore and DIPFull). As for the one remaining dataset (KroganCore), HFADE-FMD and FA-FMD play equally well using SGD as benchmark, and they both get the best results on three metrics. Only in the case of using CYC2008 as benchmark for KroganCore dataset, HFADE-FMD is slightly worse than FA-FMD. HFADE-FMD obtains the best results on two out of six metrics while FA-FMD gets the best results on four out of six metrics. On the whole, the above experimental results fully proved that the DE strategies are helpful for searching better functional module partitions.

Fig. 4
figure 4

Comparative results of HFADE-FMD and FA-FMD in terms of various evaluation metrics on Gavin dataset

Fig. 5
figure 5

Comparative results of HFADE-FMD and FA-FMD in terms of various evaluation metrics on KroganCore dataset

Fig. 6
figure 6

Comparative results of HFADE-FMD and FA-FMD in terms of various evaluation metrics on DIPCore dataset

Fig. 7
figure 7

Comparative results of HFADE-FMD and FA-FMD in terms of various evaluation metrics on DIPFull dataset

3.4 Comparing with competitive methods

To demonstrate the performance of HFADE-FMD, the authors compared it with ten competitive methods: MCODE [4], CFinder [5], DCAFP [6], NeMo [7], FAG-EC [9], MTGO [11], GA-PPI [28], ClusterEPs [18], ACC-FMD [30] and BFO-FMD [31]. Among them, GA-PPI, ACC-FMD and BFO-FMD are three swarm intelligence and evolution based algorithms. MCODE, CFinder and DCAFP are three density-based algorithms. According to the node’s neighbor local density, MCODE fist picks out seed nodes for initial clusters, and then further augments these clusters to form the final clusters. CFinder first identifies the k-cliques using clique percolation, and then combines the adjacent k-cliques to get the functional modules. DCAFP integrates functional preferences of proteins and the graph topology of PPI network for detecting functional modules. NeMO combines a unique neighbor sharing score with the hierarchical agglomerative clustering to identify functional modules. FAG-EC is a fast agglomerate algorithm based on the edge clustering coefficients. MTGO directly exploits GO terms during the module assembling process, and labels each module with its best fit Go term. ClusterEPs is different from the above methods and is a supervised technology. Its key idea is to discover emerging patterns, which can clearly distinguish true complexes from random subgraphs in a PPI network. In the experiments, GA-PPI, ACC-FMD and BFO-FMD adopted the same parameters as [28, 30] and [31], respectively. For MCODE, CFinder, DCAFP, NeMo, FAG-EC, MTGO, and ClsterEPs, the authors obtained their software implementations, and used the default values for their parameters. The parameters of the proposed HFADE-FMD algorithm is given in the last section.

Table 4 provides the basic information of the detection results for the eleven algorithms on the four datasets. From the table, it is clear that whichever set of gold standard functional modules using, MCODE always obtains the least number of modules (Number of modules), and ACC-FMD and BFO-FMD get the biggest module and the smallest module (Average Size of modules), respectively. HFADE-FMD usually generates more and smaller modules. Besides, the authors find that the number of gold standard modules matching at least one detected module (Ncs) is more than all the other algorithms in most cases, and the number of detected module matching at least one gold standard module (Nch) stands the top two ranks in most situations.

Table 4 The basic results of eight algorithms on four datasets

Figures 8910, and 11 show the overall comparison results of eleven algorithms on four datasets in terms of six evaluation metrics including Precision, Recall, F-measure, Sensitivity, PPV, and Accuracy. For the Gavin dataset shown in Fig. 8, it is can be seen that according to the SGD benchmark, HFADE-FMD ranks second in terms of Recall, F-measure, and PPV, takes third places for Sn and ACC, and is in the fifth place on the Precision metric. Using CYC2008 as benchmark, HFADE-FMD achieves the best results on the Sn and ACC statistics, respectively takes the third and fourth places on the Recall and PPV statistics, and obtains the fourth positions on the Precision and F-measure statistics.

Fig. 8
figure 8

Comparative results of eleven methods in terms of various evaluation metrics on Gavin dataset

Fig. 9
figure 9

Comparative results of eleven methods in terms of various evaluation metrics on KroganCore dataset

Fig. 10
figure 10

Comparative results of eleven methods in terms of various evaluation metrics on DIPCore dataset

Fig. 11
figure 11

Comparative results of eleven methods in terms of various evaluation metrics on DIPFull dataset

On KroganCore dataset in Fig. 9, One can easily finds that when using SGD as benchmark, HFADE-FMD has the best performance in terms of PPV and ACC, has the second-highest top level in term of Recall, F-measure and Sn, and ranks seventh with respect to the precision metric. Using CYC2008 as benchmark, HFADE-FMD is in the first place for Sn and ACC, and respectively ranks second, third, sixth, eighth place in terms of Recall, F-measure, PPV and Precision.

Figure 10 displays the results of the DIPCore dataset. Taking PCDq as the benchmark, HFADE-FMD is first in terms of three out of six metrics: Recall, Sn and ACC, receives the second places with respect to PPV and F-measure measures, and takes fifth place for the Precision metric. Taking CORUM as benchmark, HFADE-FMD ranks number one in regard to ACC metric, ranks second in terms of Recall, Sn and PPV, and respectively takes the third and seventh places on F-measure and Precision metrics.

As for the remaining one dataset in Fig. 11, using PCDq as benchmark, HFADE-FMD stands first on half of metrics: Recall, F-measure and Sn, gets the next best results on PPV and ACC statistics, and ranks ninth with respect to the Precision metric. Using CORUM as benchmark, HFADE-FMD obtains the best performance on ACC metric, receives the second place in terms of Recall, Sn and PPV, and respectively ranks third and eighth in regard to the F-measure and Precision metrics.

According to the above analysis, no matter what PPI datasets and benchmark being used, HFADE-FMD always obtains the superior results in most evaluation metrics. Only on the Precision metric, HFADE-FMD usually does not perform well. However, as previously mentioned, the number of detected modules matching at least one gold standard module (Nch) stands the top two ranks in most situations. So the reason for the bad performance with respect to Precision metric is because HFADE-FMD usually generates more functional modules according to formula (11). Besides, regardless of which PPI datasets and benchmark being used, HFADE-FMD plays better than the other three swarm intelligence and evolution based algorithms (GA-PPI, ACC-FMD and BFO-FMD) on overwhelming majority of metrics. Therefore, these experimental results fully demonstrate that HFADE-FMD is a very promising swarm intelligence and evolution based approach for functional module detection in PPI networks.

4 Conclusions

The identification of functional modules in PPI networks is important for biological knowledge discovery since many important biological processes in the cell are carried out through the formation of protein modules. Nowadays, swarm intelligence and evolutionary based approaches have been a kind of effective way to detect functional modules. In this paper, a novel hybrid approach of FA and differential evolution strategies is proposed for detecting functional modules in PPI networks (called HFADE-FMD). HFADE-FMD uses a new individual initialization method and new implementation methods of optimization mechanisms. To demonstrate the performance of HFADE-FMD, the authors have carried out a series of experiments on four datasets in terms of various evaluation metrics. The empirical results illustrate that HFADE-FMD achieves prominent performance with respective to Recall, Sn, PPV, and ACC while performing well on other metrics. Therefore, the proposed HFADE-FMD algorithm can effectively detect functional modules from PPI networks and can be considered as a complement to help biologists to discover some biological insights.

In spite of HFADE-FMD having superior performance, there are the following two serious limitations that require further study in the future: (1) HFADE-FMD as a swarm intelligence algorithm is an iterative algorithm based on population evolution, which is time consuming. Due to parallel mechanisms are effective in reducing time complexity, thus the systematical research of the parallel algorithms will be a major task to enhance the efficiency of detecting functional modules. (2) At present most studies including this paper consider PPI networks as static graphs and overlook the inherent dynamics of PPI networks. So it it very essential to make clustering analysis on dynamics PPI networks.