A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm

Khanteymoori, Ali Reza; Olyaee, Mohammad-H.; Abbaszadeh, Omid; Valian, Maryam

doi:10.1007/s00500-017-2557-z

A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm

Methodologies and Application
Published: 06 April 2017

Volume 22, pages 3049–3060, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm

Download PDF

Ali Reza Khanteymoori¹,
Mohammad-H. Olyaee¹,
Omid Abbaszadeh¹ &
…
Maryam Valian²

696 Accesses
15 Citations
Explore all metrics

Abstract

Bayesian networks (BNs) are widely used as one of the most effective models in bioinformatics, artificial intelligence, text analysis, medical diagnosis, etc. Learning the structure of BNs from data can be viewed as an optimization problem and is proved that this problem is NP-hard. Therefore, heuristic methods can be used as powerful tools to find high-quality networks. In this paper, an interesting approach which is based on Breeding Swarm has been used to learn BNs. Breeding Swarm is a hybrid GA/PSO which enable us to benefits the strengths of particle swarm optimization with genetic algorithms. In order to assess the proposed method, several real-world and benchmark applications are used. Results show that our method is a clear improvement on genetic algorithm and particle swarm optimization.

A Memetic Approach to Bayesian Network Structure Learning

Bayesian network hybrid learning using an elite-guided genetic algorithm

Article 29 January 2018

A comparative study on swarm intelligence for structure learning of Bayesian networks

Article 27 June 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, Bayesian networks (BNs) become a considerable probabilistic models in the field of artificial intelligence and widely applied in many areas such as computational biology, medical diagnosis, vision recognition, data mining, information retrieval and so on (Ahmad et al. 2012; Hill 2012; Ji 2011; Li 2011; Murphy and Mian 1999; Yang et al. 2010; Zou and Conzen 2005). The BN is a graphical model which is represented as a directed acyclic graph (DAG) where nodes correspond to variables and arcs denote dependences among variables which encodes the joint probability distribution. The capability of representing the uncertainty of real problems, comprehensibility of graphical model, the strong mathematical bases of formalism, inference the network and calculating the value of unobserved variables based on observed variables are some of the main advantages of using BNs (Gámez et al. 2011; Ji 2011).

As the popularity of BNs increased, the building of BNs has been considered by researchers. The creation of BNs can be divided into two categories: Structure learning and parameter learning. The former refers to earning the topology of network based on the collected data, and the later refers to calculating the conditional probabilities of a given structure.

The structure learning can be done as a manual task by an expert but this is so complicated and time-consuming and moreover sometimes it might be inaccessible. On the other hand, it has been proved that structure learning is an NP-hard problem (Chickering 1996). Therefore, the structure learning based on the available data is increasingly becoming a vital factor in using BNs.

Generally, there are two main approaches for structural learning: dependency-based and score-based approaches (Gámez et al. 2011). The first contains methods which estimate dependencies among variables and the second refers to approaches which try to find a network structure that is most fit with training data. The score-based method can be seen as optimization problem because there is an evaluation function which gives score to candid structures. The goal is to finding the structure which maximizes the function measure (Khanteymoori et al. 2011). Although dependency-based approaches are relatively simple for implementations, the assessment of dependencies is complicated and unreliable when the number of variables is increased (Ji et al. 2013). Therefore, score-based methods are attracting considerable interest due to learning structure.

Since this problem is NP-hard, Several authors have attempted to solve the problem based on heuristic methods which can be refer to ant colony optimization (Ji 2011), asexual reproduction optimization (ARO) (Khanteymoori et al. 2011), particle swarm optimization (Cowie et al. 2007), artificial bee colony algorithm (Ji et al. 2013),genetic algorithm (You 2001; Larrañaga 1996; Wong et al. 1999) and so on (Alonso-Barba and Puerta 2011; Gámez et al. 2011; Larrañaga 2013; Ziegler 2008).

The aim of this study is to provide a suitable method for structure learning by using a hybrid GA/PSO algorithm called Breeding Swarm Particle Swarm Optimization (BSPSO). In this method, a portion of population which is called breeding rate is reproduced according to crossover operator in genetic algorithm (Mattew and Terence 2006). The combination of these algorithms allows to increase the chance of reproducing particles with higher scores, while the cost of computations is not changed significantly. Furthermore, by using mutation operator, the diversity of particles will be saved and will prevent from early convergence. Therefore, the search strategy will be more robustness against local maxima.

This paper is organized as follows: In Sects. 2 and 3, some preliminaries and basics concepts about BNs and structure learning are reviewed. In Sect. 4, our method is described in detail. Results are shown in Sect. 5, and our conclusions are drawn in the final section.

2 Bayesian network

BNs or causal networks represent probability distribution, correlations and relations between random variables in real-world problems. We can suppose BN as a tuple $\hbox {BN}({S},\theta )$ in which S is structure which demonstrates the random variables and the relations between them by a directed acyclic graph (DAG). In a DAG, random variables and relations are nodes and edges, respectively. $\theta $ is a set of local parameters which denotes the conditional probability distributions for the values of each variable according to the structure S. These measures are saved as a table for each variable which is called conditional probability table (CPT).

Figure 1 shows a simple BN with five variables and theirs CPTs which all variables are binary. The set of variables contains Cancer, Smoker, Pollution, X-ray and Dyspnea.

The information about the variables and the dependencies between them can be gained easily from the topology of a BN.

Suppose a BN with a set of random variables $X_{1}, \ldots ,X_{n}$. The immediate predecessors of variable $x_{i}$ are referred to as its parents, with values parents ($x_{i}$). The joint probability distribution is factored as:

$$\begin{aligned}&P\left( X_{1}=x_{1},\ldots ,X_{n}=x_{n} \right) \nonumber \\&\quad =\prod \limits _{i=1}^{n} {p(X_{i}= x_{i}\vert \mathrm{parents}(x_{i}))} \end{aligned}$$

(1)

As mentioned earlier, in order to design a BN, two phases must be accomplished. First, search for suitable structure and next, compute CPTs of variables based on the found structure such that the output approximates distribution of the given set of samples. The both of these phases are important and are called structure learning and parameter learning, respectively.

The most popular parameter learning method is the expectation maximization (EM) algorithm (Chickering et al. 1995). This paper concentrates on the structure learning and presents a new approach to find an appropriate BN structure which is matched with the given dataset.

3 Structure learning

The structure learning for a BN can be stated as follows: X is a set of random variables $X_{1},\ldots ,X_{n}$. Each variable $X_{i}$ has a special domain which takes values $val(X_{i})$. Given a training set with M cases, $D=\{d[1],\ldots , d[M]\}$ where d[i] is an instance of domain variables $val\left( X_{1},\ldots ,X_{n} \right) $. The learning goal is to find a network structure G that is a good predictor for the data.

It has now been demonstrated that the number of possible structures for BN with n nodes can be given by the recurrence relation as follows (Robinson 1977):

$$\begin{aligned} r\left( n \right) =\sum \limits _{i=1}^n {\left( -1 \right) ^{i+1}\left( {}_{i}^{n} \right) 2^{i\left( n-i \right) }r\left( n-i \right) \in n^{2^{o(n)}}} \end{aligned}$$

(2)

The above formula obviously shows that the size of search space is exponential and it is impossible to use an exhaustive search to find the best structure. As already mentioned, dependency based and score based are two main structure learning methods. In this article, we are only interested in the score-based approach. Also, the reasons that why score-based approach are more popular than dependency-based approach have been declared.

Score-based approaches define structure learning as an optimization problem. In order to find the best structure that fits with training data, use a scoring criteria and a search algorithm. several scoring criteria have been proposed so far that can be denoted to Bayesian information criteria (BIC), minimum description length (MDL) score and Bayesian Dirichlet (BDe) score (Heckerman et al. 1995). Since the search space is exponential and the search problem is NP-hard, using the heuristic methods is popular.

Among the heuristic methods, genetic algorithm (GA) has been widely noticed. However, one of the major drawbacks to exploiting this algorithm is high computational complexity especially when the number of random variables are increased. Using PSO can resolve the challenge of computational complexity, but this method suffers from premature convergence when diversity of the population is low. In the proposed method, in order to exploit the advantages of GA and PSO altogether, a hybrid algorithm called Breeding Swarms PSO (BSPSO) algorithm is used.

4 Proposed method

The input is a dataset which is a $m\times n$ matrix M. m is the number of records, and n is the number of variables. The output is a BN with n nodes that is completely fittest with data. In the first step, some information in matrix M is modified with preprocessing. Next, through the use of BS, we were able to find the best BN structure. Figure 2 shows the diagram of the proposed method. Moreover, some important steps will be introduced in details as follows.

4.1 Preprocessing

In the initial stage of the preprocessing, continuous variables will be discretized. For example, suppose age as a continuous variable. It can be discretized to less than 20, between 20 and 45 and more than 45. Next, if there is a field with missing values, the measure ‘U’ will be added to it and its variable too. Finally, all the variables have discrete domains without missing values.

4.2 Breeding Swarms algorithm

Breeding Swarms or GA/PSO is a hybrid algorithm which has been proposed by Settles (Mattew and Terence 2006).This algorithm combines the updating rules of PSO including particle velocity and particle position with the operators of GA. The reproduction between the particles increases the probability of making more desirable particles in search space. Since only the portion of population is constructed by GA, the computational complexity is improved. Moreover, the diversity of population prevents from the early convergence. $\psi $ is the only additional parameter that is called breeding rate and determines the portion of combination . Its measure depends on the problem criteria and is set experimentally. The algorithm is summarized in Fig. 3. Suppose N is size of the population. The position of $(N-1)\psi $ particles is updated based on PSO algorithm and construct $(N-1)\psi $ particles of next generation. The remains are made by GA from all the particles of current population.

4.2.1 Particle formation

Suppose there are N random variables. Since each BN structure is a directed acyclic graph, it can be represented by an adjacency matrix, therefore, each particle can be considered as a $N\times N$ matrix. Figure 4 is an example and displays the corresponding particle of a BN.

4.2.2 Initial population

A population is composed of N particles and can be randomly generated. Each cyclic graph with n nodes can be a candidate structure for BN.

4.2.3 Selection operator

Binary Tournament selection is used to generate $N\psi $ particles of the next generation. For selection of each parent, two particles are randomly selected from the whole of population. Then, the particle with better score is selected. By having this strategy, weak particles have a considerable chance for survival.

4.2.4 Recombination operator (crossover)

In the proposed method each chromosome is coded in matrix form. Therefore, single point crossover is adopted with probability $P_{c}$ to determine whether the operation will be performed or not. The operator randomly selects a crossover point, and the bit strings after that point are swapped between the two parents.

This recombination may be lead to unjustified offspring and make cycles in the corresponding graphs. Therefore, checking the justifiability of the two produced offspring and correcting them is indispensable.

As it is illustrated in Fig. 5, the cycles in graphs can be classified in to three groups. Amending the unjustified offspring is done according to the type of cycles:

Cycle with length one: when there is a cycle from one node to itself. In this case, some entries of adjacency matrix in the main diagonal are one. Since both of the parents are justified and the main diagonal of both of them is zero, after applying the crossover operator, there is not any cycle with length one in new offspring.
Cycle with length two: as it is shown in Fig. 5, there is a bi-directed edge between two nodes. In this case, there are two entries with measure one which are symmetric with respect to the main diagonal. In this case, in order to improve the efficiency of the proposed method, the score of each edge is computed individually based on the Bayesian criteria and the edge that has minimum score against others will be deleted.
Cycle with length three or more: the Warshalls algorithm with the order $O(n^{3})$ is used to find cycles with length more than two. Although this algorithm has been proposed to find the shortest path between two nodes, it can be used to find the shortest path between one node and itself. In the proposed method, among the edges which construct the cycle, the edges with least score will be deleted.

4.2.5 Mutation operator

This operator randomly inserts or deletes an edge into the structure (graph) or selects and edge and reverses its direction. The insertion of new edge may lead to creating cycle in mutated chromosome. In this case, the inserted edge will be deleted.

4.2.6 Update the position of particle

This operator computes the new position for each particle by applying its velocity (V) to its current position. Each point in search space is a Directed Acyclic Graph (DAG) and is represented in matrix form. Hence, it is appropriate that its velocity is denoted in the form of a matrix $n\times n$ which n is the number of random variables. Let V be a set of velocities $V=\{v_{1},v_{2},\ldots ,v_{m}$ that each $v_{ij}^{m}\in \{0,1,-1\}$. m is the size of population. If $v_{ij}=1$, it means an edge is added from i to j. If $v_{ij}=-1$, it means the edge between i and j is deleted, and $v_{ij}=0$ means that the edge between i and j is not changed. $v_{i}$ is computed based on the current velocity, position of the best particle and best position that the ith particle has been seen up to now.

The insertion of an edge may lead to a cycle in corresponding graph of particles. In this condition, in order to decrease the computational complexity, the inserted edge will be removed. The Fig. 6 illustrates an example of applying velocity to the current position of a particle.

The proposed method is summarized in Fig. 7.

5 Experimental results

In order to evaluate the proposed method, several real-world problems are used and the validity of this method is demonstrated through computer simulation.

(A)
Datasets In this experiment, seven benchmarks are used which are described as bellows:
1. (1)
  ASIA network: This network has been represented by Lauritzen and Spiegelhalter (Lauritzen and Spiegelhalter 1988)which is used as a basic model for analyzing the performance of structure learning algorithms. This network composed of 8 nodes and 8 edges which is considered as a small network.
2. (2)
  Car diagnosis problems: This problem has been introduced by Norsys (NorsysSoftwareCorp) and is a simple example of a belief network. The reason why a car does not move is presumed based on spark plugs, headlights, main fuse, and so on. 15 nodes as well as 17 edges are used in this BN. Moreover, all nodes of the network take discrete values. Some of them can take on three discrete states, and the others can take on two states. A database of two thousand cases is utilized to train the BN. The database was generated from Netica tool.
3. (3)
  Child network
  
  This network represents a diagnosis model of heart diseases for newborn babies which has composed of 20 nodes and 25 edges (Spiegelhalter and Cowell 1992).
4. (4)
  Mildew network
  
  This network is designed for determining the amount of fungicides against mildew of wheat (Jensen and Jensen 1996)and consists of 35 nodes and 46 edges.
5. (5)
  Insurance network
  
  Insurance is a medium network with 27 nodes and 52 edges which has been proposed for estimating the risks of car insurance(Binder 1997).
6. (6)
  ALARM network: A logical alarm reduction mechanism (ALARM) is a medical diagnostic system for patient monitoring. It is a complex belief network with 37 nodes and 46 edges (Beinlich 1989). As in the previous example, the database is generated from Netica tool.
7. (7)
  Hailfinder network
  
  The network has been represented for forecasting stormy weather in Northeastern Colorado (Abramson 1996). Hailfinder composed of 56 nodes and 66 edges.
  
  It is be noted that all of the introduced benchmark networks are available in http://www.bnlearn.com/bnrepository/. The goal of the structure learning for BN is to obtain the structures which are close to the desired networks.
(B)
Evaluation criteria

Quantitative and Qualitative criteria are two main standards which are used to analyze the performance of the output results. The former refers to Bayesian Dirchlet metric which has been used in K2 algorithm (Cheng et al. 1997; Cowell 1998) and its score is between 0 and $-\infty $. The later compares the output with desired structure and is based on the differences evaluates the output structure. Structure learning factor (SLF) and topology learning factor (TLF) (Colace et al. 2010) are two popular qualitative criteria which are defined as follows:
$$\begin{aligned} \left\{ {\begin{array}{l} \text {SLF}=\frac{{\text {TC}}}{{\text {TE}}} \\ \text {TLF}=\frac{{\text {TC}}+{\text {IE}}}{{\text {TE}}}\\ \end{array}} \right. \end{aligned}$$
(3)
In the above formula, TC is the number of edges which are correctly determined, TE is the number of edges in target graph and IE is the number of edges which determined correctly but with wrong direction.
(C)
Simulation results

In order to set used parameters optimally in proposed method, several experiments are done based on the seven available databases described in Table 1.

Table 1 The default parameters in experiments

Full size table

Breeding rate:

In this experiment, the goal is find the optimal breeding rate for each problem. For this purpose, all other parameters are fixed. In order to evaluate performance of the algorithm, the Bayesian Dirichlet metric which mentioned previously is used. The measures of bellow table are according to the average of 10 times running for each problem.

As it is shown in Fig. 8, the quality of scores when the breeding rate is located in range [0.1, 0.5] is better than second part, i.e. [0.5, 0.9]. Therefore, it can be said that when GA is dominated to PSO, the performance of method will be decreased. On the other hand, when the breeding rate is less than 0.3, the performance has been decreased that shows that the best range for breeding rate is [0.3, 0.5]. The experimental results in Table 2 demonstrate that the algorithm has different behavior against other databases and the performance is more suitable when the breeding rate is greater than 0.5. The experiment reveals that the number of variables in databases has direct impact such that when the BN is small, the diversity of the population respect to its size is suitable but when the size of BN is increased, it is necessary to have more diversity in population and search space is explored more accurately. The experimental results show that the range [0.6, 0.8] is suitable in Alarm and Car problems.
Size of the population

The other parameter which should optimally set is the size of population. For this purpose, the breeding rate is set based on the results of pervious experiment. In this test, the size of population is changed from 5 to 40 and in each phase, 5 particles are inserted to the population. The algorithm is run 10 times and the average of gained scores is computed.

The results are listed in Table 3. As it is shown, in Car and Asia problem, when the size of population is 20, the performance is suitable. But in Alarm network, with increasing the size of population, the resulted scores are improved. However, the growth of population leads to the more computational complexities. Therefore, it is appropriate that the size is determined based on the available resources.
Performance evaluation

For further assessment, the proposed method is run 30 times with the optimal parameters.

The results are depicted in Figs. 9, 10 and 11 for Asia, Car and Alarm networks, respectively.

Table 2 Experimental results of breeding rate setting in three test Bayesian networks

Full size table

Table 3 Experimental results of population size setting in three test Bayesian networks

Full size table

The formulas 4–7 are used to compare the results with the target networks.

$$\begin{aligned}&\frac{\text {Total number of TC edges }}{\text {Total number of edges}} \end{aligned}$$

(4)

$$\begin{aligned}&\frac{\text {Total number of IE edges }}{\text {Total number of edges}} \end{aligned}$$

(5)

$$\begin{aligned}&\frac{\text {Total number of ME edges }}{\text {Total number of edges}} \end{aligned}$$

(6)

$$\begin{aligned}&\frac{\text {Total number of EE edges }}{\text {Total number of edges}} \end{aligned}$$

(7)

In the above relations, TC is the number of edges correctly added between the same nodes as those in target network. ME is the number of missed edges against target network. IE is the number of edges that have been determined correctly but are reverse and EE is the number of edges which are wrongly added.

Table 4 The average of correct, missed, reverse and wrong edges over 30 times executions

Full size table

To assess more accurately, the proposed method is run for each network 30 time individually and the average of above relations are listed in Table 4.

These tests highlighted that the resulting networks are close to the target networks and can conclude that the proposed method has suitable performance.

Table 5 Comparison of SLF measure among the proposed method and other nine algorithms on seven datasets

Full size table

Comparison with other methods

The proposed method has been compared with totally thirteen well-known methods. It has been tried to select the methods which are applicable and cover all different classes for Bayesian structure learning based on data. These methods are REST (Cowie et al. 2007), CONAR (Cowie et al. 2007), CGA (Larrañaga 1996), SGA (You 2001), FGS (Chickering 2002), FCI (Colombo and Maathuis 2014), PC (Spirtes et al. 2000), MMHC (Tsamardinos et al. 2006), TPDA (Cheng et al. 1997), SCA (Friedman et al. 1999), Minimum spanning tree (MST) (BayesiaLab 6.0.2), EQ (Munteanu and Bendou 2001)and Taboo (BayesiaLab 6.0.2).

Three evaluation metrics which are structure learning factor (SLF), topology learning factor (TLF) and graph error criteria have been used in which the first two metrics have been declared in relation 3.

Furthermore, graph error is the number of all errors such as reversing, missing and extra edges.

As mentioned above, among the comparing methods, REST and CONAR are based on the particle swarm optimization and CGA and SGA use the genetic algorithm. Since the proposed method has a hybrid strategy combining GA and PSO, for more assessment, the comparison of these four methods has been represented individually.

The diagram in Fig. 12 shows the results for Asia, Car diagnosis and Alarm networks. Because of the stochastic behaviors of GA and PSO, the evaluating metrics have been calculated based on average over 30 runs.

The results of SLFs and TLFs show that performance of the proposed method is higher than the other two methods. Also, in Alarm Bayesian network which is a challenging problem, the results of presented method are acceptable. This result has further strengthened our hypothesis that the combination of GA with PSO increases the chance of gaining optimal solution and prevents from early convergence to local optima.

In order to compare the proposed method with GA-based approaches (CGA and SGA), graph error criteria is used.

The diagram of comparing normalized graph errors of GA based and the proposed method are represented in Fig. 13.

The first three datasets which are related to ASIA network include 2000, 3000 and 5000 records, respectively. The second group of datasets, i.e., the fourth to sixth are related to Alarm network and include 3000, 5000 and 10,000 records, respectively. As expected, this experiment demonstrates that the graph errors of the proposed method are less than GA-based methods.

Table 6 Comparison of TLF measure among the proposed method and other nine algorithms on seven datasets

Full size table

Table 7 Comparison of graph error among the proposed method and other nine algorithms on seven datasets

Full size table

Next, the other nine methods have been compared according to the pervious assessment metrics and the results have been demonstrated separately in Tables 5, 6 and 7. In order to have a complete report, seven datasets are generated from the introduced benchmark networks. PC, SCA, MMHC and TPDA has been run by using Causal Explorer system (Statnikov 2010). For FCI and FGS algorithms TETRAD project (Scheines 1998) has been used. Finally, BayesiaLab software (BayesiaLab 6.0.2) has been used for running Taboo, Minimum Spanning Tree and EQ methods. It is be noted that parameters of all the algorithms have been set based on their default values. In each row of Tables, the best value for each metric is shown in bold. Finally, we have reported the best results of proposed method over 30 runs.

SLF and TLF measures in Tables 5 and 6 demonstrate that the proposed method almost outperforms the other approaches and achieves the highest SLF on five datasets (asia, child, alarm, mildew and insurance) and obtains the highest TLF measures on five datasets (asia, car, child, mildew and insurance).

However, there are no major differences among BSPSO method and other approaches in other cases.

By analyzing the graph error measures in Table 7, it can be obtained that EQ and Taboo have smaller graph error measures than the other approaches in most cases. Maybe the main reasons for obtaining their efficient performance are that the first switches from greedy strategy to Taboo for avoiding from local minima and the second reduces the search space by searching for the equivalence classes of Bayesian networks. BSPSO achieves the best graph error score on four datasets (Asia, Car, Alarm and Hailfinder). Also its results are affordable in other cases. Some methods such as PC, MMHC and SCA have poor outcomes. Maybe setting their parameters more accurately for more complex networks and increasing the number of samples in datasets improve their performances.

6 Conclusion

In this paper, Breeding Swarm PSO (BSPSO) algorithm as an efficient swarm intelligence-based approach has been used to solve the BN structure learning problem. BSPSO combines PSO and GA which provides a synergy between their capabilities and help to search more effectively the solution space and avoid from early convergence.

In order to set some parameters such as breeding rate, the method is run several times with different conditions. So as to assess the performance of the proposed method, seven datasets have been generated based on seven real-world networks. Several experiments have been done to evaluate the performance of the proposed method. Moreover, it has been compared with thirteen representative algorithms which four of them are GA based and PSO based, respectively. The experimental results demonstrate that BSPSO completely outperforms the four population-based approaches and also has promising performance against the other nine methods in finding the desirable networks.

Future work will concentrate on more complex problems in learning BNs, e.g., hidden variables and multi-relational data.

References

Abramson B et al (1996) Hailfinder: a Bayesian system for forecasting severe weather. Int J Forecast 12:57–71
Article Google Scholar
Ahmad FK, Deris S, Othman N (2012) The inference of breast cancer metastasis through gene regulatory networks. J Biomed Inform 45:350–362
Article Google Scholar
Alonso-Barba JI, Puerta JM (2011) Structural learning of Bayesian networks using local algorithms based on the space of orderings. Soft Comput 15:1881–1895
Article MATH Google Scholar
BayesiaLab 6.0.2, Bayesia SAS, Laval, France. http://www.bayesialab.com
Beinlich IA (1989) The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. Springer, Berlin
Google Scholar
Binder J et al (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29:213–244
Article MATH Google Scholar
Cheng J, Bell DA and Liu W (1997) Learning belief networks from data: an information theory based approach. In: Proceedings of the sixth international conference on Information and knowledge management. ACM, pp 325–331
Chickering DM, Geiger D, Heckerman D (1995) Learning bayesian networks: search methods and experimental results. In: 5th international workshop on artificial intelligence and statistics, pp 112–128
Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H-J (eds) Learning from data, vol 112. Springer, New York, pp 121–130
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554
MathSciNet MATH Google Scholar
Colace F, De Santo M, Vento M (2010) A multiexpert approach for Bayesian network structural learning. In: 43rd Hawaii international conference on system sciences (HICSS), 2010 . IEEE pp 1–11
Colombo D, Maathuis MH (2014) Order-independent constraint-based causal structure learning. J Mach Learn Res 15:3741–3782
MathSciNet MATH Google Scholar
Cowell R (1998) Introduction to inference for Bayesian networks. In: Jordan MI (ed) Learning in graphical models, vol 89. Springer, Netherlands, pp 9–26
Cowie J, Oteniya L, Coles R (2007) Particle swarm optimisation for learning Bayesian networks. World Congress on Engineering, Newswood Limited/International Association of Engineers (IAENG), pp 71–76
Da You L et al (2001) Research on learning bayesian network structure based on genetic algorithms. J Comput Res Dev 8:916–922 (in Chinese)
Google Scholar
Friedman N, Nachman I, Peér D (1999) Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 206–215
Gámez JA, Mateo JL, Puerta JM (2011) Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min Knowl Discov 22:106–148
Article MathSciNet MATH Google Scholar
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243
MATH Google Scholar
Hill SM et al (2012) Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28:2804–2810
Article Google Scholar
Jensen AL, Jensen FV (1996) MIDAS-an influence diagram for management of mildew in winter wheat. In: Proceedings of the twelfth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 349–356
Ji J et al (2011) A hybrid method for learning Bayesian networks based on ant colony optimization. Appl Soft Comput 11:3373–3384
Article Google Scholar
Ji J, Wei H, Liu C (2013) An artificial bee colony algorithm for learning Bayesian networks. Soft Comput 17:983–994
Article Google Scholar
Khanteymoori AR, Menhaj MB, Homayounpour MM (2011) Structure learning in Bayesian networks using asexual reproduction optimization. ETRI J 33:39–49
Article Google Scholar
Larrañaga P et al (2013) A review on evolutionary algorithms in Bayesian network learning and inference tasks. Inf Sci 233:109–125
Article MathSciNet MATH Google Scholar
Larrañaga P et al (1996) Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans Pattern Anal Mach Intell 18:912–926
Article Google Scholar
Lauritzen SL, Spiegelhalter DJ (1988) Local computations with probabilities on graphical structures and their application to expert systems. J R Stat Soc B 50:157–224
MathSciNet MATH Google Scholar
Li Z et al (2011) Large-scale dynamic gene regulatory network inference combining differential equation models with local dynamic Bayesian network analysis. Bioinformatics 27:2686–2691
Article Google Scholar
Mattew S, Terence S (2006) Breeding PSO: a GA/PSO Hybrid. Department of Computer Science, University of Idaho, Moscow
Munteanu P, Bendou M (2001) The EQ framework for learning equivalence classes of Bayesian networks. In: Proceedings ieee international conference on data mining, 2001. ICDM 2001. IEEE, pp 417–424
Murphy K, Mian S (1999) Modelling gene expression data using dynamic Bayesian networks. Technical report, Computer Science Division, University of California, Berkeley, CA
NorsysSoftwareCorp, 1990–2013. Netica. Version 5.12. http://www.norsys.com/
Robinson RW (1977) Counting unlabeled acyclic digraphs. In: Little CHC (eds) Combinatorial mathematics V. Lecture Notes in Mathematics, vol 622. Springer, Heidelberg, pp 28–43
Scheines R et al (1998) The TETRAD project: constraint based aids to causal model specification. Multivar Behav Res 33:65–117
Article Google Scholar
Spiegelhalter DJ, Cowell RG (1992) Learning in probabilistic expert systems. Bayesian Stat 4:447–465
MathSciNet Google Scholar
Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search. MIT press, Cambridge
MATH Google Scholar
Statnikov A (2010) Causal explorer: a matlab library of algorithms for causal discovery and variable selection for classification. Causation Predict Chall Chall Mach Learn 2:267
Google Scholar
Tsamardinos I, Brown LE, Aliferis CF (2006) The max–min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65:31–78
Article Google Scholar
Wong ML, Lam W, Leung KS (1999) Using evolutionary programming and minimum description length principle for data mining of Bayesian networks. In: IEEE transactions on pattern analysis and machine intelligence, pp 174–178
Yang G, Lin Y, Bhattacharya P (2010) A driver fatigue recognition model based on information fusion and dynamic Bayesian network. Inf Sci 180:1942–1954
Article Google Scholar
Ziegler V (2008) Approximation algorithms for restricted Bayesian network structures. Inf Process Lett 108:60–63
Article MathSciNet MATH Google Scholar
Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21:71–79
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, University of Zanjan, Zanjan, Iran
Ali Reza Khanteymoori, Mohammad-H. Olyaee & Omid Abbaszadeh
Islamic Azad University, Karaj Branch, Karaj, Iran
Maryam Valian

Authors

Ali Reza Khanteymoori
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad-H. Olyaee
View author publications
You can also search for this author in PubMed Google Scholar
Omid Abbaszadeh
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Valian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Reza Khanteymoori.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khanteymoori, A.R., Olyaee, MH., Abbaszadeh, O. et al. A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm. Soft Comput 22, 3049–3060 (2018). https://doi.org/10.1007/s00500-017-2557-z

Download citation

Published: 06 April 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00500-017-2557-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm

Abstract

Similar content being viewed by others

A Memetic Approach to Bayesian Network Structure Learning

Bayesian network hybrid learning using an elite-guided genetic algorithm

A comparative study on swarm intelligence for structure learning of Bayesian networks

1 Introduction

2 Bayesian network

3 Structure learning