Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

As a graph model, BNs is an effective tool to deal with the uncertain problems in modeling and analysis, which is widely applied in many domains. Meanwhile, the learning problem of BNs is an important part of BN study. BN learning includes BN structural learning and parameter learning. Structure learning is needed to disclose the qualitative and the quantitative relationship between variables to light at the same time, while the BN structure learning is proved to be NP hard. Therefore, studying the BN structure learning problems is more challengeable and meaningful.

In the investigation of BN structure learning, there are two classes of methods to deal with the structure learning problems [1]: The method of independence analysis and the method of score searching. The former determines whether there is border between the corresponding points or not by examining the conditional independence and dependence between variables, thereby establishing the skeleton of BN structure and orienting the border to get the BN structure. The latter is the method of score searching. For the network search space is of great extent generally, some BN structure learning adopts heuristic greedy algorithm, which tends to lead to the local optimum in the learning outcomes.

Nowadays, some researchers adopt the method of intelligence evolution to avoid the shortage of the heuristic algorithm [2, 3]. The SA algorithm is just the intelligence algorithm which is applied therein firstly, which is proved to be successful [4]. In contrast with the typical methods, it has a persistent evolution. In fact, examining the dependence and the independence between variables can reveal the variables’ relational information which is camouflaged in data. Using this information can guide the evolution of the intelligence algorithm, thereby achieving the target of rapid convergence. ASOSA algorithm for the BN structure learning is proposed in this chapter.

2 BN Structure Learning

As a pictorial model that represents the joint probability distributions between variables, BNs include two parts: directed acyclic graph (DAG) and BN parameters. BN joint probability distributions can be decomposed as following through the independence relationship between variables contained in BN structure:

$$ p\left({X}_1,\cdots, {X}_n\right)={\displaystyle \prod_{i=1}^np\left({X}_i\Big|\mathbf{P}{\mathbf{a}}_i,G\right)} $$
(60.1)

wherein Pa i refers to the father node of the variable X i in BN structure (G). BN structure learning refers to obtaining the network structure which matches the sample data fitting best by analyzing a variety of samples, which is the focus of this chapter.

The method based on conditional independence test is efficient. But in certain cases, the conditional independence test order (the variables’ number of the conditional set is the number of the conditional independence test orders) increases exponentially with respect to the number of the variables.

This chapter adopts the mutual information and conditional mutual information in the conditional independence test. I(X i ; X j ) expresses the mutual information between variables X i and X j . According to the information theory, I(X i ; X j ) is

$$ I\left({X}_i;{X}_j\right)=I\left({X}_j;{X}_i\right)={\displaystyle \sum_{X_i,{X}_j}p\left({X}_i,{X}_j\right)} \log \frac{p\left({X}_i,{X}_j\right)}{p\left({X}_i\right)p\left({X}_j\right)} $$
(60.2)

The mutual information is nonnegative. The more variables X i and X j incline to independence, the more I(X i ; X j ) approaches 0.

The method based on score searching is another method of the BN structure learning; this method defines grading function S for each candidate network. Generally, S is posterior probability of the network. When assuming that BNs have equal a priori probability, the comparison of the posterior probability S of different BN structures is the comparison of the structure likelihood p(D|G). Cooper and Herskovits provided the calculating procedure of the structure likelihood:

$$ p\left(\mathrm{D}\Big|G\right)={\displaystyle \prod_{i=1}^n{\displaystyle \prod_{j=1}^{q_i}\frac{\varGamma \left({\alpha}_{ij}\right)}{\varGamma \left({\alpha}_{ij}+{N}_{ij}\right)}}}{\displaystyle \prod_{k=1}^{r_i}\frac{\varGamma \left({\alpha}_{ij k}+{N}_{ij k}\right)}{\varGamma \left({\alpha}_{ij k}\right)}} $$
(60.3)

wherein Γ is gamma function, N ijk is the number of cases in the dataset in which the parents of X i are in state j and X i itself is in state k, and q i and r i are the number of the parents of X i and X i in its own state separately. α ij and α ijk are the Dirichlet prior distributions. Equation (60.3) is also the famous Bayesian Dirichlet (BD) score function; the greater the structure score, the better the network structure.

3 Adaptive Selection Operator

The traditional SA algorithm of BN structure learning gets the neighbor by randomly selecting add, delete, or reverse the directed edges. This kind of operation is purposeless and ineffective. The zero-order independence information of the BN nodes can characterize the relationship between nodes substantially. So the information can be used to guide the generation of neighbor. The main thought is constructing the initial selection matrix with the nodes’ zero-order information in the independence test. With the conducting of the annealing, effect of selection matrix is weakened gradually and selection matrix will not change any more when it meets certain conditions.

The selection matrix can be classified into the dependence selection matrix and the independence selection matrix which are applied to the addition and the deletion of the BN edges, respectively.

The selection matrix can be gained in two steps:

  1. Step 1:

    Initialization. Set the coefficient of renovation k (k > 1). Calculate the mutual information of the two different random variable C 0 ij  = I(X i ; X j ), i = 1 ⋯ m − 1, and j = i + 1 ⋯ m; m is the number of the BN nodes. The matrix C 0 is the initial dependence selection matrix; the independence selection matrix and the dependence selection matrix have the opposite effects. In order to ensure the nonnegativity of the probability, D 0 ij  = − C 0 ij  + max(C 0 ij ), i = 1 ⋯ m − 1, j = i + 1 ⋯ m. D 0 is the initial independence selection matrix.

  2. Step 2:

    Updating. p = p + 1. C p ij  = (C p − 1 ij )1/k, D p ij  = (D p − 1 ij )1/k, p ≥ 1 when it meets the conditions; C p ij  = C p − 1 ij , D p ij  = D p − 1 ij , p ≥ 1 when it does not meet the conditions.

In order to keep the selectivity of the selection matrix, the maximum of the number of updating can be set.

4 ASOSA Algorithm

For the SA algorithm of BN structure learning, the initialized network structure is empty. It looks for the better network locally through the operation such as randomly adding, deleting the edge, and converting the edge’s direction. Lower the temperature gradually to look for the locally optimal network, until the suspense condition is reached. Adaptive selection operator with simulated annealing (ASOSA) algorithm has the same main body frame as the SA algorithm, but it uses the selection probability of the selection matrix instead of the random selection to operate the edges. The algorithm proposed in this chapter is presented in Algorithm 1.

The marking criterion of the algorithm based on ASOSA adopts the BD score. For SA algorithm seeking for the minimum, the score results should maintain the negative value only. The condition of the step 13 and 14 refers to ΔS ≤ 0, and the updating time does not reach the maximal time λ.

In order to compare the effectiveness of the ASOSA algorithm, an algorithm based on selection operator with simulated annealing (SOSA) is designed the selection matrix of which will not change in the annealing process.

5 Experimentation

The standard way of assessing the effectiveness of a learning algorithm is to draw samples from a known BN, apply the algorithm on the artificial data, and to compare the learned structure with the original one [3]. This chapter chooses three typical networks in different domains for the experiment: Asia network [5] (8 variables, 8 edges), insurance network [6] (27 variables, 52 edges), and alarm network [7] (37 variables, 49 edges). For the three experimental networks, datasets with 100, 200, 500, 1,000, and 5,000 samples were generated separately by applying Gibbs sampling.

Algorithm 1: ASOSA algorithm

Input: Set of learning data

Output: Bayesian network

  1. 1.

    Set the initial temperature T 0.

  2. 2.

    Set the minimum temperature T end .

  3. 3.

    ν = 0.99;//The descent velocity of temperature.

  4. 4.

    β = 20;//The maximal time of the outside loop.

  5. 5.

    σ = 20;//The maximal time of the inside loop.

  6. 6.

    Calculate the mutual information between any two variables to get C 0.

  7. 7.

    D 0 = − C 0 + max(C 0).

  8. 8.

    k = 1.1;//Coefficient of renovation.

  9. 9.

    λ = 15;//The maximal updating time of the selection matrix.

  10. 10.

    G = empty graph;//The candidate graph is initialized into empty graph.

  11. 11.

    T = T 0.

  12. 12.

    Repeat

  13. 13.

    If the conditions are met, update the selection matrix C and D; end.

  14. 14.

    If the conditions are not met, do not update the selection matrix; end.

  15. 15.

    For σ times do.

  16. 16.

    Add one edge and reduce one edge to G, according to the selection matrix C and D, and reverse the direction of one of the directed edges in G randomly.

  17. 17.

    Calculate the score difference ΔS coming from the three operations above.

  18. 18.

    If ΔS > 0 or e (ΔS/T) > rand(0,1) then apply these actions to G; end.

  19. 19.

    End.

  20. 20.

    T = T × ν.

  21. 21.

    Until the maximal time of the loop or T > T end is obtained.

  22. 22.

    Return G.

5.1 Qualitative Analysis of Algorithms

Table 60.1 shows the BD score statistics to the learning results of the 15 experimental datasets from the three experimental networks by three algorithms (SA, SOSA, and ASOSA).

Table 60.1 BD score for the learned structure (normalized for correct network score)

Hamming distance (HD) is an available approach to describe the difference between the network G after learning and the original network G 0. HD is the sum of excessive edge, deleted edge, and reversed edge. Table 60.2 shows the statistic results of the three different algorithms from the three different samples. It can be found from the statistic results of the HD that the performance of ASOSA is optimal and the posterior is SOSA.

Table 60.2 HD for the learned structure

5.2 Constringent Analysis of Algorithms

The runtime of algorithm can be the leading indicator to measure the algorithm efficiency, but the final results of the different algorithms are different. For convenience in comparison, take the time consumption of the BD final score of the SA algorithm calculated by SOSA and ASOSA as the comparison object, and the result is normalized with the computation time of the SA, as shown in Table 60.3.

Table 60.3 Runtime (normalized for SA)

It can be found from the comparison of the runtime of the algorithms that the time of ASOSA is smaller than the other two algorithms, but there is little difference in SOSA and ASOSA.

In order to better compare the convergence of the algorithms, the performance process of the Asia network learning results from 500 samples which is obtained by ASOSA and SA is recorded. The cross axle is time, and the axis of ordinates is the BD score result normalized for the correct network score as shown in Fig. 60.1.

Fig. 60.1
figure 1figure 1

(a and b) The performance record of ASOSA and SA

It can be found from the figure above that ASOSA converges rapidly. The searching directivity of the initial algorithm is conspicuous. The final convergence result is in the low level. The convergence process of the traditional SA is slow, and the final convergence result is worse than that of ASOSA.

6 Conclusion

This chapter introduces the ASOSA algorithm for BN structure learning, which fuses the independence analysis method and the score searching method of the BN structure learning, by the agency of the searching optimal network of the simulated annealing intelligence algorithm. The comparison of learning in three different samples from multi-aspect is among the algorithms ASOSA, SOSA, and SA. The result shows that the ASOSA is superior to SOSA and SA in the learning accuracy and the time consumption.