Keywords

1 Introduction

The Simple Propagation (SP) algorithm was introduced in [1] as a new algorithm for belief update in Bayesian networks. SP is similar to LP as it proceeds by message passing in a join tree or a junction tree taking advantage of a decomposition of clique and separator potentials, but it takes a simpler approach to the computation of potentials in the message passing phase of belief update. In SP, message construction is based on a one-in, one-out-principle meaning a potential has at least one non-evidence variable in the separator and at least one non-evidence variable not in the separator.

In this paper, we empirically analyze the role that the choice of the tree structure plays when SP conducts inference. We consider three kinds of tree structures, namely, optimal (or believed to be close to optimal) junction trees, junction trees produced by the fill-in-weight heuristic, and maximal prime subgraph decomposition (MPD) trees [4]. When using optimal junction trees or the ones produced by the fill-in-weight heuristic, experimental results shows that in fewer than half the cases, the space cost of SP is almost invariant. Moreover, the time cost of SP in these two categories is approximately the same in 14 out of 28 cases. On the other hand, SP often runs out of memory on MPD trees.

2 Preliminaries and Notation

A Bayesian network BN \(\mathcal{{N}}=(\mathcal{{X}}, G, \mathcal{{P}})\) consists of a set of discrete random variables \(\mathcal{{X}}=\{X_1,\ldots ,X_n\}\), where \(\mathrm {dom}(X)\) is the state space of X and \(||X||=|\mathrm {dom}(X)|\), an acyclic, directed graph (DAG) \(G=(V,E)\), where \(V\sim \mathcal{{X}}\) is the set of vertices and E is the set of edges, and a set \(\mathcal{{P}}\) of conditional probability distributions (CPDs). A BN represents a decomposition of the joint probability distribution \(P(\mathcal{{X}})= \prod _{X\in \mathcal{{X}}}P(X\!\mid \!\mathrm {pa}(X))\), where \(\mathrm {pa}(X)\) denotes the parents of X in G and \(\mathrm {fa}(X)=\mathrm {pa}(X)\cup \{X\}\). For example, a BN \(\mathcal{{N}}=(\mathcal{{X}}, G, \mathcal{{P}})\) over six variables is shown by G in Fig. 1, where we assume \(||X_1||=||X_4||=3\), \(||X_2||=||X_5||=||X_6||=4\), and \(||X_3||=2\).

Belief update is the task of computing the posterior marginal probability distribution \(P(X\!\mid \!\epsilon )\) for each non-observed variable \(X\in \mathcal{{X}}\setminus \mathcal{{X}}_\epsilon \) given evidence \(\epsilon \) assumed to be instantiations of variables \(\mathcal{{X}}(\epsilon )\). A variable X is a barren w.r.t. a set \(T\subseteq \mathcal{{X}}\), evidence \(\epsilon \), and DAG G, if \(X\not \in T\)\(X\not \in \mathcal{{X}}_\epsilon \) and X only has barren descendants in G, if any [6]. The notion of barren variables can be extended to graphs with both directed and undirected edges [3].

A junction tree \(T=(\mathcal{{C}},\mathcal{{S}})\) of \(\mathcal{{N}}\) is created by moralization and triangulation of G (see, e.g., [2]), where \(\mathcal{{C}}\) denotes the cliques and \(\mathcal{{S}}\) denotes the separators of T. The state space size of clique or separator A is defined as \(s(A)=\prod _{X\in A}||X||\).

3 Simple Propagation

SP [1] can be considered a simplification of LP [5] with respect to how messages are computed. The basic idea is to maintain a decomposition of clique and separator potentials and to exploit independence relations induced by evidence and barren variables during belief update. Each CPD \(P\in \mathcal{{P}}\) is associated with a clique C s.t. \( \mathrm {dom}(P)\subseteq C\), where P is reduced to reflect \(\epsilon \). Next messages are passed over the computational tree structure relative to a selected root of T.

figure a

In SP, the one-in, one-out-principle is applied when a clique sends a message to a neighbouring clique over a separator S. The one-in, one-out-principle states that a probability potential \(\phi \) has at least one non-evidence variable in S and another variable X not in S and SP can eliminate X. The computation of messages in SP is performed using the Simple Message Computation (SMC) algorithm shown as Algorithm 1. The \(\textsc {{SumOut}}\) algorithm corresponds to Variable Elimination summing out X from the product of \(\varPhi _X=\{\phi \in \mathcal{{F}}| X\in \mathrm {dom}(\phi )\}\) and replacing \(\varPhi _X\) in \(\mathcal{{F}}\) with the result.

In addition to passing the potentials created by Algorithm 1, SP also passes any potential \(\phi \) for which \(\mathrm {dom}(\phi )\subseteq S\), i.e., the potentials where the domain is a subset of the separator. After a full round of message passing, the posterior marginal \(P(X\!\mid \!\epsilon )\) can be computed from any clique or separator containing X.

4 Computational Tree Structure

The computational tree structure induces a partial order on the set of possible (implicit) elimination orders produced by the SMC algorithm during the message passing step of belief update. As SP maintains a decomposition of clique and separator potentials, it becomes sensitive to the order in which variables are eliminated during message passing. The number of variables to be eliminated when sending a message from A to B is determined by \(A\setminus B\) which is influenced by |A| and |B|. Thus, the impact of the (implicit) elimination order tends to increase with \(|A \setminus B|\). As SP uses the same potentials for message computation as LP, we can rely on the correctness considerations of [4].

Fig. 1.
figure 1

A DAG G with \(G^m\) and triangulations using fill-in-size and fill-in-weight.

To illustrate the impact of using different computational tree structures, consider the moral graph \(G^m\) and two triangulations of \(G^m\) in Fig. 1. The fill-in-size heuristic may produce the elimination order \(\sigma _{fis}=(X_6,X_1,X_5,X_2,X_3,X_4)\), while the order \(\sigma _{fiw}=(X_6,X_5,X_2,X_1,X_3,X_4)\) will be produced by the fill-in-weight heuristic (optimal w.r.t. s(T)).

Fig. 2.
figure 2

Three different computational tree structures.

The junction trees \(T_{fis}\) and \(T_{fiw}\) produced are shown in Fig. 2. This figure also shows the MPD tree \(T_{MPD}\) for \(\mathcal{{N}}\) constructed from \(T_{fiw}\) by merging cliques connected by incomplete separators in \(G^m\). For simplicity, we assume that clique \(X_4X_5X_6\) is selected as root of each tree.

Due to space limitations, we only consider the first message passed in each of the computational trees shown in Fig. 2. For \(T_{fis}\), the first message is from clique \(X_1X_2X_4\) to \(X_2X_3X_4\). The clique potential \(\pi _{X_1X_2X_4} = (\{P(X_2),P(X_1\!\mid \!X_2),P(X_4\!\mid \!X_1)\})\) has two CPDs satisfying the one-in, one-out-principle: \(P(X_1\!\mid \!X_2)\) and \(P(X_4\!\mid \!X_1)\). Notice \(P(X_2)\) does not satisfy the principle as all domain variables are in S. Both \(P(X_1\!\mid \!X_2)\) and \(P(X_4\!\mid \!X_1)\) satisfy the condition in line 2 of the SMC algorithm and variable \(X_1\) can be eliminated using the \(\textsc {{SumOut}}\)-algorithm producing \(\phi (X_4\!\mid \!X_2)\). The message is then

$$\begin{aligned} \pi _{X_1X_2X_4\rightarrow X_2X_3X_4} = (\{P(X_2), \phi (X_4\!\mid \!X_2)\}). \end{aligned}$$
(1)

For \(T_{fiw}\), the first message is from clique \(X_1X_2X_3\) to \(X_1X_3X_4\). The potential \(\pi _{X_1X_2X_3} = (\{P(X_2),P(X_1\!\mid \!X_2),P(X_3\!\mid \!X_2)\})\) has two CPDs satisfying the one-in, one-out-principle: \(P(X_1\!\mid \!X_2)\) and \(P(X_3\!\mid \!X_2)\). Notice \(P(X_2)\) does not satisfy the principle. Both \(P(X_1\!\mid \!X_2)\) and \(P(X_3\!\mid \!X_2)\) satisfy the condition in line 2 of the SMC algorithm and variable \(X_2\) can be eliminated using the \(\textsc {{SumOut}}\)-algorithm producing \(\phi (X_1,X_3)\). The message is then

$$\begin{aligned} \pi _{X_1X_2X_3\rightarrow X_1X_3X_4}=(\{\phi (X_1,X_3)\}). \end{aligned}$$
(2)

For \(T_{MPD}\), the first message is from clique \(X_1X_2X_3X_4X_5\) to \(X_4X_5X_6\). Potential \(\pi _{X_1X_2X_3X_4X_5}=(\{P(X_2),\) \(P(X_1\!\mid \!X_2),\) \(P(X_3\!\mid \!X_2),\) \(P(X_4|X_1),\) \(P(X_5|X_3)\})\) has two CPDs satisfying the one-in, one-out-principle: \(P(X_4\!\mid \!X_1)\) and \(P(X_5\!\mid \!X_3)\). Assume variable \(X_1\) selected for elimination first producing the updated set \(\mathcal{{F}}=\{P(X_2),P(X_3\!\mid \!X_2),P(X_5|X_3), \phi (X_4\!\mid \!X_2)\}\). Next, \(P(X_5|X_3)\) and \(\phi (X_4\!\mid \!X_2)\) satisfy the condition in line 2. Assume variable \(X_2\) is selected for elimination producing the updated set \(\mathcal{{F}}=\{P(X_5|X_3),\phi (X_4,X_3)\}\). Next, \(P(X_5|X_3)\) and \(\phi (X_3, X_4)\) satisfy the condition in line 2. Variable \(X_3\) is the last variable eliminated producing the updated set \(\mathcal{{F}}=\{\phi (X_4,X_5)\}\). The message is then

$$\begin{aligned} \pi _{X_1X_2X_3X_4X_5\rightarrow X_4X_5X_6}=(\{\phi (X_4,X_5)\}). \end{aligned}$$
(3)

The important point to notice is that different tree structures lead to different messages being constructed by SP, as shown in Eqs. (1), (2), and (3). The next section reports on an empirical evaluation of the performance impact of different tree structures.

5 Experimental Analysis

Table 1 shows statistics on a sample of 10 out of 28 BNs used in the experiments and information on the junction tree \(\hat{T}\), the junction tree \(T_{fiw}\) and the MPD tree \(T_{MPD}\), where sizes are on a log-scale in base 10. Table 2 shows the average largest factor over 100 sets of random evidence created by SP during belief update (this includes computing marginals using Variable Elimination). The empty entries denote examples where SP ran out of memory on some evidence sets.

Table 1. Information on 10 out of the 28 Bayesian networks and the computational tree structures used in the experiments.
Table 2. Average space cost of belief update using SP.

In fewer than half the cases, the space cost of SP is almost invariant to the use of \(\hat{T}\) or \(T_{fiw}\) as the computational structure. These are the cases where \(s(\hat{T})\approx s(T_{fiw})\). In the cases where \(s(\hat{T})<< s(T_{fiw})\), there is a large difference in the average size of the largest factor created by SP. SP is not able to perform belief update in 20 out of 28 networks using \(T_{MPD}\) as the computational structure. The eight networks for which SP can perform belief update have an average size of the largest factor of less than 100, 000 probabilities.

Table 3 shows the average computation cost (wall-clock time) of belief update by SP. In 14 out of 28 cases the time cost of SP using \(T_{fiw}\) is approximately the same as the time cost SP using \(\hat{T}\). The difference is less than five per cent. Since SP using \(T_{MPD}\) is only able to solve the most simple networks, the time cost of SP in this case is low and in many cases comparable to the time cost of \(\hat{T}\) and \(T_{fiw}\) (at least in absolute terms).

Table 3. Average time cost of belief update using SP (mean and standard deviation).

6 Conclusion

This paper has investigated the impact of the secondary computational structure used for belief update on time and space performance of SP. The results of a preliminary empirical performance evaluation on a set of real-world Bayesian networks indicate that the tree structure used to control the message passing can have a significant impact on both time and space performance.