1 Introduction

Bayesian networks are probabilistic graphical models that provide a powerful way to embed knowledge and to update one’s beliefs about target variables given new information about other variables. They are widely used for problems with inherent uncertainty such as classification, diagnosis and decision-making [65]. In a Bayesian network, prior knowledge is represented by a probability distribution P on the set of variables which define the problem, whereas updated beliefs are represented by the posterior probability distribution P(.∣o b s) where obs represents new information. Inference in Bayesian networks provides a means to explain given evidence e.g., Maximum A Posteriori Assignment (MAP), Most Probable Explanation (MPE) [61], and Most Relevant Explanation (MRE) [73]. Evidence is the starting point of these methods and refers to new information in a Bayesian network. A piece of evidence is also called a finding or an observation, and evidence refers to a set of findings. Figure 1 illustrates the propagation of evidence in belief updating.

Fig. 1
figure 1

Propagation of evidence in belief updating

A finding on a variable commonly refers to an instantiation of the variable. This can be represented by a vector with one element equal to 1, corresponding to the state the variable is in, and all other elements equal to zero. This type of evidence is usually referred to as hard evidence though other terms are sometimes used.

This paper focuses on another type of evidence that cannot be represented by such vectors: uncertain evidence. The objective of the paper is to clarify the term uncertain evidence and its underlying concepts. It is argued that three types of uncertain evidence need to be clearly distinguished, namely likelihood evidence, fixed probabilistic evidence and not-fixed probabilistic evidence.

Likelihood evidence is applicable when there is uncertainty about the veracity of an observation, such as, for example, the information given by an imperfect sensor. Ideally the Bayesian network should include two variables for each of two physical quantities: one for the unobserved real value, and one for the value observed by the sensor. Using likelihood evidence avoids the need to add both these variables in the model.

In contrast probabilistic evidence can be regarded as a new probability distribution on a variable arising from a new observation after creation of the model. This is dissimilar to likelihood evidence where the original probability distribution is not challenged, only ones belief in it and which may be amended with new likelihood evidence.

Not-fixed probabilistic evidence is typical of the situation where a variable is given new probability distribution as a result of the BN models application to a specific sub-population of the global population for which the model was built. Thus the conditional probabilities remain the same and the variable with the new observed distribution can be updated in response to evidence at other nodes.

Fixed probabilistic evidence is conceptually similar to not-fixed but the new probability distribution is regarded as immutable, even after later evidence is applied to other nodes. A typical example is where probabilistic evidence is imparted to a subscriber from a publisher in a one-way communication in an agent encapsulated Bayesian network.

Below these concepts are further explicated and defined. The propagation of the various forms of evidence is discussed and examples of applications are given to further illustrate the differences between these types of evidence.

The rest of the paper is organized as follows. Section 2 presents definitions of a Bayesian network and hard evidence in a Bayesian network. Section 3 is a review of terminology and concepts about evidence in Bayesian networks, in the literature and in Bayesian network engines, followed by the proposed terminology for evidence in a Bayesian network. Section 4 proposes definitions, properties and examples of the three types of uncertain evidence. Section 5 describes updating algorithms for each type of uncertain evidence. Section 6 proposes and discusses three types of situation where the use of fixed probabilistic evidence is required. The first one is about the integration of Bayesian networks with Geographic Information Systems (GIS), the second one concerns the propagation of a continuous variable in a discrete Bayesian network, and the third one is about using fixed probabilistic evidence for a distributed Bayesian network. Section 7 offers our conclusions and presents our future research proposals.

2 Basics of a Bayesian network

This section concerns the definition of a Bayesian network and hard evidence.

2.1 Bayesian network definition

Bayesian networks [20, 35, 42, 61] are a class of probabilistic graphical models. After giving a definition of Bayesian networks, we provide a brief overview about learning and inference.

Definition 1 (Bayesian network)

A Bayesian network is a couple (G, P), where G = (X, E) is a directed acyclic graph with nodes X = {X 1,...,X n } and directed edges E which represent conditional dependencies between nodes. The joint probability distribution for X = {X 1,X 2,...,X n } is given by the chain rule:

$$P(X_{1},X_{2},... , X_{n}) = \prod\limits^{n}_{i = 1} P(X_{i} \mid pa(X_{i}))$$

where p a(X i ) represents the parents of X i as defined by the presence of directed edge from a parent node to X i .

The graph G represents the qualitative component of the Bayesian network; a quantitative component is given by P, which is defined by a set of conditional probability distributions associated with each node in X. In this paper we consider only discrete random variables, and thus each node is associated with a conditional probability table (CPT). Both G and P can be obtained from human experts or learnt from available data.

Once the Bayesian network is defined, algorithms are used to propagate new evidence through it. Unfortunately, both exact and approximate methods of inference have been proved to be NP-hard [16, 17], rendering the propagation of evidence intractable in some cases. This intractability depends on the size of the model (number of nodes, degree of nodes, size of the sets of possible values for each node, and other graphical parameters, such as the treewidth). However, some approximate inference methods [72] have been shown to work very well in practice for large scale networks.

In the following, capital letters are used to represent random variables, and lower-case letters represent their values. Bold capital letters correspond to sets of variables. Here are some more notations used in the rest of the paper:

  • X denotes a Bayesian network node XX having its states (or values) in \(\mathcal {D}_{X} = \{ x_{1},..., x_{m} \}\),

  • P(X) denotes P(X 1,…,X n ); it is the joint probability distribution that defines a Bayesian network on the set X.

  • P(x) denotes P(X = x),

  • P(X) is the probability distribution (P(X = x 1),…,P(X = x m )).

  • R(X i ) and R(X j ,X k ) are local probability distributions used to describe uncertain findings on X i and (X j ,X k ).

2.2 Hard evidence in a Bayesian network

Bayesian networks are commonly used to propagate observations represented by hard findings.

Definition 2 (Hard finding)

A hard finding e on a variable X in a Bayesian network with values in \(\mathcal {D}_{X}\) is defined by an observation vector of size \(m = |\mathcal {D}_{X}|\) containing a single 1, at the position corresponding to a state \(x \in \mathcal {D}_{X}\) and 0 for all other positions. This finding represents the instantiation of X to the value x and it is characterized by P(X = xe)=1. Hard evidence is a set of hard findings.

Let Q denote the probability distribution reflecting the belief state after taking into account hard evidence e. We have Q = P(.∣e).

3 Review of uncertain evidence in the literature and Bayesian networks software

This section presents a seven point summary that establishes a picture of uncertain evidence in Bayesian networks. This leads us to the tabulation of both concepts and terminology presented at the end of this section. Detailed definitions of each concept are presented in the following section.

The seven points are:

  1. 1.

    A quasi-consensus on the definition of a finding in a Bayesian network;

  2. 2.

    Uncertain evidence: two main concepts;

  3. 3.

    Likelihood finding: a first concept of uncertain evidence well identified;

  4. 4.

    Soft evidence: a source of confusion due to terminology;

  5. 5.

    Uncertain evidence specified by a local probability distribution: no terminology, no consensus, and not supported by Bayesian network engines;

  6. 6.

    Confusion leads to debates about commutation of iterated belief revisions;

  7. 7.

    Uncertain evidence propagation in Bayesian network software products.

3.1 A quasi-consensus on the definition of a finding in a Bayesian network

The terms evidence and findings are synonyms, but only the term finding is used in the singular form. When no adjective qualifies the word evidence (or finding or observation), it generally represents hard evidence, that is to say the instantiation of a set of variables of the Bayesian network. In the literature, a hard finding on a variable is also called an observation, or a deterministic, specific, regular or positive finding. Despite the variety of terminology in the literature, the definition of hard evidence is clear and presents no particular problems (see Definition 2).

However, a generalization of the definition of finding is proposed by Jensen and Nielsen [35] in which a finding on a variable X is an m-dimensional table of zeros and ones, allowing several ones. We do not follow this proposition. The second point of this review identifies two main concepts of uncertain evidence in the context of belief updating in the probabilistic framework.

3.2 Two main concepts of uncertain evidence

Many real world problems require reasoning with uncertain inputs. Belief updating founded upon uncertain inputs may differ from belief updating founded on certain inputs. We cite the two possible meanings of the term “uncertain input” in the probabilistic framework, following Dubois, Moral and Prade [25]:

  • “the uncertainty bears on the meaning of the input; the existence of the input itself is uncertain, due to, for instance, the unreliability of the source that supplies inputs.”

  • “the input is a partial description of a probability measure; the uncertainty is part of the input and is taken as a constraint on the final cognitive state. The input is then a correction to the prior cognitive state.”

In the following, we use the term uncertain evidence as a generic term to refer to any of these two meanings. In the context of Bayesian networks, only one of these meanings is widely used. This is the next point of this review about uncertain evidence. Complete definitions, properties and examples pertaining to these two main meanings are given in Section 4.

3.3 The first concept of uncertain evidence: likelihood finding

The concept of likelihood evidence (or virtual evidence) [61] models the case where the observation is uncertain due to, for example, an unreliable source of information.

This type of uncertain evidence is well documented in the literature and its propagation is possible in some Bayesian network software, even if it is sometimes wrongly named. Table 1 provides the list of terms used to refer to likelihood evidence in several Bayesian network software applications. Four of them employ the term soft evidence instead of likelihood evidence or virtual evidence. In the next section we argue this term should not be used.

Table 1 Terminology used in several Bayesian networks software to name likelihood evidence in April 2014

3.4 Soft evidence : a terminology with no consensus

The term soft evidence has been introduced in the context of Agent Encapsulated Bayesian networks (AEBN) [11, 69]. It refers to evidence specified by local probability distributions that define constraints on the posterior probability distribution and that cannot be changed by further information meaning that these probability distributions are fixed. Each observed local probability distribution on a subset of variables is different than the encoded prior probability distributions for these variables associated with the Bayesian network.

The term soft evidence is used with that meaning by Valtorta [40, 48, 49, 69] and other authors [43, 60, 63, 64, 68]. With that meaning, likelihood evidence can be interpreted as evidence with uncertainty, while soft evidence can be interpreted as evidence of uncertainty [64].

However, a review of the literature over the past ten years shows that the term soft evidence is sometimes used to refer to likelihood evidence [9, 13, 14, 18, 41, 44]. None of these articles refers to Valtorta’s use of the term; it is clear therefore, that their authors have developed a different contemporary use which has led to confusion. The same confusion occurs in Bayesian network software, where at least four products use soft evidence to refer to likelihood evidence (see Table 1). In another paper [12], the term was used to refer to a variable X which does not take a specific value x, that is to say Xx, which is usually called a negative finding. Moreover, the concept of uncertain evidence specified by a local probability distribution and that cannot be modified by other information is not always named soft evidence (Table 2).

Table 2 Available features in Bayesian network engines to propagate uncertain evidence specified by a local probability distribution (April 2014)

Thus the term soft evidence is confusing since it is used ambiguously both in the literature and by Bayesian network software interfaces.

3.5 Uncertain evidence specified by a local probability distribution: no terminology, no consensus, rarely available in Bayesian network engines

Several papers concern both the specification of uncertain evidence and its propagation [15, 20, 60, 64, 71]. The term uncertain evidence is often employed in a generic way to name different types of non-hard evidence . They clearly distinguish between two types of uncertain evidence in Bayesian networks. The first concept is likelihood evidence, as proposed by Pearl in [61], and is clearly identified by the authors who mentioned it. The second concept of uncertain evidence is specified by a local probability distribution but is not so clearly defined. Thus two gaps need to be filled: (1) an adequate terminology should be defined to name uncertain evidence specified with a local probability distribution; (2) this second type of uncertain evidence requires a more precise definition to allow a clear identification of, and to make the distinction between two different sub-types. None of the above-cited papers focuses on terminology, nor identifies clearly the two sub-types of uncertain evidence specified by a local probability distribution.Footnote 1

Chan and Darwiche present and compare two methods of propagation of uncertain evidence [15]. They provide an interesting discussion about the specification of evidence in both cases. However, no terminology or definition is proposed in order to consider the local probability distribution as a particular type of uncertain evidence in a Bayesian network. Although the analysis of Chan and Darwiche [15] is referred to by Peng and Zhang [64], it does not lead to a clear distinction of both sub-types of uncertain evidence specified by a local probability distribution.

In Bayesian network software, very few products propose the propagation of uncertain evidence specified with a local probability distribution. Table 2 shows the available features of three Bayesian network software products.

3.6 Confusion leads to debates about commutation of belief revision

The confusion about uncertain evidence specified by a local probability distribution gave rise to debates between authors, particularly about the question of commutation. The question “Should, and do, iterated belief revisions commute?” [15] concerns the case of revision with several pieces of evidence, of which some are uncertain. Some authors claim that several pieces of evidence specified by a local probability distribution and carrying the “All things considered” interpretation must not be commutative [15]. Others argue that soft evidence is a true observation of the distributions of some events, and as such, they should all be preserved in the updated “posterior” distribution [64]. In the first case, the arriving information is susceptible to improvement by further evidence, whereas in the second case, the arriving information has to behave as hard evidence and can not be influenced by any other information.

The conclusion is that a third concept of uncertain evidence has to be more clearly defined in the context of Bayesian networks. More exactly, uncertain evidence can be divided into two main concepts, namely likelihood evidence and uncertain evidence specified by a partial measure of probability. This second concept has to be divided into two sub-concepts, according to whether the local probability distribution which specifies a constraint on the posterior belief state can be (or cannot be) modified by later arrival of new information.

3.7 Proposed terminology for uncertain evidence in a Bayesian network

In order to address the lack of an unambiguous terminology in Bayesian network theory and practice, we propose the use of the term probabilistic evidence for uncertain evidence specified by a local probability distribution. In contrast to hard evidence, the inconsistent use of terms for some type of uncertain evidence is problematic, in particular the term soft evidence, the misuse of which may cause real confusion. In order to make a clear distinction between the two sub-types of probabilistic evidence, we propose the terms fixed probabilistic evidence to refer to soft evidence such as defined by Valtorta [64, 69], and not-fixed probabilistic evidence to refer to the concept used in Jeffrey’s rule and discussed in [15, 64]. The terms “likelihood” and “probabilistic” capture the ways the evidence are specified. The adjectives “fixed” and “not fixed” capture the expected behavior of the posterior probability distribution after further evidence is obtained. Table 3 presents the proposed terminology and the main associated characteristics.

Table 3 Proposed terminology about uncertain evidence in Bayesian network

Up to now, the two concepts of uncertain evidence specified by a probability distribution, namely fixed and not fixed probabilistic evidence, are poorly identified in the Bayesian network community. They are mostly absent from Bayesian network software products.

3.8 Uncertain evidence in Bayesian network software

Most available implementations of uncertain evidence propagation in Bayesian network engines concern Pearl’s method of virtual evidence. Table 4 shows the features available for the updating of uncertain evidence among some available Bayesian network engines.Footnote 2

Table 4 Features of Bayesian network software about evidence updating

This review of literature leads to the conclusion that two main types of uncertain evidence have been defined, but neither terminology nor concepts have been clearly defined by the Bayesian network user community. The next section presents the definitions and characteristics of likelihood evidence and probabilistic evidence.

4 Uncertain evidence in a Bayesian network

This section presents definitions and properties of the two main types of uncertain evidence, namely likelihood evidence and probabilistic evidence. Their definitions and properties are illustrated with some examples, and the main elements of propagation algorithms are given in the following section.

4.1 Likelihood evidence: definition and characteristics

Likelihood evidence corresponds to the cases where the observation is uncertain (Fig. 2). The uncertainty on the observation may come from the unreliability or imprecision of the source of the information.

Fig. 2
figure 2

A likelihood finding on X. The variable X is observed with uncertainty (e.g. via an imperfect sensor). The binary variable O b s represents the observation, whereas X represents the real value. The variable O b s is temporarily added to the graph in order to propagate the observation O b s = o b s. The evidence is specified by the likelihood of the observed value obs with respect to each value of X

Definition 3 (Likelihood finding or virtual finding)

A likelihood finding (or virtual finding) on a variable X of a Bayesian network is an observation with uncertainty of the variable. It is specified by a likelihood ratioFootnote 3

$$L(X) = (L(X=x_{1}): \ldots: L(X=x_{m})) = (P(obs \mid x_{1}): \ldots:P(obs \mid x_{m})) $$

where the L(X = x i ) are quantities relative to each other representing the probability of the observed event given X is in the state x i . Likelihood evidence, also called virtual evidence, is a set of likelihood findings.

A particular case of likelihood finding occurs when the likelihood ratio is composed of only zeros and ones, in order to represent information meaning that only some values of the observed variables are possible (Definition 4). The zeros denote a negative finding, meaning that the corresponding states of X are impossible, whereas the ones denote a disjunctive finding, meaning that the variable is necessarily in one of the states corresponding to a one, but without specifying that some values are more probable than others.

Definition 4 (Negative finding, disjunctive finding)

A negative finding (or disjunctive finding) on a variable X with values in \(\mathcal {D}_{X}\) is defined by an observation vector of zeros and ones. It represents the information that X can be only in one of the states corresponding to the ones and that the other states are impossible.

A negative finding whose observation vector contains a single one is a hard finding.

The next two properties describe how likelihood evidence interacts with beliefs before and after its propagation.

Property 1

Likelihood evidence is specified “without a prior”, as a consequence, propagating likelihood evidence takes into account the beliefs in the variable before the evidence.

Property 2

Belief in a variable after propagating a likelihood finding on it is not fixed: it can be modified by further evidence on other variables. In other words, let Q 1 represent the beliefs after the propagation of a likelihood finding on X and Q 2 represent the beliefs after a second piece of evidence on another variable; then it may occur that Q 1(X)≠Q 2(X), meaning that the belief in X has been modified by the second piece of evidence.

Example 1 (Likelihood finding: optical character recognition (OCR))

A Bayesian network includes a variable X representing a letter of the alphabet that the writer wanted to draw. The set of values of X is the set of letters of the alphabet. A piece of uncertain information on X is received from a system of OCR. The input of this system is an image of a character and the output is a vector of similarity between the image of the character and each letter of the alphabet. Let o represent the observed image. Consider a case where, due to lack of clarity, o can be recognized as either the letter ’v’, ’u’ or ’n’. The OCR technology provides the indices such that P(O b s = oX = ’v’)=0.8, P(O b s = oX = ’u’)=0.4, P(O b s = oX = ’n’)=0.1 and P(O b s = oX = x)=0 for any letter x other than ’u’, ’v’ or ’n’. This means that there is twice as much chance of observing o if the writer had wanted to draw the letter ’v’ than if she had wanted to draw the letter ’u’. Such a finding on X is a likelihood finding on X, specified by L(X)=(0:…:0:0.1:0:…0:0.4:0.8:0:0:0:0). Note that from the definition, the entries of L(X) need not add up to 1.

This example illustrates the two characteristics of likelihood finding: in the Bayesian network, the prior probability distribution P(Xp a(X)) includes knowledge about the distribution of letters in the language of the text from which the character comes whereas the OCR technology does not integrate that knowledge. Thus it provides information about X without prior knowledge. In order to update the belief in the value of the character, the information provided by the OCR (the vector of similarity) has to be combined with the prior knowledge about the frequency of letters. Moreover, the result of propagation is not fixed since belief in X can be further modified by other information. For example, information about the neighboring characters could be taken into account.

Example 2 (Negative finding: Example 1 continued)

Consider now the negative (or disjunctive) finding given by the information that the observed image (o) can be only the letter ’u’ or ’v’, and that all other letters are excluded. We have P(oX = ’u’)=P(oX = ’v’) and P(oX = x)=0 for all \(x \in \mathcal {D}_{X} \setminus \{\text {'u'},\text {'v'} \}\). Thus,

$$P(X=x \mid o) = \left\{ \begin{array}{ll} \frac{P(X=x)}{P(X=\text{'u'}) + P(X=\text{'v'})} & \text{for } x \in \{ \text{'u'}, \text{'v'} \} \\ 0 & \text{for other letters.} \end{array} \right. $$

The posterior beliefs on the events X = ’u’ and X = ’v’ depend on the prior beliefs on these events and may differ from them.

4.2 Probabilistic evidence: definition and characteristics

Probabilistic evidence corresponds to another meaning of uncertain input where the evidence is specified by local probability distributions. Fixed probabilistic evidence is often called soft evidence [47, 48, 60, 64, 69].

The definition below is given in its simplest form. A more general version is given below (Definition 6).

Definition 5 (Probabilistic finding, fixed or not-fixed)

A probabilistic finding on a variable XX is specified by a local probability distribution R(X) that defines a constraint on the belief in X after this information has been propagated; it describes the state of beliefs in the variable X “all things considered”. A probabilistic finding is fixed (or not) when the distribution R(X) can not be (or can be) modified by the propagation of other findings. Probabilistic evidence is a set of probabilistic findings.

The difference between fixed and not-fixed probabilistic evidence cannot be seen before the arrival of new information.

The next two properties describe how probabilistic evidence interacts with beliefs before and after its propagation.

Property 3

A probabilistic finding R(X) on a variable X of a Bayesian network replaces any prior belief or knowledge on X. As a consequence, the prior P(X) is not used in the propagation of R(X), and any previous finding or belief on X is lost.

Probabilistic evidence includes both the strength of the evidence and the state of beliefs before evidence.

Property 4

A probabilistic finding R(X) on a variable X is preserved when updating belief. The beliefs after considering the probabilistic finding on X is represented by a probability distribution Q on X such that Q(X)=R(X).

Property 5

A fixed probabilistic finding on X is not modified by further evidence on any other variables of the model, and a further finding on X is not possible, unless it overwrites the current evidence. Any kind of evidence received on other variables after fixed probabilistic evidence makes it necessary to re-propagate previous fixed probabilistic evidence together with the new evidence, in order to keep the former probabilistic evidence fixed. As a consequence, the propagation of several fixed probabilistic findings commutes: the result of propagation is independent of the order in which fixed probabilistic findings are received.

Fixed probabilistic evidence behaves as hard evidence in that the specified evidence remains unchanged after its propagation, and still remains unchanged after the arrival of other information on the same case.

Property 6

A not-fixed probabilistic finding on X can be modified by further evidence on any variable in the model, including likelihood evidence on X. As a consequence, the propagation of several not-fixed probabilistic findings does not commute.

In order to illustrate these definitions and properties, we propose below three examples of probabilistic findings: the first one concerns a not-fixed probabilistic finding in the example of character recognition, the second one presents fixed probabilistic evidence coming from the observation of a sub-population in the ASIA Bayesian network, and the third one shows a case of probabilistic evidence that first has to be fixed and not fixed afterwards.

Example 3 (not-fixed probabilistic finding: Example 1 continued)

Consider for the variable X in Example 1 that the language of the word from which the character comes, and the frequency of letters in that language are known. If the Bayesian network does not contain the variable “language of the text” (L), this information can be applied as not fixed probabilistic evidence for the variable X representing the character: R(X)=(R(X = ’a’),R(X = ’b’),…,R(X = ’z’)), provided that R(X) satisfy the condition “all things considered”, meaning that no other prior belief has to be combined with it. This has to replace the prior belief in the event X = x. Since that information about X could be improved by further evidence such as a likelihood finding on X described in Example 1, it is a not-fixed probabilistic finding on X.

Remark: in that example, let us suppose now that the first information is the likelihood finding provided by the OCR, and the second information concerns the language of the text (L = English), but L is not a variable of the Bayesian network. In that case, the probability distribution R(X) representing the frequency of each letters in English cannot be used to specify a not-fixed probabilistic finding on X since it does not consider “all things”, in particular, the information provided by the OCR technology is not taken into account in R(X).

Example 4 (Fixed probabilistic evidence in Asia Bayesian Network)

Consider the Bayesian network Asia [52] which contains eight binary nodes, among which there is a (root) node Smoking and a (leaf) node Dyspnea (Fig. 3). Instead of having findings about a single person, consider findings coming from the data of a particular sub-population, such as the workers in a given factory FunT. Observing that half of them have dyspnea and a tenth of them smoke constitutes fixed probabilistic findings on these variables such that: R(D y s p n e a)=(0.5,0.5) and R(S m o k i n g)=(0.1,0.9). No other information about the factory FunT can modify these probability distributions. The first finding has to be preserved even after propagating the second probabilistic finding. Thus both findings have to behave as hard findings and must not to be modified by propagation. They are fixed probabilistic findings. When no more details are available, these findings cannot be considered as a single piece of probabilistic evidence on the two variables R(D y s p n e a, S m o k i n g) as defined in the extended definition of probabilistic finding (Definition 6 below).

Fig. 3
figure 3

Two fixed probabilistic findings about Dyspnea (D) and smoking (S) observed in a sub-population. The evidence is specified by the probability distributions R(D) ans R(S) that represent the belief in D and S after the propagation

Example 5 (Probabilistic findings in Asia Bayesian Network: fixed then not-fixed)

Consider again the Bayesian network Asia and the case of Mr. Flipo who works in the factory FunT about which a recent survey has revealed that half of its workers suffer from dyspnea and only one in ten smoke (see Fig. 4). Without any more information on Mr. Flipo, the probability distributions R(D)=(0.5,0.5) and R(S)=(0.1,0.9) represent the posterior belief in the variables D and S for Mr. Flipo. The prior beliefs about D and S represented by P(D) and P(S) no longer have influence on our belief about Mr. Flipo thus R(D) and R(S) replace the prior beliefs, and they are given “all things considered”. Since the propagation of R(D) has to be preserved while R(S) is propagated (or vice versa regarding the order of propagation), the first probabilistic finding, say R(S)=(0.1,0.9), initially has to be fixed. If new information about Mr. Flipo arrives, such as a recent visit to Asia, it should modify our belief in both variables S and D. Thus, R(D)=(0.5,0.5) and R(S)=(0.1,0.9) become not-fixed probabilistic findings (the probabilistic finding on S is no longer kept fixed).

Fig. 4
figure 4

Two probabilistic findings about Mr. Flipo’s case. The first is keep fixed while the second is propagated, but after, both probabilistic findings are not fixed since beliefs in variable S and D can be modified by further evidence about Mr. Flipo, such as his recent visit in Asia

Consider now the case where the first information about Mr. Flipo concerns his recent visit to Asia (A = t r u e), and the second information is that Mr. Flipo works in the factory FunT, in which half its workers suffer from dyspnea and only one in ten smoke. The probability distributions R(D)=(0.5,0.5) and R(S)=(0.1,0.9) do not represent the posterior belief in the variables D and S for Mr. Flipo, since they do not include the initial information about the visit to Asia. Since these probability distributions are not given with “all things considered”, they can not be used to specify probabilistic findings.

This example illustrates that the propagation of several probabilistic findings, such that all of them are preserved, requires that they be fixed until the last finding has been propagated. This has to be done even if the probabilistic findings can be later modified by other information. In that case, the initial probabilistic findings are no longer kept fixed.

In this example, a set of not fixed probabilistic findings are deduced from the same initial information. Since each of these findings defines a constraint on the posterior probability distribution, they have to be kept fixed until all of them are propagated. Afterwards, these probabilistic findings can be later modified by other information.

Example 4 illustrates that an observation from a sub-population constitutes a fixed probabilistic observation, whereas Example 5 illustrates that the information on a single instance furnished by knowledge of the sub-population to which it belongs, is not a fixed probabilistic observation since this information can be improved by further evidence. Two other kinds of probabilistic evidence are detailed below in Section 6.

There are two main differences between probabilistic evidence and likelihood evidence. Firstly the specification: for probabilistic evidence the distribution is specified “all things considered” whereas for likelihood evidence the likelihood ratio is without prior knowledge or belief. Secondly the propagation: while probabilistic evidence remains unchanged by updating the observed variables, likelihood evidence has to be combined with previous beliefs in order to update the belief in the observed variable(s).

Fixed probabilistic evidence on a variable X can be supplied by an expert on X, and her judgment on X cannot be improved by other evidence on any other variables of the model. This type of evidence can be obtained by the precise observation of a variable on a sub-population. The difference between fixed and not-fixed probabilistic evidence is only visible when several pieces of evidence are received and propagated.

Definition 5 has been extended by Valtorta [69] in order to consider information about one or more variables of the model, specified by different forms of probabilistic evidence.

Definition 6 (extended notion of probabilistic finding)

A probabilistic finding on a subset of variables YX is a partial description of a probability measure that can be one of the following:

(a) a joint probability distribution R(Y),

(b) a conditional probability distribution R(YZ) where ZXY,

(c) probability assignments on arbitrary events on variables of Y,

(d) probability assignments on arbitrary logic formulae on variables of Y.

The extended notion of probabilistic evidence can be handled for evidence updating by the introduction of an observation node [69]. This technique allows the reformulation of extended probabilistic evidence into probabilistic evidence on a single new observation variable. In the following, we consider only probabilistic evidence involving a single variable.

4.3 D-separation and uncertain evidence in a Bayesian network

The property of d-separation is central in Bayesian networks. It allows the identification of those variables in the network whose posterior probability could be modified by new information, regarding both previous observations and their relative position in the graph.

The property of d-separation between two variables requires the examination of all the paths between them to check whether they are blocked or not.

Usually, a path between two nodes X and Y is said to be blocked if there exists an intermediate node Z on the path such that one of the following conditions are true:

  • there is a serial or a diverging connection on Z and Z is observed;

  • there is a converging connection on Z and Z is not observed and none of its descendants is observed.

However, it is necessary to be more precise to explain what is meant by Z is observed (or not) regarding the different kinds of observations. Table 5 is a first step in that direction. It shows how each kind of evidence can be classified in two classes, regarding whether the belief on the observed variable is fixed or not, as long as the observed case is considered.

Table 5 Classification of the status of the belief of a variable X regarding the kind of observation on that variable. This status holds as long as the observed case is considered

The classification given in Table 5 allows a more general characterization of a blocked path between two variables to be given (see Table 6 and Definition 7).

Table 6 Characterization of a blocked path between two variables X and Y, regarding the kind of evidence on an intermediate variable Z and the type of connection on Z

Definition 7 (blocked path)

A path between two nodes X and Y is blocked if there is an intermediate node Z on the path such that one of the following condition is true:

  • there is a serial or a diverging connection on Z and Z received a hard finding or a fixed probabilistic finding;

  • there is a converging connection on Z and Z received neither hard finding nor fixed probabilistic finding and the same occurs for its descendants.

The notion of d-separation can now be extended to fixed probabilistic evidence by using the definition above. Further studies would be required about d-separation and different types of uncertain evidence, particularly in the context of the propagation of new evidence in a Bayesian network including previous evidence.

4.4 Synthesis of properties of all types of evidence in a Bayesian network

Table 7 summarizes the properties of different types of evidence in a Bayesian network. It is interesting to note that fixed probabilistic evidence has the same properties as hard evidence.

Table 7 Synthesis of properties of different types of evidence in a Bayesian network

5 Uncertain evidence: propagation algorithms in Bayesian networks

This section briefly presents the main algorithms for the propagation of uncertain evidence, also called updating algorithms. First we present Pearl’s method of virtual evidence for propagating likelihood evidence. Then we present Jeffrey’s rule for probabilistic evidence and we explain why this method is restricted to the case where probabilistic evidence is not fixed. This leads us to discuss the commutativity of the propagation of several probabilistic observations. Next, recent algorithms to propagate probabilistic evidence respecting commutation are listed. Finally, we summarize propagation algorithms for uncertain evidence that are available in the main Bayesian network software.

5.1 Pearl’s method of virtual evidence

Virtual evidence refers to Pearl’s idea of interpreting a likelihood finding on an event as a hard finding on some virtual event that only depends on this event [61]. The virtual evidence method provides a convenient way of incorporating evidence with uncertainty in a Bayesian network.

Pearl’s method to propagate a likelihood finding on X extends the given Bayesian network by adding a binary virtual node which is a child of X. The uncertain evidence on X is replaced by hard evidence on the added node. The hard evidence on the added node is propagated using a classical inference algorithm in the Bayesian network. The uncertainty of the evidence is specified in the conditional probability table of the added virtual node.

The probability distribution Q representing beliefs after the propagation of a likelihood finding L(X) by the virtual evidence method is defined as follows. Consider a Bayesian network (G, P) with G = (X, E), and a likelihood finding on a variable XX, specified by a likelihood ratio L(X). Let O be the node added in the Bayesian network with the states \(\{o,\bar {o}\}\) where o is the observation. Let G =(X∪{O},E∪{(X, O)} be the augmented graph and (G ,P ) the augmented Bayesian network, where the probability distribution P is defined by P (O = oX)=L(X) and

$$P'(\textbf{X} \cup \{O\}) = \prod\limits_{X_{i} \in \textbf{X}} P(X_{i} \mid pa(X_{i})) \times P'(O \mid X).$$

With this notation, the posterior probability distribution Q can be defined by:

$$ Q(.) = P'(. \mid O=o) $$
(1)

Equation 1 is directly linked with Property 1 stating that likelihood evidence on X has to be combined with prior belief in X to be propagated.

5.2 Jeffrey’s rule and conversion in likelihood evidence: propagating not-fixed probabilistic evidence

Jeffrey’s rule [34] specifies evidence using posterior probabilities. Propagating a probabilistic finding on XX requires a revision of the probability distribution P on X by a local probability distribution R(X). The difficulty arises since Bayes’ rule cannot be applied because R(X) is not an event [64]. A probabilistic finding R(X) requires a reconsideration of the joint probability distribution P because it replaces the existing prior on the variable X.

The propagation of probabilistic evidence requires the replacement of the initial probability distribution P by another probability distribution Q that reflects the beliefs in the variables of the model after accepting the probabilistic evidence. This replacement is not definitive: it lasts as long as the specific observed case holds, whereas the Bayesian network applies to a larger population.

Jeffrey’s approach for this problem is known as “probability kinematics”, and it is based on the requirements that:

  1. 1.

    the posterior probability distribution on the observed variable X Q(X) is unchanged: Q(X)=R(X),

  2. 2.

    the conditional probability distribution of other variables given X remains invariant under the observation: Q(X∖{X}∣X)=P(X∖{X}∣X).

Jeffrey’s rule is given in (2): for a given local probability distribution R(X) and for ZX∖{X},

$$ Q(Z=z) = \sum\limits_{x}P(Z=z \mid X=x)R(X=x) $$
(2)

In other words, even if P and Q disagree on X, they agree on the consequences of X on other variables.

However, Jeffrey’s rule cannot be directly applied to Bayesian networks, because their operations are defined on full joint probability distributions. This can be overcome by converting a probabilistic finding to a likelihood finding: R(X) can be converted to a likelihood ratio

$$ L(X) = \frac{R(x_{1})}{P(x_{1})}: \ldots: \frac{R(x_{n})}{P(x_{n})}. $$
(3)

Propagating the likelihood finding L(X) with Pearl’s method provides the same results as propagating R(X) by Jeffrey’s rule [15, 64]. Thus, the posterior probability of X after propagating L(X) by Pearl’s method, is equal to R(X).

The propagation of a not-fixed probabilistic evidence with Jeffrey’s rule is available in the Netica software [58] under the name of “calibration”. It requires the user to input P(X = x i ∣“all observations”) for each value \(x_{i} \in \mathcal {D}_{X}\). The term “all observations” means that the probabilistic finding integrates any information about the variable.

In case of several probabilistic findings, the method of converting probabilistic findings into likelihood findings does not preserve probabilistic findings. A simple example can be found in [15, 64]: let R 1(X 1) and R 2(X 2) be two pieces of probabilistic evidence, and Q be the probability distribution reflecting the state of beliefs after considering both findings by using either Jeffrey’s rule or by the conversion into likelihood findings. Then Q(X 1) ≠ R 1(X 1) or Q(X 2) ≠ R 2(X 2) depending on the order of propagation. Results are not better when the second probabilistic finding is converted into likelihood finding using its probability revised after propagating the first finding. It therefore holds that the inclusion of several pieces of probabilistic evidence with Jeffrey’s rule does not commute. In other words, final beliefs depend on the order of arrival of the probabilistic findings. The next section deals with the propagation of several pieces of fixed probabilistic findings, such that their order does not modify the final belief.

5.3 Fixed probabilistic evidence propagating

Propagating a single probabilistic finding can be done by its transformation into a likelihood finding as in (3). This section concerns the propagation of several fixed probabilistic findings, that is to say that each of the specified probability distributions has to remain unchanged and the order of propagation should have no influence on the final result.

Several algorithms were recently proposed to propagate fixed probabilistic evidence in a Bayesian network. Most of them are based on the Iterative Proportional Fitting Procedure (IPFP) algorithm, which is an iterative method of revising a probability distribution to respect a set of given probability constraints in the form of posterior marginal probability distributions over a subset of variables. This algorithm first appeared in the literature in [45], and shortly after was used as a procedure to estimate cell frequencies in contingency tables under some marginal constraints [23]. More recently, a space-saving implementation of IPFP has been proposed [2, 36, 37]. However, the IPFP works on full joint distributions, and thus is not directly applicable to belief update in Bayesian networks. The algorithm could be applied for very small Bayesian networks, but would be infeasible for larger ones since it needs to literally modify each entry of the joint probability distribution table at each iteration.

The big clique algorithm is a variation of the junction tree algorithm, based on the IPFP. When constructing the junction tree, all variables involved in a soft finding are fully connected with each other by additional undirected edges. After triangulation, these nodes appear in a single clique (the big clique). The belief update is done by first updating the big clique by running IPFP to convergence and then propagating the resulting distribution of this clique to the rest of the junction tree.

The algorithms BN-IPFP1 and BN-IPFP2 [64] do not modify the junction tree and can work with any Bayesian network inference engine. Both algorithms utilize the IPFP, although in quite different ways. The iterations of BN-IPFP-1, BN-IPFP-2 and Big Clique algorithm all converge to the same distribution [64]. The BN-IPFP-1 algorithm first converts all pieces of probabilistic evidence to likelihood evidence and then iterates using the IPFP to update the Bayesian network until it settles down to a distribution that satisfies all given probabilistic evidence. The BN-IPFP-2 algorithm is more similar to the big clique algorithm, but without modifying the junction tree. BN-IPFP-2 can provide efficient computation when the number of variables involved in the probabilistic evidence is small.

figure f

The algorithm SMOOTH was developed by modifying the standard IPFP to support belief update with inconsistent evidence.

Table 8 provides a list of algorithms to propagate fixed probabilistic evidence [63, 64].

Table 8 List of algorithms to propagate fixed probabilistic evidence

5.3.1 The algorithm BN-IPFP-1

We present here the detail of one of the above algorithms. We choose BN-IPFP1 because of the initial results of the comparison of three algorithms (BN-IPFP1 and 2 and Big-clique) [48].

The BN-IPFP-1 algorithm [64] manages a set of consistent fixed probabilistic findings such that each fixed probabilistic finding R(Y f ) is dominated by the initial probability distribution (R(Y f ) << P(Y f )), meaning that there is no value y such that P(Y f = y)=0 and R(Y f = y) > 0.

The BN-IPFP-1 algorithm is independent of the inference algorithm and combines the IPFP and the conversion of probabilistic findings to likelihood findings. The BN-IPFP-1 algorithm converts separately each probabilistic finding to a likelihood finding and then iterates using IPFP to update the Bayesian network until it settles down to a distribution that satisfies all given probabilistic evidence. At each iteration, a new likelihood ratio is obtained by dividing a probabilistic finding (one at each iteration) by the marginal probability on that variable obtained in the previous step. This new likelihood ratio is then combined with all previous likelihood ratios on the same variable obtained in previous iterations (one for m iterations).

The proof of the convergence of the algorithm BN-IPFP-1 [64] is based on the convergence of the IPFP.

5.3.2 Dealing with the extended notion of fixed probabilistic evidence

The following procedure allows the handling of the extended notion of fixed probabilistic finding (see definition 6). This is done by adding an observation variable that is created as follows:

  • First, an observation variable O b s is created for each piece of fixed probabilistic evidence received. Every state of the observation variable corresponds to the possible outcomes of the probabilistic finding.

  • Second, directed edges to O b s are added from all variables in the Bayesian network that have a direct influence on the observation, that is to say variables involved in the probabilistic finding (Fig. 5).

  • Third, the dependence of the added nodes are modeled by specifying the conditional probability tables P(O b sp a(O b s)).

Fig. 5
figure 5

A Bayesian network receives a fixed probabilistic finding on (X 1,…,X P ) (Definition 6). An observation variable O b s is created in order to consider the new piece of information as a fixed probabilistic finding on a single node

Probabilistic evidence on the added observation node is propagated in the augmented Bayesian network thanks to one of the probabilistic evidence updating methods presented above.

In the case of evidence on a set of observation variables E 1,…,E p that are independent in the Bayesian network, the propagation can be done by considering a single piece of evidence R(E 1,…,E p )=R(E 1)×…×R(E p ).

6 Applications of fixed probabilistic evidence

Although the concept of probabilistic evidence in Bayesian networks was introduced in 1998 [11, 25], it remains little used by the Bayesian network user community. In this section we propose several examples of the use of fixed probabilistic evidence. Firstly we introduce the integration of Bayesian networks with Geographic Information Systems (GIS), secondly applications concerning the propagation of an observation on a continuous variable in a discrete Bayesian network, and thirdly using fixed probabilistic evidence for a distributed Bayesian network.

6.1 GIS applications

A range of applications of probabilistic evidence concerns the integration of a Geographic Information System (GIS) with a Bayesian network. The geographic area in focus furnishes the Bayesian network with the fixed probabilistic evidence derived from the GIS database held about a sub-population of the population used to derive the CPTs employed in the Bayesian network. For example, a Bayesian network is used to support conflict analysis for groundwater protection and observations about rainfall, groundwater salinity and land use are obtained from the GIS [31]. In a review of Bayesian network applications in ecosystem service modeling [46], the authors point out that in most of the studies which integrate a Bayesian network and a GIS, GIS is used as an input for the Bayesian network, providing probabilistic evidence for each geographical area.

Fixed probabilistic evidence can also been used in a Bayesian network to evaluate the social, economic and environmental impacts of community deployed renewable energy; each geographic area of interest furnishes the Bayesian network with a new fixed probability distribution to describe renewable energy resources, socio-economic parameters and the carbon intensity of displaced fossil fuels [53, 54]. The new probability distributions cannot replace prior probability distributions in the Bayesian network since they have to be kept fixed; they have to be propagated as fixed probabilistic evidence.

Figure 6 illustrates the use of fixed probabilistic evidence from a GIS. New information about the geographic area of interest is represented by a set of variables X 1,…,X p . An area is composed of several units, each of them having a specific value for each variable X i . The unit can be a pixel, a house, etc. Each observation of a variable X i in an area A k is obtained from a source S i that allows to compute the distribution of X i in the area A k . The source S i can be a database, a GIS or any other source. Since information about an area may come from different sources, it is specified by a list of fixed probabilistic evidence rather than a single joint probability distribution on the variables X 1,…X p . Each local probability distribution has to be kept fixed since it represents the variability of the observed variable in the considered area.

Fig. 6
figure 6

Fixed probabilistic evidence on a geographic area of interest

The question of representing and propagating uncertainty in geospatial information has been discussed by Laskey and Wright [50]. In particular, the authors propose to represent statistical regularities and uncertain evidence with probabilistic ontologies. This consists of a generic model based on Multi-Entity Bayesian Network (MEBN) that allows the generating of a new Bayesian network for each pixel in the database.

However, propagating uncertain evidence in a Bayesian network remains a topic to explore in order to manage uncertainty in geospatial information.

6.2 Observations on continuous variables

Here we present the principle of using fixed probabilistic evidence for propagating observations on continuous variables in a discrete Bayesian network, as proposed by Di Tomaso and Baldwin [68].

A common way to deal with a continuous variable in discrete Bayesian network is to discretize it. Despite the loss of information due to the discretization, this technique is broadly used within the framework of Bayesian networks. The impact of the choice of the discretization method in different Bayesian network classifiers has been studied, and it appears that it does not really have an effect when classifiers are being compared [29]. In contrast, the choice of the number of intervals influences the quality of the results. Each interval is considered as a specific discrete value and all the points within the interval are considered as if they were the same discrete value. Thus, they are treated in the same way wherever their position in the interval. For this reason different ways of defining the partition, i.e. with a differing number of intervals and thresholds can give very different results.

Discretization of continuous variables in a Bayesian network is a compromise between three criteria that do not always have the same importance:

(1) Information quality: discretization has to avoid or minimize any loss of information regarding the objectives of the model: to this end, intervals have to be defined such that, if different values for an evidence node deliver a different outcomes for target nodes, they must be contained within different intervals. This criterion leads to the choice of smaller, and therefore a higher number of intervals.

(2) Statistical quality: with due regard for the available data, discretization has to ensure there are enough samples falling within each interval. This aspect is all the more important with insufficient data. This criterion leads to the choice of larger, and therefore a smaller number of intervals.

(3) Computational feasibility: discretization has to preserve the usability and the effectiveness of the model (spatial and temporal complexity of inference). This aspect is all the more important if the discretized variable has several parent nodes and/or child nodes, and if the overall size of the Bayesian Network is large. This criterion leads to the choice of a smaller number of (and therefore larger) intervals in order to create a model with a reasonable size. This is because the number of intervals into which a particular node is discretized, together with both the number of its parents, and their respective number of states, determine the size of its conditional probability table, and thus commensurately the size of the overall model. The size of the model is a limiting factor both for the learning of the Bayesian network and inference making.

In situations where a very small number of intervals are thought necessary for any of the reasons outlined above, an observation on a continuous variable can be treated by using a small number of fuzzy partitions. Figure 7 shows how hard evidence is substituted by probabilistic evidence [68]. The probabilistic evidence is obtained by fuzzy discretization. This implies the computation of a probability distribution on the discretized variable’s set of intervals such that each element of the probability distribution represents the degree of membership to the corresponding interval.

Fig. 7
figure 7

Fixed probabilistic evidence to manage observation on a continuous variable in a discrete Bayesian network

Definition 8

A fuzzy partition on the universe Ω is the set of fuzzy sets f 1,...,f p such that

$$\forall x \in {\Omega}, \ \sum\limits^{p}_{i = 1} X_{f_{i} (x)} = 1 $$

where \(X_{f_{i}}\) is the membership function of f i , i.e. a function \(X_{f_{i}}: {\Omega } \longmapsto [0,1]\).

Choosing a discretization with a small number of intervals reduces the size of the model, which facilitates inference making, and makes the learning step less data demanding. The loss of information may be partially compensated by fixed probabilistic evidence propagation.

The appropriateness of using probabilistic evidence rather than likelihood evidence is that likelihood evidence represents a subjective statement that can be improved by something observed later, while probabilistic evidence represents an observation that cannot be improved by anything observed later i.e. it is fixed. This ensures that the probability distribution of the observed variable will not be later influenced by further observations on any other variable. This is required since the initial hard evidence on the continuous variable means that we are certain of the observed value.

Example 6 (Age)

Let us consider a variable age that represents the age of a person and a discretization in three intervals: young (a g e<20), adult (20≤a g e<70) and old (a g e≥70). On the one hand, the proposed discretization does not distinguish between 1 year and 19 years whilst on the other, the values 19 years and 21 years are treated differently. An example of fuzzy discretization of the variable age is given in Fig. 8. Table 9 shows the evidence vector obtained after each kind of discretization of age for two observed values (a g e = 19 and a g e = 10). The probability obtained with fuzzy discretization is computed using the membership function shown in Fig. 8.

Fig. 8
figure 8

Membership functions of the fuzzy discretization of the variable age

Table 9 Evidence vector obtained after discretization of two observed values of the age

6.3 Probabilistic evidence for a distributed Bayesian network: AEBN framework

Large real world intelligent systems are often too complex or expensive to be built as centralized systems. Several solutions have been proposed to address this issue, including Agent Encapsulated Bayesian Network (AEBN) [11]. The principle behind this framework has been the prime motivation for the definition and use of fixed probabilistic evidence in a Bayesian network. Indeed, in an agent based model using AEBNs, the belief of a receiver agent is updated following the transmission of probabilistic evidence sent from a publisher agent.

Agent Encapsulated Bayesian Networks (AEBN) were first proposed by Bloemeke [11] and extended by Langevin and Valtorta [47, 49].

Definition 9

An Agent Encapsulated Bayesian Networks (AEBN) is an agent equipped with a local Bayesian network (V, E, P), that represents its internal knowledge base. The set of variables V = ILO is partitioned into three sets of variables:

  • Input variables (I): variables for which other agents have better knowledge

  • Output variables (O): variables for which this agent has the best knowledge (oracular knowledge)

  • Local or hidden variables (L): variables which are private to this agent and not visible by the other agents.

The set E contains the edges in the model that define the causal relationships amongst the variables of V. A joint probability distribution P is defined over V. Variables in IO are shared variables, while variables in L are private.

Agents are organized into a publisher/subscriber hierarchy, where agents are publishers of their output variables and subscribers to their input variables. They communicate by sending messages consisting of joint probability distributions over the subset of shared variables. Therefore, each agent that receives messages from other agents obtains fixed probabilistic evidence for one or more observation variables (see Fig. 9). Fixed probabilistic evidence on the added observation node is propagated in the augmented Bayesian network of the receiver agent, thanks to one of the probabilistic evidence update method presented in Table 8.

Fig. 9
figure 9

Message passing between AEBNs: the AEBN1 is publisher of the variable X and the AEBN2 is subscriber; it receives a fixed probabilistic finding on X from the AEBN1

Fixed probabilistic evidence updating assures a kind of global consistency, since the belief in each shared variable, represented by the marginal posterior probability, is the same in every agent [69].

7 Discussion and conclusion

In this last section, we firstly discuss the difference between probabilistic evidence propagation and model revision. Then we examine a recently proposed Fuzzy/Bayesian network and explain why this is actually a case of probabilistic evidence. Finally, we conclude and present the perspectives of that work.

7.1 Probabilistic evidence propagation versus model revision

Since probabilistic evidence is a constraint on the posterior probability distribution, it replaces the marginal probability distribution over the variables concerned and thus casts doubt on the joint probability distribution. Should we consider probabilistic evidence to be a knowledge contribution? In this case, why not change the model rather than propagate probabilistic evidence? One paper argues that probabilistic evidence propagation is suitable for model revision and not for updating [71], but this viewpoint is later abandoned by the author [40]. We present two scenarios which show clearly the choice between probabilistic evidence updating and model revision.

Firstly, consider evidence that has a temporary validity. Such evidence may result from the partial observation of a particular state of the modeled system that is valid as long as this state holds. Similarly the probabilistic evidence may come from the observation of a sub-population. A further example of temporary validity presents itself when an observation of a continuous variable is made and transformed via a fuzzy discretization. In those cases, probabilistic evidence can be considered as temporary hard evidence, and thus does not justify model revision.

In contrast the probability distribution P of a Bayesian network (G, P) to which the temporary probabilistic evidence is applied, represents permanent knowledge which does not need to be revised by short-term evidence.

The second argument deals with the fact that a model is often a compromise between efficiency and accuracy, meaning that the model never includes all parameters of the modeled system. This limit of the model can be partially overcome by using a probabilistic finding to consider information about a variable which is not included in the model. When the observer is able to translate the observed characteristic onto a variable of the model she may apply a probabilistic finding to that variable. While the knowledge embedded in the probability distribution is permanent knowledge this information is relevant only while the specific instance with that characteristic is considered. It can be concluded that information about a specific state of the modeled system has to be taken into account via propagating evidence (hard, likelihood or probabilistic), even if it considers variables that are not in the model.

Revision of the model occurs when hypotheses associated with it are changed. Several methods for knowledge integration and Bayesian network revision have been proposed [63].

Some of the algorithms to propagate fixed probabilistic evidence (Table 8) have been adapted in order to revise the Bayesian network by the direct replacement of the initial probability distribution by the revised probability distribution. These adaptations are the E-IPFP, E-IPFP-SMOOTH and D-IPFP algorithms [63]. The E-IPFP algorithm integrates the constraints by only changing the conditional probability tables of the given Bayesian network while preserving the network structure. The E-IPFP-SMOOTH and D-IPFP algorithms are two variations of the E-IPFP algorithm. The first deals with the situation where the probabilistic constraints are inconsistent with each other or with the network structure of the given Bayesian network. The second reduces the computational cost by decomposing a global E-IPFP into a set of smaller local E-IPFP problems.

7.2 Probabilistic evidence and an example of fuzzy/Bayesian network

Despite a number of papers dealing with fuzzy Bayesian networks, for example [3, 28, 59, 67], there is still no commonly accepted definition of this concept. However, one of the objectives of these is to manage uncertainties in the input of the Bayesian Network. In this section, we examine a recently proposed Fuzzy/Bayesian network to solve a fault detection problem [19]. We explain why this is actually a case of probabilistic evidence.

The real system described in [19] aims to detect short circuits in stator-windings. The problem includes a set of continuous variables representing the difference in magnitude between currents. These variables do not belong to the proposed Bayesian network, but they are observed. Another set of Boolean variables are included in the Bayesian network, each of them associated with an observed continuous variable. A set of rules allows the determination of the values of the Boolean variables given the continuous variables. If these rules are deterministic, the information on the continuous variables can be transformed into hard evidence on the Boolean variables. In the other cases, it provides uncertain evidence on the Boolean variables of the Bayesian network. In the presented application, these rules consist of a set of membership functions associated to each continuous variable. It provides a vector of membership degree for the states True and False of the associated Boolean variable. We suggest that each vector specifies a probabilistic finding on the Boolean variable. The equation proposed in [19] to define the posterior probability of a variable after such uncertain evidence would require both explanation and justification. The proposed equation is applied in a very simple example, including a single probabilistic finding for each considered case. We compared the results of this application with those obtained with Pearl’s method of virtual evidence after converting probabilistic evidence into likelihood ratiosFootnote 4 and we obtained the same results.Footnote 5 Having a single probabilistic finding does not allow the discussion of whether each probabilistic finding (given by the membership degree) has to be kept fixed or not.

The comparison between a fuzzy Bayesian network and probabilistic evidence suggests that further study of their differing properties is required to achieve a better characterization of these methods.

7.3 Conclusion and future work

In this paper, we have set out definitions and properties of hard and uncertain findings in a Bayesian network. Three kinds of uncertain evidence are distinguished: likelihood evidence, fixed and not-fixed probabilistic evidence. Evidence is a set of findings on the variables of the Bayesian network. (1) Likelihood evidence is unreliable, imprecise, or vague evidence; it is specified by a likelihood ratio and propagated by Pearl’s method of virtual evidence. This method translates a likelihood finding on a variable X onto a hard evidence on a new virtual node, added as a child of the node X. (2) Fixed or not-fixed probabilistic evidence expresses a constraint on the state of some variables after this information has been propagated in the Bayesian network. A probabilistic finding on X is specified by a probability distribution R(X) that is given “all things considered”, meaning that is replaces any former belief and knowledge on X. (2a) A not-fixed probabilistic finding can be propagated by converting it in likelihood evidence (see Eq. 3). It can be later modified by further evidence on any node of the Bayesian network, including X itself. (2b) Fixed probabilistic evidence is also known as soft evidence. A probabilistic finding on a variable X cannot be altered by any further information on variables in the model. Thus the propagation of several pieces of fixed probabilistic evidence is commutative. Fixed probabilistic evidence has to be propagated by specific algorithms such as big clique or BN-IPFP (Table 8).

We provide several examples where the application and propagation of fixed probabilistic evidence in a Bayesian network is of interest. In an agent organization based on an AEBN, a probabilistic finding on a shared variable X is used by a receiver agent to take into account the information sent by a publisher agent which is considered to be an expert on X. In a Bayesian network with discretized continuous variables, a fixed probabilistic finding on such a variable X is obtained when using fuzzy discretization, which for a coarse discretization maintains a greater fidelity to a more granular discretization. Finally, a fixed probabilistic finding can also summarize observations about a set of instances of a particular subgroup such as in Example 4.

This paper aims to contribute to the standardization and clarification of the definitions and properties of different types of evidence in a Bayesian network.

A number of terms for hard evidence are in use (specific finding, positive finding, regular finding, deterministic finding, observation), but there is no major semantic difference. The general understanding is that hard evidence is when there is a single one in the observation vector.

Negative evidence is characterized by observation vectors of zeros and ones which may include several ones. A negative finding is not a hard finding, except in case the observation vector contains a single one. It is a specific case of likelihood finding where the observation does not permit the making of a distinction between included values.

Terms to describe uncertain evidence have been made clearer: likelihood evidence, also called virtual evidence has to be clearly distinguished from probabilistic evidence, both in terms of specification and of propagation. This latter notion includes two sub-groups, fixed or not-fixed probabilistic evidence, regarding the impacts of future information on the constraint defined by the probabilistic evidence. Both classes are specified by a local probability distribution. The distinction between fixed and not-fixed probabilistic evidence, which is around the question of commutativity had already been raised and discussed in previous papers [15, 64], but it had not led to distinct definitions. Fixed probabilistic evidence was initially called soft evidence.

The problem addressed in this paper is one of belief updating and not the problem of model revision, which leads to a change of the probability distribution (or even the graph) of the Bayesian network. In the case of probabilistic evidence, information is probabilistic in nature and leads to the replacement (temporarily) of the prior distribution extant in the defined model.

Currently, many Bayesian network engines allow the propagation of likelihood evidence, even though the terminology is not yet standardized. However, very few of them possess the ability to propagate fixed or not-fixed probabilistic evidence. This would be of a great interest and utility to the Bayesian network user community. The introduction of features in Bayesian network engines to allow the propagation of uncertain evidence in a Bayesian network would require clarity of terminology so that the user is confident about the type of uncertain evidence propagation being invoked. At least three new features are required in most Bayesian network engines. The first is a requirement to enter an observation on a subgroup of instances instead of a single instance. This would be useful for applications such as classification of sub-populations, as in [30] concerning the detection of atmospheric pollution, since each case represents both a sequence of images during a short period of time and a set of areas of interest in the image. Another useful application concerns the analysis of a body of text, such as automatic text summarization [27]. The use of probabilistic findings would allow the exploitation of features of the text both at the level of the sentence and at the level of a section, considered as a set of sentences. Secondly, software should facilitate the implementation of an agent organization based on AEBNs. This requires a subscriber agent to integrate the information from a corresponding publisher agent by propagating fixed probabilistic evidence in its local Bayesian network. This would be very useful in large applications based on probabilistic graphical models such as multi criteria decision aiding framework for recurrent problems [22]. Other set of applications that could benefit from AEBNs concerns forensic science. For example, in order to guide criminal investigations [39], the knowledge based systems elaborates a Bayesian network that regroups a set of plausible scenarios in order to determine which investigation strategies are likely to produce the most conclusive evidence. Such approaches could be augmented by considering some types of uncertain evidence, such as the confidence in a person’s testimony, or the importance of a person’s financial situation. The uncertainty of this evidence is not reducible and could be taken into account as probabilistic evidence. Such uncertain evidence could be obtained from experts, each of them using her own Bayesian network. The resulting probabilistic evaluation could be exploited by the main agent of the system, in an AEBN framework. The third feature of interest would allow the propagation of observations on continuous variables using a fuzzy discretization with degrees of membership of two or more coarse intervals.

Another feature of interest would allow the combination of several uncertain findings on the same variable when appropriate: for example when combining not-fixed probabilistic finding with a likelihood finding on the same variable, or several likelihood findings. At present, very few Bayesian network software applications allow such combinations. Another area of work to be done is a detailed comparison with the different types of uncertain evidence presented in this paper and the interesting proposition of Bessière [8] of using coherence variables to fuse prior belief with evidence specified by a probability distribution. This research has been supported by the International Campus on Safety and Intermodality in Transportation; the Nord/Pas-de-Calais Region, the European Community, the Regional Delegation for Research and Technology, the Ministry of Higher Education and Research, and the National Center for Scientific Research. The authors gratefully acknowledge the support of these institutions.