Keywords

1 Introduction

Currently, in the field of technical diagnostics at the peak of application, there are hybrid approaches to the detection and prediction of faults based on the combination of heterogeneous data obtained from different sensor systems [1,2,3,4,5,6,7]. For the development of hybrid diagnostic technologies, the key methods are the combination (fusion) of heterogeneous data and, in particular, the approach based on the combination of probabilistic and intellectual methods [8,9,10,11,12].

The Dempster-Shafer (D-Sh) methodology [13], which is used in conjunction with adaptive network models, is an effective and promising means of modeling and processing heterogeneous data used together with network models. Neural network models are able to give adaptability to the D-Sh methodology and improve its “accuracy” due to the possibility of training on experimental data, and network models of the D-Sh methodology will make it possible to process quantitative information on fuzzy judgments and confidence ratings on the basis of strict mathematical methods.

The main objects and parameters of the D-Sh theory are confidence functions, basic probabilities and probability estimates of hypotheses. However, in practice, there are serious problems with the quantitative evaluation of these parameters, due to the combinatorial “explosion” of the space of hypotheses, on the set of which the expert should evaluate these parameters. However, in reality, it is always possible to obtain estimates for some of the hypotheses for which expert or statistical information is available. To extend partial estimates to the entire area of hypotheses, methods are needed to calculate these parameters based on the attraction of additional expert information.

The approach discussed below is based on the identification of the most important parameters of the D-Sh methodology, called basic probability hypotheses or masses, with the involvement of statistical and empirical information obtained from experts. The proposed approach also makes it possible to implement procedures for combining information in the process of obtaining new certificates.

2 The Generalized Scheme of Fusion of Diverse Data

At the heart of the offered approach to fusion of the diverse data obtained from a set of various sensors is the use of the generalized scheme presented in Fig. 1.

Fig. 1.
figure 1

The generalized scheme of fusion of diverse data.

From Fig. 1 we can see that fusion of data can be classified on three levels [14]:

  1. 1.

    Low level of fusion. This level is often called the level of raw data (raw data level) or level of signals (signals level). The raw data are considered as entrance data, then unite. As a result of association it is expected to obtain new more exact and informative data, than the raw entrance data. For example, in work [15] the example of low-level fusion to use of the filter of moving average is given.

  2. 2.

    Medium level of fusion. This level is called the level of signs (attributes, characteristics) (feature level). There is a fusion of signs (a form, texture, edges, corners, lines, situation) as a result of which new objects, or cards of objects which can be used for other problems, for example, of segmentation and recognitions turn out. Also at this level there is a data processing, namely a filtration (fight against noisy data), normalization (transformations to one type of data), correlation, classification of data, to use of methods of “soft calculations” and methods of data mining. Examples of medium-level fusion are given in [16, 17].

  3. 3.

    High level of fusion. This level is called the level of decision (decision level) or level of symbols (symbol level). There is a fusion at the level of the decision as a result of which the global decision turns out. The most traditional and known methods of fusion of data are probabilistic methods (Bayesian networks, the theory of proofs); computational intelligent methods (Dempster-Shafer, theory of indistinct sets and neural networks). These methods allow to present the coordinated and uniform opinion on diagnostic process to the person making the decision. High level fusion is considered, e.g. in [18].

3 Classification of Levels and Methods of Diverse Data Fusion

Now rather large number of methods is developed for fusion of data. However, at the choice of this or that method it is necessary to consider some aspects (what are the best fusion methods for the available data?; what is preliminary processing necessary?; how to choose from a data set those which fusion will give the best effect?, etc.).

On the basis of systematization of the review of references in Fig. 2, classification of levels [19,20,21,22] and modern methods of diverse data fusion [23,24,25,26,27,28] in the conditions of incomplete, indistinct basic data is presented.

Fig. 2.
figure 2

Classification of levels and methods of data fusion.

Let us note that the given division of methods of fusion of the diagnostic decisions given for acceptance as conditional character as in practice they are crossed and interact among themselves.

4 Elements of the Dempster-Shafer Methodology

The main objects of the D-Sh methodology are confidence-building measures and plausibility, which are calculated on the basis of the basic probabilities of hypotheses. A set of hypotheses describing the decision-making situation is compared with the confidence interval, which should belong to the degree of confidence in each hypothesis. The confidence measure is denoted by Bel(P) changing from zero, indicating full confidence in this hypothesis. A measure of the plausibility of the hypothesis Pl(P) is determined using a measure of credibility:

$$ Pl\left( P \right) \, = \, 1\,{-}\,Bel\left( {not\left( P \right)} \right). $$

Let X be a universal set of conjectures and 2X be a set of all subsets of X called an exponential set. The key in the D-Sh methodology is the notion of mass m(A) of the element of the exponential set. It expresses the ratio of all relevant and available evidence that supports the claim that a certain conjecture X belongs to A, but does not belong to any subset of A. The value of m(A) refers only to set A and carries no additional information about other subsets of A, each of which has its own mass.

Based on the assigned masses, the upper and lower limits of the range of possibilities can be determined. This interval contains the exact probability of the considered subset of hypotheses and is bounded by two continuous measures called trust (belief) and plausibility. The confidence of Bel(A) to set A is defined as the sum of all masses of the eigenvectors of the set in question

$$ Bel(A)\, = \,\sum\limits_{B \subseteq A} {m(B)} , $$

and plausibility Pl(A) is the sum of the masses of all subsets corresponding to at least one element with A:

$$ Pl(A) = \sum\limits_{B \cap A \ne \, \varnothing } {m(B)} . $$

In practice, the base probability function is often defined as a frequency function based on statistics: m(Ai) = ci/N, where N – the total number of observations; ci – the number of observations of the event Ai.

The most important element of the theory of D-Sh is the rule of evidence combination. The original Union rule, known as the Dempster rule combination [13], is a generalization of the Bayes rule. In fact, the Union (called the connected mass) is calculated from two sets of masses m1 and m2 as follows:

$$ \begin{aligned} & m_{1,2} (\varnothing ) = 0,\;\;m_{1,2} (A) = \frac{1}{1 - K}\sum\limits_{B \cap C = A \ne \varnothing } {m_{1} (B)m_{2} (C)} , \\ & K = \sum\limits_{B \cap C = \varnothing } {m_{1} (B)m_{2} (C)} . \\ \end{aligned} $$

The K factor is a normalizing factor and, at the same time, characterizes the measure of conflict between the two sets of masses.

5 The Network Dempster-Shafer Model

In the practical application of the D-Sh methodology, an important problem is to determine the underlying probabilities of m(X) hypotheses on the basis of which confidence and plausibility estimates are calculated. As a rule, the user a priori does not have such information in full.

To solve this problem, we use an approach based on the identification of the Dempster-Shafer model (DS-model) parameters with the involvement of additional expert statistics. Identification of parameters is based on the unification of the two network models: adaptive network DS models, performing the calculation and adjustment of the basic and trust probabilities of the hypotheses, and neuro-fuzzy model to obtain probabilistic assessments of hypotheses based on empirical views of experts.

Consider the organization of the DS-model on the example of one of the diagnostic subsystems of railway automation [1]. Suppose that the state of a controlled technical object (TO) is characterized by a set of parameters (numeric attributes), one of which is X. Based on the analysis of the values of the parameter X, hypotheses are put forward about the technical state of the object being diagnosed.

For this interval of values of the attribute is under divided into several fuzzy intervals α1,…, αn characterizing different degrees of efficiency of TO, including: the valid values for the parameter X under which TO is considered fully functional; invalid values at which the TO is considered unhealthy; intermediate values characterizing the pre-failure states of the TO.

Each TO state is associated with an elementary hypothesis αi, which for each specific value x ∈ X has a certain base probability (mass) \( m(\upalpha_{i} )\, = \,\upmu_{{\upalpha_{i} }} (x) \) where \( \upmu_{{\upalpha_{i} }} (x) \) is membership function (MF) of the fuzzy interval αi. All possible combinations of elementary hypotheses form an exponential set 2L (L = {αi}) of compound hypotheses. The mass of composite hypotheses \( \left\{ {\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} } \right\} \) is calculated through a conjunction of MF constituent elementary hypotheses:

$$ m(\upalpha_{{1_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) = \mathop\Lambda \limits_{j = 1}^{k}\upmu_{{\upalpha_{{i_{j} }} }} (x), $$

where ˄ is fuzzy conjunction calculated on the basis of the T-norm operator.

The decision-making situation is characterized by a specific value of the diagnostic feature x ∈ X, on the basis of which the probabilistic estimates of all hypotheses \( \left\{ {\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} } \right\}\;\left( {k = 1,\,2, \ldots , \,n} \right) \) on the technical condition of the controlled object can be calculated.

The network DS model is focused on the calculation and adaptation of the basic and trust probabilities of the hypotheses \( \{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} \) and contains n layers of neurons \( s_{k}^{i} = s_{k}^{i} (\{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} ) \), corresponding to the hypotheses of the 2L, so that each k-th layer contains neurons corresponding to all possible combinations of k elementary hypotheses, the index k indicates the layer number of the network, and the index j is the number of the neuron in the layer. Interlayer connections between neurons is organized in such a way that the output of the j-th neuron \( s_{k}^{j} (\{\upalpha_{{j_{1} }} , \ldots ,\,\upalpha_{{j_{k} }} \} ) \) in the k-th layer is connected to the input l-th neuron \( s_{k + 1}^{l} (\{\upalpha_{{l_{1} }} , \ldots ,\,\upalpha_{{l_{k + 1} }} \} ) \) in the subsequent (k + 1)-th layer if and only if \( \{\upalpha_{{j_{1} }} , \ldots ,\,\upalpha_{{j_{k} }} \} \subset \{\upalpha_{{l_{1} }} , \ldots ,\,\upalpha_{{l_{k + 1} }} \} \).

The first (input) layer contains n neurons \( s_{1}^{i} \;\left( {i = 1, \ldots , \,n} \right) \) corresponding to elementary hypotheses α1,…, αn, the inputs of which are the value of the parameter x. The outputs stay for the basic probabilities of the elementary hypotheses mi) and at the same time, of the confidence probabilities Beli) calculated on the basis of the MF:

$$ m(s_{1}^{i} ) = Bel(\upalpha_{i} ) =\upmu_{{\upalpha_{i} }} (x,\;\overrightarrow {{p_{i} }} ) \cdot K, $$

where \( K = \left( {\sum\limits_{{(\upalpha_{{i_{1} }} , \ldots\upalpha_{{i_{k} }} ) \in 2^{L} }} {m(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} )} } \right)^{ - 1} \) is a normalized coefficient, \( \overrightarrow {{p_{i} }} \) is a vector of MF parameters.

The neurons of the subsequent hidden layers of the DS-model calculate the masses and confidence probabilities of compound hypotheses by formulas:

$$ m(s_{k}^{i} ) = \mathop\Lambda \limits_{{s_{k - 1}^{i} \subset s_{k}^{i} }}^{{}} m(s_{k - 1}^{i} ) \cdot K,\;Bel(s_{k}^{i} ) = \sum\limits_{{s_{k - l}^{i} \subseteq s_{k}^{i} }} {m(s_{k - l}^{i} )} ,\;(l = 1, \ldots ,\,k - 1), $$

where ˄ is fuzzy conjunction defined on the basis of the T-norm operator. Thus, the DS-model has n inputs and 2n outputs, on which the values of the confidence probabilities of the hypotheses of the exponential set 2L are formed. In Fig. 3 the structure of a DS model is shown for the four hypotheses α1,…, α4 and distribution error signal when adapting the network to the received probability estimation P({α1, α2, α3}) for the hypothesis {α1, α2, α3}.

Fig. 3.
figure 3

The structure of the DS-model for the four-element set of hypotheses {α1, α2, α3, α4} and the propagation of the error signal from the neuron \( s_{3}^{1} \) to the input neurons \( s_{1}^{1} \& s_{1}^{2} \).

Training and adaptation of the network model is based on the error back propagation algorithm, a generalized description of which is given below.

Suppose that with respect to some particular decision-making situation characterized by the value of the controlled parameter x ∈ X and a hypothesis \( \{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} \), there was a statistical or expert information about the likelihood of this hypothesis \( P(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) \), that is, information about the likelihood of finding something in suitable condition. The output of the corresponding neuron \( s_{k}^{j} = s(\{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} ) \) of the network model generates a value of confidence probability of the hypothesis, \( Bel(\{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} ) \).

The difference \( e = P(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) - Bel(\{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} ) - Bel(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) \) between the expert rating and the confidence value of the hypothesis is considered as an error of the network model on the given input parameter value x ∈ X. The adaptation algorithm aims at minimizing the deviation of e by adjusting the baseline probabilities of hypotheses \( Bel(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) \) network model and adjusting the MF of the masses of the elementary hypotheses mi).

Adaptation of the parameters of DS-model is reduced to the error distribution at the output of the j-th neuron of the current layer k for all neurons of the preceding

(k − 1)-th layer according to the following law:

$$ e_{k - 1}^{i \to j} = e_{k}^{j} :\left[ {\sum\limits_{{s_{k - 1} \subset s_{k}^{j} }} {Bel(s_{k - 1} )} } \right]\, \cdot \,Bel(s_{k - 1}^{i} ), $$
(1)

where \( e_{k}^{j} \) is the total error at the output of the j-th neuron of the K-th layer; \( e_{k - 1}^{i \to j} \) is a private error between the \( e_{k}^{j} \)-th neuron (k − 1)-th layer and the j-th neuron of the K-th layer.

The total error \( e_{k}^{j} \) at the neuron output is calculated as the algebraic sum of the partial errors between the given neuron and all related neurons in the subsequent layers:

$$ e_{k}^{j} = \sum\limits_{{s_{k + 1}^{l} \supset s_{k}^{j} }} {e_{k}^{j \to l} } . $$
(2)

The total error at the output of the neuron \( e_{k}^{i} \) is equal to the total error at its input, that is:

$$ \sum\limits_{{s_{k + 1}^{i} \supset s_{k + 1}^{j} }} {e_{k}^{i \to j} } = \sum\limits_{{s_{k - 1}^{j} \supset s_{k}^{i} }} {e_{k - 1}^{j \to i} } . $$
(3)

The total error at the output of neurons of the input layer is calculated as:

$$ e_{1}^{i} =\upmu_{{\upalpha_{i} }} (x_{i} ,\;\overrightarrow {{p_{i} }} ) - \sum\limits_{{s_{2}^{j} \supset s_{1}^{i} }} {e_{1}^{i \to j} } . $$
(4)

Minimization of the error e is reduced to the adjustment of masses of neurons \( m(\{\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} \} ) \) MF parameters \( p\, \in \,\overrightarrow {{p_{i} }} \) in accordance with the law:

$$ \Delta e_{1}^{i} = \frac{{de_{1}^{i} }}{{dp_{j} }} \cdot e_{1}^{i} \cdot\upeta, $$
(5)

where η is the coefficient of the speed of learning.

Correction of neuronal masses is carried out by distributing the total error at the output of neurons of the k-th layer on all related neurons of the previous k − 1 layer in proportion to the confidence probabilities of these neurons.

If at the beginning of training for some hypotheses the values of the basic probabilities are a priori unknown for these hypotheses, the same starting masses are assigned and normalized to one.

6 Hybrid Neural-Fuzzy Dempster-Shafer Model

In real-world decision-making situations, the user often does not have complete statistical or expert information about the probability of finding a particular state with this combination of features. In such situations, the classical probabilistic inference procedures developed in the Framework of the D-Sh theory are not applicable.

To solve this problem, the paper proposes a hybrid approach based on the combination of the network DS-model and adaptive fuzzy model that simulates the empirical process of experts’ formation of probability estimates based on subjective preferences about the impact of certain sensor readings on diagnostic solutions. As a fuzzy model, it is proposed to use a neural network model of the first order Takagi-Sugeno (TS-model) [29].

TS-model is a neural network that describes the relationship between the input values of the diagnostic feature x ∈ X and probabilistic estimates of hypotheses \( P(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{k} }} ) \). TS-model has one input, which receives the value of the controlled parameter x, and 2n outputs, which form the probability estimates of hypotheses. The TS-model knowledge base includes m ≤ 2n fuzzy rules of the form:

$$ R_{i} :(x^{*} =\upalpha_{{i_{1} }} ) \vee \ldots \vee (x^{*} =\upalpha_{{i_{r} }} ) \Rightarrow P(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{r} }} ) = c_{r} , $$
(6)

where x* are the values of parameter \( X;\;\upalpha_{{i_{j} }} \) are fuzzy terms corresponding to the conjectures of the exponential set 2L; cr are output adaptable parameters of fuzzy rules.

The distinguishing feature of the fuzzy system (6) from the traditional TS-model is the presence of not one, but a set of outputs corresponding to the hypotheses of the exponential set, relative to which there is expert information about the validity of hypotheses. Another feature is the aggregation of fuzzy terms in the antecedents of fuzzy rules based on the disjunction operator implemented in the class of S-norms. This is due to the fact that the probability of a composite hypothesis about the technical condition is obviously a non-decreasing function of the probabilities of its elementary hypotheses.

The output values from TS-model when entering the input parameter values x* ∈ X is computed in the standard way:

$$ P(\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{r} }} ) = S(\upmu_{{\upalpha_{{i_{1} }} }} (x^{*} ), \ldots ,\,\upmu_{{\upalpha_{{i_{r} }} }} (x^{*} )) \cdot c_{r} , $$

where S(*) is S-standard operator.

TS-model is an adaptive system, in which, in addition to calculating the probabilistic estimates of hypotheses \( (\upalpha_{{i_{1} }} , \ldots ,\,\upalpha_{{i_{r} }} ) \), the adaptation of the parameters MF \( \upmu_{{\upalpha_{ij} }} (x_{{i_{j} }} ,\,\overrightarrow {{p_{{i_{j} }} }} ) \) and the output parameters cr is used. The adaptation properties of the TS-model are used in the hybrid D-Sh system to replenish information about the probability of estimates of hypotheses when identifying the parameters of the DS-model, as well as to adjust the parameters of the TS-model upon receipt of additional expert information about the probabilities of hypotheses. The overall structure of the hybrid system is shown in Fig. 4 below.

Fig. 4.
figure 4

Structure of a hybrid adaptive network based on the TS model and the Dempster-Shafer network model.

The principle feature of the developed hybrid DS-system is the possibility of joint adaptation of the parameters \( \overrightarrow {{p_{i} }} \;MF\;\;\upmu_{{\upalpha_{i} }} (x_{{i_{j} }} ,\,\overrightarrow {{p_{i} }} ) \), representing the basic probabilities of elementary hypotheses {α1},…{αn}, in both models, which, firstly, increases the flexibility of the system due to its adaptation to both objective statistical and subjective expert data, and, secondly, increases the speed of learning.

7 Conclusion

A new class of neural network models focused on the implementation of the probabilistic inference methodology developed in the framework of the DempsterShafer theory is developed in the article. A new hybrid approach to the diagnosis of technical objects using multisensory data, based on the combination of neural network model Dempster-Shafer and neural network, whose principal advantages are:

  1. 1.

    The possibility of realizing probabilistic inference in the absence of full information on masses of hypotheses due to the attraction of expert information on probabilistic estimates of hypotheses received from a neural network on the basis of the analysis of subjective preferences of an expert.

  2. 2.

    The possibility of implementation of a fundamentally new approach to combining multiple evidence in the methodology of Dempster-Shafer adaptation of parameters of neural network models as new evidence from multiple sources and adjusting the results of the calculations simultaneously for all certificates.

  3. 3.

    Joint adaptation of the parameters of both models in the learning process, which increases the reliability of the results of computation due to the diversity of the expert statistical information in the process of computing the confidence.