Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A Knowledge Based System (KBS) carries out a set of knowledge intensive tasks for the purpose of putting in practice problem-solving capabilities, comparable to those of a domain expert, from an input data flow produced by a process.

In particular, a knowledge intensive task requires, by construction, a Knowledge Model in order to interpret the input data flow according to the task to be achieved, to identify an eventual problem to be solved and to produce a solution to this one.

The Knowledge Engineering (KE) discipline provides methods, techniques and tools which facilitate and improve the modelling task of expert knowledge. In this field of study, most approaches model separately expert knowledge regarding the expert’s reasoning mechanisms from expert knowledge specific to the domain of interest. Thus, a model of the expert’s knowledge, called Expert Model (or Knowledge Model), obtained through this discipline will be generally made up of a model describing how the expert reasons about the process (a conceptual model of the expert’s reasoning tasks) and of a representation of the knowledge used in the involved reasoning (a conceptual model of the domain knowledge). This latter is derived from the Process Model utilized by the expert in order to formulate his own knowledge. Knowledge Engineering allows then to establish a back and forth way between the expert’s knowledge and the built Expert Model where the validity of this latter can be evaluated. However, two of the main drawbacks with the KE approaches are (1) the cost of knowledge acquisition and modelling process, which is too long for economic domains that use technologies with short life cycles and (2) the validation of the Expert Model which is mainly oriented to “case-based”.

An interesting alternative to deal with these problems is to resort to the process of Knowledge Discovery in Database (KDD) which uses Data Mining techniques in order to obtain knowledge from data. In this approach, the process data flow is recorded by a program in a database where the data contained in such a database are analysed by means of Data Mining techniques in a KDD process with the purpose of discovering “patterns” of data. An n-ary relation among data can be considered a pattern when this relation has a power of representativeness according to the data contained in a database. This representativeness is related to a form of recurrence within the data; that is to say, an n-ary relation among data of a given set is a pattern, when this relation is ``often” observed in the database. Thereby, a set of patterns is then considered as the observable manifestation of the existence of an underlying model of the process data contained in the database. Nevertheless, establishing the meaning, regarding the expert’s semantics, of such a Data Model entails a difficult task. One of the reasons for this difficulty is the deep difference between the universe of the process instrumentation, from where the data come, and the conceptual universe of the expert’s reasoning where exist scientific theories and theirs underlying hypothesis. As a consequence, the validation of a Data Model is an intrinsically difficult task and a lot of work has to be done to constitute a knowledge corpus from a validated Data Model.

Thus, in this last decade the idea of combining Knowledge Engineering with Knowledge Discovery in Database emerges with purpose of taking the advantages of both disciplines in order to reduce the construction cost of suitable Knowledge Models for Knowledge Based Systems. The main idea is to make possible the cross-validation of an Expert Model and a Data Model. This aims to define a general perspective, by combining Knowledge Engineering with Knowledge Discovery in Database in a global approach of knowledge creation carried out from experts and knowledge discovered in data. The key point to achieve this is then to find a KE methodology and a KDD process which allow to produce Expert Models and Data Models comparable each other by knowledge engineers and easily interpretable by experts.

As far as we know, only the KE methodology and the KDD process which are based on the Theory of Timed Observations [1] allow to compare their models each other. This theory has been established to provide a general mathematical framework for modelling dynamic processes from timed data by combining the Markov Chain Theory, the Poisson Process Theory, the Shannon’s Communication Theory [2] and the Logical Theory of Diagnosis [3]. Thus, this theoretical framework provides the principles that allow to define a KE methodology, denominated TOM4D (Timed Observation Modelling For Diagnosis) [47], and a KDD process called TOM4L (Timed Observation Mining For Learning) [813]. Owing to that, both TOM4D and TOM4L are based on the same theory, the models constructed through both can be easily related and compared to each other.

The purpose of this chapter is to describe the way the Theory of Timed Observations builds a bridge between Knowledge Engineering and Knowledge Discovery in Database. In line with this aim, a global knowledge creation perspective which combines experts’ knowledge with knowledge discovered in a database is presented. In order to show how models built through this perspective can be collated and complement each other, the proposed approach is applied to a very simple didactic example of the diagnosis of a vehicle taken from the book by Schreiber et al. [14].

The next section completes this introduction by presenting arguments about the need of a global approach which fuses Knowledge Engineering and Knowledge Discovery in Database. The main concepts of the Theory of Timed Observations are then introduced in order to present the TOM4D KE methodology and the basic principles of the TOM4L KDD process. Next, both TOM4D and TOM4L are applied to the didactic example above mentioned in order to show how the corresponding Expert Models and Data Models can be compared to each other. Finally, the conclusion section synthesizes this chapter and refers to some applications of our approach of knowledge creation on real world problems.

2 Two Knowledge Sources, Two Different Approaches

Creating or capturing knowledge can be originated from psychological and social processes or, alternatively, from data analysis and interpretation. That is to say, the two significant ways to capture knowledge are: synthesis of new knowledge through socialization with experts (a primarily people-driven approach) and discovery by finding interesting patterns through observation and intertwining of data (a primarily data-driven or technology-driven approach) [15].

2.1 Knowledge Engineering: A Primarily People-Driven Approach

Considering knowledge as intellectual capital in individuals or groups of them, the creation of new intellectual capital is carried out through combining and exchanging existing knowledge. With this perspective, Nonaka’s knowledge spiral [16, 17], illustrated in Fig. 6.1, is considered in the literature as a foundational stone in knowledge creation. Nonaka characterizes knowledge creation as a spiralling process of interactions between explicit and tacit knowledge. The former can be articulated, codified, and communicated in symbolic form and/or natural language [18], the latter is highly personal and hard to formalize, making it difficult to communicate or share with others [19]. Each interaction between both existing knowledges gives as result new knowledge. Thus, this process is conceptualized in four phases: Socialization (the sharing of tacit knowledge between individuals), Externalization (the conversion of tacit into explicit knowledge: the articulation of tacit knowledge and its translation into comprehensible forms that can be understood by others), Combination (the conversion of explicit knowledge into new and more complex explicit knowledge) and Internalization (the conversion of explicit knowledge into tacit knowledge: the individuals can broaden, extend and reframe their own tacit knowledge).

Fig. 6.1
figure 1

Spiral evolution of knowledge conversion and self-transcending process [20, p. 43]

The tacit knowledge is, among other things, the knowledge of experts who intuitively know what to do in performing their duties but which is difficult to express because it refers to sub-symbolic skills. Such knowledge is frequently based on intuitive evaluations of sensory inputs of smell, taste, feel, sound or appearance. Eliciting such knowledge can be a major obstacle in attempts to build Knowledge Based Systems (KBSs). Knowledge Engineering (KE) arises then as the need of transforming the art of building KBSs into an engineering discipline [21, 22] providing thus techniques and tools that help to treat with the expert’s tacit knowledge and to build KBSs. This discipline motived the development of a number of methodologies and frameworks such as Roles-Limiting Methods and Generic Tasks [23], and later, CommonKADS [14, 24], Protégé [25], MIKE [26, 27], KAMET II [28, 29] and VITAL [30]. In particular, CommonKADS is a KE methodology of great significance which proposes a structured approach in the construction of KBSs. Essentially, it consists in the creation of a collection of models that capture different aspects of the system to be developed, among which is the Knowledge Model (or Expert Model) that describes the knowledge and reasoning requirements of a system, that is, expert knowledge. Other two important modelling frameworks are MIKE and PROTÉGÉ, where the former focuses on executable specifications while the latter exploits the notion of ontology. All these frameworks or methodologies aim, of one or another way, to build a model of the expert’s knowledge.

2.2 Knowledge Discovery in Database: A Primarily Data-Driven Approach

The traditional method of turning data into knowledge is based on data manual analysis and interpretation. For example, in the health-care industry, specialists periodically analyse trends and changes regarding health in the data. Then, they detail the analysis in a report which becomes the basis for future decision making in the domain of health. However, when data volumes grow exponentially and their manipulation is beyond human capacity, resorting to automatic analysis is absolutely necessary. Thus, computational techniques help to discover meaningful structures and patterns from data.

The field of Knowledge Discovery in Database (KDD) is concerned with the development of methods and techniques for making sense of data. The phrase knowledge discovery in database was coined at the first KDD workshop in 1989 [31] to emphasize that knowledge is the end product of a data-driven discovery. Although the terms KDD and Data Mining are often used interchangeably, KDD refers to the overall process of discovering useful knowledge from data, and Data Mining refers to a particular step in the mentioned process [32]. More precisely, this step consists of the application of specific algorithms in order to extract patterns from data.

The typical KDD process is depicted in Fig. 6.2 and summarized as follows [33]. The starting point is to learn the application domain and its goals. Next, to select a dataset or a subset of variables on which discovery is to be performed. Then preprocessing takes place which involves removing noise, collecting the necessary information to account for noise, deciding on strategies for handling missing data field, etc. The following step is data transformation which includes finding useful features to represent the data, depending on the goal of the task, and to reduce the effective number of variables under consideration or to find invariant representation for the data. After that, data mining is carried out. In general terms, this involves selecting data mining methods and choosing algorithms; and through these ones, searching for patterns of interest. Finally, the mined patterns are interpreted removing those that are redundant or irrelevant, and translating the useful ones into terms understandable by users. This discovered knowledge can be incorporated in systems or simply documented and reported to interested parties.

Fig. 6.2
figure 2

Overview of the steps constituting the KDD process [33, p. 29]

In a KDD process, finding patterns in data can be carried out through different techniques such as Decision Trees [34], Hidden Markov Chain [35], Neural Networks [36], Bayesian Networks [37], K Nearest-Neighbour [38], SVM [39], etc. All these techniques allow to obtain a model representative of the studied data where this model have to be interpreted and validated by expert knowledge.

2.3 The Need of One Integral Approach

The model-building of an observed process can be carried out through KE or KDD. As Fig. 6.3 depicts, given a process about which an expert has knowledge, a model \( M_{e} \) of this process can be constructed from expert knowledge by applying KE techniques. In turn, the process can be observed through sensors by a program which records data describing its evolution. Thus, these data can be analysed by applying data mining techniques in a KDD process in order to obtain a model \( M_{d} \) of the process. In an ideal world, both \( M_{e} \) and \( M_{d} \) would complement each other in order to have a process model \( M_{PR} \) more complete and suitable. That is, \( M_{e} \) must be validated with the process data perceived through sensors and \( M_{d} \) must be validated with expert knowledge. Nevertheless, some drawbacks arise. Knowledge Engineering approaches do not address the treatment of knowledge discovered in databases, that is to say, sometimes the interpretation of discovered patterns is not trivial for an expert. Besides, relating models \( M_{e} \) and \( M_{d} \) obtained through KE and KDD, respectively, proves to be difficult owing to the different theories and the different natures of the representation formalisms used in both disciplines.

Fig. 6.3
figure 3

Building a process model from two knowledge sources

As [15] establishes, although capturing knowledge is the central focus of both fields of study, knowledge creation has tended to be approached from one or the other perspective, rather than from a combined perspective. Thus, a holistic view of knowledge creation that combines a people-dominated perspective with a data-driven approach is considered vital. In line with this need, this article proposes to integrate a KE methodology with data mining techniques in a KDD process in order to define a human–machine learning process.

3 Two Knowledge Sources, One Integral Approach

Models obtained through Knowledge Engineering (KE) and Knowledge Discovery in Database (KDD) will be able to be related and collated each other, if a bridge between the mentioned areas is established. We believe that fusing KE and KDD into a global approach of learning or knowledge acquisition, nourished with knowledge discovered in data and experts’ knowledge, requires a theory on which to base both disciplines. The integral approach presented in this chapter and illustrated in Fig. 6.4 combines a KE methodology called Timed Observation Modelling For Diagnosis (TOM4D) [47] with a data mining technique, named Timed Observation Mining For Learning (TOM4L) [813]. Both TOM4D and TOM4L are based on the Theory of Timed Observations [1], a stochastic approach framework for discovering temporal knowledge from timed data.

Fig. 6.4
figure 4

Human-machine learning integral approach

The TOM4D methodology is a primarily syntax-driven approach for modelling dynamic processes where semantic content is introduced in a gradual and controlled way through the CommonKADS conceptual approach [14], Formal Logic and the Tetrahedron of States [40]. TOM4L is a probabilistic and temporal approach to discover temporal relations from initial timed data registered in a database. The time stamps are used to provide a partial order within the data in the database (i.e. two data can have the same time stamp) and to discover the temporal dimension of knowledge when needed. Owing to that, the underling theory is the same, TOM4D models and TOM4L models can be compared to each other in order to build a suitable model of the observed process. In particular, TOM4D allows to build a process model which, by construction, can be directly related to the knowledge model provided by the expert, i.e. a CommonKADS Knowledge Model; and besides, it can be collated with models obtained from data.

Figure 6.4 depicts the proposed overall view where a process model can be built through TOM4D from a knowledge source and then the constructed model can be validated by experts. In turn, an observation program \( \Uptheta \left( {X,\Updelta } \right) \) requires a model of the observed process for recording data in respect of the evolution of this one. These data are then analysed by means of TOM4L to produce a process model. This model can be directly related to the TOM4D model built from the expert’s knowledge and consequently, it can be either validated by the expert or it can be utilized as pieces of new knowledge when the learning approach is applied to an unknown process. In this way, the built model can be defined through a back and forth way between experts’ knowledge and knowledge discovered in data, establishing thus, an integral human–machine learning approach.

4 Introduction to the Theory of Timed Observations

The Theory of Timed Observations (TTO) [1] provides a general framework for modelling dynamic processes from timed data by combining the Markov Chain Theory, the Poisson Process Theory, the Shannon’s Communication Theory [2] and the Logical Theory of Diagnosis [3]. The main concepts of the TTO, required in order to introduce the TOM4D KE methodology and the TOM4L KDD process, will be described in this section. These concepts are the notions of timed observation and observation class.

The Theory of Timed Observations defines a dynamic process as an arbitrarily constituted set \( X(t) = \left\{ {x_{1} (t), \ldots ,x_{n} (t)} \right\} \) of \( n \) functions \( x_{i} (t) \) of continuous time \( t \in \Re \). The set \( X(t) \) of functions implicitly defines a set \( X = \left\{ {x_{1} , \ldots ,x_{n} } \right\} \) of \( n \) variable names \( x_{i} \). The dynamic process \( X(t) \) is monitored by a program \( \Uptheta (X,\Updelta ) \) which observes the functions \( x_{i} (t) \) of \( X(t) \); and then, it establishes, records and informs their evolution over time with a finite set \( \Updelta = \left\{ {\delta_{j} } \right\}_{j = 1, \ldots ,m} \) of constants \( \delta_{i} \) (i.e. a number or a string). The program \( \Uptheta (X,\Updelta ) \) usually accounts for the functions progression through messages recorded in a database. These messages can be alarms, warnings or reporting events.

This theory considers a message at time \( t_{k} \) as a timed observation \( (\delta ,t_{k} ) \) where \( \delta \) is a constant value of \( \Updelta \) and \( t_{k} \) is the moment in which the observation occurs. For example, let us suppose that timed data recorded in a database are of the form “yymmdd-hhmmss/message_value” where yymmdd-hhmmss is a time stamp and message_value is a value determined by a monitoring program. The message “080313-132225/TEMPERATURE/very_high” can be represented with a timed observation \( (\delta ,t_{k} ) \) where t k  = 080313-132225 and δ = /TEMPERATURE/very_high. That is, (δ, t k ) = (TEMPERATURE/very_high, 080313-132225).

In general terms, a timed observation \( (\delta ,t_{k} ) \) is written by an observer program \( \Uptheta (\{ x\} ,\{ \delta \} ) \) when a function \( x(t) \) of continuous time enters in a specific interval of values. The specification of such an observer program refers to a threshold value \( \Uppsi_{j} \in \Re \) and two immediately successive values \( x(t_{k - 1} ) \in \Re \) and \( x(t_{k} ) \in \Re \) so that,

$$ x(t_{k - 1} ) < \Uppsi_{j} \wedge x(t_{k} ) \ge \Uppsi_{j} \Rightarrow write((\delta ,t_{k} )) . $$
(6.1)

In this program, write(msg) is a predicate which denotes that the element msg is recorded in a memory. For example, Fig. 6.5 illustrates a temperature function \( x_{i} (t) \), where values above \( \Uppsi_{j} \) are interpreted by an observer program \( \Uptheta (\{ x_{i} \} , \) {TEMPERATURE/very_high}) as very high temperature; that is, when \( x_{i} (t) \in [\Uppsi_{j} , + \infty ) \). Thus, given a sequence of values \( w = (x_{i} (t_{1} ), \ldots,\,x_{i} (t_{k - 1} ),\,x_{i} (t_{k} ),\,x_{i} (t_{k + 1} )) \), the program \( \Uptheta (\{ x_{i} \} , \){TEMPERATURE/very_high}) will write a timed observation (TEMPERATURE/very_high, t k ), which indicates that the function \( x_{i} (t) \) entered the interval \( [\Uppsi_{j} , + \infty ) \) at time \( t_{k} \).

Fig. 6.5
figure 5

Function of temperature

The Theory of Timed Observations establishes that the existence of a timed observation \( (\delta ,t_{k} ) \), recorded in a database, allows to infer that the mentioned observation has been recorded by an unknown program \( \Uptheta (\{ x\} ,\{ \delta \} ) \) which implements the abstract logical equation described in (6.2).

$$ \forall t_{k} \in \Upgamma ,\theta (x,\delta ,t_{k} ) \in \Uptheta \Rightarrow (\delta ,t_{k} ) \in \Upomega $$
(6.2)

This sentence associates the set \( \Uptheta \) of all the assignations to a ternary predicate \( \theta (x_{\theta } ,\;\delta_{\theta } ,\;t_{\theta } ) \) with the set \( \Upomega \) of all the timed observations carried out by the program \( \Uptheta (\{ x\} ,\{ \delta \} ) \) (i.e., the database). A timed observation \( (\delta ,t_{k} ) \) is then interpreted as the logical consequence of the assignation of the values \( x \), \( \delta \) and \( t_{k} \) to a ternary predicate \( \theta (x_{\theta } ,\;\delta_{\theta } ,\;t_{\theta } ) \). In other words, this means that the timed observation \( (\delta ,t_{k} ) \) was recorded when the program \( \Uptheta (\{ x\} ,\{ \delta \} ) \) assigned the values \( x \), \( \delta \) and \( t_{k} \) to the predicate \( \theta (x_{\theta } ,\;\delta_{\theta } ,\;t_{\theta } ) \).

Given the sentences (6.1) and (6.2), the general meaning “is” can be always provided to the predicate \( \theta \) so that the timed observation \( (\delta ,t_{k} ) \) is interpreted as “at time \( t_{k} \), \( x \) is \( \delta \)”. Considering that \( x \) is associated with a function \( x(t) \), the meaning “equal” can also be attributed to the predicate \( \theta \), which leads to the following abuse of language: \( \theta (x,\;\delta ,\;t_{k} ) \) means “\( Equal(x,\;\delta ,\;t_{k} ) \)” (i.e. “\( x(t_{k} ) = \;\delta \)”). Consequently, the Theory of Timed Observations considers that a message contained in a database is a timed observation \( (\delta ,t_{k} ) \) written by a program \( \Uptheta (X,\Updelta ) \) which observes a time function \( x(t) \) and implements the abstract Eq. (6.2). In our example, the timed observation (TEMPERATURE/very_high, t k ) indicates that a program \( \Uptheta (x(t),\{ \delta \} ) \), observing a time function \( x_{i} (t) \) and defining implicitly a predicate \( \theta (x_{\theta } ,\;\delta_{\theta } ,\;t_{\theta } ) \), has considered θ(x i, TEMPERATURE/very_high, t k ) true and then it has written the timed observation (TEMPERATURE/very_high, t k ) in the database \( \Upomega \). This example illustrates the abuse of language frequently carried out, which associates the meaning “x i (t k ) = very_high” with the interpretation of the function “\( x_{i} (t) \)” as a temperature.

According to the Definition 6.1, the interpretation of a timed observation \( (\delta ,t_{k} ) \) is precisely the assigned predicate \( \theta (x,\;\delta ,\;t_{k} ) \). It is noteworthy that the program \( \Uptheta (\{ x\} ,\{ \delta \} ) \) could have errors; that is to say, a timed observation \( (\delta ,t_{k} ) \) could have been written in a database although the assertion \( \theta (x_{i} ,\;\delta ,\;t_{k} ) \) is not really true.

Definition 6.1

Let \( X(t) = \{ x_{i} (t)\}_{i = 1, \ldots ,n} \) be a finite set of time functions; let \( X = \{ x_{i} \}_{i = 1, \ldots ,n} \) be the corresponding finite set of variable names; let \( \Updelta = \{ \delta_{j} \}_{j = 1, \ldots ,m} \) be a finite set of constant values; let \( \Uptheta (X,\Updelta ) \) be a program observing the evolution of the functions of \( X(t) \); let \( \Upgamma = \{ t_{k} \}_{k \in \Re } \) be a set of arbitrary time instants; and let \( \theta (x_{\theta } ,\delta_{\theta } ,t_{\theta } ) \) be a predicate implicitly determined by \( \Uptheta (X,\Updelta ) \). Then,

  • a timed observation \( (\delta ,t_{k} )\; \in \;\Updelta \times \Upgamma \) on \( x_{i} (t) \) is the assignation of values \( x_{i} \), \( \delta \) and \( t_{k} \) to the predicate \( \theta (x_{\theta } ,\;\delta_{\theta } ,\;t_{\theta } ) \) such that \( \theta (x_{i} ,\;\delta ,\;t_{k} ) \);

  • by definition \( o(t_{k} ) \) denotes a timed observation; i.e., \( o(t_{k} ) \triangleq (\delta ,t_{k} ) \) Footnote 1 and,

  • a finite set \( O \subset \Updelta \times \Upgamma \) of timed observations is disjointly partitioned and ordered in a scenario \( \Upomega \) defined as a set of temporally ordered sequences of timed observations; that is, \( \Upomega = \{ w:\{ 1, \ldots ,n\} \to O\} | n \in \aleph \wedge \forall i,j \in \{ 1, \ldots ,n\} ,i < j, \) \( (w(i) = o(t_{k} ) \wedge w(j) = o(t_{r} ) \Rightarrow t_{k} \le t_{r} )\} \wedge \bigcap\limits_{w \in \Upomega } {\Im (w)} = \emptyset \wedge \bigcup\limits_{w \in \Upomega } {\Im (w)} = O \) where \( \Im (w) \) denotes the image or range of \( w \), i.e. the observations of the sequence \( w \in \Upomega \).

Moreover, as follows from that previously explained, timed observations on a particular function implicitly determine a variable, which assumes discrete values and describes the function evolution according to an interpretation of the observer program. That is to say, when \( \Uptheta \) considers θ(x i, TEMPERATURE/very_high, t k ) true and then writes (TEMPERATURE/very_high, t k ), it is implicitly defining a discrete variable which assumes the value TEMPERATURE/very_high. Consequently, a timed observation and the implicit existence of an associated discrete variable enable to define the notion of observation class, other important concept in this theory. An observation class associated with a variable \( x \), that assumes values \( \delta \; \in \;\Updelta \), is a set \( C_{x} = \{ (x,\delta )\;|\;\delta \in \Updelta \} \). For simplicity reasons, \( C_{x} \) is often defined as a singleton \( C_{x} = \{ (x,\delta )\} ,\delta \in \Updelta \). Thus, this concept establishes the link between a constant \( \delta \in \Updelta \) and a variable \( x \in X \) and then, a timed observation \( (\delta ,t_{k} ) \) is an occurrence of an observation class \( C_{x} = \{ (x,\;\delta )\} \). Definition 6.2 specifies this concept.

Definition 6.2

Let \( X(t) = \{ x_{i} (t)\}_{i = 1, \ldots ,n} \) be a set of time functions whose evolutions are observed by a program \( \Uptheta \); let \( X = \{ x_{i} \}_{i = 1, \ldots ,n} \) be a set of discrete variables where each \( x_{i} \) is associated with a time function \( x_{i} (t) \) and its value is determined by an interpretation of \( \Uptheta \) about the evolution of \( x_{i} (t) \); and, let \( \Updelta = \bigcup\limits_{{x_{i} \in X}} {\Updelta_{{x_{i} }} } \) be such that \( \Updelta_{{x_{i} }} \) is a set of values which can be assumed by \( x_{i} \in X \). Then we say that an observation class associated with a variable \( x_{i} \in X \) is a set \( C_{i} = \{ (x_{i} ,\delta )\;|\;\delta \in \Updelta_{{x_{i} }} \} \).

In summary, from a message (TEMPERATURE/very_high, t k ) written in a database, the Theory of Timed Observations allows to consider that there exists a program \( \Uptheta (\{ x_{i} \} , \){TEMPERATURE/very_high}) which wrote the message, by means of observing a time function, maybe unknown for us, noted as \( x_{i} (t) \). This message is then a timed observation (TEMPERATURE/very_high, t k ) indicating that a certain predicate θ(x i, TEMPERATURE/very_high, t k ) was assumed true by the program \( \Uptheta (\{ x_{i} \} , \){TEMPERATURE/very_high}). Then, there is tacitly a discrete variable \( x_{i} \) which takes at least the value TEMPERATURE/very_high. Therefore, we can define an observation class C i  = {(x i , TEMPERATURE/very_high)}, so that the timed observation (TEMPERATURE/very_high, t k ) is an occurrence of \( C_{i} \). When knowing that the time function \( x_{i} (t) \) represents the evolution of temperature, it is inferred that (1) \( x_{i} \) denotes a variable of temperature, (2) the observation class \( C_{i} \) can then be written as C i  = {(very_high)} denoting that the temperature is very high and (3) the timed observation (TEMPERATURE/very_high, t k ) is an occurrence of this class, which means “at time \( t_{k} \), temperature is very high”.

For sake of generality, it is important to note that a predicate \( \theta (x_{\theta } ,\delta_{\theta } ,t_{\theta } ) \) is satisfied when the corresponding time function \( x_{i} (t) \) matches against a behavioural model [41]. Such a model can be as simple as the switch of an interrupter or requiring complex techniques, as signal processing techniques for artificial vision.

The TOM4D KE methodology and the TOM4L KDD process are based on these notions of timed observation and observation class, as the next sections describe below.

5 TOM4D KE Methodology

TOM4D is a modelling approach for dynamic systems focused on timed observations. The objective of this one is to produce suitable models for dynamic process diagnosis from timed observations and experts’ a priori knowledge. This methodology combines then the modelling of the experts’ cognitive process, using CommonKADS [14, 24], with a multi-modelling approach for dynamic systems [40, 42]. In addition, TOM4D is a primarily syntax-driven approach [57] which resorts to CommonKADS, Formal Logic and the Tetrahedron of States (ToS) [40] as interpretation frameworks and paradigms in order to introduce, in the modelling process, semantic content in a gradual and controlled way.

5.1 Multi-Modelling

In this methodology, a system is represented by means of four models, the three models described in the conceptual multi-modelling framework introduced in [43] and a complementary model called Perception Model [6].

The models of the multi-modelling framework are Structural Model (SM), Behavioural Model (BM) and Functional Model (FM) which describe different types of knowledge. The SM contains knowledge relative to the system components and their structural organization, that is to say, the relations between these ones. The BM specifies knowledge about the phenomena which act inside the system in order to transform an input flow into an output flow. Such transformations are measured through the evolution of the values of a set of variables. Thus, these changes in the values define the possible sequences of observation classes that can occur and therefore, the discernible states between them. Finally, the FM describes knowledge about the relations among the values that the variables can assume.

For its part, the Perception Model (PM) contains knowledge about the following elements and aspects of the process: variables and their thresholds, operating goals, and normal and abnormal operating modes.

The relations between the first three models are determined by the notion of variable as Fig. 6.6 illustrates. A variable used in a function of the Functional Model is associated with a component of the Structural Model and, a discrete event of the Behavioural Model is the assignment of a value to the variable. Indeed, any specification in these models must be consistent with that one made in the Perception Model.

Fig. 6.6
figure 6

Relations between TOM4D models

5.2 Interpretation Frameworks

CommonKADS [14, 24] is a methodology which offers a structured approach in the development of KBSs by proposing three groups of models. The first group regarding the organizational context and environment, the second one with respect to the conceptual description of the knowledge applied in a task, and the last one concerning the technical aspects of the software artefact.

In particular, the CommonKADS Knowledge Model which belongs to the second group is utilized in our approach. This model describes the types and structures of the knowledge required to accomplish a particular task and thus, it acts as a tool that helps to clarify the structure of a knowledge-intensive information-processing task. This model is developed, in a way that is understandable by humans, as part of the analysis process and therefore, it does not contain any implementation-specific term. Thus, this one is an important vehicle for communication with experts and users about the problem-solving aspects. Consequently, TOM4D uses the aforementioned model as a mean of interpreting and structuring the available knowledge.

Formal logic is also used by the proposed methodology as a resource which provides reasoning mechanisms and gives the possibility of utilizing Reiter’s Theory of Diagnosis [44]. In turn, in order to give a physical interpretation to the variables, the Tetrahedron of States (ToS) [40, 45, 46] can be incorporated in the analysis process. The ToS is a framework that describes a set of generalized equations (Fig. 6.7) which are common to a wide variety of physical domains (electromagnetism, fluid dynamics, thermodynamics, etc.). This one allows to map physical variables of a specific domain into four classes of generalized variables (effort, flow, impulse and displacement) and to identify the set of relationships among these ones. For example, in the electric domain (Electric ToS), current is mapped to generalized flow, electric charge to generalized displacement, voltage to generalized effort and magnetic flux to generalized impulse; thus, the relations among the electric domain variables can be established according to the ToS. Our modelling approach then resorts to Formal Logic and ToS as paradigms of interpretation and analysis of knowledge.

Fig. 6.7
figure 7

Tetrahedron of states (ToS) (based on [40, p. 1728])

5.3 TOM4D Modelling Process

The modelling approach of this methodology is based on three principles [7]. The first one is that each symbol of an entity used in one of the three models introduced in Sect. 6.5.1 (structural, functional and behavioural models) denotes a concept that is defined at the level of domain knowledge of a CommonKADS model [14]. This means that the introduction of a symbol that is not associated with an element of the domain knowledge model is prohibited. The second principle is that a variable is always associated with a component or a component aggregate defined in the structural model. The third principle is that a transition between two states is conditioned by the assignment of a new value to a variable. The notion of variable, as aforementioned in Sect. 6.5.1, constitutes thus the common point of the three models.

The modelling process aims to produce a generic model of a system from available knowledge and data, where the three fundamental modelling phases are knowledge interpretation, process definition and generic modelling. Figure 6.8 illustrates a structure of logical dependences that describe the TOM4D reasoning process for obtaining a model of an observed system. Therefore, how the control flow of the modelling process is carried out, is not part of this structure. The illustrated process, introduced below, gives a general guide in order to understand the principal objectives of this approach. Clearly, the modelling is generally cyclical and each stage can require to return to previous phases with the objective of revising the expert’s knowledge, results, ideas, modelling decisions, etc.

Fig. 6.8
figure 8

General structure of the TOM4D modelling process

  1. 1.

    Knowledge Interpretation

The objective of this phase is to define a scenario model. In general terms, a scenario \( \Upomega \) of a system is a set of observations or measures over time on the variables of the system, where these measures describe a certain evolution of the process that drives the system dynamic. Definition 6.1 in Sect. 6.4 introduces the meaning of scenario and other concepts such as timed observation and observation class. In short, a scenario is a set of sequences of timed observations describing partially the behaviour of a process.

The construction of a scenario model \( M(\Upomega ) = < SM(\Upomega ),FM(\Upomega ),BM(\Upomega ) > \) consists of the definition of a structural model \( SM(\Upomega ) \), a functional model \( FM(\Upomega ) \) and a behavioural one \( BM(\Upomega ) \) of \( \Upomega \).

For the purpose of defining a model \( M(\Upomega ) \), a CommonKADS template is utilized to interpret and to organize the available knowledge. This knowledge is provided by a scenario \( \Upomega \) and a knowledge source where the latter can be an expert, a set of documents, etc. Thus, the outcome of this phase is an organized description of knowledge and available information.

  1. 2.

    Process Definition

The process definition step aims to define the process \( X(t) \) that governs a system; that is, the boundary of the process, the operating goals and the normal and abnormal operating modes of this one. In this phase, the available knowledge, the scenario model \( M(\Upomega ) \) and the concepts of Formal Logic or the Tetrahedron of States (ToS) can be used to achieve the objective. As described in Sect. 6.5.2, the last two are interpretation frameworks which allows, along with CommonKADS, to introduce semantic content in a controlled way, providing contexts of logical and physical interpretation of variables. The result of this phase must then be a perception model of the process, that is, \( PM(X(t)) \).

  1. 3.

    Generic Modelling

This stage aims to define a generic model of a process \( X(t) \). The definition of this model consists of the perception model defined in previous steps and structural, functional and behavioural models associated with the process \( X(t) \); that is, \( M(X(t)) = < PM(X(t)),SM(X(t)),FM(X(t)),BM(X(t)) > \). The objective is then to define a model already not relative to a particular scenario \( \Upomega \), but to a type of process. This model should be more general and more abstract than the scenario model and thus, more useful for diagnosis. This stage can be accomplished using the available knowledge, the Perception Model and analyses through Formal Logic and the ToS.

The results of applying TOM4D to a didactic example will be presented later in order to show how the built TOM4D model can be related to a TOM4L model automatically obtained from data.

6 TOM4L KDD Process

TOM4L [12], based on the Theory of Timed Observations [1], is a probabilistic and temporal approach to discover temporal relations for description, diagnosis and prediction from initial timed data \( \Upomega \) registered in a database (i.e. a set of timed observation sequences). The aim is to discover n-ary temporal relations which are representative of the process behaviour which gave rise to \( \Upomega \).

In particular, the TOM4L approach is implemented by the ElpLab Java software, so that the n-ary temporal relations can be discovered in an automatic way.

6.1 Temporal Relations

As described in Sect. 6.4, sequences of timed observations \( (\delta ,t_{k} ) \in \Updelta \times \Upgamma \) recorded by a program observing a process allow to establish a set of discrete variables \( x \in X \); and consequently, a set \( C \) of corresponding observation classes \( C_{i} \in C \). For example, if \( C_{1a} = \{ (x_{i} ,\delta_{a} )\} \) is defined as an observation class associated with \( x_{i} \), then a timed observation \( (\delta_{a} ,t_{k} ) \) is an occurrence at time \( t_{k} \) of the class \( C_{1a} \). In order to specify that an observation is of a certain class, the symbol ‘::’ is used; e.g., \( (\delta_{a} ,t_{k} )::C_{1a} \).

TOM4L aims to discover temporal characteristics present in the data that describe the evolution of a process; therefore, detailed descriptions about variables and particular values that these variables can assume are not necessary in this context. In particular, we shall refer to timed observations and observation classes. We recall that the timed observation \( (\delta_{a} ,t_{k} ) \) can be rewritten as \( o(t_{k} ) \) (Definition 6.1); thus, we refer to this observation like \( o(t_{k} ) \) and we specify its class with the symbol ‘::’ like \( o(t_{k} )::C_{1a} \).

A temporal relation between two observation classes describes a temporal constraint between observations of the involved classes. By considering \( I = \{ [\tau^{ - } ,\tau^{ + } ]\;\;|\;\;[\tau^{ - } ,\tau^{ + } ] \subset \Re \} \) a set of time intervals and \( C \) a set of observation classes, a temporal relation between two observation classes is a pair \( (q,\bar{i}) \) where \( q \in C \times C \) and \( \bar{i} \in I \). Thus, a temporal relation \( (q,\bar{i}) = ((C_{i} ,C_{j} ),\;\;[\tau^{ - } ,\tau^{ + } ]) \) specifies a temporal constraint between timed observations of the observation classes \( C_{i} ,C_{j} \in C \). Figure 6.9 illustrates this relation according to the ElpLab representation.

Fig. 6.9
figure 9

Binary temporal relation \( ((C_{i} ,C_{j} ),\;\;[\tau^{ - } ,\tau^{ + } ]) \) between two observation classes

In particular, two observations verify the aforesaid relation if the elapsed time between an occurrence of \( C_{i} \) and an occurrence of \( C_{j} \) is greater than or equal to \( \tau^{ - } \) and less than or equal to \( \tau^{ + } \). That is to say, two observations \( o(t_{k} ),o(t_{r} ) \in \Updelta \times \Upgamma \) verify the relation \( ((C_{i} ,C_{j} ),\;\;[\tau^{ - } ,\tau^{ + } ]) \) if \( o(t_{k} )::C_{i} \wedge o(t_{r} )::C_{j} \wedge (t_{r} - t_{k} ) \in [\tau^{ - } ,\tau^{ + } ] \).

For its part, an n-ary temporal relation is a sequence \( m \) of temporal relations. Thus, a sequence of timed observations verifies an n-ary temporal relation \( m \) if the mentioned sequence verifies each temporal relation in \( m \), even if in the middle of the observation sequence there exist occurrences of classes that are not present in \( m \).

As an example, we consider the observation classes \( C_{1a} \), \( C_{2b} \), \( C_{3c} \) and the n-ary temporal relation \( m = (((C_{1a} ,C_{2b} ),[2,5]),\;((C_{2b} ,C_{3c} ),[0,4]))\; \), as illustrated in Fig. 6.10. Besides, we suppose the sequence of timed observations \( w = (o(19),o(20),o(22),o(24)) \) such that \( o(19)::C_{1a} \), \( o(20)::C_{3c} \), \( o(22)::C_{2b} \) and \( o(24)::C_{3c} \), also illustrated in the figure. In this case, \( w \) verifies \( m \) owing to the following. Firstly, the class of the first observation coincides with the first class in the n-ary relation (i.e., \( o(19)::C_{1a} \)) and the class of the last observation in \( w \) coincides with the last class in \( m \) (i.e., \( o(24)::C_{3c} \)). In addition, the sequence of relations \( m = (((C_{1a} ,C_{2b} ),[2,5]),\;((C_{2b} ,C_{3c} ),[0,4]))\; \) is verified in \( w \). That is to say, \( ((C_{1a} ,C_{2b} ),[2,5]) \) specifies that the elapsed time between an occurrence of the observation class \( C_{1a} \) and an observation of the class \( C_{2b} \) is greater than or equal to 2 and less than or equal to 5. Thus, in \( w \), \( o(19) \) and \( o(22) \) verify this temporal constraint since that \( o(19)::C_{1a} \), \( o(22)::C_{2b} \), 22 − 19 = 3 and \( 2 \le 3 \le 5 \). In a similar way, \( o(22) \) and \( o(24) \) verify \( \;((C_{2b} ,C_{3c} ),[0,4])\; \). It is noteworthy that between \( o(19) \) and \( o(22) \), the observation \( o(20) \) takes place. However, this does not invalidate that the relation \( ((C_{1a} ,C_{2b} ),[2,5])\; \) is verified, along with the complete n-ary relation, in the sequence of observations \( w \).

Fig. 6.10
figure 10

Sequence \( w = (o(19),o(20),o(22),o(24)) \) of timed observations that satisfies the n-ary temporal relation \( m = (((C_{1a} ,C_{2b} ),[2,5]),\;\;((C_{2b} ,C_{3c} ),[0,4])) \)

In this way, given a set of data describing the behaviour of a process, discovering the n-ary temporal relations that are representative of these data is the central focus in the TOM4L KDD process.

6.2 Stochastic Approach

In TOM4L, the analysis of a sequence \( w \) of timed observations consists of finding the more representative sequential relations between observation classes and establishing the temporal constraints in each relation. Thus, the study of the mentioned relations is addressed by resorting the Markov chain theory and the estimation of temporal constraints is dealt with the Poisson process theory. Consequently, in this framework, a sequence \( w \) of timed observations has a stochastic representation that consists of associating with \( w \) a superposition of the Poisson process and a Markov chain.

Given a finite set \( O \subset \Updelta \times \Upgamma \) of timed observations, \( w \) is the sequence of all observations in \( O \) (i.e., the image of \( w \) is equal to \( O \), or \( w:\aleph \to O \) and \( \Im (w) = O \)) and \( C \) is the set of the \( n \) classes of observations in \( w \). A stochastic representation of \( w \) consists then of a set of matrices reflecting different properties, where the rows and columns refer to the observations classes in \( C \); that is to say, matrices \( n \times n \) where the element of row \( i \), column \( j \) refers to the sequential relation between the class \( C_{i} \) and the class \( C_{j} \). We denote by \( P(C_{j} \;|\;C_{i} ) \) the conditional probability \( P(w(k)::C_{j} \;|\;w(k - 1)::C_{i} ) \) of observing an occurrence of \( C_{j} \) having immediately before observed an occurrence of \( C_{i} \) and we denote by \( P((C_{i} \;,\;C_{j} )) \) the probability \( P((w(k - 1)::C_{i} \;,\;w(k)::C_{j} )) \) of observing an occurrence of \( C_{i} \) followed immediately by an occurrence of \( C_{j} \). Thus, the stochastic representation of \( w \) is given by the set of the following matrices. \( {\text{N}} = (N_{ij} )_{n \times n} \) is a matrix where each \( N_{ij} \) establishes the number of observations of \( C_{i} \) followed immediately by an observation of \( C_{j} \) in \( w \). The matrix \( {\text{P}} = (p_{ij} )_{n \times n} \) establishes the transition probabilities between two observation classes, where the value \( p_{ij} \) corresponds to \( P(C_{j} \;|\;C_{i} ) \) and is calculated, based on \( {\text{N}} \), as the rate between the number of the occurrences of \( C_{i} \) followed immediately by an occurrence of \( C_{j} \) and the number of occurrences of \( C_{i} \) followed immediately by an occurrence of any class.

The temporal constraints between two observation classes are calculated by analysing only the two subsequences of \( w \) whose observations are of the classes in question. In other words, \( w \) is partitioned in a set \( \Upomega \) of sequences \( w_{r} \), where the observations in each \( w_{r} \) are of a same class \( C_{r} \). By considering \( w_{i} ,w_{j} \in \Upomega \) the subsequences of \( w \) whose observations are of the classes \( C_{i} \) and \( C_{j} \) respectively, the temporal constraint \( [\tau^{ - } ,\tau^{ + } ] \) of a relation \( ((C_{i} ,C_{j} ),\;\;[\tau^{ - } ,\tau^{ + } ]) \) is computed from the average of the elapsed times between an observation of class \( C_{i} \) and the following and first observation of class \( C_{j} \), when overlapping \( w_{i} \) and \( w_{j} \).

Based on theses calculations, an algorithm called BJT computes the stochastic representation of a sequence \( w \) under study, and an algorithm called BJT4T, based on the mentioned representation and on an abductive reasoning, builds a three of n-ary temporal relations associated with a given observation class \( C_{i} \), i.e., paths ended in \( C_{i} \) representative of \( w \). Both algorithms belonging to the TOM4L framework are implemented by ELpLab.

6.3 BJ-Measure and the Bayesian Networks Building

The BJ-measure [12] is a measure based on information entropy. Considering a superimposition of occurrences of two timed observation classes, this measure allows to evaluate the strength of intertwining of the mentioned superimposition; that is to say, the strength of an oriented binary relation between two observation classes taken from an arbitrarily built set.

Given an ordered binary relation \( (C_{i},\;C_{j} ) \) between two observation classes, if these classes are independent, the probability of observing an occurrence of \( C_{j} \) at a time \( t_{k} \) is equal to the probability of observing an occurrence of \( C_{j} \) at that time having observed an occurrence of class \( C_{i} \) at the previous time \( t_{k - 1} \); that is, \( P(C_{j} \;|\;C_{i} ) = P(C_{j} ) \). However, according to [12], if the classes are not independent, an occurrence of \( C_{i} \) at a time \( t_{k} \) provides information about an occurrence (or not) of \( C_{j} \) at the subsequent time \( t_{k + 1} \). In particular, the interest is in a measure indicating that an occurrence of class \( C_{i} \) at the time \( t_{k} \) increases the probability of observing an occurrence of class \( C_{j} \) at the time \( t_{k + 1} \); that is, \( P(C_{j} \;|\;C_{i} ) \ge P(C_{j} ) \). Thus, the BJ-measure is based on the Kullback–Leibler distance [47] between two probability distributions which can be interpreted as the amount of information lost when a probability distribution is approximated by another distribution. The general idea is then the analysis, on the base of this distance measure, of the distance between \( P(C_{j} \;|\;C_{i} ) \) and \( P(C_{j} ) \) in order to establish if the relation \( (C_{i},\;C_{j} ) \) is strong or weak.

Consequently, the BJ-measure \( BJM(C_{i},\;C_{j} ) \) is defined from associating a sequential relation \( (C_{i},\;C_{j} ) \) with a discrete memoryless communication channel [2] and from using the Kullback–Leibler distance. In particular, the BJ-measure verifies the properties of monotony, dissymmetry, positivity, independence and triangular inequality. Thus, if \( BJM(C_{i} ,\;C_{j} ) \) is negative, the relation \( (C_{i},\;C_{j} ) \) is weak; otherwise it is considered a possibly strong relation, or of interest.

The maximum and minimum values of the BJ-Measure depend on the rate \( \tilde{\theta }_{ij} \) between the number of observations of class \( C_{i} \) and the number of observations of class \( C_{j} \) in the studied sequence \( w \). In particular, [12] shows that a sequential relation \( (C_{i} ,\;C_{j} ) \) is credible in the sense of the BJ-Measure if and only if \( \frac{1}{4} < \tilde{\theta }_{ij} < 4 \). This condition allows to select then relations of interest that provide a representative model of the sequence \( w \).

TOM4L proposes an algorithm called Tom4BN [10, 11] to build Naive Bayesian Networks from timed data. Inspired by Cheng et al.’s algorithm [48], Tom4BN uses the properties of monotony, dissymmetry, positivity, independence and triangular inequality of the BJ-Measure to build a Naive Bayesian Network.

The general idea of the Tom4BN algorithm is to remove the sequential relations \( (C_{i} ,\;C_{j} ) \) that are not of interest when building, from a set of timed data, the structure of a Naive Bayesian Network associated with a given observation class. For example, if \( R_{BN} \subseteq C\; \times \;C \) is the set of all binary sequential relations \( (C_{i},\;C_{j} ) \) with which paths in a Bayesian Network can be built, in principle \( R_{BN} = C\; \times \;C \); then, relations \( (C_{i} ,\;C_{j} ) \) where \( BJM(C_{i} ,\;C_{j} ) \le 0 \) or \( BJM(C_{i} ,\;C_{j} ) < BJM(C_{j} ,\;C_{i} ) \) are removed from \( R_{BN} \). These and other criteria based on the aforesaid properties allow to select the binary sequential relations suitable for building a Bayesian Network from a data set. Consequently, given a goal class, the structure of the Bayesian Network associated with this one is constructed by the aforementioned algorithm from the mentioned criteria.

Based on properties which follow from the discrete memoryless communication channel, the tables of conditional probability for the Bayesian Network are defined from the matrix \( {\text{N}} = (N_{ij} )_{n \times n} \) established by the stochastic representation. That is to say, are defined the probability \( P(C_{i} ) \) of a root node, the probability \( P(C_{i},\;C_{j} ) \) for a simple sequential relation and the probabilities for two sequential relations \( (C_{i},\;C_{j} ) \) and \( (C_{z} ,\;C_{j} ) \) associated with the same class \( C_{j} \) (i.e., \( P(C_{j} \;|\;C_{i} ,\;C_{z} ) \), \( P(C_{j} \;|\;\neg C_{i} ,\;C_{z} ) \), \( P(C_{j} \;|\;C_{i} ,\;\neg C_{z} ) \), \( P(C_{j} \;|\;\neg C_{i} ,\;\neg C_{z} ) \)). Thus, from the mentioned definitions, probabilities as for example \( P(\neg C_{j} \;|\;C_{i} ) \) can be calculated as \( P(\neg C_{j} \;|\;C_{i} ) = 1 - P(C_{j} \;|\;C_{i} ) \); and \( P(\neg C_{j} \;|\;C_{i} ,\;\neg C_{z} ) \) as \( P(\neg C_{j} \;|\;C_{i} ,\;\neg C_{z} ) = 1 - P(C_{j} \;|\;C_{i} ,\;\neg C_{z} ) \).

Thereby, given a goal class, the Bayesian Network associated with this one can be automatically built through the Tom4BN algorithm from a data set.

6.4 Signatures

An n-ary temporal relation \( m \) is considered representative of a sequence \( w \) of timed observations from evaluating two rates, anticipation rate and coverage rate [13].

Considering \( w \) a sequence of observations, let \( m \) be a sequence of temporal relations and let \( m_{s} \) be the sequence resultant of eliminating from \( m \) the last binary relation. The anticipation rate \( T_{A} \) of \( m \) in \( w \) is the rate between the number of subsequences \( w_{j} \) of \( w \) that satisfy \( m \) (i.e., \( w_{j} \sqsubseteq w \wedge satisfies(w_{j} ,m) \)) and the number of subsequences of \( w \) that satisfy \( m_{s} \), as illustrated in Fig. 6.11. That is to say, the percentage of cases in which after observing an instance of \( m_{s} \), an occurrence of the last relation in \( m \) takes place. Clearly, \( T_{A} \) is of great interest in the diagnosis task when allowing to anticipate the occurrence of an observation class, in particular the last class of the model \( m \); i.e., \( C_{i} \) in Fig. 6.11.

Fig. 6.11
figure 11

Anticipation rate of \( m \) with regard to \( w \) (based on [13, p. 47])

In the TOM4L framework, a signature [13] is a model \( m \) that has certain representativeness in the data, that is, in the sequence \( w \) under study. In particular, this representativeness is given when the anticipation rate \( T_{A} \) is above a certain value \( T_{Amin} \) (typically, 50 %). In other words, a sequence of temporal relations \( m \) is a signature in the sequence \( w \) of timed observations if and only if the anticipation rate \( T_{A} \) of \( m \) in \( w \) is greater than or equal to \( T_{Amin} \). Thus, for anticipating the occurrence of an observation class \( C_{i} \), a signature ending in \( C_{i} \) can be used as predictive model.

In some cases, although the anticipation rate \( T_{A} \) of a model \( m \) is above the value established \( T_{Amin} \) (i.e. \( m \) is a signature), the number of occurrences of \( m \) in \( w \) is low; or put in another way, the number of occurrences of the class \( C_{i} \) to be predicted is low. Therefore, in order to discard signatures where the occurrences of the last class are not significant in \( w \), the coverage rate is established and illustrated in Fig. 6.12. Thus, the coverage rate \( T_{C} \) of \( m \) in \( w \) is the rate between the number of subsequences of \( w \) that satisfy \( m \) and the number of occurrences of the last observation class in \( m \).

Fig. 6.12
figure 12

Coverage rate of \( m \) with regard to \( w \) (based on [13, p. 47])

TOM4L aims, among other things, to discover from a given sequence \( w \), a minimal set of signatures able to predict the maximal number of observations classes defined. That is, to discover a minimal set of temporal relations \( m \) whose anticipation rate \( T_{A} \) and coverage rate \( T_{C} \) in \( w \) are above the established threshold.

6.5 TOM4L Process

The general structure of the TOM4L KDD process is illustrated in Fig. 6.13 and is implemented by the ElpLab Java software, which allows to apply this data mining approach in an automatic way.

Fig. 6.13
figure 13

TOM4L KDD process [12, p. 40]

As depicted in the figure, stochastic and temporal properties of binary relations are obtained from a stochastic representation which associates a superposition of the Poisson process and a Markov chain with a set \( \Upomega \) of timed observations. A minimal set \( R = \{ r_{j} \}_{j = 1, \ldots ,r} \) of binary temporal relations which satisfies a criterion of interest is then induced from this stochastic representation, where the used criterion of interest is based on the BJ-measure described in Sect. 6.6.3.

From the mentioned minimal set, the TOM4L KDD process allows to compute a Naive Bayesian Network by means of the Tom4BN algorithm and a set of representative n-ary temporal relations. This latter is built through an abductive reasoning which is carried out on \( R \) in order to build a minimal set \( M = \{ m_{i} \}_{i = 1, \ldots ,s} \) of n-ary temporal relations \( m_{i} \) which would represent some properties of the process. In particular, the depth of abduction is controlled by heuristics based on the BJ-measure.

In the next stage, an exhaustive search is accomplished to extract from \( M \) the minimal set \( S \subseteq M \) of n-ary temporal relations which satisfy a criterion of representativeness adapted to temporal relations. These n-ary relations are called signatures and their predictive ability allows to anticipate the occurrence of observable events. Searching for signatures consists in identifying all n-ary temporal relations \( m_{i} \) which finish in a particular observation class \( C_{j} \) (all paths and sub-paths to \( C_{j} \)) whose representativeness in \( \Upomega \) is sufficient. This representativeness is calculated from the coverage and the anticipation rates. The coverage rate of an n-ary relation \( m_{i} \) is the rate between the number of instances of \( m_{i} \) and the number of observation occurrences of class \( C_{j} \); and, the anticipation rate of an n-ary relation \( m_{i} \) is the rate between the number of instances of the relation \( m_{i} \) and the number of instances of the relation \( m^{\prime}_{i} \), where \( m^{\prime}_{i} \) is the result of removing the last observation class \( C_{j} \) of the path \( m_{i} \).

Models obtained through the TOM4L process, both signatures and Bayesian Networks, can be related to TOM4D models as described in the next section.

7 Application to a Didactic Example

In this section, the proposed modelling approach combining TOM4D with TOM4L is described by means of a case study about the diagnosis of problems with a car. This case study has been taken from the book by Schreiber et al. [14] where it is presented by the authors in order to describe concepts and components of a CommonKADS Knowledge Model. Figure 6.14 depicts the domain knowledge of this case study where nine rules constitute the knowledge provided by an expert. These ones can be interpreted as \( (R_{1} ) \) if the fuse is blown then the result of the fuse inspection is broken, \( (R_{2} ) \) if the fuse is blown then the power is off, \( (R_{7} ) \) if the power is off then the engine does not start, and so on.

Fig. 6.14
figure 14

Classification and organization of knowledge pieces

From the introduced didactic problem, a summary description of applying TOM4D and TOM4L to this example, along with the relation between the obtained models, will be presented below. For the interested reader, the complete description of TOM4D modelling process applied to the example can be found in [4], and the detailed application of the TOM4L algorithms to the same example can be found in [8].

7.1 TOM4D Models

The interpretation of available knowledge requires an organization of this one. Thus, organizing and structuring knowledge is the first step in the modelling activity of the TOM4D KE methodology.

7.1.1 Organizing Available Knowledge

CommonKADS is an important methodology in terms of modelling experts’ knowledge and therefore, it is utilized by TOM4D as a framework of interpretation and organization of knowledge. CommonKADS provides a collection of predefined sets of model elements such as task templates and inference catalogues, which detail tasks and inferences typical for resolving a problem of a particular type. These templates also propose a characteristic structure for specifying the domain knowledge from the point of view of the selected type of task. In this case, we shall consider the diagnosis template.

The diagnosis template presents a typical domain schema in which each system being diagnosed can be characterized in terms of two types of features: those ones that can be observed and those ones that can represent an internal state of the system. Consequently, as Fig. 6.14 illustrates, the concepts fuse inspection, battery dial and gas dial are considered observable features; and fuse, battery, fuel tank, power, gas in engine and engine behaviour are considered concepts that allow to represent the states of the car. In particular, engine behaviour refers to a state which can be perceived in some way; therefore, the last concepts associated with car states can in turn be classified as visible or invisible.

Considering the previous classification, the arrows in Fig. 6.14 show dependences between the knowledge pieces. These dependences are rules which indicate relations between domain concepts. For example, “if there is no gas in engine, engine stops” establishes a causal relation between the concepts “gas in engine” and “engine behavior”: gas-in-engine.status=false ⇒ engine-behaviour.status=stops. In this case study, two types of dependences can be observed: rules that indicate that a value assumed by an entity causes a certain value in other entity; and rules which establish that a value assumed by an entity has a particular manifestation in other entity.

Thus, the previous reasoning, illustrated in Fig. 6.14, describing dependence types and concept types in the specific domain determines the following domain rules specified in (6.3) in the language CLM (Conceptual Modelling Language, [14]).

(6.3)

Considering the aforementioned analysis and the three TOM4D principles introduced in Sect. 6.5.3, the next objective is to define a scenario model \( M(\Upomega ) = < SM(\Upomega ),FM(\Upomega ),BM(\Upomega ) > \) from the given knowledge and a set \( \Upomega \) of sequences of timed measures or observations which describe certain modes of functioning of the car. In a real case, it would be desirable to have a set of timed observations describing the evolution over time of the process under study. In this case, \( \Upomega \) has not been provided; nevertheless, we shall deduce on the basis of the existing domain knowledge a scenario \( \Upomega \) to be assumed.

7.1.2 Knowledge Interpretation

The rules in (6.3) represent causal relations which implicitly define the notion of timed sequence of events; thus, from these rules, a set of sequences of timed observations can be assumed, that is, a scenario \( \Upomega \). Taking in consideration \( (R_{1} ) \)and \( (R_{2} ) \) in (6.3), if the fuse blows at the instant \( t_{0} \), the fuse inspection will result equal to broken at a subsequent moment \( t_{0} + \Updelta t_{i} \) and the electric supply will be off at another moment \( t_{0} + \Updelta t_{j} \). Affirming the order of sequence between \( t_{0} + \Updelta t_{i} \) and \( t_{0} + \Updelta t_{j} \) is not possible from the available information; nevertheless, we assume that all sensors properly work and quickly react, therefore, the order \( t_{0} + \Updelta t_{i} < t_{0} + \Updelta t_{j} \) will be considered. In other words, first the fuse blows, then the fuse inspection result is equal to broken and, after that, the electric supply is switched off. Analogously, other two assumptions are: first the level of battery falls below the minimum, then the battery-dial is equal to zero and later the electric supply is turned off; and besides, first the fuel-tank is empty, then the gas-dial is equal to zero and after that the gas supply is empty.

Thus, considering the previous assumptions, it is supposed a scenario \( \Upomega \) of timed observations such that \( \Upomega = \{ w_{1} ,w_{2} ,w_{3} ,w_{4} \} \) where

$$ \begin{aligned} w_{1} = & ((blown,t_{10} ),\;(broken,t_{10} + \Updelta t_{11} ),\;(off,t_{10} + \Updelta t_{11} + \Updelta t_{12} ), \\ & (does\_not\_start,t_{10} + \Updelta t_{11} + \Updelta t_{12} + \Updelta t_{13} )) \\ w_{2} = & ((low,t_{20} ),\;(battery\_zero,t_{20} + \Updelta t_{21} ),\;(off,t_{20} + \Updelta t_{21} + \Updelta t_{22} ), \\ & (does\_not\_start,t_{20} + \Updelta t_{21} + \Updelta t_{22} + \Updelta t_{23} )) \\ w_{3} = & ((empty,t_{30} ),\;(gas\_zero,t_{30} + \Updelta t_{31} ),\;(false,t_{30} + \Updelta t_{31} + \Updelta t_{32} ) \\ & (does\_not\_start,t_{30} + \Updelta t_{31} + \Updelta t_{32} + \Updelta t_{33} )) \\ w_{4} = & ((empty,t_{40} ),\;(gas\_zero,t_{40} + \Updelta t_{41} ),\;(false,t_{40} + \Updelta t_{41} + \Updelta t_{42} ), \\ & (stop,t_{40} + \Updelta t_{41} + \Updelta t_{42} + \Updelta t_{43} )) \\ \end{aligned} $$
(6.4)

From the interpretation of the available knowledge, the concepts fuse, battery, fuel-tank, battery-dial and gas-dial are considered as components of the system. However, the concepts fuse-inspection, power, gas-in-engine and engine-behaviour denote physical entities which are unknown or whose information is insufficient. Consequently, abstract components (or component aggregates) such as tools that allow fuse inspection, electric supply, gas supply and engine will be defined to represent these concepts. In addition, the knowledge interpretation from CommonKADS enables to identify the variables of the system such as fuse.status, gas-dial.value, engine-behaviour.status, etc. Thus, these variables and components are defined in (6.5) where the value that, in principle, a variable \( x_{i} \,(i = 1, \ldots ,9) \) can assume, is described in the corresponding set \( \Updelta_{{x_{i} }} \) presented in (6.6), denoting \( \phi_{i} \) an unknown value.

$$ \begin{array}{*{20}c} {{\text{Variables}}\,X = \{ x_{1} , \ldots ,x_{9} \} } \hfill & {{\text{Components}}\,COMPS = \{ c_{1} , \ldots ,c_{9} \} } \hfill \\ {x_{1} \triangleq {\tt {fuse.status}}} \hfill & {c_{1} \triangleq {\tt {fuse}}} \hfill \\ {x_{2} \triangleq {\tt {battery.status}}} \hfill & {c_{2} \triangleq {\tt {battery}}} \hfill \\ {x_{3} \triangleq {\tt {fuel\text{-}tank.status}}} \hfill & {c_{3} \tt{ \triangleq {fuel\text{-}tank}}} \hfill \\ {x_{4} \triangleq {\tt {fuse\text{-}inspection.value}}} \hfill & {c_{4} \triangleq {\tt {fuse inspection tools}}} \hfill \\ {x_{5} \triangleq {\tt {battery\text{-}dial.value}}} \hfill & {c_{5} \triangleq {\tt {battery\text{-}dial}}} \hfill \\ {x_{6} \triangleq {\tt {gas\text{-}dial.value}}} \hfill & {c_{6} \triangleq {\tt {gas\text{-} dial}}} \hfill \\ {x_{7} \triangleq {\tt {power.status}}} \hfill & {c_{7} \triangleq {\tt {electric supply}}} \hfill \\ {x_{8} \triangleq {\tt {gas\text{-}in\text{-}engine.status}}} \hfill & {c_{8} \triangleq {\tt {gas supply}}} \hfill \\ {x_{9} \triangleq {\tt {engine\text{-}behaviour.status}}} \hfill & {c_{9} \triangleq {\tt {engine}}} \hfill \\ \end{array} $$
(6.5)
$$ \begin{array}{*{20}c} {\Updelta_{{x_{1} }} = \{ blown,\phi_{1} \} } \hfill & {\Updelta_{{x_{4} }} = \{ broken,\phi_{4} \} } \hfill & {\Updelta_{{x_{7} }} = \{ off,\phi_{7} \} } \hfill \\ {\Updelta_{{x_{2} }} = \{ low,\phi_{2} \} } \hfill & {\Updelta_{{x_{5} }} = \{ battery\_zero,\phi_{5} \} } \hfill & {\Updelta_{{x_{8} }} = \{ false,\phi_{8} \} } \hfill \\ {\Updelta_{{x_{3} }} = \{ empty,\phi_{3} \} } \hfill & {\Updelta_{{x_{6} }} = \{ gas\_zero,\phi_{6} \} } \hfill & {\Updelta_{{x_{9} }} = \{ stops,\;does\_not\_start\} } \hfill \\ \end{array} $$
(6.6)

In the first phase the scenario model \( M(\Upomega ) = < SM(\Upomega ),FM(\Upomega ),BM(\Upomega ) > \) is defined. Although the detailed specification of this model will not be presented, we shall mention the principal points of this one. This model organizes and describes the information and the knowledge available. \( SM(\Upomega ) \) describes the 9 components in (6.5) and the interconnections between them; and \( FM(\Upomega ) \) specifies the relation among the values that the variables can assume through the definition of a set of functions. For example, rule \( R_{5} \) allows to establish an interconnection between the components \( c_{3} \) (fuel-tank) and \( c_{6} \) (gas-dial); and also, the relation between the values of \( x_{3} \) and \( x_{6} \) through a function \( f_{1} :\Updelta_{{x_{3} }} \to \Updelta_{{x_{6} }} \) such that \( f_{1} (empty) = gas\_zero \), \( f_{1} (\phi_{3} ) = \phi_{6} \), and where \( x_{6} = f_{1} (x_{3} ) \). Besides, \( BM(\Upomega ) \) specifies an initial behavioural model that, because of the 9 existent binary variables, consists of 18 observation classes (e.g., \( C_{1,1} = \{ (x_{1} ,blown)\} \), \( C_{1,2} = \{ (x_{1} ,\phi_{1} )\} \) are observation classes related to \( x_{1} \)) and \( 2^{9} = 512 \) characterized states (e.g., a state in which \( x_{1} = blown \), \( x_{2} = low \), \( x_{3} = empty \), \( x_{4} = \phi_{4} \), \( x_{5} = battery\_zero \), \( x_{6} = gas\_zero \), \( x_{7} = off \), \( x_{8} = false \), \( x_{9} = stops \)). However, this model, which describes the available knowledge, is inadequate for analysing or diagnosing behaviour problems. It should be noticed that the existence of only 9 binary components determines 512 discernible states, a number significant with respect to the small number of units. Presumably, certain states in \( BM(\Upomega ) \) are irrelevant for the pursued objectives or, they are meaningless since are impossible physically. Then, the two following stages in the modelling process, illustrated in Fig. 6.8, Sect. 6.5.3, aim to deal with these aspects.

7.1.3 Process Definition

In the phase of process definition, the perception model \( PM(X(t)) \) is defined, where the boundaries and operating constraints such as the set of variables of interest, operating goals, normal and abnormal operating modes are established. After that, in the stage of generic modelling, the objective is to define a model already not of a particular scenario, but a more general model of the car functioning. These two stages resort to the Formal Logic and the Tetrahedron of States in order to carry out a logical and a physical interpretation of the variables as Table 6.1 describes.

Table 6.1 Logical and physical interpretations

From Formal Logic, the components \( c_{i} \) \( (i = 1, \ldots ,9) \) in (6.5) can be considered as logical components \( c_{Bi} \) described with first order predicate logic where Reiter’s diagnosis theory [44] can be applied. Thus, the variables \( x_{i} \) \( (i = 1, \ldots ,9) \) can be interpreted as logical variables \( \bar{x}_{i} \) \( (i = 1, \ldots ,9) \) which assume values 1 (true) or 0 (false). For example, in Table 6.1, \( x_{1} = blown \) is logically interpreted as \( \bar{x}_{1} = 0 \) (false).

In principle, the components \( c_{4} \) (fuse inspection tools), \( c_{5} \) (battery-dial) and \( c_{6} \) (gas-dial) being sensors, they simply replicates the behaviour of the components \( c_{1} \) (fuse), \( c_{2} \) (battery) and \( c_{3} \) (fuel-tank). Consequently, and for reducing the complexity, it is assumed that the former work correctly (i.e. sensors are supposed to never fail) and then they are not necessary in the resultant model. Thus, the logical model of the process is depicted in Fig. 6.15 and the logical relations among the variables are presented in Table 6.2. In the figure, the boxes \( c_{B7} \), \( c_{B8} \) and \( c_{B9} \) represent logical “AND” components, and the components \( c_{B1} \), \( c_{B2} \) and \( c_{B3} \) represent boolean value generators. This interpretation allows to specify clearly conditions of normal and abnormal behaviour on the variables and, as mentioned, it allows to resort to Reiter’s theory.

Fig. 6.15
figure 15

Logical model of the process

Table 6.2 Logical and physical functional relations

Nevertheless, Reiter’s theory tacitly assumes that logically consistent states correspond to normal and desired behaviour, and the inconsistent states, denoting a problem with at least one component, coincide with abnormal and undesired behaviour. The problem is that this correspondence sometimes is not compatible with a physical interpretation of the variables; thus, a logical model is a strong tool for reasoning but is not sufficient. For example, when observing Fig. 6.15, a state in which \( \bar{x}_{3} = 0 \) and \( \bar{x}_{8} = 1 \) results in an inconsistent state and in the mentioned theory, this would indicate that the component \( c_{B8} \) does not work. However, in this inconsistent state, the fuel tank is empty \( (\bar{x}_{3} = 0) \) and there is gas in the engine \( (\bar{x}_{8} = 1) \); consequently, the mentioned situation can not be associated with the problem of a component. On the contrary, this state is transient and corresponds to normal behaviour, although it is not a state of interest for the diagnosis task and it should not be considered. However, in the logical model, it is identified as a state of abnormal behaviour.

The example shows that the logical interpretation of variables required by Reiter’s theory must be completed with a physical interpretation. For this purpose, [40] proposes to utilize the Tetrahedron of States (ToS), introduced in Sect. 6.5.2, where the given variables can be mapped to physical variables of the ToS and thus, the relations among them established. In this way, the introduction of semantic content in the physical interpretation of variables is controlled through the ToS framework. In particular, the ToS of hydraulic domain and that one of electric domain, shown respectively in Fig. 6.16a, b, are used in this example.

Fig. 6.16
figure 16

Physical interpretation of variables

Each given variable \( x_{i} \in X \) is mapped to a physical variable of the corresponding ToS. For example, using the Hydraulic ToS in Fig. 6.16a, the variable \( x_{3} \) (fuel tank status) is associated with the gas volume \( V(t) \) in the tank, as Table 6.1 specifies where \( V(t) \) is also noted as \( x_{3}^{p} \). Thus, \( x_{3} = empty \) which is logically interpreted as \( \bar{x}_{3} = 0 \) (false), is physically interpreted through the ToS as \( V(t) = 0 \); and, \( x_{3} = \neg empty \) (or \( x_{3} = \phi_{3} \)), related to \( \bar{x}_{3} = 1 \) (true), is physically interpreted as \( V(t) \ne 0 \).

Thereby, the variables are mapped with physical variables as illustrated in Fig. 6.16 and specified in Table 6.1 where the relations among the variables are established as Table 6.2 presents. Thus, the physical model of the process is illustrated in Fig. 6.17.

Fig. 6.17
figure 17

Physical model of the process

This interpretation allows to determine conditions on the variables in order to identify transient states which can be discarded from the model to be built. For example, the states in which \( V(t) = 0 \) and \( Qv(t) \ne 0 \) (see Fig. 6.17) can be eliminated from the model; or, what is the same thing, the states in which \( \bar{x}_{3} = 0 \wedge \bar{x}_{8} = 1 \).

From this interpretation and a suitable analysis, the transient or physically impossible states can be removed from the model to be built, which results in 21 states of interest.

7.1.4 Generic Modelling

From the consideration of the two previous interpretations, a generic model of the process \( M(X(t)) = < PM(X(t)),SM(X(t)),FM(X(t)),BM(X(t)) > \) is defined. Details of the analysis carried out for establishing this model will not be described. Nevertheless, we shall limit ourselves to present the resultant model of the process formally specified in the TOM4D formalism [4].

In order to facilitate the analysis, the logical variables are used to describe the model; considering always that it is possible to reinterpret them like the variables and components described in (6.5) through Table 6.1. Thereby, observing Fig. 6.15, the models are the following ones.

The perception model \( PM(X(t)) \) of the process consists of the set \( X \) of variables, the set \( \Uppsi \) of threshold values described in Sect. 6.4 which in this case are not present and a set \( R_{q} \) of sentences describing objectives and operating modes. This model is specified as follows:

$$ \begin{aligned} PM(X(t)) & = < X,\Uppsi ,R_{q} > {\text{ where}} \\ & X = \{ \bar{x}_{1} ,\bar{x}_{2} ,\bar{x}_{3} ,\bar{x}_{7} ,\bar{x}_{8} ,\bar{x}_{9} \} ,\quad \Updelta_{{\bar{x}_{i} }} = \{ 0,1\} ,\quad i = 1,2,3,7,8,9 \\ & \Uppsi = \{ \Uppsi_{i} \}_{i = 1,2,3,7,8,9} \quad ({\text{threshold values of the time functions}} \\ & \quad \quad \quad \quad \quad \quad \quad \quad {\text{which we do not know}}) \\ & R_{q} = R_{goal} \cup R_{n} \cup R_{ab} {\text{ such that}} \\ & R_{goal} \;{\text{describes the process operating goals }}\bar{x}_{9} = 1 \\ & R_{n} {\text{ describes the conditions of the normal operating mode}}: \\ & (\bar{x}_{1} = 1 \wedge \bar{x}_{2} = 1 \wedge \bar{x}_{3} = 1 \wedge \bar{x}_{7} = 1 \wedge \bar{x}_{8} = 1 \wedge \bar{x}_{9} = 1) \vee \\ & ((((\bar{x}_{1} = 0 \vee \bar{x}_{2} = 0) \wedge \bar{x}_{7} = 0) \vee (\bar{x}_{3} = 0 \wedge \bar{x}_{8} = 0)) \wedge \bar{x}_{9} = 0) \\ & R_{ab} {\text{ describes the conditions of the abnormal operating mode}}: \\ & (\bar{x}_{1} = 1 \wedge \bar{x}_{2} = 1 \wedge \bar{x}_{7} = 0) \vee (\bar{x}_{3} = 1 \wedge \bar{x}_{8} = 0) \vee (\bar{x}_{7} = 1 \wedge \bar{x}_{8} = 1 \wedge \bar{x}_{9} = 0) \\ \end{aligned} $$
(6.7)

The structural model \( SM(X(t)) \), defined in (6.8), describes the set \( COMPS \) of components, the set \( R_{port} \) specifying the interconnections between output ports with input ports of components (e.g., \( out(c_{B1} ) = in_{1} (c_{B7} ) \)) and the set \( R_{xport} \) associating each variable with an output port (e.g., \( out(c_{B1} ) = \bar{x}_{1} \)).

$$ \begin{aligned} SM(X(t)) & = < COMPS,R_{port} ,R_{xport} > {\text{ where}} \\ & COMPS = \{ c_{B1} ,c_{B2} ,c_{B3} ,c_{B7} ,c_{B8} ,c_{B9} \} \\ & R_{port} = \{ out(c_{B1} ) = in_{1} (c_{B7} ),out(c_{B2} ) = in_{2} (c_{B7} ),out(c_{B7} ) = in_{1} (c_{B9} ), \\ & \,\quad \quad \quad out(c_{B3} ) = in_{1} (c_{B8} ),out(c_{B3} ) = in_{2} (c_{B8} ),out(c_{B8} ) = in_{2} (c_{B9} )\} \\ & R_{xport} = \{ out(c_{B1} ) = \bar{x}_{1} ,out(c_{B2} ) = \bar{x}_{2} ,out(c_{B3} ) = \bar{x}_{3} , \\ & \quad \quad \quad out(c_{B7} ) = \bar{x}_{7} ,out(c_{B8} ) = \bar{x}_{8} ,out(c_{B9} ) = \bar{x}_{9} \} \\ \end{aligned} $$
(6.8)

The functional model \( FM(X(t)) \) describes the relations among the values that the variables can assume, as defined in (6.9). This model consists of the set \( \Updelta \) of values belonging to the domain and the image of the functions defined in the set \( F \), the mentioned set \( F \) and the set \( R_{f} \) that establishes the relation among the variables (e.g., \( \bar{x}_{7} = f_{B4} (\bar{x}_{1} ,\bar{x}_{2} ) \)).

$$ \begin{aligned} FM(X(t)) & & = < \Updelta ,F,R_{f} > {\text{ where}} \\ & & & & \Updelta & = \Updelta_{{\bar{x}_{1} }} \cup \Updelta_{{\bar{x}_{2} }} \cup \Updelta_{{\bar{x}_{3} }} \cup \Updelta_{{\bar{x}_{7} }} \cup \Updelta_{{\bar{x}_{8} }} \cup \Updelta_{{\bar{x}_{9} }} {\text{ with}} \\ & \quad \Updelta_{{\bar{x}_{i} }} = \{ 0,1\} ,\quad i = 1,2,3,7,8,9 \\ & F = \{ f_{B4} , \, f_{B5} , \, f_{B6} \} {\text{ with}} \\ & \quad f_{B4} :\Updelta_{{\bar{x}_{1} }} \times \Updelta_{{\bar{x}_{2} }} \to \Updelta_{{\bar{x}_{7} }} \\ & \quad f_{B5} :\Updelta_{{\bar{x}_{3} }} \to \Updelta_{{\bar{x}_{8} }} \\ & \quad f_{B6} :\Updelta_{{\bar{x}_{7} }} \times \Updelta_{{\bar{x}_{8} }} \to \Updelta_{{\bar{x}_{9} }} \quad {\text{and such that}} \\ & \quad \quad f_{B4} (y_{1} ,y_{2} ) = and\;(y_{1} ,y_{2} ) \wedge \, f_{B5} (y) = and(y,y) \wedge \\ & \, \quad \quad f_{B6} (y_{1} ,y_{2} ) = and\;(y_{1} ,y_{2} ) \\ & R_{f} = \{ \bar{x}_{7} = f_{B4} (\bar{x}_{1} ,\bar{x}_{2} ),\;\;\bar{x}_{8} = f_{B5} (\bar{x}_{3} ),\;\;\bar{x}_{9} = f_{B6} (\bar{x}_{7} ,\bar{x}_{8} )\} \\ \end{aligned} $$
(6.9)

For readability and clarity, we consider to reinterpret from Table 6.1 the logical variables \( \bar{x}_{i} \) (\( i = 1,2,3,7,8,9 \)) like their corresponding \( x_{i} \) (\( i = 1,2,3,7,8,9 \)). This reinterpretation then allows us to see the functional model as depicted in Fig. 6.18.

Fig. 6.18
figure 18

Functional model

The behavioural model requires the set of observation classes, which is defined as \( C = \{ C_{1,1} ,C_{1,2} ,C_{2,1} ,C_{2,2} ,C_{3,1} ,C_{3,2} ,C_{7,1} ,C_{7,2} ,C_{8,1} ,C_{8,2} ,C_{9,1} ,C_{9,2} \} \) where

$$ \begin{array}{*{20}c} {C_{1,1} = \{ (\bar{x}_{1} ,0)\} ,} \hfill & {C_{2,2} = \{ (\bar{x}_{2} ,1)\} ,} \hfill & {C_{7,1} = \{ (\bar{x}_{7} ,0)\} ,} \hfill & {C_{8,2} = \{ (\bar{x}_{8} ,1)\} ,} \hfill \\ {C_{1,2} = \{ (\bar{x}_{1} ,1)\} ,} \hfill & {C_{3,1} = \{ (\bar{x}_{3} ,0)\} ,} \hfill & {C_{7,2} = \{ (\bar{x}_{7} ,1)\} ,} \hfill & {C_{9,1} = \{ (\bar{x}_{9} ,0)\} ,} \hfill \\ {C_{2,1} = \{ (\bar{x}_{2} ,0)\} ,} \hfill & {C_{3,2} = \{ (\bar{x}_{3} ,1)\} ,} \hfill & {C_{8,1} = \{ (\bar{x}_{8} ,0)\} ,} \hfill & {C_{9,2} = \{ (\bar{x}_{9} ,1)\} } \hfill \\ \end{array} $$
(6.10)

From this set and the a priori knowledge, the possible sequences of observations classes are defined, as Fig. 6.19 depicts; that is, it is considered possible that after an occurrence of the class \( C_{1,1} \) (i.e., the fuse is blown) an occurrence of the class \( C_{7,1} \) (the power is off) is observed, then the sequential relation \( (C_{1,1} ,C_{7,1} ) \) is present in the figure.

Fig. 6.19
figure 19

Graphical representation of the possible sequences of observation classes

The occurrence of an observation class entails the assignation of a value to a variable; that is to say, an occurrence of \( C_{1,1} \) entails that the value 0 is assumed by \( \bar{x}_{1} \). Consequently, the previous value of \( \bar{x}_{1} \) was not 0. Thus, the possible states between two observation classes can be characterized and established. Recall that only 21 characterized states were considered of interest from the logical and physical interpretations.

Thereby, the behavioural model \( BM(X(t)) \), defined in (6.11) and illustrated in Fig. 6.20, consists of the set \( S \) of characterized states, the set \( C \) of observation classes and the transition function \( \gamma \).

$$ BM(X(t)) = < S,C,\gamma > \quad {\text{where}} $$
$$ \begin{aligned} S = & \{ s_{8} ,s_{11} ,s_{17} ,s_{18} ,s_{20} ,s_{21} ,s_{23} ,s_{24} ,s_{27} ,s_{28} ,s_{29} ,s_{30} ,s_{31} ,s_{32} , \\ & s_{50} ,s_{53} ,s_{56} ,s_{61} ,s_{62} ,s_{63} ,s_{64} \} \quad {\text{such that}} \\ \end{aligned} $$
(6.11)
Fig. 6.20
figure 20

Behavioural model of the process P(t)

\( \begin{gathered} S = \{ s:VAR \to VALUE| \hfill \\ s(x) = \delta ,x \in X \subseteq VAR,\delta \in \Updelta \subseteq VALUE\} \hfill \\ \end{gathered} \)

\( S \)

\( \bar{x}_{1} \)

\( \bar{x}_{2} \)

\( \bar{x}_{3} \)

\( \bar{x}_{7} \)

\( \bar{x}_{8} \)

\( \bar{x}_{9} \)

\( s_{8} \)

1

0

1

0

0

0

\( s_{11} \)

0

1

1

0

0

0

\( s_{17} \)

1

1

1

0

0

0

\( s_{18} \)

1

1

0

1

0

0

\( s_{20} \)

1

0

1

1

0

0

\( s_{21} \)

1

0

1

0

1

0

\( s_{23} \)

0

1

1

1

0

0

\( s_{24} \)

0

1

1

0

1

0

\( s_{27} \)

1

1

1

1

0

0

\( s_{28} \)

1

1

1

0

1

0

\( s_{29} \)

1

1

0

1

1

0

\( s_{30} \)

1

0

1

1

1

0

\( s_{31} \)

0

1

1

1

1

0

\( s_{32} \)

1

1

1

1

1

0

\( s_{50} \)

1

1

0

1

0

1

\( s_{53} \)

1

0

1

0

1

1

\( s_{56} \)

0

1

1

0

1

1

\( s_{61} \)

1

1

0

1

1

1

\( s_{62} \)

1

0

1

1

1

1

\( s_{63} \)

0

1

1

1

1

1

\( s_{64} \)

1

1

1

1

1

1

\( C = \{ C_{1,1} ,C_{1,2} ,C_{2,1} ,C_{2,2} ,C_{3,1} ,C_{3,2} ,C_{7,1} ,C_{7,2} ,C_{8,1} ,C_{8,2} ,C_{9,1} ,C_{9,2} \} \) where

$$ \begin{array}{*{20}c} {C_{1,1} = \{ (\bar{x}_{1} ,0)\} ,} \hfill & {C_{2,2} = \{ (\bar{x}_{2} ,1)\} ,} \hfill & {C_{7,1} = \{ (\bar{x}_{7} ,0)\} ,} \hfill & {C_{8,2} = \{ (\bar{x}_{8} ,1)\} ,} \hfill \\ {C_{1,2} = \{ (\bar{x}_{1} ,1)\} ,} \hfill & {C_{3,1} = \{ (\bar{x}_{3} ,0)\} ,} \hfill & {C_{7,2} = \{ (\bar{x}_{7} ,1)\} ,} \hfill & {C_{9,1} = \{ (\bar{x}_{9} ,0)\} ,} \hfill \\ {C_{2,1} = \{ (\bar{x}_{2} ,0)\} ,} \hfill & {C_{3,2} = \{ (\bar{x}_{3} ,1)\} ,} \hfill & {C_{8,1} = \{ (\bar{x}_{8} ,0)\} ,} \hfill & {C_{9,2} = \{ (\bar{x}_{9} ,1)\} } \hfill \\ \end{array} $$
$$ \gamma :S \times C \to S\quad {\text{such that}} $$
$$ \begin{array}{*{20}c} {\gamma (s_{8} ,C_{2,2} ) = s_{17} ,} & {\gamma (s_{31} ,C_{7,1} ) = s_{24} ,} & {\gamma (s_{11} ,C_{1,2} ) = s_{17} ,} & {\gamma (s_{32} ,C_{1,1} ) = s_{31} ,} \\ {\gamma (s_{17} ,C_{7,2} ) = s_{27} ,} & {\gamma (s_{32} ,C_{9,2} ) = s_{64} ,} & {\gamma (s_{18} ,C_{3,2} ) = s_{27} ,} & {\gamma (s_{32} ,C_{3,1} ) = s_{29} ,} \\ {\gamma (s_{20} ,C_{7,1} ) = s_{8} ,} & {\gamma (s_{32} ,C_{2,1} ) = s_{30} ,} & {\gamma (s_{21} ,C_{2,2} ) = s_{28} ,} & {\gamma (s_{50} ,C_{9,1} ) = s_{18} ,} \\ {\gamma (s_{23} ,C_{7,1} ) = s_{11} ,} & {\gamma (s_{53} ,C_{9,1} ) = s_{21} ,} & {\gamma (s_{24} ,C_{1,2} ) = s_{28} ,} & {\gamma (s_{56} ,C_{9,1} ) = s_{24} ,} \\ {\gamma (s_{27} ,C_{1,1} ) = s_{23} ,} & {\gamma (s_{61} ,C_{8,1} ) = s_{50} ,} & {\gamma (s_{27} ,C_{2,1} ) = s_{20} ,} & {\gamma (s_{62} ,C_{7,1} ) = s_{53} ,} \\ {\gamma (s_{27} ,C_{8,2} ) = s_{32} ,} & {\gamma (s_{63} ,C_{7,1} ) = s_{56} ,} & {\gamma (s_{28} ,C_{7,2} ) = s_{32} ,} & {\gamma (s_{64} ,C_{1,1} ) = s_{63} ,} \\ {\gamma (s_{29} ,C_{8,1} ) = s_{18} ,} & {\gamma (s_{64} ,C_{2,1} ) = s_{62} ,} & {\gamma (s_{30} ,C_{7,1} ) = s_{21} ,} & {\gamma (s_{64} ,C_{3,1} ) = s_{61} } \\ \end{array} $$

As a result of this analysis, we consider that the construction of a generic model of the process requires interpretations of the expert’s knowledge both in logical and physical terms. These interpretations along with modelling decisions allowed a reduction from 512 to only 21 states physically possible and of interest for diagnosing behaviour problems. The logical model of Fig. 6.15 describes the structure of the expert’s diagnosis reasoning and the physical model of Fig. 6.17 provides the diagnosis knowledge required for this reasoning. Thus, both logical and physical models are necessary and complement each other. We believe that these models are, ultimately, those ones “constructed” by experts where, in practice, the combination of these ones simplifies the diagnosis task.

Moreover, the resultant model \( M(X(t)) \) admits the application of model-based diagnosis techniques and, simultaneously, introduce the dimension of time allowing to model the dynamic of the process in a behavioural model. This model is a crucial element in the supervision of processes since generally it is collated with the real process evolution. This quadripartite structure of the model discriminates the different types of knowledge about the process and then allows greater understanding of the problem and better communication with experts.

7.2 TOM4L Models

The models described in this section have been automatically provided by the ElpLab Java software which implements the complete TOM4L KDD process as illustrated in Fig. 6.13, Sect. 6.6.5. For this purpose, based on the scenario \( \Upomega \) defined in (6.4), Sect. 6.7.1.2, and from the method described in [49], a set of 100 occurrences of the observation classes \( C_{1,1} \), \( C_{2,1} \), \( C_{3,1} \), \( C_{7,1} \), \( C_{8,1} \) and \( C_{9,1} \), with a stochastic distribution of time according to Table 6.3, was built.

Table 6.3 Prior probabilities of the car example [10, p. 76]

As described in Sect. 6.6, the TOM4L learning approach groups data mining algorithms and techniques which provide the possibility of finding n-ary temporal relations among observation classes in timed data. Thus, from the sequence \( w \) which is made up the 100 timed observations, a Functional Model and a Behavioural Model of the car functioning can be obtained when applying TOM4L.

7.2.1 Functional Model

The algorithm Tom4BN [8, 10] which allows to discover naive Bayesian Networks (Sect. 6.6.3) from timed data is applied to the 100 observations of the car example, giving as a result the Bayesian Network shown in Fig. 6.21a.

Fig. 6.21
figure 21

Functional model obtained through TOM4L [10, pp. 81, 83]

In this example, classes \( C_{i,j} \) are singletons of the form \( C_{i,j} = \{ (x_{i} ,\delta_{j} )\} \) and \( P(C_{i,j} ) \), equivalent to \( P(x_{i} = \delta_{j} ) \), is the prior probability of observing an occurrence of the class \( C_{i,j} = \{ (x_{i} ,\delta_{j} )\} \) in \( w \). Besides, it should be noted that “\( \neg x_{i} \)” refers to any equality except “\( x_{i} = \delta_{j} \)” or, put in another way, “\( \neg x_{i} \)” denotes “\( x_{i} = \delta_{k} \wedge \delta_{k} \ne \delta_{j} \)”.

Thus, this Bayesian Network enables the definition of the Functional Model of Fig. 6.21b, whose functions correspond to those ones of the TOM4D Functional Model (Fig. 6.18, Sect. 6.7.1.4); but unlike the last one, these functions have probabilities associated which provide a certain level of confidence about the established relations among values. For example, the probability of observing that the power is off having observed that the battery is low and the fuse is blown is 0.684; that is, \( P(x_{7} |x_{1} ,x_{2} ) = 0.684 \) in Fig. 6.21a. Thus, the level of confidence when considering \( off = f_{4} (blown,low) \) is approximately 68 % as Fig. 6.21b depicts. Another example is the probability of \( \phi_{7} = f_{4} (\phi_{1} ,\phi_{2} ) \), which can be obtained from \( P(x_{7} |\neg x_{1} ,\neg x_{2} ) = 0.087 \) when calculating \( P(\neg x_{7} |\neg x_{1} ,\neg x_{2} ) = 1 - P(x_{7} |\neg x_{1} ,\neg x_{2} ) \).

Hence, the Functional Model with probabilities automatically obtained from data can be compared with the Functional Model defined from experts’ knowledge; and thus, both models can be analysed together complementing each other.

7.2.2 Behavioural Model

A behavioural model can also be obtained from timed data through the TOM4L process. The algorithm BJT4S [9] is applied to the set of observation sequences, and consequently, the model in Fig. 6.22 is automatically obtained.

Fig. 6.22
figure 22

Behavioural model obtained through TOM4L. Signature tree of the observation class \( C_{9,1} \) [10]

The figure presents the sequences of observation classes discovered from data where the values between brackets denote the average maximum and minimum time periods between two occurrences of observation classes; that is, the temporal constraints as described in Sect. 6.6.1. This model is a tree whose branches (called signatures [13, 49] and described in Sect. 6.6.4) define n-ary temporal relations among observation classes and verify certain anticipation and coverage rates. For example, as shown in Fig. 6.22, \( m = ((C_{1,1} ,C_{7,1} ,[0,4s]),(C_{7,1} ,C_{9,1} ,[0,6s])) \) is a signature which denotes the sequence of the type \( C_{1,1} ,C_{7,1} ,C_{9,1} \) with its temporal constraints. In the figure, the anticipation rate of the mentioned signature \( m \) indicates that in 40 % of the cases, when an occurrence of \( C_{1,1} \) is followed by an occurrence of \( C_{7,1} \) in at most \( 4s \), then an occurrence of \( C_{9,1} \) takes place in at most \( 6s \). For its part, its coverage rate means that in 20 % of the cases in which an occurrence of \( C_{9,1} \) is observed, the signature \( m = ((C_{1,1} ,C_{7,1} ,[0,4s]),(C_{7,1} ,C_{9,1} ,[0,6s])) \) was verified.

Clearly, this model is a sub-model of that one in Fig. 6.19, Sect. 6.7.1.4, describing sequences of observation classes built through TOM4D. Therefore, the model of Fig. 6.22 implicitly determines a behavioural model which is included in the TOM4D Behavioural Model defined from experts’ knowledge (Fig. 6.20, Sect. 6.7.1.4). In particular, the model obtained from data provides, in addition, knowledge about temporal constraints between event occurrences. Thus, once again, these models belonging to different disciplines, such as KE and KDD are, can be easily related and compared to each other.

Owing to that, TOM4L models can be related with TOM4D models and the latter are directly related with a CommonKADS conceptual model, the communication with experts about the first one is easier. That is to say, the meaning of the signature \( m = ((C_{1,1} ,C_{7,1} ,[0,4s]),(C_{7,1} ,C_{9,1} ,[0,6s])) \) can be easily explained by saying that in the 40 % of cases, when observing the fuse blown, in at most \( 4s \) the power is observed off and subsequently, in at most \( 6s \), it is observed that the engine does not work. Thus, TOM4D establishes a bridge between experts’ knowledge and data, and TOM4L allows automatic learning from these last ones.

8 Conclusion

Knowledge acquisition, as a topic of interest in sciences, has been generally addressed from two different perspectives. One approach has been to consider knowledge acquisition as a psychological and social process that consists in the synthesis of new knowledge through socialization with experts. The other approach has been to consider knowledge acquisition as an interpretation and analysis process of data, based on discovering patterns of interest through observation, analysis and intertwining of the data. These two perspectives are, respectively, central issues in Knowledge Engineering (KE) and in Knowledge Discovery in Database (KDD).

Nevertheless, as highlighted by N. Wickramasinghe [15], although knowledge acquisition is the main and central question in both disciplines, the issue has been traditionally approached from one or the other perspective, rather than from an integrative view. We consider then that a whole approach is necessary in order to accelerate the global learning process and even, in extremely complex cases, to provide viability.

Results about probabilistic information and temporal constraints, as well as discovered event sequences which could be unexpected, extend the knowledge about a real process and provide resources to build a more suitable model of this one. However, relating this knowledge to the expert’s one is not a trivial task because, generally, the formalisms used by Knowledge Engineering methodologies and by Knowledge Discovery in Database processes to represent knowledge models are different. As a consequence, the comparison between both models can not be in principle carried out. We then believe that the main difficulty for relating the mentioned disciplines stems from the lack of a global approach based on a same theory and consequently, from the lack of representation formalisms that can be used in both domains.

Thereby, the central focus of this chapter was the definition of a global human–machine learning process which combines a Knowledge Engineering methodology called TOM4D (i.e. Timed Observation Modelling for Diagnosis) with a Knowledge Discovery in Database process called TOM4L (i.e. Timed Observation Mining for Learning). Thus, with the aim of defining this integral view, the Theory of Timed Observations [1] has been established as a basis for the development of the proposed approach. This theory defines, among other things, the notions of timed observation and observation class, concepts that enable to specify the traditional notion of discrete event and the Artificial Intelligence notion of alarm (or warning).

This chapter presented then the TOM4D Knowledge Engineering methodology, which allows to build models, by basing on the Theory of Timed Observations, from experts’ knowledge. The models built through this methodology are not experts’ Knowledge Models but models of the process about which experts have knowledge. By construction, TOM4D models are consistent with and easily relatable to CommonKADS Knowledge Models built from experts’ knowledge, CommonKADS being one of the principal KE methodologies. Therefore, models of a process built through TOM4D facilitate the communication with the expert, and thus, the validation of the Knowledge Models. Besides, the chapter introduced the basic elements of the TOM4L Knowledge Discovery in Database process to obtain knowledge from data. The TOM4L process allows to find n-ary temporal relations of observation classes representative of the process that gives rise to data, by using an entropy-based measure called the BJ-measure [9, 12]. In addition, through the aforesaid measure, TOM4L enables to build Bayesian Networks from timed data [8, 10, 11]. Thus, TOM4L models are directly relatable to TOM4D models.

In summary, it was presented a human–machine learning process nourished from experts’ knowledge and knowledge discovered in data which, in our opinion, is ultimately a virtuous circle that establishes a positive and corrective feedback to each step. Therefore, a process model which meets the expectation in the knowledge intensive tasks performed by a Knowledge Based System can be built in a more suitable way.

Real world problems have been addressed though this approach. In particular, the security of the dam of Cublize (France), where the resultant models have been validated by the hydraulic dam experts of the French governmental organization (Irstea) which controls the security of hydraulic civil engineering structures in the corresponding country [6, 50]. Moreover, nowadays we are utilizing the presented approach in order to model human behaviour from gerontologists’ knowledge and smart environments data, in the context of the GerHome Project of the Centre Scientifique et Technique du Bâtiment (CSTB) of Sophia Antipolis, France [4, 51].

We believe that binding the KE and KDD universes enriches and facilitates the modelling task. Nevertheless, there still exists a difficulty with regard to the discursive and conceptual levels in which each universe is developed. That to say, sometimes, even being able to link the mentioned disciplines, relating models obtained from knowledge discovered in data to models obtained from experts’ knowledge is very difficult, because experts’ conceptual abstraction level is very high or is far from those concepts at data level. Although this topic has been beyond the scope of the present chapter, we consider of interest to mention that this issue has been addressed by means of a theoretical framework of abstraction levels that we have defined [4, 52, 53], where in each level a KE methodology, like TOM4D, can be combined with a KDD process, like TOM4L, in order to built a set of models linking the data abstraction level (e.g. sensor level) to the expert’s conceptual level.