Introduction

The connected factory, also known as Industry 4.0 (Lasi et al. 2014; Oztemel and Gursev 2018) is a project initiated by German industrialists and supported by the government to make machines connected and intelligent. The term “Industry 4.0” refers to a 4th industrial revolution. The first revolution began with the use of steam power and the first machines. The second introduced electrical energy and the mass production, while the third one started with the use of electronics and computers to automate production. Industry 4.0 uses the concepts of Internet of Things (IoT) (Xia et al. 2012) that offers ubiquitous computing through advanced connectivity of products, systems and services. Due to the ubiquitous nature of objects, a remarkable number of systems will be connected to the Internet. The main aspect of IoT is the use of these connected objects in manufacturing processes. This means that smart sensors and smart tools in general will be available in the factory. These technologies will improve performance of the manufacturing processes in real time by acting in a pro-active manner.

The analysis of collected data allows to model the behaviour of a machine, be it normal or abnormal. When these models are confronted with real-time data, monitoring systems can detect anomalies at appropriate time, and send an alert to humans for a timely intervention (Mobley 1990). This is the principle of predictive maintenance (Hashemian and Bean 2011).

Intelligent systems that allow this kind of maintenance are based on analyzing collected signals, that are generally a set of timestamped events. For this aim, data mining techniques are used, particularly sequential data mining (Agrawal and Srikant 1995) and frequent sequential pattern mining (Borgelt 2012).

However, sequential patterns have a main shortcoming (Cram et al. 2011); they inform about the sequentiality of events but nothing about the gap time between events.

Let the data set shown in Table 1 where events are accompanied by instants of occurrence in each tuple.

Table 1 Data set of sequences (pairs of event/instant)

We can note that sequence \(\langle A, B, C \rangle \) is redundant. It shows that events A, B and C occurred frequently in a sequence manner, but without providing any additional information about the gap between them. A richer pattern where time constraints are considered is the chronicle, initially introduced in Dousson and Duong (1999) and developed in Huang et al. (2012) and Cram et al. (2011). In our data set example, we can deduce that A, B and C occur sequentially, and that B occurs after A at least after one instant and at most after 5 instants, while C occurs after B in the interval [2, 4] of instants. We represent our chronicle as A[1, 5]B and B[2, 4]C. It is a direct graph where nodes are events and vertices are the instant intervals, denoted by time constraints as shown in Fig. 1.

Fig. 1
figure 1

A chroncile extracted from Table 1

To optimise machines’ performance, critical events should be anticipated. Predictive maintenance is based on this principle. Predictive maintenance is of paramount importance and offers considerable potential for innovation to overcome the limitations of traditional maintenance policies (Cho et al. 2018). It consists of collecting and analysing the data of industrial equipments. Then, an alert system learn from previous event sequences and prevent from imminent failures. In opposite to preventive maintenance (Rivera Torres et al. 2018), predictive maintenance avoid a failure before it occurs. To the best of our Knowledge, no research work has been interested in applying frequent chronicle mining to predictive maintenance. This paper introduces a complete data mining approach based on the extraction of frequent chronicles to predict machine failures. In opposite to classic prediction techniques, the aim of our approach is to predict not only the failure event, but also the time interval of its occurrence as time dimension is crucial in predictive maintenance. As chronicle mining algorithms are resource greedy, we are obliged to improve existing works to scale with manufacturing requirements (i.e. large number of sequences, real-time prediction, etc). Thus, we introduce a new algorithm for the extraction of frequent chronicles called Clasp-CPM. This algorithm generates only closed failure chronicles in an effective manner. in addition, to handle the huge volume of sequences, several optimization methods and multi-threading coding are brought up. Then, extracted failure chronicles serve as input to a classification algorithm that predict machine failures. Our general approach is validated on a real industrial data set denoted by SECOM (McCann et al. 2008) as well as on synthetic data sets. Performance of our algorithms and quality of failures’ predictions were investigated against the aforementioned kind of data. To summarize, this paper introduces three contributions: (i) a new approach for mining failure chronicles from a set of sequences that report a machine activity; (ii) a new efficient algorithm called Clasp-CPM introduced to mine failure chronicles; and finally (iii), a new algorithm called FADE that uses the mined chronicles to predict if a new sequence of parameters values will lead to a machine failure, and if yes, in which time interval the crash will occur.

Related works

The literature is plentiful of works that have dealt with the subject of temporal data mining in order to discover interesting patterns (Zhao and Bhowmick 2003; Masseglia et al. 2005; Laxman and Sastry 2006). Sequential Pattern Mining (Srikant and Agrawal 1995) (commonly called SPM) is the discovery of sequences that are frequent in a set of sequences. The process is similar to the frequent itemset mining (Goethals 2003), except that the data set events are ordered and time-stamped. Similarly to the frequent pattern mining problem, SPM algorithms extract frequent sub-sequences according to a user-defined threshold commonly called minimum support. The pioneering SPM algorithm, called AprioriAll, is introduced in Agrawal and Srikant (1995). It generates the set of frequent sequential pattern in a level-wise manner as does Apriori (Agrawal et al. 1993).

Several algorithms have been introduced to discover sequential patterns from sequences, such as GSP (Srikant and Agrawal 1996), SPADE (Zaki 2001), PrefixSpan (Pei et al. 2001) for frequent sequential patterns based on Apriori, and CloSpan (Yan et al. 2003) and ClaSP (Antonio et al. 2013) for frequent closed sequential patterns. SPM techniques have many uses in several fields of application such as DNA sequence analysis (D’Addona et al. 2017; Zerin and Jeong 2011), Web access models (Fournier-Viger et al. 2012), etc.

It has been proven that sequential patterns are not sufficiently informative in several application fields such as network alarm (Mannila et al. 1997) or analysis of human activity (Mannila et al. 1997). Therefore, the chronicle pattern model, that is an extension of sequential patterns, has been introduced Dousson et al. (1993). Chronicles are rich of information because they add the exact time interval where a sequence events occur. However, this richness raises several drawbacks such as memory consumption and execution time. In Dousson and Duong (1999), made the foundation of what has been later known as chronicle mining. They proposed the first algorithm for chronicle extraction from journal logs of telecommunication alarms. Unfortunately, it has been pointed out that this algorithm also known as Frequency Analyzer for Chronicle Extraction (FACE) is not complete (i.e., does not generate all patterns).

In Cram et al. (2011) introduced the HCDA algorithm, to mine the complete set of chronicles. Frequent chronicles of size 2 are mined and those with the same items are grouped in a tree. Chronicles with the largest interval are located in the root and the tightest are placed in leaves. The CCP-Miner algorithm (Huang et al. 2012), is an extension of HCDA algorithm that searches for chronicles in a set of sequences. It has been applied to extract sequences corresponding to patient pathways in a hospital center. In Dauxais et al. (2017), Dauxais et al. proposed a new approach to extract discriminant chronicles in to the context of pharmaco-epidemiology. The proposed DCM algorithm allows to mine chronicles in a labelled sequential data set that more representative of a single phenomena. Finally, Sellami et al. (2018) introduced a new approach, known as CPM, to mine chronicles from a set of sequential patterns extracted using the CloSpan algorithm (Yan et al. 2003).

Chronicle mining: the basic notions

Extracting frequent chronicles requires discovering sequential patterns taking into consideration temporal information, which is the time of occurrence of the events in the sequences. In this section, we introduce all definitions of the concepts necessary for the task of chronicle mining.

Definition 1

(Event) According to Cram et al. (2011), an event is a couple (et) where \(e \in {\mathbb {E}}\) is the type of the event and \(t \in {\mathbb {T}}\) is its time.

These events appear together in their order of occurrence, called timestamped events, which allows us to form a sequence.

Definition 2

(Sequence) Let \({\mathbb {E}}\) be a set of event types, and \({\mathbb {T}}\) a time domain such that \({\mathbb {T}} \subseteq {\mathbb {R}}\). \({\mathbb {E}}\) is assumed totally ordered and is denoted \(\le _{{\mathbb {E}}}\). A sequence is a couple \(\langle SID, \langle (e_{1}, t_{1}), (e_{2}, t_{2}),\ldots , (e_{n}, t_{n}) \rangle \rangle \) such that SID is the index of the sequence and \(\langle (e_{1}, t_{1}),(e_{2}, t_{2}),\ldots , (e_{n}, t_{n}) \rangle \) is a sequence of events. For all \(i,j \in [1,n], i< j \Rightarrow t_{i} \le t_{j}\). If \(t_{i}=t_{j}\) then \(e_{i} <_{{\mathbb {E}}} e_{j}\).

The appearance of timestamped events in a sequence allows us to define temporal constraints between them.

Definition 3

(Temporal constraint) A time constraint is a quadruplet \((e_{1},e_{2},t^{-},t^{+})\), denoted \(e_{1}[t^{-},t^{+}]e_{2}\), where \(e_{1},e_{2}\in {\mathbb {E}}\), \(e_{1}\le _{{\mathbb {E}}}e_{2}\) and \(t^{-},t^{+}\in {\mathbb {T}}\).

A time constraint \(e_{1}[t^{-},t^{+}]e_{2}\) is said satisfied by a couple of events \(((e,t),(e',t'))\), \(e\le _{{\mathbb {E}}}e'\) iff \(e = e_{1}\), \(e' = e_{2}\) and \(t'-t \in [t^{-},t^{+}]\).

We say that \(e_{1}[a,b]e_{2} \subseteq e'_{1}[a',b']e'_{2}\) iff \([a,b] \subseteq [a',b']\).

The extraction of temporal constraints between the events of a sequence leads us to define the concept of chronicles (Dousson and Duong 1999).

Definition 4

(Chronicle) A chronicle is a pair \({\mathcal {C}}=({\mathcal {E}},{\mathcal {T}})\) such that:

  1. 1.

    \({\mathcal {E}} =\{e_{1}\ldots e_{n}\}\), where \(\forall i, e_{i}\in {\mathcal {E}}\) and \(e_{i}\le _{{\mathbb {E}}}e_{i+1}\),

  2. 2.

    \({\mathcal {T}}=\{t_{ij}\}_{1 \le i < j \le |{\mathcal {E}}|}\) is a set of temporal constraints on \({\mathcal {E}}\) such that for all pairs (i, j) satisfying \(i < j\), \(t_{ij}\) is denoted by \(e_{i}[t_{ij}^{-},t_{ij}^{+}] e_{j}\).

\({\mathcal {E}}\) is called the episode of \({\mathcal {C}}\), according to the definition of episode’s discovery in sequences (Mannila et al. 1997).

The relevance of a chronicle is based essentially on the value of its support. The support of a chronicle refers to the number of its occurrences in a sequence. It can therefore be formalized by the definition below.

Definition 5

(Chronicle support) An occurrence of a chronicle \({\mathcal {C}}\) in a sequence S is a set \((e_{1},t_{1}). . .(e_{n},t_{n})\) of events of the sequence S that satisfies all temporal constraints defined in \({\mathcal {C}}\). The support of a chronicle \({\mathcal {C}}\) in the sequence S is the number of its occurrences in S.

Example 1

Let us illustrate all these basic definitions. Assuming a sequence S of three events \(\langle A, B, C \rangle \) represented as follows:

Time constraints that describe the pattern {A, B, C} are noted by A[2,5]B, B[1,4]C and A[6,7]C.

After the generation of temporal constraints, these events can be represented as a graph, as shown in Fig. 2.

Fig. 2
figure 2

Example of a chronicle

figure a

Chronicle mining for predictive maintenance

In this work, we seek to develop an approach to detect machine anomalies in advance. Previous works such as Carrault et al. (2003), Fradkin and Mörchen (2015), Vautier et al. (2005) and Huang et al. (2012), treated the extraction of frequent chronicles, but none put it in the context of predictive maintenance. Our contribution is developed to solve this problem and aims to answer the question, i.e. how can we use temporal constraints between events, and therefore chronicles, to predict anomalies before they occur?

Approach overview

The aim of our approach is to predict anomalies of machines in an industrial context. Our goal is not only to predict the failure, but also the time interval of the occurrence of this failure. For this purpose, we rely on mining the most frequent chronicles describing the events that lead to a machine failure. Our interest in this kind of temporal pattern lies not only in predicting an event, but especially in the time interval in which that event will occur (in our case a machine failure).

Like any knowledge discovery process, our approach starts with a preprocessing step, a mining step and a third step for the interpretation of extracted knowledge. The overall approach is described in Fig. 3.

Fig. 3
figure 3

The different steps describing our approach

The data preprocessing step

Chronicle mining algorithms handle data that are discrete and sequential as mentioned in Definition 2. However, the generated data from industrial machines are not necessarily in that format. Raw industrial data are often continuous and not sequential in the sense of Definition 2. Consequently, we discretize continuous values to obtain nominal ones, i.e., the events. Then, we use sequentialization to transform data in a set of sequences in the form of pairs (event, instant) where each sequence finishes with the failure event.

Furthermore, the number of measures in an industrial data set could be tremendous. Analysis of such huge number of data dimensions is on the one hand costly and on the other useless, since not all feature attributes are necessarily relevant to predict the failure event. Therefore, before the discretization and sequentialization steps, we apply a feature selection method, that computes the most relevant attributes in predicting the breakdownFootnote 1 event.

At this point, the resulting data set is ready to mine.

The failure chronicle mining step

In this step, we aim at discovering chronicles that represent breakdown. This type of chronicles is called failure chronicle and is introduced in Definition 6.

Definition 6

(Failure chronicle) Assuming a chronicle \({\mathcal {C}}_F=({\mathcal {E}},{\mathcal {T}})\). We say that \({\mathcal {C}}_F\) is a failure chronicle if and only if the events that describe it are set according to their order of occurrence in the sequence, and that the end of the chronicle is the event that represents the failure, i.e. for \({\mathcal {E}} =\{e_{1} \cdots e_{n}\vert e_i \le _{{\mathbb {E}}} e_{i+1}, i \in [1,n]\}\), \(e_{n}\) is the failure event.

To discover the frequent failure chronicles, we proceed in two steps, namely:

  1. 1.

    Extraction of the frequent closed sequential patterns from our set of sequences. We recall here that in our preprocessed data set, all sequences end with a breakdown event. We chose in this step to extract the closed frequent sequential patterns to produce efficiently a minimal set of patterns that describe our sequences. Among the two methods presented in “Related works” section, namely ClosPan and ClaSP, we retained the second one because of its confirmed efficiency (Antonio et al. 2013).

  2. 2.

    Extraction of the frequent chronicles. In this step, we scan again the data set to extract the time constraints related to the sequential patterns mined in the previous step. This operation is performed by a new chronicle mining algorithm we introduce in “Clasp-CPM” section.

The failure prediction step

The generated failure chronicles is a set of sequential patterns that describe the most frequent sequence of events that lead to a failure machine. Chronicles provide not only the order of occurrence of those events, but also the interval of time they occur in. Therefore, when a new sequence of timestamped events arrive, we can predict if it favours a breakdown or not by comparing it to the set of mined frequent failure chronicles as well as the time interval it will probably happen. This step is detailed in “Failure detection with chronicles” section.

Clasp-CPM

The first implementation of chronicle mining for predictive maintenance (Sellami et al. 2018) handled few cases of time constraints extraction. It was dependent on the length of the patterns and treated these on a case-by-case basis. Moreover, chronicle extraction needed two steps of time constraints extraction: one for the events on the patterns, and another to extract constraints between regular events and the breakdown event. To overcome these two major setbacks, two notions were used: subsequence graphs and suffix data sets.

First, we introduce a sample data set which will be used for examples in this section.

Table 2 Sample data set

Definition 7

(Closed Frequent sequential patterns) Let D be a sequence data set, i.e. \(D = \left\{ s_i \right\} _{i=1}^N\) where \(\forall i \in \llbracket 1, N\rrbracket , s_i\) is a sequence.

Let \(\mathcal {FS}\) be the set of frequent sequential patterns for a given minimum support minsupp, i.e.

$$\begin{aligned} \mathcal {FS}:= \left\{ s \vert s \subseteq s_i \in D \wedge {{\,\mathrm{supp}\,}}(s) \ge minsupp \right\} \end{aligned}$$
Table 3 Sample suffix database: P is the failure event, s-concatenated at the end of each sequence

Then, we define \(\mathcal {CS}\), the set of frequent closed sequences as:

$$\begin{aligned} \mathcal {CS}:= \left\{ s \vert s \in \mathcal {FS}\wedge (\not \exists \beta \in \mathcal {FS}\vert s \subseteq \beta \wedge {{\,\mathrm{supp}\,}}(s) = {{\,\mathrm{supp}\,}}(\beta )) \right\} \end{aligned}$$

One can easily notice that \(\mathcal {CS}\) is a subset of \(\mathcal {FS}\).

Example 2

(Closed frequent sequences examples) Referring to Table 2, with a relative minsupp of 0.8, one can see that the sequences \(\langle A \rangle \), \(\langle A, A, B \rangle \) and \(\langle A, A, A, B \rangle \) are frequent sequences (they all appear in at least 4 of the sequences). However, \(\langle A \rangle \) is not closed, as it has a support of 1 and is included in \(\langle A, A, B \rangle \), which also has a support of 1. On the contrary, \(\langle A, A, B \rangle \) is a closed frequent sequence, since, even if it is included in \(\langle A, A, A, B \rangle \), the support of the latter is 0.8.

Thus, we see that by keeping closed patterns only, we filter simpler, less informative patterns, but we retain those which are more frequent than other, more complex patterns.

Definition 8

(Concatenation operators) Let \(s = \langle \alpha _1, \ldots , \alpha _p \rangle \) and \(s' = \langle \beta _1, \ldots , \beta _l \rangle \), where \(\alpha _i\) and \(\beta _i\) are sets. Let \(\mathbin {\Diamond }_{\bullet }\) be the concatenation operator. We define two kind of concatenations (Yin et al. 2012):

  • \(\mathbin {\Diamond }_i\): \(s \mathbin {\Diamond }_i s' = \langle \alpha _1, \ldots , \alpha _{p-1}, (\alpha _p \bigcup \beta _1), \beta _2, \ldots , \beta _l \rangle \). Here, the first element of \(s'\) merges with the last element of s, and then we just append the rest of the second sequence.

  • \(\mathbin {\Diamond }_s\): \(s \mathbin {\Diamond }_s s' = \langle \alpha _1, \ldots , \alpha _{p-1}, \alpha _p, \beta _1, \beta _2, \ldots , \beta _l \rangle \). This is usual concatenation.

Definition 9

(Suffix database) Let \(\omega \) be a sequence. \(D_\omega \) is said to be the suffix database associated to D if:

$$\begin{aligned} \forall s \in D_\omega , \exists s' \in D, s = s' \mathbin {\Diamond }_s w \end{aligned}$$

that is if \(\omega \) is a suffix for all sequences in \(D_\omega \). We will note \(\mathcal {FS}_\omega \) and \(\mathcal {CS}_\omega \) the set of frequent sequences and the set of closed frequent sequences associated to the suffix database \(D_\omega \), respectively.

Remark 1

  • We have defined the suffix database with the s-concatenation operator as i-concatenation does not preserve the closeness property we need. See B for further details.

  • In our application, we will use a sequence of length 1 for \(\omega \).

  • \(\# D = \# D_\omega \), i.e. D and \(D_\omega \) have both the same number of sequences.

  • \(\mathcal {FS}\subseteq \mathcal {FS}_\omega \), more precisely \(\mathcal {FS}_\omega = \mathcal {FS}\cup \{ s \mathbin {\Diamond }_s \omega \vert s \in \mathcal {FS}\}\). This tells us that if a sequence is frequent in D, then it is also frequent in \(D_\omega \), but the converse does not hold.

  • \(s_\omega \in \mathcal {FS}_\omega \wedge s_\omega = s \mathbin {\Diamond }_s \omega \Rightarrow s \in \mathcal {FS}\).

Example 3

(Suffix database) In Table 3, we have built a suffix database, by s-appending the failure itemset {P} at the end of each sequence. One can choose the timestamp to be different to that of the last element, but the added time should be constant to keep a coherent analysis later.

Proposition 1

$$\begin{aligned} \mathcal {CS}_\omega = \left\{ s \mathbin {\Diamond }_s \omega \vert s \in \mathcal {CS}\right\} \end{aligned}$$

Lemma 1

Let \(\omega \) be the sequence associated to the suffix database \(D_\omega \) built from the database D. Then

$$\begin{aligned} \forall s \in \mathcal {FS}, \exists s' \in \mathcal {FS}_\omega , s' = s \mathbin {\Diamond }_s \omega : {{\,\mathrm{supp}\,}}(s) = {{\,\mathrm{supp}\,}}(s') \end{aligned}$$

Proof

Reader may refer to 8 for the detailed proof. \(\square \)

This proposition is useful to prove that every closed frequent sequence of a suffix database ends with the sequence \(\omega \). We use this fact to improve the previous algorithm given, as there will be only one ExtractTimeConstraints procedure and there will be no union of chronicles at the end.

Proof

Reader may refer to A for the detailed proof of Proposition 1. \(\square \)

Subsequence graph

A subsequence graph is a data structure built to represent all occurrences of a sequence within another sequence. It was designed in the image of the Knuth–Morris–Pratt algorithm for word matching.

Definition 10

(Subsequence graph) Let \(s = \langle s_1, \ldots , s_n \rangle \) be a sequence and \(p = \langle p_1, \ldots , p_m \rangle \), a pattern. A susbequence graph is a couple (XU) where X are the vertices and U are the (directed) edges. Vertices \(v_{i, j}\) correspond to elements of the pattern \(p_i\), indexed by their position on s, j. There is an edge from \(v_{i, j}\) to \(v_{k, l}\) iff:

  • \(k = i\) (both pattern elements are the same) and \(v_{k,l}\), \(l > j\), is the first next occurrence (of such pattern element) in s.

  • \(k = i + 1\) and there is no edge from \(v_{i,j}\) to \(v_{k, m}\) with \(m < l\) (\(v_{k,l}\) is the first occurrence of \(p_{i+1}\) after \(v_{i, j}\)).

figure b

Example 4

(Graph construction) Let us take the pattern \(\langle A, B, B, C \rangle \) and the sequence \(\langle B, A, C, B, B, A, C, A, B, C, B, A, B, C \rangle \). Table 4 includes the enumerated sequence:

Table 4 Sequence enumeration
  1. 1.

    First, we initialize an empty list of lists, such that the first element correspond to the occurrences of A, the second one to the occurrences of B, the third one to the occurrences of B following a B and the fourth one to the occurrences of C.

  2. 2.

    Next, in the first step, B is placed in the second list. Then, A is placed in the first list. In step three, C is placed in the fourth list (Fig. 4a). Notice there are no links yet as the elements, while they are individually in the pattern, have not occurred in the order of the pattern.

  3. 3.

    During the fourth iteration, we find a second B. We proceed in four steps: B is placed in the third list, we add links from the second list elements, without links to a third list element, to this B element, then we add B to the second list and we add links from the first list elements, without links to a second list element, to this B (Fig. 4b).

  4. 4.

    We continue the process. During step seven, the graph allows to extract the first occurrence of the pattern in the sequence: \(\langle (A, 2), (B, 4), (B, 5), (C, 7)\rangle \) (Fig. 5a).

  5. 5.

    Last, when the whole sequence has been treated, one can perform pruning steps to remove the nodes without links or that do not allow to extract a whole pattern (Fig. 5b).

Fig. 4
figure 4

Subsequence graph: third and fourth steps

Fig. 5
figure 5

Subsequence graph: final steps

We devised this structure to be able to extract all the information needed, namely timestamps and precedence relations for each element of a pattern in a sequence. It is not consuming in space, as each element is only referenced.

Multi-threading

The first step of CPM is frequent closed sequences mining, which we found hard (if possible) to parallelize. Thus, we enchanced the following steps, time constraint extraction and chronicle building, with multi-threading.

Using a Producer–Consumer approach, we defined the following schema :

  1. 1.

    Initialize a certain number of workers, a pool of extracted closed patterns, a pool of time constraints sets and a pool of chronicles.

  2. 2.

    (Time constraint extraction) For half the workers:

    1. (a)

      Take a pattern from the pool.

    2. (b)

      Extract the time constraints associated with it, i.e. build a subsequence graph for each sequence in which the pattern is found and use it to extract the constraints.

    3. (c)

      Put the time constraints in the corresponding pool.

    4. (d)

      Repeat until the pattern pool is empty.

  3. 3.

    (Chronicle Building) For half the workers:

    • Take the constraints from the pool.

    • Build a chronicle using the time constraints.

    • Put the chronicle in the corresponding pool.

    • Repeat until the constraint pool is empty.

An improvement would be to switch workers tasks when one of the pools is empty, e.g. if the time constraint pool is empty, then all workers are extracting time constraints, and vice versa.

There is no race conditions when accessing the data set as the sequences are not modified but only read. Our implementation uses blocking queues and linked blocking queues (java) for the pools. Here again, each element is a reference only, so we do not significantly increase memory usage beyond the extraction step.

Failure detection with chronicles

Clasp-CPM allows to mine failure chronicle. Semantically, it mines events and indicators of failure. This kind of information is useful to monitor machines and predict the failure. Therefore, we intend to use these mined chronicles for classification and prediction problems. We introduce the algorithm FADE shown in Algorithm 2. The latter uses the set of extracted failure chronicles as input. FADE tries to match failure chronicle to a single sequence. To define chronicle sequence match, we introduce the following definitions.

Definition 11

(Chronicle cover) Assuming a sequence \(S= \langle (e_{1}, t_{1}), (e_{2}, t_{2}),\ldots , (e_{n}, t_{n}) \rangle \) and a failure chronicle c. We say that c covers the sequence SID if and only if the events represented by the chronicle belong to the sequence as well as the time intervals between these events in the sequence belong to the temporal constraints extracted by the chronicle, i.e.,

$$\begin{aligned}&{\mathcal {C}} < \cdot S \Leftrightarrow \forall e_{i}[t^{-},t^{+}]e_{j} \in {\mathcal {C}},\\&\exists ((e,t),(e',t')) \in S \wedge e=e_{i}, e' = e_{j} \wedge t'-t \in [t^{-},t^{+}]. \end{aligned}$$

Definition 12

(Supported failure chronicle) Assuming a sequence S and the set of failure covering chronicle ,i.e., \({\mathbb {C}}=\{c\in {\mathcal {C}}, c <\cdot S\}\). We say that \({\mathcal {C}}_{f}\) is the supported failure chronicle if and only if it has the maximal support among all chronicles of the set \({\mathbb {C}}\), i.e., \({\mathcal {C}}_{f}= c \in {\mathcal {C}} \wedge \not \exists c'\in {\mathcal {C}} \wedge supp(c')>supp(c)\).

Let explain this principle in the following example.

Example 5

Assuming the following chronicle: A[0,3]B, B[0,7]Failure and the three sequences depicted in Table 5.

Table 5 Sequences’ data set

For the first sequence, the duration between events A and B is 2 instants,Footnote 2 that belongs to [0,3], and between B and the failure is 2 instants belongs to [0,7]. So the occurrence’s time of the failure is in the interval illustrated by the chronicle, so we have classified this sequence correctly, likewise for the second sequence. Whereas for the third sequence, the interval between A and B is 4 does not belong to [0,3], and between B and the failure 11 does not belong [0,7]. In this sequence, the failure that has appeared but is not validated by this chronicle. So this failure will not be predicted and the duration of the sequence will be misclassified. Suppose we have two chronicles: \({\mathcal {C}}_{1}=\hbox {A[0,3]B}\), B[0,7]Failure, \({\mathcal {C}}_{2}=\hbox {C[0,9]Failure}\). The second sequence in the Table 5 {(A,5) (C,6) (B,7) (Failure, 9)} is covered by these two chronicles, so to have a relevant prediction, we must choose one of the two chronicles. The chronicle that has the highest support in the data set will be kept.

In this example, the first chronicle covers the first two sequences, so \(supp({\mathcal {C}}_{1}) = 2\), and the chronicle \({\mathcal {C}}_{2}\) only covers the second sequence, so \(supp({\mathcal {C}}_{2}) = 1\), subsequently the supported failure chronicle for this sequence is the chronicle \({\mathcal {C}}_{1}\).

figure c

Definitions 11 and 12 are implemented in Algorithm 2. First, the algorithm uses the coverage procedure to check whether the processed sequence matches at least one element from the set of chronicles. This procedure takes as parameter a sequence and a set of chronicles. This process ensure that for a single chronicle, only the covering chronicle with the highest support is retained.

Experimentation and results

To validate our approach, we have performed a large set of experiments. Two aspects were subjected to evaluation; the performance in term of resources’ cost, as well as the prediction quality of the machine failures. As the costly part of our approach is mining the frequent chronicles, we experimented Clasp-CPM on both synthetic data and a real industrial data set. Then, we experiment the quality of failure prediction, i.e., the FADE algorithm on the real data set only. To this aim, we used the cross validation principle (Stone 1974) as it is the most used in literature. We compared FADE with the state-of-the-art chronicle mining based approaches. Our approach not only predicts failure, but also its time interval. Classical classification approaches like K-Nearest Neighbours or Long Short Term Memory (LSTM) (Malhotra et al. 2015) to cite a few are unable to consider the time dimension neither mined pattern for training, that’s why they are ignored in this experimental study. All experiments were performed on a personal computer equipped with a 2.5 GHz processor and 16 GB main memory under the Microsoft Windows OS.

Experimental data sets

As mentioned above, synthetic data sets were generated to evaluate the scalability of Clasp-CPM according to several parameters we identified. These parameters are the number of sequences in the data set (denoted DB), the size of a single sequence (denoted \(seq\_size\)), the maximum number of items (also called the dictionary size, denoted \(dic\_size\)) and the maximum number of items per event (denoted \(max\_items/event\)). Our generator produces randomly a set of time-stamped sequences according to the parameters above. The experiments presented in the next subsection consists in comparing the performance of Clasp-CPM to existing algorithms over several generated data sets where the parameters in question were varied.

In addition, our experiments were performed on a real industrial data set; the SECOM data set (McCann et al. 2008). It consists on a set of measurement data captured from senors installed on the machines of a manufacturing production line. Each row contains a set of measures and signals produced by the machine, its status (1 for normal running and \(-1\) for a failure) and a time-stamp (the instant where the machine measures and status were observed). The SECOM data set includes 590 attributes and 1567 records. To monitor the semi-conductor manufacturing process, these data are mined. However, to benefit from the concept of chronicles, data have to be pre-processed, hence the discretization and the sequentialisation performed of the raw SECOM data. Feature selection is also achieved to reduce dimensionality of data, and to keep only the relevant measures that affect at most the machine status.

Evaluation of the chronicle mining phase

In this first part, we confront Clasp-CPM to other algorithms of the state-of-the-art using synthetic data sets. We compared it to a previous brute-force algorithm called CPM (Sellami et al. 2018), as well as DCM (Dauxais et al. 2015) and FACE (Dousson and Duong 1999). In our experiments, we used different synthetic data sets to test the effect of several parameters on the results. Table 6 shows the number of generated chronicles by the aforementioned algorithms for a range of support threshold that goes from 0.4 to 1 for three different synthetic data sets where we change \(seq\_size\), \(dic\_size\) and \(max\_items/event\).

Table 6 Number of chronicles of Clasp-CPM, CPM, DCM and FACE w.r.t data set size, sequence’s size, dictionary size, maximum events/item and support threshold

As Clasp and Clospan extract the same number of closed patterns (Antonio et al. 2013), the experiments show that both Clasp-CPM and CPM generate the same number of frequent chronicles. This number depends on the different parameters used when generating the data set. Obviously, it increases considerably while increasing the parameters’ values. These same experiments done on FACE algorithm (Dousson and Duong 1999) show that this algorithm generates the highest number of chronicles, since it is based on Apriori algorithm. In fact, Apriori extracts all the frequent patterns, not only the closed frequent ones. In our introduced approach, we bypassed the Apriori-like methods to avoid redundant chronicles, and to optimise the performance of our chronicle mining step. On the other hand, DCM (Dauxais et al. 2015) was designed to consider discriminancy in data, so the comparison with our algorithm is not straightforward.

Extracting chronicles was only possible for threshold values greater than 0.7 as shown in Table 6. One hypothesis for these results is, as already mentioned, the use of the discriminance constraint. This parameter is used in the epidemiology algorithm to distinguish two populations (positive and negative) and to extract patterns that are frequently present in the positive base, which is not really the case in our approach since we only process a single data set (population) at a time.

Table 6 shows that the maximum size of a sequence is the parameter that affects the most the number of obtained chronicles. On the other hand, increasing the dictionary size and the maximum number of items per event leads to a small increase of the number of chronicles. This behaviour is explained by the fact that small sequences generate fewer closed patterns. A larger sequence size induces more frequent closed sequential patterns, which increases the number of generated frequent chronicles. This experiment result supports our choice to use an attribute selection method in the pre-processing step of our approach. Indeed, the more attributes (measures) we consider, the more our sequences are long. That’s why we consider only relevant attributes, avoiding a huge number of “irrelevant” chronicles that would make our approach more costly, without a significant impact on the prediction step.

The results shown in Tables 7 and 8 confirm the effect of the sequence sizes on the performance of the tested algorithms. Indeed, this parameter increases the number of mined patterns which increases time execution and memory consumption to compute the temporal constraints between the different pairs of events in each chronicle. We can see that varying parameters DB, \(dic\_size\) and \(max\_items/event\) do not change considerably the performance of the algorithms, in opposite to the variation of \(seq\_size\). Another main observation, Clasp-CPM outperforms all its competitors in term of time execution, especially CPM and FACE (where comparison is fair in term of resulted chronicles). About the memory consumption, Clasp-CPM uses clearly less memory especially when the support threshold exceeds 0.7. We note that FACE consumes more memory than the other algorithms since it generates more chronicles than the other approaches. We note also that used memory increases lightly each time the frequency threshold decreases for all approaches. This is explained by the fact that a lower frequency threshold generates more frequent sequences and thus all algorithms need more memory space to store them.

Table 7 Time consumption of Clasp-CPM, CPM, DCM and FACE w.r.t data set size, sequence’s size, dictionary size and threshold
Table 8 Memory consumption of Clasp-CPM, CPM, DCM and FACE w.r.t data set size, sequence’s size, dictionary size and threshold

The performance evaluation was also done on the SECOM data set. The interpretations done on the synthetic data sets are confirmed with our real data set as shown in Figs. 6 and 7.

Tables 9 and 10 show the number of generated chronicles with respect to the variation of the support threshold. As remarked with the synthetic data experiments, the number of chronicles extracted by FACE is huge. This is explained by the kind of patterns mined by FACE that considers all frequent ones while Clasp-CPM and CPM consider only the close ferquent chronicles.

From Table 10, we notice that CPM, Clasp-CPM and FACE generate chronicles with the same maximum size. Clasp-CPM outperforms FACE in the sense that it decreases the number of extracted chronicles as well as the execution time, but it generates the same larger patterns, also extracted by the FACE algorithm.

Choosing a high frequency threshold may be interesting in the sense that it extracts the patterns that have a higher relevance in the data set, but on the other hand it extracts a smaller number of chronicles with a small size, which does not help to evaluate the performance of our approach. In addition, in predictive maintenance, we need to have a sufficient number of chronicles with an interesting size to be able to efficiently predict the failures caused by the occurrence of a long sequence of events.

Evaluation of the prediction phase

To evaluate the quality of prediction, we used four measures. The first is to compute the True Positive Rate of the extracted chronicles. It is used to measure the number of positive sequences that are correctly classified, i.e., the sequences for which there is at least one chronicle allowing to predict the appearance of the failure without taking into consideration the temporal constraints. Indeed, for each sequence, if its events are described by the chronicle, we have correctly classified the sequence’s failure and the chronicle could have predicted what are the events which caused this breakdown. Otherwise, if there is no chronicle that could describe the failure for a given sequence, therefore there is no prediction of failure so the sequence was misclassified. The idea is to bring the approach to a classification problem to apply the cross-validation method (Stone 1974).

For each value of \(fq_{min}\), the chronicles are extracted from the training sequences. Then, for the test set, we check for each sequence, its membership in at least one chronicle among those extracted. The number of sequences validated by the chronicles is computed to estimate its percentage with respect to the sequence set. This procedure is repeated 10 times to validate all the sequences of the data set. This is the same principle used to compute the recall rate (Davis and Goadrich 2006), which is defined by the number of relevant instances found in relation to the number of relevant instances in the data set.

Fig. 6
figure 6

Memory consumption of Clasp-CPM, CPM, DCM and FACE

Fig. 7
figure 7

Execution Time of Clasp-CPM, CPM, DCM and FACE

The True Positive Rate is computed according to this formula:

$$\begin{aligned} \frac{TP}{TP + FN} \end{aligned}$$
(1)
Table 9 Comparison of the three algorithms according to the number of generated chronicles
Table 10 Maximum size of chronicles according to \(fq_{min}\)

where TP (the true positive results) is the number of validated sequences, for which we found at least one chronicle that could have predicts the failure, and FN (the false negative results) is the number of sequences for which no chronicle could predict the failure.

Indeed, if there is no chronicle that describe a sequence S, then we cannot classify the sequence as a “failure”, and we state the “normal” case. However, all the sequences of our data set lead to a machine failure, that’s why we consider such a sequence classified as false negative.

Example 6

Let’s take the chronicle shown in Fig. 8 and the four sequences depicted in Table 11.

Fig. 8
figure 8

Example of a chronicle

In this example, the first three sequences are described by the given chronicle, since according to this chronicle an event A followed by a B causes a failure, whereas this is not the case for the fourth sequence since the event D is not described by the chronicle. So the true positive rate for this example is: \(\dfrac{3}{3 + 1} = 75\%\).

The second measure we used evaluates the precision of the results with consideration of the failure time, i.e., it estimates the percentage of sequences for which time constraints are extracted correctly. Indeed for each sequence, if the moment predicted by the extracted chronicles is outside the failure appearance interval in the sequence, so the chronicle could not extract the temporal constraints of this failure, and the failure is classified as false positive. These interpretations allow us to apply the following precision formula:

$$\begin{aligned} \frac{TP}{TP + FP} \end{aligned}$$
(2)

Example 7

With the same data from the Example 6, the first two sequences are classified as TP since the temporal constraints between events belong to those extracted by the chronicle, whereas the third sequence is classified as FP since the events are described by the chronicle but the temporal constraints are not checked. So the precision rate is: \(\dfrac{2}{2 + 1} = 66.66\%\).

Table 11 Sequences’ data set

These two previous measurements allow to compute the F-measure as follows:

$$\begin{aligned} \frac{2TP}{2TP + FP + FN} \end{aligned}$$
(3)

Taking always the same example, the F-measure value will be equal to: \(\dfrac{2\times 2}{2 \times 2 + 1 + 1} = 66.6\%\).

The literature is plentiful of other measures that use the number of true negatives such as specificity measure (also called the true negative rate). It measures the proportion of negatives that are correctly identified. In our application case, we can not apply them because a sequence is classified as a true negative if the chronicles are able to predict normal cases for a sequence where there is no failure. This class can not be used in our case since all the available sequences lead to failures. Furthermore, a chronicle is made to predict a failure and not the normal operation of a machine.

Table 12 True positive rate of FADE and FACE approaches
Table 13 Precision of FADE and FACE approaches
Table 14 F-measure of FADE and FACE approaches
Fig. 9
figure 9

Precision of results of Clasp-CPM according to the number of selected attributes for \(fq_{min} = 0.8\)

Tables 1213 and 14 show the results obtained for the three aforementioned measures. The computed values vary between 80 and 90% which are encouraging results for our approach. We also note that the values decrease by increasing the minimum frequency threshold. Indeed, increasing the frequency threshold will produce a set of the most relevant patterns, but on the other hand, it affects our measures since the number of chronicles will decrease, so the test sequences will have less chance of being validated by extracted chronicles. Therefore, we can use small values of the frequency threshold. It will produce a best quality of prediction. However, as shown in the previous section, small values of the frequency threshold will decrease the performance of the system in term of running time and memory consumption. That’s why, we should look for a trade-off between performance and quality by decreasing the frequency threshold until our mining algorithm finds scaling difficulties.

These same experiments are performed on the FACE algorithm. Note that the TPR values are slightly larger than those found by the FADE algorithm. This is due to the fact that FACE is based on Apriori so it extracts more frequent chronicles. But on the other hand, the precision will decrease, since the time constraints computed by FACE are as close as possible to 0, so some chronicles will not be extracted. Unlike our approach, where Clasp-CPM generates the largest time constraints, and therefore the test sequences are more likely to be validated, which increases the precision of the results.

We performed another experiment to evaluate the impact of the feature selection processing on the precision of the results. As shown in Fig. 9, the number of attributes that lead to an optimum value of the precision is 10 (86.22%). This fact argue our choice to perform such a pre-processing method before applying Clasp-CPM. With a such small number of attributes, we can produce a “precise” classifier and ensure at the same time a very acceptable performance of our approach, which is affected by the number of attributes we consider.

Conclusion

This article extends an existing work (Sellami et al. 2018), where we dealt with the problem of frequent chronicle mining for predictive maintenance. We have discussed the techniques of frequent chronicle mining whose purpose is to extract from frequent sequences, the events that trigger machine failures. This process considers the time constraints between the different events, which allow the prediction of the failure event. In this article we improved our approach. We applied the Clasp principle (Antonio et al. 2013) to extract closed patterns from which we mine the frequent chronicles using a new algorithm we called Clasp-CPM. We implemented it using a multi-thread environment. These improvements have considerably impacted the performance of our method as shown in the experiments. We also improved the evaluation of our approach especially by using configurable synthetic data sets. The main finding is that performance is greatly impacted by sequences length in the mined data set. When dealing with the real data set, whose attributes number is huge, we applied an attribute selection method to reduce the number of measures to consider. This pre-processing task has reduced the length of sequences, which impacts the performance of the mining phase, without affecting the quality of prediction as shown in the experiments.

In future work, we will focus on the probability of occurrence of a failure in given temporal constraints using specific techniques of data mining, i.e. uncertain data, which allows to evaluate the trust of the data and subsequently use uncertain ones along with the techniques of chronicle mining. We are investigating extending deep learning algorithm like LSTM to predict failure and time to failure as well.