Keywords

1 Introduction

Predictive maintenance aims to help anticipate equipment failures to allow for advance scheduling of corrective maintenance. It is usually performed based on an assessment of the health status of equipment [1, 2]. Thanks to the rapid development of IoT, massive sensors are deployed on industrial equipment to monitor health status. As a key phase for predictive maintenance, anomaly detection technologies have given us the ability to monitor anomalies from multivariate time-series sensor data or events [3].

In recent years, lot of researches have paid attention to the problem of anomaly detection/prediction from multivariate time-series sensor data. Anomaly detection within individual variables, referred to as “univariate anomaly”, have already been extensively studied [4, 5]. However, it is much more challenging but common in the real applications to mine and analyze temporal dependencies among sensor data or “univariate anomaly” events. It means the possibility of finding new anomaly type or inferring the root cause of an anomaly [6,7,8].

In Sect. 2, a simple example shows the temporal dependencies found among “univariate anomaly” events. Actually, in a complex industrial system, an anomaly/failure is not always isolated. Owing to the obscure physical interactions, trivial anomalies will propagate among different sensors and devices, and gradually deteriorate into a severe one in some device [9]. Mining such temporal dependencies are valuable as it can help forecast future anomalies/failures in advance and identifying the possible root causes for an observable device anomaly/failure [10].

In the paper, we try to propose an effective and explainable approach to predict the anomaly based on mining the temporal dependencies from multi-sensor event sequences. To reach this goal, we detect “univariate anomaly” events from sensor data and output multi-sensor event sequences. Then, we transform the temporal dependency mining problem into a frequent co-occurrence pattern mining problem. Next, a graph-based anomaly prediction model is built based on choosing and connecting the mined temporal dependencies for event predict in the experiment. Furthermore, a lot of experiments are done to show the effectiveness of our approach based on a real dataset from a coal power plant.

2 Motivation

An anomaly event carries much information about an anomaly like its occurrence time, sources and type. Here, we use a 4-tuple to depict an anomaly event: e = (timestamp, eventid, sourceid, type), where timestamp is the occurrence time of e; eventid is the unique identifier of e; sourceid is the unique identifier of the source sensor; and type is the type of anomaly event.

A time-ordered list of events from the same sensor construct an event sequence \( {\text{E}}_{i} = \left\{ {e_{1} ,e_{2} , \ldots ,e_{m} } \right\} \). All the event sequences construct the event space \( {\varvec{\Theta}} = \left\{ {E_{1} ,E_{2} , \ldots ,E_{n} } \right\} \). These events are not isolated with each other. They imply complex temporal dependencies.

Definition 1. Temporal Dependency:

Let \( A = \left\{ {E_{1} ,E_{2} , \ldots ,E_{k} } \right\},1 \le k < m \) be an event set contains k event sequences, \( B = \left\{ {E_{1}^{'} ,E_{2}^{'} , \ldots , E_{h}^{'} } \right\}1\, \le\, h\, <\, m \) be another event set contains another h event sequences, and \( {\text{A}} \cap {\text{B}} = \varnothing \). A temporal dependency [10, 11] is typically denoted as . It means that B will happen within time interval \( \left[ {t_{1} ,t_{2} } \right] \) after A occurs.

Figure 1 shows a sample about temporal dependency among several anomaly events. There are several types of anomaly events have occurred on the different sensors. These events construct five event sequences which are shown in Fig. 1. In which, the event sequence \( {\text{E}}^{1} = \left\{ {{\text{e}}_{i}^{1} ,{\text{i}} = 1, 2, 3, 4, 5, 6, 7} \right\} \) is constructed by H-CF events, and the event sequences \( {\text{E}}^{3} = \left\{ {{\text{e}}_{j}^{3} ,{\text{j}} = 1, 2, 3, 4, 5} \right\} \) is constructed by the H-IAP event. According to the maintenance log, when every event \( {\text{e}}_{j}^{3} \) in \( {\text{E}}^{3} \) happen, the event \( {\text{e}}_{i}^{1} \) in \( {\text{E}}^{1} \) will happen within an average time lag \( \Delta {\text{t}} = 18\,{ \hbox{min} } \). Thus, there is a temporal dependency between the H-CF event and H-IAP event, i.e. . Thus, if there is a H-CF event, we can predict that in the flowing 18 min, there will happen the H-IAP event.

Fig. 1.
figure 1

A real case: Temporal dependencies among event sequences.

Besides the temporal dependency among two event sequences, there is temporal dependency among multi sequences. For example in Fig. 1, the part covered by a shadow contains three event sequences, \( {\text{E}}^{1} \), \( {\text{E}}^{2} = \left\{ {{\text{e}}_{k}^{3} ,{\text{k}} = 1, 2, 3, 4, 5, 6} \right\} \) which is constructed by the H-DPGB events and \( {\text{E}}^{3} \), when events in \( {\text{E}}^{1} \) and \( {\text{E}}^{3} \) occurs, the event in \( {\text{E}}^{2} \) also occur within an average time lag \( \Delta {\text{t}} = 61\,{ \hbox{min} } \). Thus, we can see that there is a temporal dependency between the H-CF event, H-IAP event and H-DPGB event, donated as .

This case illustrates that if we can discovery such temporal dependency for multi events, we have chances of predicting the anomaly event or inferring the root cause of an observable anomaly. Thus, the goal of this paper is mining the temporal dependency from multi event sequences.

3 Temporal Dependency Mining

3.1 Overview

For now, there are lots of excellent techniques have been developed to detect the univariate events. The common ones include range-based approaches, outlier detection approaches [4, 5]. A range-based approach customizes value bounders for individual sensor based on inspectors’ experiences, sensor/device instructions and so on. Outliers are widely known as the values which sufficiently deviate from most ones, the original outlier detection methods were arbitrary, but in recent years statistics techniques are used [12].

The main idea of mining the temporal dependency is to transform temporal dependency into a frequent co-occurrence pattern across multi event sequences. Essentially, a temporal dependency among the means that an event set B frequently occurs within time interval \( \Delta t \) after an event set A occurs. In other words, the temporal dependency is a relationship among the objects in a frequent co-occurrence pattern within a time interval. It inspires us to mine frequent co-occurrence patterns so as to discover event temporal dependencies. This process will be detailed described in Sects. 3.2 and 3.3.

3.2 The Frequent Co-occurrence Pattern Mining

In this section, we explain what a frequent co-occurrence pattern across multi event sequences is, what the differences between the novel pattern mining and traditional frequent co-occurrence pattern mining are, and how to mine the novel patterns. We first list some related concepts in mining the frequent co-occurrence patterns.

Co-occurrence Pattern:

For a set of objects \( { \mathcal{O}} = \left\{ {o_{1} ,o_{2} , \ldots ,o_{k} } \right\} \) that appear in the same event sequence \( E_{i} \), an object refers to an event type. \( T\left( {\mathcal{O}} \right)^{{E_{i} }} = \left\{ {t_{{{\text{o}}_{1}^{{E_{i} }} }} ,t_{{{\text{o}}_{2}^{{E_{i} }} }} , \ldots ,t_{{{\text{o}}_{\text{k}}^{{E_{i} }} }} } \right\} \), \( t_{{{\text{o}}_{\text{j}}^{{E_{i} }} }} \) is the occurrence time of \( o_{j} \) \( \left( {j = 1,2, \ldots ,m} \right) \) in \( E_{i} \), if the \( {\mathcal{O}} \) satisfies that \( max\left( {T\left( {\mathcal{O}} \right)^{{s_{i} }} } \right) - min\left( {T\left( {\mathcal{O}} \right)^{{s_{i} }} } \right) \le \xi \), then we say that \( {\mathcal{O}} \) is a co-occurrence pattern (CP), ξ is a user-specified threshold.

However, the challenge is how to identify the time lag between two event sets who has the temporal dependency. It actually reflects how long that a set of events will be affected by its related events. Unfortunately, most traditional frequent co-occurrence pattern mining algorithms cannot directly solve such problem. They only focused on the occurrence frequency of a group of unordered objects [13]. Hence, we try to design an algorithm to discover a constrained frequent co-occurrence pattern. Such pattern consists of two object groups, where intra-group objects are unordered and inter-group objects are time-ordered, and all objects span no more than Δt. We call such pattern as frequent co-occurrence pattern across multi event sequences.

Frequent Co-occurrence Pattern Across Multi Event Sequences:

For a Co-occurrence pattern \( {\mathcal{O}} = \left\{ {{\mathcal{O}}_{\text{pre}}\, \cup\, {\mathcal{O}}_{\text{post}} } \right\} \) that occurs in a set of l event sequences \( \left\{ {E_{1} ,E_{2} , \ldots ,E_{l} } \right\} \). The \( {\mathcal{O}}_{\text{pre}} \) and \( {\mathcal{O}}_{\text{post}} \) will form the multi-dimensional co-occurrence pattern, donated as \( MCP\left( {O_{pre} ,O_{post} } \right) \), if the \( { \mathcal{O}} \) satisfies the following conditions: (1) every object \( o_{i} \in O_{pre}\, \cup\, O_{post} \) comes from different event sequences; (2) the object in \( O_{post} \) always occurs after the occurrence of object in \( O_{pre} \); (3) \( \hbox{max} \left\{ {T\left( {O_{post} } \right)} \right\} - \hbox{min} \left\{ {T\left( {O_{pre} } \right)} \right\} \le \Delta t \), in which \( \Delta t \) is the time lag, \( O_{pre} \) contains m events and \( O_{post} \) contains n events, i.e. \( \left| {O_{pre} } \right| = m \) and \( \left| {O_{post} } \right| = n \). Thus, the \( MCP\left( {O_{pre} ,O_{post} } \right) \) also can be donated as \( MCP_{m,n} \left( {O_{pre} ,O_{post} } \right) \). If the \( MCP_{m,n} \left( {O_{pre} ,O_{post} } \right) \) have occurred more than k times in l event sequences \( \left\{ {E_{1} ,E_{2} , \ldots ,E_{l} } \right\} \), then the \( MCP_{m,n} \left( {O_{pre} ,O_{post} } \right) \) will be regarded as the Multi-dimensional Frequent Co-occurrence Pattern, denoted as \( {\text{FMCP}}\left( {{\text{O}}_{\text{pre}} ,{\text{O}}_{\text{post}} } \right) \) or \( FMCP_{m,n} \left( {O_{pre} ,O_{post} } \right) \). In which, \( {\mathcal{O}}_{\text{pre}} \) is the antecedent, \( {\text{O}}_{\text{post}} \) is the consequent, Δt is the time lag between them.

Based the above definition, it is obviously that, our FMCP mining task is significantly different from the traditional one. The difference is a MCP is supposed to be divided into two groups, where intra-group objects are unordered and inter-group objects are time-ordered. This time constraint raises the complexity of our task. Assume that the frequency of a FMCP \( {\mathcal{O}} = \left\{ {o_{1} ,o_{2} , \ldots ,o_{m} } \right\} \) is \( l \). To find out all valid divisions by traditional ideas, we have to count the frequency for any possible division of \( {\mathcal{O}} \). The number of possible divisions is \( 2*\left( {C_{m}^{2} + \ldots + C_{m}^{{\left\lceil {m/2} \right\rceil }} } \right) \), where \( \left\lceil {m/2} \right\rceil \) will return the closest integer greater than or equal to \( m/2 \), not to mention the number of object groups. Owing to this difference, our task is unable to be simply solved by the well-known generation and counting strategy.

3.3 FMCP Mining Algorithm

In this paper, we use \( \upgamma\left( {{\text{A }},{\text{B}}} \right) \) to denoted the temporal dependency between event sequences, and we use the \( \gamma \left( {A,B} \right).sup \) to donate the occurrence probability of B given the knowledge that A have occurred. It is used to filter the mined temporal dependency.

$$ \gamma \left( {A ,B} \right).sup = sup\left( {{\mathcal{O}}_{post} |{\mathcal{O}}_{pre} } \right) = \frac{{freq\left( {{\mathcal{O}}_{post} |{\mathcal{O}}_{pre} } \right)}}{{freq\left( {\mathcal{O}} \right)}} $$

In which, \( freq\left( {{\mathcal{O}}_{post} |{\mathcal{O}}_{pre} } \right) \) is the frequency of the occurrence of \( {\mathcal{O}}_{post} \) after with \( {\mathcal{O}}_{pre} \), and \( freq\left( {\mathcal{O}} \right) \) is the frequency of \( {\mathcal{O}} \).

Assume that the occurrences threshold \( freq_{min} = sup_{min} \), where \( sup_{min} \) is the threshold of support, and assume that FP is the set constructed by all the FMCP in \( {\varvec{\Theta}} \). All temporal dependency relationships whose satisfied that \( sup > sup_{min} \) constitute the set R, then \( \forall \gamma \left( {E_{i} ,E_{j} } \right) \in R \) and there is and only one \( TFCP\left( {O_{pre} ,O_{post} } \right) \in FP \) satisfy that \( \gamma \left( {E_{i} ,E_{j} } \right) \) is the temporal dependency of \( O_{pre} \) and \( O_{post} \), and vice versa.

Because of that \( \gamma \left( {E_{i} ,E_{j} } \right) \in R \), so \( \gamma \left( {E_{i} ,E_{j} } \right).sup \ge sup_{min} \), thus, the number of occurrences of \( E_{j} \) is over \( sup_{min} \) after the occurrence of \( E_{i} \) within a time range \( \gamma \left( {E_{i} ,E_{j} } \right).\Delta t \). In a conclusion, \( E_{i} \) and \( E_{j} \) be a FMCP, denoted as \( TFCP\left( {E_{i} ,E_{j} } \right) \), i.e. \( TFCP\left( {E_{i} ,E_{j} } \right) \in FP \). Every FMCP is constructed with antecedent and the consequent, and the item is the set is unique. Therefore, if there is \( MCP\left( {O_{pre}^{ '} ,O_{post}^{ '} } \right) \ne FMCP\left( {O_{pre} ,O_{post} } \right) \) and \( FMCP\left( {O_{pre}^{ '} ,O_{post}^{ '} } \right) \in P \), which satisfy that \( \gamma \left( {E_{i} ,E_{j} } \right) \) is the temporal dependency of \( O_{pre}^{ '} \) and \( O_{post}^{ '} \), then \( O_{pre}^{ '} = E_{i} , O_{post}^{ '} = E_{j} \), and the number of occurrences is \( freq\left( {FMCP\left( {O_{pre}^{ '} ,O_{post}^{ '} } \right)} \right) = \gamma \left( {E_{i} ,E_{j} } \right).sup \), and time lag is \( FMCP\left\langle {O_{pre}^{ '} ,O_{post}^{ '} } \right\rangle .\Delta t = \gamma \left( {E_{i} ,E_{j} } \right).\Delta t \). Then \( TFCP\left( {O_{pre}^{ '} ,O_{post}^{ '} } \right) = FMCP\left( {O_{pre} ,O_{post} } \right) \), but this is conflict. Thus, it is proofed that \( \forall \gamma \left( {E_{i} ,E_{j} } \right) \in R \), there is and only one \( FMCP\left( {O_{pre} ,O_{post} } \right) \in FP \) satisfy that \( \gamma \left( {E_{i} ,E_{j} } \right) \) is the temporal dependency of \( O_{pre} \) and \( O_{post} \).

Therefore, if we can get all the FMCP in \( {\varvec{\Theta}} \), we can calculate support of temporal dependencies and filter the candidate temporal dependency sets that satisfy the conditions. Thus, based on the traditional method of frequent co-occurrence pattern mining, in this paper, we proposed an approach called as ETD-mining with a three-stage process, that is “Generation-Filter-Extension”, to mine the FMCP.

Given a Frequent Co-occurrence Pattern \( FMCP\left( {O_{pre} ,O_{post} } \right) \), the frequent number is donated as \( freq\left( {FMCP\left( {O_{pre} ,O_{post} } \right)} \right) \), and the \( freq\left( {MCP\left( {O_{pre} ,O_{post} } \right)} \right) = \gamma \left( {O_{pre} ,O_{post} } \right).sup \). If the \( freq_{min} = sup_{min} \), then for any temporal dependency \( \gamma \left( {E_{i} ,E_{j} } \right) \) that satisfied \( \gamma \left( {E_{i} ,E_{j} } \right).{ \sup } > sup_{min} \), there is \( freq\left( {MCP\left( {E_{i} ,E_{j} } \right)} \right)\, \ge\, freq_{min} \), it means that \( MCP\left( {E_{i} ,E_{j} } \right) \) is a FMCP. Thus, for any event \( e \in E_{i} \cup E_{j} \), the occurrence number in some sequence satisfy \( freq\left( e \right)\, \ge\, freq_{min} \). Therefore, the first step of mining FMCP is to find all the events whose occurrence number over \( freq_{min} \), these events denoted as \( F^{1} \). This step is consistent with the traditional method of frequent co-occurrence pattern mining.

According to the analysis of the above, any support more than the threshold value of event correlation, which incorporates the event set and the target event set can only consist of the event in \( F^{1} \). Therefore, we can get the source through the events in combination \( F^{1} \) events set and target set, and then to filter out support more than the threshold value of event temporal dependency.

The more the event sequences, the higher the cost of filter. If there is \( n_{seq} \) event sequences in \( {\varvec{\Theta}} \), and every event sequence contains \( m_{i} \) types of events, then in \( {\varvec{\Theta}} \) there are \( \dot{m} = \mathop \sum \nolimits_{i = 1}^{{i = n_{seq} }} m_{i} \) types of events. These events construct the temporal dependency sets can be denoted as \( canP = \left\{ {\left\langle {E_{i} ,E_{j} } \right\rangle } \right\} \), then the number of canP is \( \left| {canP} \right| = C_{{\dot{m}}}^{2} *C_{2}^{1} + \ldots + C_{{\dot{m}}}^{k} *\left( {C_{k}^{1} + \ldots + C_{k}^{k - 1} } \right) + \ldots + C_{{\dot{m}}}^{{\dot{m}}} *\left( {C_{{\dot{m}}}^{1} + \ldots + C_{{\dot{m}}}^{{\dot{m} - 1}} } \right) \). And for any \( \left\langle {E_{i} ,E_{j} } \right\rangle \in canP \), it is need to verify that if \( \gamma \left( {E_{i} ,E_{j} } \right).sup \ge sup_{min} \). It is obviously that the temporal dependency number need to be filter is very large.

Thus, we designed an extension strategy to avoid this problem based on the following Theorem.

Theorem 1:

For any \( FMCP\left( {E_{i} ,E_{j} } \right) \) and the temporal dependency \( \gamma \left( {E_{i} ,E_{j} } \right) \) of event \( E_{i} ,E_{j} \). Assume that the frequency threshold is equal to the support threshold, i.e. \( freq_{min} = sup_{min} \), then for any subset \( E_{i}^{ '}\, \subseteq\, E_{i} \) and \( E_{j}^{ '}\, \subseteq\, E_{j} \), they also be \( FMCP\left( {E_{i}^{ '} ,E_{j}^{ '} } \right) \). Besides, when they satisfied that the time lag of temporal dependency \( \gamma \left( {E_{i}^{ '} ,E_{j}^{ '} } \right).\Delta t\, \ge\, \gamma \left( {E_{i} ,E_{j} } \right).\Delta t \), then there is \( \gamma \left( {E_{i}^{ '} ,E_{j}^{ '} } \right).sup \ge \gamma \left( {E_{i} ,E_{j} } \right).sup \).

Proof:

It is obviously that the occurrence count number of \( E_{j} \) is \( \gamma \left( {E_{i} ,E_{j} } \right).sup \) within a time range \( \gamma \left( {E_{i} ,E_{j} } \right).\Delta t \) after the occurrence of \( E_{i} \). For that \( E_{i}^{ '}\, \subseteq\, E_{i} \) and \( E_{j}^{ '}\, \subseteq\, E_{j} \), thus, the \( E_{j}^{ '} \) will occur at last \( \gamma \left( {E_{i} ,E_{j} } \right).sup \) times in a time range \( \gamma \left( {E_{i} ,E_{j} } \right).\Delta t \) after the occurrence of \( E_{i}^{ '} \).

Based on the Theorem 1, we can inference that for any \( \gamma \left( {E_{i} ,E_{j} } \right) \) (the support has over the threshold), assume that \( \left| {E_{i} } \right| > 1 \), \( \left| {E_{j} } \right| > 1 \), the we can get the \( FMCP\left( {E_{i} ,E_{j} } \right) \) of \( \gamma \left( {E_{i} ,E_{j} } \right) \) by extending some \( FMCP\left( {\left\{ {\varepsilon_{\alpha } } \right\},\left\{ {\varepsilon_{\beta } } \right\}} \right) \), in which \( \varepsilon_{\alpha } \in E_{i} \) and \( \varepsilon_{\beta } \in E_{j} \). We can compose the event in \( F^{1} \) to get all the patterns such as \( \left( {\left\{ {\varepsilon_{\alpha } } \right\},\left\{ {\varepsilon_{\beta } } \right\}} \right) \), then verify that if the pattern satisfied \( \gamma \left( {\left\{ {\varepsilon_{\alpha } } \right\},\left\{ {\varepsilon_{\beta } } \right\}} \right).sup \ge sup_{min} \), and filter the temporal dependencies whose support over the threshold. Thus, we can get \( FMCP\left( {\left\{ {\varepsilon_{\alpha } } \right\},\left\{ {\varepsilon_{\beta } } \right\}} \right) \). Then, choose the remaining event in \( F^{1} \) to extend the antecedent and consequent of \( FMCP\left( {\left\{ {\varepsilon_{\alpha } } \right\},\left\{ {\varepsilon_{\beta } } \right\}} \right) \). At the same time, to verify that if the supports of temporal dependencies for antecedent and consequent are more than the threshold.

Algorithm 1 is the pseudocode of mining the temporal dependency. It first finds the events who had occurred more that \( sup_{min} \) times from the event space \( {\varvec{\Theta}} \), and put them into the \( F^{1} \) (line 1–2). Then, modeling any two events into the pattern \( \left( {\varepsilon_{\alpha } ,\varepsilon_{\beta } } \right) \), and verifying that if the formed pattern is \( TFCP_{1,1} \) (line 3–6). Next, it uses the extend ()function (line 18–26) to expand antecedents of \( FMCP_{1,1} \) recursively (line 7–8). The extension process will break up until the extended pattern is not FMCP or there is no object in \( F^{1} \) can be extended. Based on the result of extension for antecedents, we use extend () function to expand consequents of \( FMCP_{1,1} \) recursively (line 9–11). Finally, we put all the FMCP into the set P, the event relationship between antecedents and consequents components in P constitute the set R.

figure d
figure e

4 Experiments

4.1 Experiment Dataset and Environment

Environments:

The experiments are done on a PC with four Intel Core i5-6300HQ CPUs 2.30 GHz and 16.00 GB RAM. The operating system is Centos 6.4. All the algorithms are implemented in Java with JDK 1.8.5.

Dataset:

There are totally 361 sensors deployed on 8 important devices. Each sensor generates one record per second, the dataset size is about 3.61 GB. And the dataset is divided into the training dataset and test dataset. The training set is from 2014-10-01 00:00:00 to 2015-03-31 23:59:59. The testing set is from 2015-04-01 00:00:00 to 2015-04-30 23:59:59.

We use real faults contained in the maintenance records of the plant power from 2014-07-01 00:00:00 to 2015-06-30 23:59:59 to verify our warnings.

4.2 Experiment Setup

Firstly, we conduct the temporal mining algorithm on the training dataset to find the temporal dependencies. We can get the temporal dependency quantity (TDQ) of the training data. We observe the variation trend of the TDQ under the different parameters of algorithm and different time of dataset.

Then, we built an anomaly prediction model with a directed graph over the anomaly events based on choosing and connecting their mined temporal dependencies. The graph is defined as \( G = \left\langle {V,E} \right\rangle \), where V is the set of anomaly events, and \( E\, \subseteq\, V \times V \) is a non-empty set of edges. Each direct edge \( v_{i} \to v_{j} \) along with a weight, the weight is the time lag Δt, it means that if the anomaly event vi occurs, we can predict that the anomaly event vj will occur after a time lag Δt.

Based on the anomaly prediction graph, we conduct the experiment on test data set. We compare our anomaly prediction model with the other two typical approaches, they are the range-based approach [4], the outlier detection approach [5]. Once the associated anomalies are detected by them, they will make a warning of maintenance to the corresponding fault.

For evaluations, we consider the following performance metrics.

The Temporal Dependency Quantity (TDQ):

The Number of temporal dependencies mined in the input data set;

Warning Time:

Warning time is the difference between the timestamp an approach makes a warning of maintenance for a fault and the starting time of this fault;

Precision:

Precision represents how many abnormities are accurate according to failure records;

Recall:

It presents the probability of being able to classify positive cases, which is defined as following.

4.3 Experiment Result

Firstly, we conduct the experiment to verify how the value of TDQ changes under different data size with different time range when the parameter \( sup_{min} = 0.8\,{\text{and}}\,\Delta t = 10 \). The data size increase from 1-month data to 6-months (the whole training set) dataset and each time for 1 month. This experiment mining the temporal dependences with no less than 0.8 probability (i.e., \( sup_{min} = 0.8 \)). Table 1 shows the result of TDQ.

Table 1. Experiment results of TDQ under different data size \( \left( {sup_{min} = 0.8,\;\Delta t = 10\,{ \hbox{min} }} \right) \).

The Table 1 shows that as the size of the data set increases, the total TDQ also shows an upward trend. However, it is clearly that there is no linear correlation between the TDQ and the size of the data set. Overall, the size of the data set has risen from one month to six months, and the TDQ generated is relatively close, always between 4,000 and 5,000. It is indicated that, as the time goes on increases, the TDQ does not rise rapidly, but increases relatively slowly and slowly, and may even be controlled within a certain range. The reason is that as the time goes on, the more temporal dependencies gradually enriched, but the growth rate trend is stable. This result indicated that our algorithm has a certain robustness.

Then, we compare the precision results and recall results of different methods. Notably, in this paper, we only consider the predict events with finally failures occurring both in training set and testing set. Figure 2 shows the final average results.

Fig. 2.
figure 2

Experiment Results of Precision and Recall under different approach

Figure 2 indicates that our methods performs well among the three methods, and achieves the highest accuracy and the second highest recall rate.

As shown in the Fig. 2, the precision of our approach is 88.33%, and the recall rate is 85.48%. The analysis of the intermediate results revealed that some of the repetitive anomaly propagation paths excavated in the training cannot be detected in the test set. For the same type of faults, the abnormal propagation path excavated in the test data set has changed compared to the training set. This phenomenon is essentially caused by caused by the character of uncertainty in the stream data.

The precision of the ranged-based approach and the outlier detection approach are 65.24% and 85.92%, and the recall of them are 72.62% and 78.26%. Analysis of the intermediate results revealed that the two approaches are based on the single sensor data. However, usually one fault may be caused by multiple anomalies, it can correspond to multiple isolated anomalous points. The signal sensor data-based detection method cannot detect such a fault. Our approach will find the correlation between multi sensors and then form the anomaly propagation paths, this helps us to find more hidden anomalies.

The above results show that the temporal dependency has an effective effect in constructing fault prediction logic and fault detection.

5 Related Works

Recently, researchers have designed several approaches dealing with the problem of anomaly prediction for the predictive maintenances. Several quantitative models ranging from simple linear discriminant analysis, more complex logistic regression analysis, and neural networks have been proposed for prediction [14]. Zhang et al. [1] presented a novel system that automatically parses streamed console logs and detects early warning signals for IT system failure prediction based on the LSTM neural network. Susto et al. [15] developed a multiple classifier machine learning methodology to deal with the unbalanced datasets that arise in maintenance classification problems. Baban et al. [16] used a fuzzy logic approach to develop a decision-making system that allows determining the lifetime of the needle and plane predictive maintenance of the needle of a sewing machine. However, it required expert knowledge and depends on datasets of small quantity.

The above works have performed well to detect or predict the univariate anomaly for the IoT applications. However, we cannot explain why and how these approach works. Besides, more and more IoT applications need to analyze and identify the root causes of the anomalies, while these approaches cannot answer the root causes.

Mining temporal dependency provide essential clues to identify the cause relationship among anomalies [17]. Several types of dependent pattern mining tasks have been induced from practical problems and carefully studied. Song and et al. mined activity dependencies (i.e., control dependency and data dependency) to discover process instances when event logs cannot meet the completeness criteria [7]. A dependency graph is utilized to mine process instances. However, the authors do not consider the dependency among events. Plantevit et al. presented a new approach to mine temporal dependencies between streams of interval-based events [8]. Friedberg et al. proposed a novel anomaly detection approach. It keeps track of system events, their dependencies and occurrences, and thus learns the normal system behavior over time and reports all actions that differ from the created system model [18].

In this paper, we introduce the temporal dependency into the predictive maintenance to improve the explanation of prediction approaches and discovery the root cause of anomaly.

6 Conclusion

In this paper, we try to propose an effective and explainable approach to predict the anomaly based on mining the temporal dependencies from multi-sensor event sequences. To reach this goal, we detect “univariate anomaly” events from sensor data and output multi-sensor event sequences. Then, we transform the temporal dependency mining problem into a frequent co-occurrence pattern mining problem. Furthermore, a lot of experiments have been done to show the effectiveness of our approach based on a real dataset from a coal power plant. But the speed of our method can also be improved. So, for future work, we are interested in parallel optimization algorithms to speed up our method.