Keywords

1 Introduction

A large number of sensors will collect various types of data in industrial production processes. Before using data, it is necessary to evaluate its quality, and the timeliness of data is an important indicator of data quality [1]. Therefore, a time dependent method for identifying timeliness is proposed. This method can effectively identify the timeliness of data when the timestamp is incomplete. The prerequisite for assessing the timeliness of data is that the data is correct and can meet work needs. Compared to the correctness of data, the timeliness of data does not necessarily need to be tested in the real world [2]. Therefore, the measurement of data timeliness should be an estimate, not a validation statement under certainty, to determine the probability of data validity [3]. For a large amount of data, it is reasonable to quantify timeliness through this estimation when the validity of the data is unclear [4].

Reference [5] proposes a probability base metric for calculating the timeliness of Wiki articles related to timeliness events. Reference [6] defines a recurrent timeliness rules (RTR) to evaluate the timeliness of periodic data generated during the manufacturing process. Reference [7] define a measure as a function that depends on the age of the attribute value at the time the currency is evaluated and the sensitivity parameters that make the measure suitable for the application environment. Reference [8] assumes that timestamps are available and mentions using the current time variable to represent the latest time. Reference [9] developed an uncertain constraint database scheme based on the constraint database scheme, and abstracted the important example of the complexity of query identification in uncertain time constraint databases. Reference [10] was used to investigate the work on time constraint satisfaction problem (TCSP) by finding estimates that satisfy a set of time variables with time constraints. Reference [11] first studied the use of rules \(\forall {t}_{1},\cdots ,{t}_{j}:R({t}_{1}[EID]={t}_{j}[EID]\wedge \varphi \to {t}_{u}{\prec }_{A}{t}_{v}),j\in [1,k]\) to help identify the timeliness of data when there is no clear timestamp in the database. Document [12] proposed a dynamic functional dependency relationship, which stipulates that a copy function can require some attributes that cannot be independently changed to be copied together. It is necessary to judge the time series relationship describing different attribute values of the same entity based on a small amount of time series rules obtained from domain knowledge, so as to identify which values are outdated. The disadvantage is that the current method cannot identify whether a value is outdated or invalid at a given time point.

In summary, the current research on data timeliness is not suitable for direct application to the identification of industrial data timeliness. There are two main reasons: firstly, it is unable to effectively provide the timeliness of the current data without clear timestamps. Secondly, it is impossible to quantify the timeliness of data at a given time point. Therefore, this article proposes a method for evaluating the timeliness of manufacturing data based on weighted timeliness maps, which is used to identify the timeliness of industrial data through time dependency relationships without a clear timestamp.

2 Data Timeliness Identification Method

This article presents the dependency relationships between various conclusions by constructing a conclusion dependency sequence diagram. Then, merge the conclusion dependency graph into a process dependency sequence graph to determine the calculation order of the timeliness of each process data. Finally, the timeliness of data in a certain process is determined through a weighted timeliness graph.

2.1 Limited Timeliness Dependency Rules

Time dependent rules refer to the identification of the timeliness of production data through the dependency relationships between processes.

Definition 1. (Limited timeliness dependency rule): In Rule \(r\), there is a dependency relationship between each process, and the data generated in Process \(A\) will be limited by its related processes. The degree of limitation is quantified by the strength of the dependency. The rules that have limited dependencies between these process data are defined as limited time dependency rules, and their dependencies can be represented as:

$$r:\forall {t}_{i}, {t}_{j}({\psi }_{B}\to {t}_{i}{\prec }_{A}{t}_{j}, \beta )$$
(1)

Among them, \({t}_{i}{\prec }_{A}{t}_{j}\) represents the generation of date \({t}_{i}\) before the generation of date \({t}_{j}\), \(\beta \) It is the dependency strength of the rule, \(\psi \) Represents the temporal relationship or constraint conditions in process \(B\). The dependency strength of rule represents the likelihood of the rule represented by \(r\). Because in actual industrial production processes, each process or employee is relatively independent, probability can be used to represent the relationship between different process data.

Let \(lhs(\mathrm{r})\) represents the left part of the rule, and \(rhs(\mathrm{r})\) represents the right part of the rule, that is, in formula (1), \(lhs\left(\mathrm{r}\right)=\psi ,rhs\left(r\right)={t}_{i}{\prec }_{A}{t}_{j}\). For the right part of the rule, \(rhs\left(r\right)=tml(lhs(r))\times tml(r)\), where the dependency strength calculation of the left part \(lhs\left(r\right)=\psi \) follows the following rule:(a). If \(\psi \) is the condition \({t}_{i}[A] op {t}_{j}[A]\) or \({t}_{k}[A] op a\) is determined, then if \(\psi \) is true, then \(tml(\psi )=1\) is satisfied, otherwise \(tml(\psi )=0\), where \(op\in \{=, \ne ,<,>, \le , \ge \}\) and \(a\) are constants; (b). If \(\psi \) is \({t}_{i}{\prec }_{A}{t}_{j}\) or \({t}_{k}{\prec }_{A}\tau \) and \(\psi \) is not the right part of any rule \(r\), then the value of \(tml(\psi )\) is obtained statistically; (c). If \(\psi \) is the right part of other rules \(r{\prime}\), the value of \(tml(\psi )\) is obtained by the calculation method of the right part of rules \(r{\prime}\) dependence strength; (d). If \(\psi ={\psi }_{1}\wedge {\psi }_{2}\), then \(\psi \) represents the conjunction of \({\psi }_{1}\) and \({\psi }_{2}\), and the dependence strength \(tml(\psi )=min\{tml({\psi }_{1}), tml({\psi }_{2})\}\) after the conjunction.

2.2 Process Dependency Sequence Graph

To determine the calculation order of the timeliness of each process data, it is necessary to construct a process dependency graph. To construct a process dependency graph, it is first necessary to construct a conclusion dependency graph based on dependency rules, and then merge them. The specific steps for constructing a dependency sequence diagram are as follows. Firstly, based on the type of conclusion, determine whether the conclusion needs to be included in the dependency sequence diagram. The conclusion of \({t}_{i}\left[A\right] op {t}_{j}[A]\) or \({t}_{k}[A] op a\) is a deterministic conclusion that satisfies the condition of 1, otherwise it is 0 and does not need to be added to the conclusion dependency diagram; The conclusions of \({t}_{i}{\prec }_{A}{t}_{j}\) and \({t}_{i}{\prec }_{A}\uptau \) are non-deterministic and need to be added to the conclusion dependency sequence diagram. Then, when the conclusion is determined to be non-deterministic, two types of nodes, \((\prec , A,*,*)\) and \((\prec , A,\uptau )\), are constructed for each process \(A\) to represent the conclusions of type \({t}_{i}{\prec }_{A}{t}_{j}\) and type \({t}_{i}{\prec }_{A}\uptau \). Finally, scan the rule set \(\Sigma (r)\), is there a rule \({r}_{k}\) that makes \({\psi }_{1}=lhs\left({r}_{k}\right), {\psi }_{2}=rhs({r}_{k})\)? If so, add directed edges from \({\psi }_{1}\) to \({\psi }_{2}\) to the dependency order graph.

The overall principle for merging conclusion dependency graphs into process dependency graphs is: (a). Merge conclusion nodes containing multiple identical processes in the diagram into one process node; (b). The directed edge no longer represents the dependency relationship between conclusions, but rather represents the dependency relationship between various processes.

2.3 Weighted Timeliness Graph

To evaluate the timeliness of all data items in process A, it is necessary to determine the values of \(tml({t}_{i}{\prec }_{\mathrm{A}}{t}_{j})\) and \(tml({t}_{i}{\prec }_{\mathrm{A}}\tau )\) separately. Therefore, weighted timelines graph (WTG) has been defined.

Definition 2. (weighted timeliness graph, WTG): Let \(D\) represent a set of data, \(\Sigma \) Represents a collection of time dependent rules, \({\rm T}\) representative \(\Sigma \) All sets containing time in the time rule, where \(A\) represents a certain process, \(\theta \) Is the effective time threshold of \(D\), and the weighted time graph of process \(A\) is \({WTG}_{A}\), which is defined as:

  1. 1.

    \({WTG}_{A}\) contains data items \(t\) and time \(\tau \) in two types of conclusions \({t}_{i}{\prec }_{\mathrm{A}}{t}_{j}\) and \({t}_{i}{\prec }_{\mathrm{A}}\tau \), and both take a certain time \(\tau \) as the initial node of the \({WTG}_{A}\) graph, where \(\tau \) includes the original time in the dataset and the threshold we set \(\theta \);

  2. 2.

    aggregate \(\Sigma \) Each rule \(r\) in is defined as the initial node of the \({WTG}_{A}\) graph if \(\tau \in T\cup \{\uptheta \}\) causes \(\tau \subset rhs\left(r\right)\);

  3. 3.

    aggregate \(\Sigma \) For each rule r in, if \({t}_{1},{t}_{2}\in D\) causes \(rhs\left(r\right)={t}_{1}{\prec }_{A}{t}_{2},tml(lhs(r))>0\) or \(tml(r)>0\), update the weight of the directed edge from \({t}_{2}\) to \({t}_{1}\);

  4. 4.

    The weight of the directed edge \({t}_{1},{t}_{2}\) in the \({WTG}_{A}\) graph is denoted as \(weight({t}_{1},{t}_{2})=tml\left(r\right)\times tml\left(lhs\left(r\right)\right)\).

The implementation details of constructing a weighted timeliness graph are shown in Algorithm 1.

figure a

2.4 Path Single Flux of Weighted Timeliness Graph

The timeliness evaluation of data item \({t}_{i}[A]\) requires determining the possibility of conclusion \({t}_{i}{\prec }_{A}\tau \) being established, that is, the value of \(tml({t}_{i}{\prec }_{A}\tau )\). The possibility of conclusion \({t}_{i}{\prec }_{A}\tau \) can be obtained by immediate inference of a rule \({r}_{k}\) or there is a directed path \(Path(\tau , {t}_{i})\) from \(\tau \) to \({t}_{i}\) in the weighted time effect graph.

Definition 3. (Path single flux of weighted timeliness graph): The single flux of path \(Path(\tau , {t}_{i})\) in the \({WTG}_{A}\) graph is denoted as \(sFlux(Path(\tau , {t}_{i}))\), which is defined as the weight of the maximum directed edge that can flow in a single path from \(\tau \) to \({t}_{i}\).

The weighted time effect graph path single flux can be mainly divided into two meanings:

  1. 1.

    (a). In the weighted efficiency graph of process \(A\), when there is only one path from \(\tau \) to \({t}_{i}\) in path \(Path(\tau , {t}_{i})\), we take the directed edge with the highest weight in the path as the flux of this path. Namely, \(Flux\left(Path(\tau , {t}_{i})\right)=min\{weight\left({t}_{i}{\prec }_{A}{v}_{1}\right), weight\left({v}_{1}{\prec }_{A}{v}_{2}\right), \cdots , weight({v}_{k}{\prec }_{A}\tau )\}\).

  2. 2.

    (b). In the weighted efficiency chart of process \(A\), when there are multiple paths from \(\tau \) to \({t}_{i}\) in path \(Path(\tau , {t}_{i})\), we take the maximum flux value of the multiple paths as the single flux of path \(Path(\tau , {t}_{i})\) in the graph \({WTG}_{A}\). Namely, \(sFlux\left(Path\right)=Max\{Flux\left({Path}_{1}\right), Flux\left({Path}_{2}\right), \cdots , Flux\left({Path}_{k}\right)\}\).

3 Example Discussion and Experimental Verification

In order to further explain the defined method for assessing timeliness based on weighted timeliness graph, this article selects an example for discussion. Finally, the effectiveness of the method was verified by testing its recall and accuracy on real datasets.

Table 1. Process data example.

As shown in Table 1, an example \(D\) of a process flow with three tuples is given. The process flow example \(D\) includes four process data examples of load testing, high pressure test, high temperature test and vibration test, as well as the identification code PID and Warranty of each product.

In addition, its corresponding set of restricted failure dependency rules has been defined \(\Sigma (r)\), \(\Sigma (r)\) contains 5 dependency rules, denoted as \({r}_{1}\) to \({r}_{5}\), as shown in Table 2.

Table 2. Formulation rule representation of dependency intensity uncertainty.

There are 5 rules in Table 2 that represent the dependency relationships between different process data in Table 1. It can be seen that in practice, there may be situations where multiple dependency rules derive the same conclusion. Due to the stronger dependency strength of rules, they often have stronger persuasiveness. Therefore, the value with higher dependency strength is chosen as the final dependency strength of this conclusion. The formula is:

$$rhs\left(r\right)\Rightarrow tml\left(Q\right)=max\left\{tml\left(rhs\left({r}_{1}\right)\right), tml\left(rhs\left({r}_{2}\right)\right), \cdots , tml\left(rhs\left({r}_{k}\right)\right)\right\}, i\in [1,k]$$
(2)

Among them, \(tml(rhs({r}_{i}))=tml({r}_{i})\times tml(lhs({r}_{i}))\).

On the basis of the original \({r}_{2}\) and \({r}_{4}\), three rules \({r}_{6}:\forall {t}_{i},{t}_{j} ({t}_{i}{\prec }_{Vibration test}{t}_{j}\) \(\to {t}_{i}{\prec }_{Load test}{t}_{j}, 0.8)\), \({r}_{7}:\forall {t}_{i},{t}_{j} ({t}_{i}{\prec }_{Vibration test}{t}_{j}\) \(\to {t}_{i}{\prec }_{High voltage test}{t}_{j}, 0.8)\), \({r}_{8}:\forall {t}_{i}, {t}_{j}({t}_{i}{\prec }_{Load test}2023\) \(\to {t}_{i}{\prec }_{Vibration test}{t}_{j})\) have been added. Since the left part of rules \({r}_{1}\) and \({r}_{3}\) belong to deterministic conclusions, they do not need to appear in the dependency order diagram. Based on these five rules, a conclusion dependency order diagram was constructed, as shown in Fig. 1.

Fig. 1.
figure 1

Dependency sequence diagram.

In the conclusion dependency sequence diagram shown in Fig. 1, \({t}_{i}{\prec }_{Load test}{t}_{j}\) and \({t}_{j}{\prec }_{Load test}2023\) can be combined to obtain a new conclusion \({t}_{i}{\prec }_{Load test}2023\). At this point, the three conclusions \((*,\prec ,Load test,\tau )\), \(\left(*,\prec ,Load test,*\right)\), and \((*,\prec ,Vibration test,*)\) interact with each other (there is a hidden loop), so the unique identification order for the timeliness of the conclusion cannot be determined. At this point, it is necessary to merge the conclusion dependency sequence diagram, as shown in Fig. 2.

Fig. 2.
figure 2

Merging conclusion dependency sequence graphs.

As shown in Fig. 2, the calculation order for the timeliness of each process data can be determined based on the merged process dependency sequence diagram of the conclusion dependency sequence diagram as follows: \(Vibration test\), \(Load test\), \(High voltage test\).

In order to make the case more convincing, a new rule \({r}_{9}=\forall t(t\left[Vibration test\right]\) \(=800\,\text{Hz}\to t{\prec }_{Load test}2023, 0.8)\) is added to \(\Sigma \) based on rule sets \({r}_{1}\) and \({r}_{3}\). By using these three rules, the weighted efficiency graph \({WTG}_{Load test}\) of process \(Load test\) at time 2023 can be obtained, as shown in Fig. 3 (a). The timeliness relationship between \({t}_{1}\), \({t}_{2}\), and \({t}_{3}\) data items in the WTG diagram of the \(Load test\) process at time \(\tau =2023\) in Fig. 3 (a) is shown in Fig. 3(a). In Fig. 3(b), there is \(weight({t}_{1}, {t}_{2})=tml({t}_{1}{\prec }_{High voltage test}{t}_{2})\times tml({r}_{2})=0.9\times 0.8=0.72\).

Fig. 3.
figure 3

Weighted timeliness graph of process \(Load test\) and process \(High voltage test\) at 2023.

Using \({t}_{3}{\prec }_{Load test}2023\) in Fig. 3 as an example to illustrate how to calculate the single flux of a path, if there are two paths from \(\tau =2023\) to \({t}_{3}\). Therefore, \({Flux(Path}_{1}\left(2023, {t}_{3}\right))=\mathrm{min}\left\{1, 0.9, 0.9\right\}=0.9, Flux({Path}_{2}\left(2023, {t}_{3}\right))=\mathrm{min}\left\{0.8\right\}=0.8\) and \(tml\left({t}_{3}\left[Load test\right]\right)=sFlux\left(Path\left(2023, {t}_{3}\right)\right)=Max\left\{0.9, 0.8\right\}=0.9\).

In order to verify the effectiveness of the time effective identification method based on the weighted timeliness graph, this paper conducts simulation experiments on real datasets from the industrial big data innovation platform(https://www.industrial-bigdata.com/Data). The experimental results are shown in Fig. 4.

Fig. 4.
figure 4

The trend of recall and precision with the number of rules.

As shown in Fig. 4, it can be seen that the recall and precision both increase with the increase of the number of rules. However, when the number of rules reaches 4, the recall rate and accuracy remain stable at a certain value and do not change.

4 Conclusion

The quality appraisal method of manufacturing big data based on timeliness-dependent rules effectively evaluates the quality of manufacturing big data from the perspective of data timeliness. This model can effectively identify the timeliness of manufacturing big data through the dependency relationship between production process data without a clear timestamp. The quantification of rule dependency intensity was achieved by limiting the time dependent rules. The calculation order of the timeliness of each process data was determined through the process order dependency graph. The timeliness of the data was evaluated through a weighted timeliness graph (WTG) and its path single flux. Finally, the identification method was discussed through three process data instances and nine dependency rules, and its effectiveness was verified on real datasets (The recall rate can reach around 0.97 and the precision rate can reach around 0.82).

In future research on timeliness identification, the weight values of edges in the weighted timeliness graph can be set to dynamically change to adapt to different industrial application scenarios.