A Method for Identifying the Timeliness of Manufacturing Data Based on Weighted Timeliness Graph

Liu, Zehua; Ding, Xuefeng; Jiang, Yuming; Hu, Dasha

doi:10.1007/978-3-031-46661-8_5

Zehua Liu^15,16,
Xuefeng Ding^15,16,
Yuming Jiang^15,16 &
…
Dasha Hu^15,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14176))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

947 Accesses

Abstract

Timeliness is one of the important indicators of data quality. In industrial production processes, a large amount of dependent data is generated, often resulting in unclear timestamps. Therefore, this article combines the conclusion dependency graph into a process dependency graph to determine the identification order of the timeliness of each process data; By constructing a weighted timeliness graph (WTG) and path single flux, a data timeliness identification method that does not completely rely on timestamps is proposed. Finally, a time-effectiveness identification method based on weighted time-effectiveness graph was discussed through an example and 9 dependency rules, and the effectiveness of the method was verified through a set of experiments.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Identifying Dependency Relationships Between Events in Production Systems

Temporal Network Representation of Event Logs for Improved Performance Modelling in Business Processes

Real-World Streaming Process Discovery from Low-Level Event Data

Keywords

1 Introduction

A large number of sensors will collect various types of data in industrial production processes. Before using data, it is necessary to evaluate its quality, and the timeliness of data is an important indicator of data quality [1]. Therefore, a time dependent method for identifying timeliness is proposed. This method can effectively identify the timeliness of data when the timestamp is incomplete. The prerequisite for assessing the timeliness of data is that the data is correct and can meet work needs. Compared to the correctness of data, the timeliness of data does not necessarily need to be tested in the real world [2]. Therefore, the measurement of data timeliness should be an estimate, not a validation statement under certainty, to determine the probability of data validity [3]. For a large amount of data, it is reasonable to quantify timeliness through this estimation when the validity of the data is unclear [4].

Reference [5] proposes a probability base metric for calculating the timeliness of Wiki articles related to timeliness events. Reference [6] defines a recurrent timeliness rules (RTR) to evaluate the timeliness of periodic data generated during the manufacturing process. Reference [7] define a measure as a function that depends on the age of the attribute value at the time the currency is evaluated and the sensitivity parameters that make the measure suitable for the application environment. Reference [8] assumes that timestamps are available and mentions using the current time variable to represent the latest time. Reference [9] developed an uncertain constraint database scheme based on the constraint database scheme, and abstracted the important example of the complexity of query identification in uncertain time constraint databases. Reference [10] was used to investigate the work on time constraint satisfaction problem (TCSP) by finding estimates that satisfy a set of time variables with time constraints. Reference [11] first studied the use of rules $\forall {t}_{1},\cdots ,{t}_{j}:R({t}_{1}[EID]={t}_{j}[EID]\wedge \varphi \to {t}_{u}{\prec }_{A}{t}_{v}),j\in [1,k]$ to help identify the timeliness of data when there is no clear timestamp in the database. Document [12] proposed a dynamic functional dependency relationship, which stipulates that a copy function can require some attributes that cannot be independently changed to be copied together. It is necessary to judge the time series relationship describing different attribute values of the same entity based on a small amount of time series rules obtained from domain knowledge, so as to identify which values are outdated. The disadvantage is that the current method cannot identify whether a value is outdated or invalid at a given time point.

In summary, the current research on data timeliness is not suitable for direct application to the identification of industrial data timeliness. There are two main reasons: firstly, it is unable to effectively provide the timeliness of the current data without clear timestamps. Secondly, it is impossible to quantify the timeliness of data at a given time point. Therefore, this article proposes a method for evaluating the timeliness of manufacturing data based on weighted timeliness maps, which is used to identify the timeliness of industrial data through time dependency relationships without a clear timestamp.

2 Data Timeliness Identification Method

This article presents the dependency relationships between various conclusions by constructing a conclusion dependency sequence diagram. Then, merge the conclusion dependency graph into a process dependency sequence graph to determine the calculation order of the timeliness of each process data. Finally, the timeliness of data in a certain process is determined through a weighted timeliness graph.

2.1 Limited Timeliness Dependency Rules

Time dependent rules refer to the identification of the timeliness of production data through the dependency relationships between processes.

Definition 1. (Limited timeliness dependency rule): In Rule $r$, there is a dependency relationship between each process, and the data generated in Process $A$ will be limited by its related processes. The degree of limitation is quantified by the strength of the dependency. The rules that have limited dependencies between these process data are defined as limited time dependency rules, and their dependencies can be represented as:

$$r:\forall {t}_{i}, {t}_{j}({\psi }_{B}\to {t}_{i}{\prec }_{A}{t}_{j}, \beta )$$

(1)

Among them, ${t}_{i}{\prec }_{A}{t}_{j}$ represents the generation of date ${t}_{i}$ before the generation of date ${t}_{j}$, $\beta $ It is the dependency strength of the rule, $\psi $ Represents the temporal relationship or constraint conditions in process $B$. The dependency strength of rule represents the likelihood of the rule represented by $r$. Because in actual industrial production processes, each process or employee is relatively independent, probability can be used to represent the relationship between different process data.

Let $lhs(\mathrm{r})$ represents the left part of the rule, and $rhs(\mathrm{r})$ represents the right part of the rule, that is, in formula (1), $lhs\left(\mathrm{r}\right)=\psi ,rhs\left(r\right)={t}_{i}{\prec }_{A}{t}_{j}$. For the right part of the rule, $rhs\left(r\right)=tml(lhs(r))\times tml(r)$, where the dependency strength calculation of the left part $lhs\left(r\right)=\psi $ follows the following rule:(a). If $\psi $ is the condition ${t}_{i}[A] op {t}_{j}[A]$ or ${t}_{k}[A] op a$ is determined, then if $\psi $ is true, then $tml(\psi )=1$ is satisfied, otherwise $tml(\psi )=0$, where $op\in \{=, \ne ,<,>, \le , \ge \}$ and $a$ are constants; (b). If $\psi $ is ${t}_{i}{\prec }_{A}{t}_{j}$ or ${t}_{k}{\prec }_{A}\tau $ and $\psi $ is not the right part of any rule $r$, then the value of $tml(\psi )$ is obtained statistically; (c). If $\psi $ is the right part of other rules $r{\prime}$, the value of $tml(\psi )$ is obtained by the calculation method of the right part of rules $r{\prime}$ dependence strength; (d). If $\psi ={\psi }_{1}\wedge {\psi }_{2}$, then $\psi $ represents the conjunction of ${\psi }_{1}$ and ${\psi }_{2}$, and the dependence strength $tml(\psi )=min\{tml({\psi }_{1}), tml({\psi }_{2})\}$ after the conjunction.

2.2 Process Dependency Sequence Graph

To determine the calculation order of the timeliness of each process data, it is necessary to construct a process dependency graph. To construct a process dependency graph, it is first necessary to construct a conclusion dependency graph based on dependency rules, and then merge them. The specific steps for constructing a dependency sequence diagram are as follows. Firstly, based on the type of conclusion, determine whether the conclusion needs to be included in the dependency sequence diagram. The conclusion of ${t}_{i}\left[A\right] op {t}_{j}[A]$ or ${t}_{k}[A] op a$ is a deterministic conclusion that satisfies the condition of 1, otherwise it is 0 and does not need to be added to the conclusion dependency diagram; The conclusions of ${t}_{i}{\prec }_{A}{t}_{j}$ and ${t}_{i}{\prec }_{A}\uptau $ are non-deterministic and need to be added to the conclusion dependency sequence diagram. Then, when the conclusion is determined to be non-deterministic, two types of nodes, $(\prec , A,*,*)$ and $(\prec , A,\uptau )$, are constructed for each process $A$ to represent the conclusions of type ${t}_{i}{\prec }_{A}{t}_{j}$ and type ${t}_{i}{\prec }_{A}\uptau $. Finally, scan the rule set $\Sigma (r)$, is there a rule ${r}_{k}$ that makes ${\psi }_{1}=lhs\left({r}_{k}\right), {\psi }_{2}=rhs({r}_{k})$? If so, add directed edges from ${\psi }_{1}$ to ${\psi }_{2}$ to the dependency order graph.

The overall principle for merging conclusion dependency graphs into process dependency graphs is: (a). Merge conclusion nodes containing multiple identical processes in the diagram into one process node; (b). The directed edge no longer represents the dependency relationship between conclusions, but rather represents the dependency relationship between various processes.

2.3 Weighted Timeliness Graph

To evaluate the timeliness of all data items in process A, it is necessary to determine the values of $tml({t}_{i}{\prec }_{\mathrm{A}}{t}_{j})$ and $tml({t}_{i}{\prec }_{\mathrm{A}}\tau )$ separately. Therefore, weighted timelines graph (WTG) has been defined.

Definition 2. (weighted timeliness graph, WTG): Let $D$ represent a set of data, $\Sigma $ Represents a collection of time dependent rules, ${\rm T}$ representative $\Sigma $ All sets containing time in the time rule, where $A$ represents a certain process, $\theta $ Is the effective time threshold of $D$, and the weighted time graph of process $A$ is ${WTG}_{A}$, which is defined as:

1.
${WTG}_{A}$ contains data items $t$ and time $\tau $ in two types of conclusions ${t}_{i}{\prec }_{\mathrm{A}}{t}_{j}$ and ${t}_{i}{\prec }_{\mathrm{A}}\tau $, and both take a certain time $\tau $ as the initial node of the ${WTG}_{A}$ graph, where $\tau $ includes the original time in the dataset and the threshold we set $\theta $;
2.
aggregate $\Sigma $ Each rule $r$ in is defined as the initial node of the ${WTG}_{A}$ graph if $\tau \in T\cup \{\uptheta \}$ causes $\tau \subset rhs\left(r\right)$;
3.
aggregate $\Sigma $ For each rule r in, if ${t}_{1},{t}_{2}\in D$ causes $rhs\left(r\right)={t}_{1}{\prec }_{A}{t}_{2},tml(lhs(r))>0$ or $tml(r)>0$, update the weight of the directed edge from ${t}_{2}$ to ${t}_{1}$;
4.
The weight of the directed edge ${t}_{1},{t}_{2}$ in the ${WTG}_{A}$ graph is denoted as $weight({t}_{1},{t}_{2})=tml\left(r\right)\times tml\left(lhs\left(r\right)\right)$.

The implementation details of constructing a weighted timeliness graph are shown in Algorithm 1.

2.4 Path Single Flux of Weighted Timeliness Graph

The timeliness evaluation of data item ${t}_{i}[A]$ requires determining the possibility of conclusion ${t}_{i}{\prec }_{A}\tau $ being established, that is, the value of $tml({t}_{i}{\prec }_{A}\tau )$. The possibility of conclusion ${t}_{i}{\prec }_{A}\tau $ can be obtained by immediate inference of a rule ${r}_{k}$ or there is a directed path $Path(\tau , {t}_{i})$ from $\tau $ to ${t}_{i}$ in the weighted time effect graph.

Definition 3. (Path single flux of weighted timeliness graph): The single flux of path $Path(\tau , {t}_{i})$ in the ${WTG}_{A}$ graph is denoted as $sFlux(Path(\tau , {t}_{i}))$, which is defined as the weight of the maximum directed edge that can flow in a single path from $\tau $ to ${t}_{i}$.

The weighted time effect graph path single flux can be mainly divided into two meanings:

1.
(a). In the weighted efficiency graph of process $A$, when there is only one path from $\tau $ to ${t}_{i}$ in path $Path(\tau , {t}_{i})$, we take the directed edge with the highest weight in the path as the flux of this path. Namely, $Flux\left(Path(\tau , {t}_{i})\right)=min\{weight\left({t}_{i}{\prec }_{A}{v}_{1}\right), weight\left({v}_{1}{\prec }_{A}{v}_{2}\right), \cdots , weight({v}_{k}{\prec }_{A}\tau )\}$.
2.
(b). In the weighted efficiency chart of process $A$, when there are multiple paths from $\tau $ to ${t}_{i}$ in path $Path(\tau , {t}_{i})$, we take the maximum flux value of the multiple paths as the single flux of path $Path(\tau , {t}_{i})$ in the graph ${WTG}_{A}$. Namely, $sFlux\left(Path\right)=Max\{Flux\left({Path}_{1}\right), Flux\left({Path}_{2}\right), \cdots , Flux\left({Path}_{k}\right)\}$.

3 Example Discussion and Experimental Verification

In order to further explain the defined method for assessing timeliness based on weighted timeliness graph, this article selects an example for discussion. Finally, the effectiveness of the method was verified by testing its recall and accuracy on real datasets.

Table 1. Process data example.

Full size table

As shown in Table 1, an example $D$ of a process flow with three tuples is given. The process flow example $D$ includes four process data examples of load testing, high pressure test, high temperature test and vibration test, as well as the identification code PID and Warranty of each product.

In addition, its corresponding set of restricted failure dependency rules has been defined $\Sigma (r)$, $\Sigma (r)$ contains 5 dependency rules, denoted as ${r}_{1}$ to ${r}_{5}$, as shown in Table 2.

Table 2. Formulation rule representation of dependency intensity uncertainty.

Full size table

There are 5 rules in Table 2 that represent the dependency relationships between different process data in Table 1. It can be seen that in practice, there may be situations where multiple dependency rules derive the same conclusion. Due to the stronger dependency strength of rules, they often have stronger persuasiveness. Therefore, the value with higher dependency strength is chosen as the final dependency strength of this conclusion. The formula is:

$$rhs\left(r\right)\Rightarrow tml\left(Q\right)=max\left\{tml\left(rhs\left({r}_{1}\right)\right), tml\left(rhs\left({r}_{2}\right)\right), \cdots , tml\left(rhs\left({r}_{k}\right)\right)\right\}, i\in [1,k]$$

(2)

Among them, $tml(rhs({r}_{i}))=tml({r}_{i})\times tml(lhs({r}_{i}))$.

On the basis of the original ${r}_{2}$ and ${r}_{4}$, three rules ${r}_{6}:\forall {t}_{i},{t}_{j} ({t}_{i}{\prec }_{Vibration test}{t}_{j}$ $\to {t}_{i}{\prec }_{Load test}{t}_{j}, 0.8)$, ${r}_{7}:\forall {t}_{i},{t}_{j} ({t}_{i}{\prec }_{Vibration test}{t}_{j}$ $\to {t}_{i}{\prec }_{High voltage test}{t}_{j}, 0.8)$, ${r}_{8}:\forall {t}_{i}, {t}_{j}({t}_{i}{\prec }_{Load test}2023$ $\to {t}_{i}{\prec }_{Vibration test}{t}_{j})$ have been added. Since the left part of rules ${r}_{1}$ and ${r}_{3}$ belong to deterministic conclusions, they do not need to appear in the dependency order diagram. Based on these five rules, a conclusion dependency order diagram was constructed, as shown in Fig. 1.

In the conclusion dependency sequence diagram shown in Fig. 1, ${t}_{i}{\prec }_{Load test}{t}_{j}$ and ${t}_{j}{\prec }_{Load test}2023$ can be combined to obtain a new conclusion ${t}_{i}{\prec }_{Load test}2023$. At this point, the three conclusions $(*,\prec ,Load test,\tau )$, $\left(*,\prec ,Load test,*\right)$, and $(*,\prec ,Vibration test,*)$ interact with each other (there is a hidden loop), so the unique identification order for the timeliness of the conclusion cannot be determined. At this point, it is necessary to merge the conclusion dependency sequence diagram, as shown in Fig. 2.

As shown in Fig. 2, the calculation order for the timeliness of each process data can be determined based on the merged process dependency sequence diagram of the conclusion dependency sequence diagram as follows: $Vibration test$, $Load test$, $High voltage test$.

In order to make the case more convincing, a new rule ${r}_{9}=\forall t(t\left[Vibration test\right]$ $=800\,\text{Hz}\to t{\prec }_{Load test}2023, 0.8)$ is added to $\Sigma $ based on rule sets ${r}_{1}$ and ${r}_{3}$. By using these three rules, the weighted efficiency graph ${WTG}_{Load test}$ of process $Load test$ at time 2023 can be obtained, as shown in Fig. 3 (a). The timeliness relationship between ${t}_{1}$, ${t}_{2}$, and ${t}_{3}$ data items in the WTG diagram of the $Load test$ process at time $\tau =2023$ in Fig. 3 (a) is shown in Fig. 3(a). In Fig. 3(b), there is $weight({t}_{1}, {t}_{2})=tml({t}_{1}{\prec }_{High voltage test}{t}_{2})\times tml({r}_{2})=0.9\times 0.8=0.72$.

Using ${t}_{3}{\prec }_{Load test}2023$ in Fig. 3 as an example to illustrate how to calculate the single flux of a path, if there are two paths from $\tau =2023$ to ${t}_{3}$. Therefore, ${Flux(Path}_{1}\left(2023, {t}_{3}\right))=\mathrm{min}\left\{1, 0.9, 0.9\right\}=0.9, Flux({Path}_{2}\left(2023, {t}_{3}\right))=\mathrm{min}\left\{0.8\right\}=0.8$ and $tml\left({t}_{3}\left[Load test\right]\right)=sFlux\left(Path\left(2023, {t}_{3}\right)\right)=Max\left\{0.9, 0.8\right\}=0.9$.

In order to verify the effectiveness of the time effective identification method based on the weighted timeliness graph, this paper conducts simulation experiments on real datasets from the industrial big data innovation platform(https://www.industrial-bigdata.com/Data). The experimental results are shown in Fig. 4.

As shown in Fig. 4, it can be seen that the recall and precision both increase with the increase of the number of rules. However, when the number of rules reaches 4, the recall rate and accuracy remain stable at a certain value and do not change.

4 Conclusion

The quality appraisal method of manufacturing big data based on timeliness-dependent rules effectively evaluates the quality of manufacturing big data from the perspective of data timeliness. This model can effectively identify the timeliness of manufacturing big data through the dependency relationship between production process data without a clear timestamp. The quantification of rule dependency intensity was achieved by limiting the time dependent rules. The calculation order of the timeliness of each process data was determined through the process order dependency graph. The timeliness of the data was evaluated through a weighted timeliness graph (WTG) and its path single flux. Finally, the identification method was discussed through three process data instances and nine dependency rules, and its effectiveness was verified on real datasets (The recall rate can reach around 0.97 and the precision rate can reach around 0.82).

In future research on timeliness identification, the weight values of edges in the weighted timeliness graph can be set to dynamically change to adapt to different industrial application scenarios.

References

Li, M., Li, J., Cheng, S., Sun, Y.: Uncertain rule based method for determining data currency. IEICE Transactions on Information and Syst. E101.D (10), 2447–2457(2018)
Google Scholar
Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer Publishing Company, Incorporated (2016)
Book MATH Google Scholar
Even, A., Shankaranarayanan, G., Berger, P.D.: Evaluating a model for cost-effective data quality management in a real-world CRM setting. Decis. Support. Syst. 50(1), 152–163 (2010)
Article Google Scholar
Firmani, D., Mecella, M., Scannapieco, M., Batini, C.: On the meaningfulness of “big data quality” (invited paper). Data Science Eng. 1(1), 6–20 (2015)
Article Google Scholar
Klier, M., Moestue, L., Obermeier, A.A., Widmann, T.: Event-driven assessment of currency of wiki articles: a novel probability-based metric. In: International Conference on Interaction Sciences (2021)
Google Scholar
Liu, Z., Ding, X., Tang, J., Jiang, Y., Hu, D.: Anomaly monitoring of process based on recurrent timeliness rules (AMP-RTR). Applied Sciences 12(24), 12917 (2022)
Google Scholar
Ballou, D., Wang, R., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44(4), 462–484 (1998)
Article MATH Google Scholar
Dyreson, C.E., Jensen, C.S., Snodgrass, R.T.: Now in temporal databases. In: Encyclopedia of Database Systems. Springer, New York (2018)
Google Scholar
Koubarakis, M.: The complexity of query evaluation in indefinite temporal constraint databases. Theoret. Comput. Sci. 171(1), 25–60 (1997)
Article MathSciNet MATH Google Scholar
Bodirsky, M., Kára, J.J.J.A.: The complexity of temporal constraint satisfaction problems. Association for Computing Machinery 57(9), 1–41 (2010)
MathSciNet MATH Google Scholar
Fan, W., Geerts, F., Wijsen, J.: Determining the currency of data. ACM Trans. Database Systems (TODS) 37(4), 25–29 (2012)
Article Google Scholar
Vianu, V.J.J.A.: Dynamic functional dependencies and database aging. Association for Computing Machinery 34(1), 28–59 (1987)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Key R&D Program of China under Grant No. 2020YFB1707900 and 2020YFB1711800; the National Natural Science Foundation of China under Grant No. 62262074, U2268204 and 62172061; the Science and Technology Project of Sichuan Province under Grant No. 2022YFG0159, 2022YFG0155, 2022YFG0157.

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Zehua Liu, Xuefeng Ding, Yuming Jiang & Dasha Hu
Big Data Analysis and Fusion Application Technology Engineering Laboratory of Sichuan Province, Chengdu, 610065, China
Zehua Liu, Xuefeng Ding, Yuming Jiang & Dasha Hu

Authors

Zehua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dasha Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dasha Hu .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Xiaochun Yang
The University of Indonesia, Depok, Indonesia
Heru Suhartanto
Beijing Institute of Technology, Beijing, China
Guoren Wang
Northeastern University, Shenyang, China
Bin Wang
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Bing Li
Sun Yat-sen University, Guangzhou, China
Huaijie Zhu
Anhui University, Hefei, China
Ningning Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Ding, X., Jiang, Y., Hu, D. (2023). A Method for Identifying the Timeliness of Manufacturing Data Based on Weighted Timeliness Graph. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14176. Springer, Cham. https://doi.org/10.1007/978-3-031-46661-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-46661-8_5
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46660-1
Online ISBN: 978-3-031-46661-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics