Comparative Analysis of Process Mining Algorithms in Industrial Applications

Shanmuga Sundari, M.; Nayak, Rudra Kalyan

doi:10.1007/978-981-19-2358-6_43

M. Shanmuga Sundari⁶ &
Rudra Kalyan Nayak⁶

Part of the book series: Cognitive Science and Technology ((CSAT))

633 Accesses
1 Citations

Abstract

In businesses, people have to manage the processes efficiently to build products and services to meet customer requirements. The process generally consists of a large number of activities that users need to manage efficiently. Process mining plays a vital role in all business domains. It helps organizations to organize activities and improve productivity. These processes in business generate activity data and produce overwhelming process data. Process mining helps connect these activities, applies business intelligence to the process, and finds hidden patterns in the business process. Process mining provides different process models and algorithms. The models use event log data of activities and find patterns in the process data. Process mining algorithms work on different scenarios in business and usage of domains in the application. This paper explains the various algorithms and determines the use of process mining applications in business domains. This research gives the solution to find the most suitable model for business applications.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Practical Approach for Process Mining in Production Processes

Comparative Analysis of Process Mining Algorithms in Python

A Process Mining Based Approach to Support Decision Making

Keywords

1 Introduction

Process mining [1] is the intermediate bridge between process-based analysis and data-oriented analysis techniques. The process-based analysis provides the knowledge of activities in the process. On the other hand, data-oriented analysis uses machine learning techniques to analyze data, evaluate the accuracy of the model, and find insights from data. Process mining can bridge the gap between process-based analysis and data-centric analysis [2]. Process mining is a tool to seek the confrontation between the event logs and models used in mining. For example, business administrators define the processes in the organization and their sequence. In supply chain management, the logistics route from source to destination is considered a process. In the online delivery business, the product delivery to the end consumer without delay is considered a process. The number of activities will be different for each trace. The information which is the backbone of the process is mined with a proper model. Process mining is a relatively good discipline to achieve the analysis of the activities. The process mining tool ingests real-time information or event logs [3] based on the extracted information. Process mining uses the real-time event log data and applies the models. It can picture the real flow of the process in the organization that shows the efficacy [4] of the resources.

2 Process Mining Algorithm

There are various algorithms widely used to evaluate the business process. Each algorithm will execute various scenarios and fit into the particular domain.

2.1 Alpha Miner

Process mining has been implemented in three sequence steps. The steps will help to prepare the dataset [5] and make it ready for the process analysis: Pre-processing, Processing, and Post-processing. The α-algorithm [6] is the basic algorithm that works in the sequence of activities and builds the Petri net diagram.

Let T be a set of activities. W be the event log in T. first (σ) is the first trace in the event log. I define the first trace in the event. O is the last trace in an event which is calculated using the last (σ). X is the pair between the events in activities. X, Y are the two different event flows. Y is the set of minimal pairs of activities. P represents the place of traces. F determines the connectivity matrix in between the activities. Let W consider as a workflow log over U. The (W) is defined as follows.

$$\begin{aligned} & 1.\;U_{W} = \{ u \in U|\exists_{\sigma \in W} u \in \sigma \} \\ & 2.\;U_{I} = \{ u \in U|\exists_{\sigma \in W} u = {\text{first}}(\sigma )\} \\ & 3.\;U_{W} = \{ u \in U|\exists_{\sigma \in W} t = {\text{last}}(\sigma )\} \\ & 4.\;V_{W} = \{ (X,Y)|X \subseteq U_{W} \wedge Y \subseteq U_{W} \\ & \quad \quad \wedge \forall_{\sigma \in X} \forall_{y \in Y} x \to Wy \wedge \forall_{x1,x2 \in X} x1\#_{W} x2 \wedge \forall_{y1} \#_{W} y2\} \\ & 5.\;W_{w} = \{ (X,Y) \in X_{W} |\forall_{(X,Y) \in XW} X \subseteq X^{^{\prime}} \wedge Y \subseteq Y^{^{\prime}} \Rightarrow (X,Y) = (X^{^{\prime}} ,Y^{^{\prime}} )\} \\ & 6.\;P_{W} = \{ P(X,Y)|(X,Y) \in Y_{W} \} \cup \{ i_{w} ,o_{w} \} \\ & 7.\;F_{W} = \{ (a,P(X,Y))|(X,Y) \in Y_{w} \wedge x \in X\} \cup \{ (P(X,Y),y)|(X,Y) \\ & \quad \quad \in Y_{W} \wedge y \in Y \cup \{ i_{w} ,u\} |u \in UI\} \cup \{ (u,o_{w} )|u \in U_{o} \} \\ & 8.\;\alpha (W) = (P_{W} ,U_{W} ,F_{W} ) \\ \end{aligned}$$

2.2 Heuristic Miner

Heuristics miner [6] works with frequencies of events and activities and builds the frequency matrix [7]. It can deal with noisy events and find the behavior of the process. It can identify the exceptions of the event log which is not identified using the alpha algorithm. Consider there is N number of activities. A is the sequence of activity. E is the event log. In the event log, the activities will have happened multiple times. The following steps of activities show the relationship used in the algorithm. Let E consider as event log over N, Let x,y are the activities in x,y ϵ N:

1.
x E y if any traces are noted in N = t₁…t_n, t is the time stamp for the activity
2.
x → E y whenever the activity meets x > E y and y > E x
3.
x ≠ E y and x ║ E y whenever the activity meets x > E y and y > E x.
4.
x > > E y and x > > > E y whenever the activity meets a trace of N = ts₁, ts2, ts3….. t_n and i Ɛ {1;:::; n - 2} that based on N ϵ E and ts_i = x and ts_i + 1 = y and ts_i + 2 = x.

The dependency graph shows the value of the dependency relation (1) between x and y:

$$|x = >_{L} y| = \left\{ {\begin{array}{*{20}l} {\frac{{x = >_{L} y| - |y + >_{L} x|}}{{|x = >_{L} y| + |y + >_{L} x| + 1}}} \hfill & {{\text{if}}\;x \ne y} \hfill \\ {\frac{{|x = >_{L} y|}}{{|x = >_{L} y| + 1}}} \hfill & {{\text{if}}\;x = y} \hfill \\ \end{array} } \right\}$$

(1)

This will produce the dependency graph from the traces [8] of activities. We can identify the dependency in between the activities based on the out product of the formula. The join and split process will find the proper flow of the activities. Based on the flow it can find the deviation [9] in the process in real-time application.

2.3 Genetic Miner

The genetic algorithm is started with an initial population [10] and crossover [11] which are created using dependency measures [12]. The initial population is derived from the dependency measures between the activities. For example x1, x2 are the activities. The dependency measure D(x1, x2) is high than the causal high probability then we have it true.

$$\begin{aligned} {\text{Fitness}}_{S} \left( {{\text{PM}},L} \right) & = 0.20 \times \frac{{{\text{parsed}}\_{\text{activities}}_{S} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{activities}}\_\log \left( L \right)}} + .0.30 \\ & \quad \times \frac{{{\text{completed}}\_{\text{log}}\_{\text{traces}}_{S} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} + 0.50 \\ & \quad \times \frac{{{\text{completed}}\_\log \_{\text{traces}}_{S} \left( {{\text{Pm}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} \\ \end{aligned}$$

(2)

After the initial population [13] is derived, then the fitness of the events is calculated using the below formula. The stop semantics is calculated using the activity log traces. Let L be an event log and PM be a process model. The fitness (2) and (3) is calculated based on the completed traces with a successful time frame. The continuous semantics [14] are calculated using the traces of the event log and shown below. The fitness of the continuous semantics is:

$$\begin{aligned} {\text{Fitness}}_{C} \left( {{\text{PM}},L} \right) & = 0.40 \times \frac{{{\text{Parsed}}\_{\text{activities}}_{C} \left( {{\text{PM}},L} \right)}}{{{\text{all}}\_{\text{actiivies}}\_\log_{C} \left( L \right)}} + 0.60 \\ & \quad \times \frac{{{\text{completed}}\_{\text{traces}}_{C} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} \\ \end{aligned}$$

(3)

Then crossover and mutation are evaluated with the fitness values [15]. Crossover generates the new individuals from the fittest [16] in the current population.

2.4 Fuzzy Miner

The Fuzzy Miner [17] is applicable for semi-structured processes in a large database. The complex and uncertain activities [18] will happen in a real-time process. The unstructured process [19] is very difficult to find the flow. Fuzzy miner has been obtained using the flow of sub-logs in the particular event. It will execute up to the low-level abstraction in the sub-log also. A fuzzy net is generated as a Simple Precedence Diagram (SPD) [20]. An SPD consists of nodes and edges called a directed graph. The nodes are considered as activities, and edges are considered as transactions. A Fuzzy Performance Diagram (FPD) is the visual output diagram of an SPD. It also shows the performance analysis information. It can predict the control flow of a process in a visual manner that can help the process mining analysis. This will be captured in all nodes in the SPD in terms of clusters.

3 Comparative Analysis of Algorithms

The comparative analysis of process mining algorithms is displayed in Table 1. The resultant table depicts the various aspects of criteria and explained the differences of the algorithms.

Table 1 Analysis of various process mining algorithms in industry aspect

Full size table

4 Conclusion

The importance and usage of process mining increase in real-world dynamic applications. Many of the domains find this process mining useful. In this paper, we have carried a comparative analysis of various process mining algorithms and domain usage. At the end of this research based on many papers, we can conclude that:

Alpha miner: This can be used for concrete process log and using replay it can find the deviation in the process. This is the basic model in process mining, which can show the flow of traces in the process.
Heuristic miner: This can be used for fewer noise data and having a simple sequence of activities in the process. This can produce the frequency dependency between the activities.
Genetic miner: This can be used with the noisy processes, multiple repetitions of activities in process with any kind of choices inflow. This can populate the evolution and analysis with the native events.
Fuzzy miner: This can be used with complex and irrelevant processes. This can depict the traits and eliminate irrelevant activities.

References

M.S. Sundari, R.K. Nayak, Process mining in healthcare systems: a critical review and its future. Int. J. Emerg. Trends Eng. Res. 8(9), 5197–5208 (2020). https://doi.org/10.30534/ijeter/2020/50892020
Article Google Scholar
W.M.P. Van Der Aalst et al., Business process mining: an industrial application. Inf. Syst. 32(5), 713–732 (2007)
Article Google Scholar
J. Geyer-klingeberg, J. Nakladal, F. Baldauf, F. Veit, Process mining and robotic process automation : a perfect match process mining as enabler for RPA ımplementation, in 16th Int. Conf. Bus. Process Manag., vol. i, no. July, 2018
Google Scholar
R.R. Reddy, Y. Ramadevi, K.V.N. Sunitha, Robust Data Model for Enhanced Anomaly Detection,” in Proceedings of the International Congress on Information and Communication Technology, 2016, pp. 439–446
Google Scholar
Q. Cai, H. Wang, Z. Li, X. Liu, A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2019.2941419
Article Google Scholar
R. Tripathy, R. K. Nayak, P. Das, and D. Mishra, Cellular cholesterol prediction of mammalian ATP-binding cassette (ABC) proteins based on fuzzy c-means with support vector machine algorithms, J. Intell. Fuzzy Syst., vol. 39, no. 2, 2020, doi: https://doi.org/10.3233/JIFS-179934.
S. Chen, S. Yang, M. Zhou, R. Burd, I. Marsic, Process-Oriented Iterative Multiple Alignment for Medical Process Mining, 2017, doi: https://doi.org/10.1109/ICDMW.2017.63
M. Prodel, V. Augusto, B. Jouaneton, L. Lamarsalle, X. Xie, Optimal process mining for large and complex event logs. IEEE Trans. Autom. Sci. Eng. (2018). https://doi.org/10.1109/TASE.2017.2784436
Article Google Scholar
W. M. P. Van Der Aalst, A. K. A. De Medeiros, and A. J. M. M. Weijters, Genetic process mining, 2005, doi: https://doi.org/10.1007/11494744_5
S.A. Shinde, P.R. Rajeswari, Intelligent health risk prediction systems using machine learning: a review. Int. J. Eng. Technol. 7(3), 1019–1023 (2018). https://doi.org/10.14419/ijet.v7i3.12654
Article Google Scholar
M. D. N. Salman, P. T. Rao, U. R. Rahman, M. D. Zıa, Adaptıve noıse cancellers for cardıac sıgnal enhancement for ıot based health care systems. J. Theor. Appl. Inf. Technol., vol. 95, no. 10, 2017
Google Scholar
A. Mubarakali, M. Ashwin, D. Mavaluru, A.D. Kumar, Design an attribute based health record protection algorithm for healthcare services in cloud environment. Multimed. Tools Appl. 79(5), 3943–3956 (2020)
Article Google Scholar
P. Weber, B. Bordbar, P. Tino, A framework for the analysis of process mining algorithms, IEEE Trans. Syst. Man, Cybern. Syst., vol. 43, no. 2, pp. 303–317, 2012
Google Scholar
M. Weidlich, A. Polyvyanyy, N. Desai, J. Mendling, M. Weske, Process compliance analysis based on behavioural profiles. Inf. Syst. 36(7), 1009–1025 (2011)
Article MATH Google Scholar
R. Gatta et al., A framework for event log generation and knowledge representation for process mining in healthcare, 2018, doi: https://doi.org/10.1109/ICTAI.2018.00103
D. Van, Process mining discovery, conformance and enhancement of business processes, Springer Heidelb. Dordr. London New York. ISBN, 2011
Google Scholar
A. K. A. de Medeiros, B. F. van Dongen, W. M. P. van der Aalst, and A. J. M. M. Weijters, Process mining: Extending the a algorithm to mine short loops, BETA Work. Pap. Ser., 2004
Google Scholar
S. Chandra Sekaran, V. Saravanan, R. K. Nayak, R. Tripathy, and S. Siva Shankar, Human health and velocity aware network selection scheme for WLAN/WiMAX integrated networks with QoS, Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 12, pp. 198–205, 2019, doi: https://doi.org/10.35940/ijitee.L3571.1081219
S. S. Mirza and M. Z. Ur Rahman, Efficient adaptive filtering techniques for thoracic electrical bio-impedance analysis in health care systems, J. Med. Imaging Heal. Informatics, vol. 7, no. 6, pp. 1126–1138, 2017
Google Scholar
B. Tibeme, H. Shahriar, and C. Zhang, Process Mining Algorithms for Clinical Workflow Analysis, 2018, doi: https://doi.org/10.1109/SECON.2018.8479176

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
M. Shanmuga Sundari & Rudra Kalyan Nayak

Authors

M. Shanmuga Sundari
View author publications
You can also search for this author in PubMed Google Scholar
Rudra Kalyan Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Shanmuga Sundari .

Editor information

Editors and Affiliations

BioAxis DNA Research Centre Limited, Hyderabad, India
Amit Kumar
Department of Computer Science, Brunel University, Uxbridge, UK
Gheorghita Ghinea
CMR College of Engineering and Technology, Hyderabad, India
Suresh Merugu
Ichikawa, Chiba, Japan
Takako Hashimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shanmuga Sundari, M., Nayak, R.K. (2023). Comparative Analysis of Process Mining Algorithms in Industrial Applications. In: Kumar, A., Ghinea, G., Merugu, S., Hashimoto, T. (eds) Proceedings of the International Conference on Cognitive and Intelligent Computing. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-19-2358-6_43

Download citation

DOI: https://doi.org/10.1007/978-981-19-2358-6_43
Published: 01 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2357-9
Online ISBN: 978-981-19-2358-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparative Analysis of Process Mining Algorithms in Industrial Applications

Abstract

Similar content being viewed by others

A Practical Approach for Process Mining in Production Processes

Comparative Analysis of Process Mining Algorithms in Python

A Process Mining Based Approach to Support Decision Making

Keywords

1 Introduction

2 Process Mining Algorithm

2.1 Alpha Miner

2.2 Heuristic Miner

2.3 Genetic Miner

2.4 Fuzzy Miner

3 Comparative Analysis of Algorithms

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparative Analysis of Process Mining Algorithms in Industrial Applications

Abstract

Similar content being viewed by others

A Practical Approach for Process Mining in Production Processes

Comparative Analysis of Process Mining Algorithms in Python

A Process Mining Based Approach to Support Decision Making

Keywords

1 Introduction

2 Process Mining Algorithm

2.1 Alpha Miner

2.2 Heuristic Miner

2.3 Genetic Miner

2.4 Fuzzy Miner

3 Comparative Analysis of Algorithms

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation