Keywords

1 Introduction

Process mining [1] is the intermediate bridge between process-based analysis and data-oriented analysis techniques. The process-based analysis provides the knowledge of activities in the process. On the other hand, data-oriented analysis uses machine learning techniques to analyze data, evaluate the accuracy of the model, and find insights from data. Process mining can bridge the gap between process-based analysis and data-centric analysis [2]. Process mining is a tool to seek the confrontation between the event logs and models used in mining. For example, business administrators define the processes in the organization and their sequence. In supply chain management, the logistics route from source to destination is considered a process. In the online delivery business, the product delivery to the end consumer without delay is considered a process. The number of activities will be different for each trace. The information which is the backbone of the process is mined with a proper model. Process mining is a relatively good discipline to achieve the analysis of the activities. The process mining tool ingests real-time information or event logs [3] based on the extracted information. Process mining uses the real-time event log data and applies the models. It can picture the real flow of the process in the organization that shows the efficacy [4] of the resources.

2 Process Mining Algorithm

There are various algorithms widely used to evaluate the business process. Each algorithm will execute various scenarios and fit into the particular domain.

2.1 Alpha Miner

Process mining has been implemented in three sequence steps. The steps will help to prepare the dataset [5] and make it ready for the process analysis: Pre-processing, Processing, and Post-processing. The α-algorithm [6] is the basic algorithm that works in the sequence of activities and builds the Petri net diagram.

Let T be a set of activities. W be the event log in T. first (σ) is the first trace in the event log. I define the first trace in the event. O is the last trace in an event which is calculated using the last (σ). X is the pair between the events in activities. X, Y are the two different event flows. Y is the set of minimal pairs of activities. P represents the place of traces. F determines the connectivity matrix in between the activities. Let W consider as a workflow log over U. The (W) is defined as follows.

$$\begin{aligned} & 1.\;U_{W} = \{ u \in U|\exists_{\sigma \in W} u \in \sigma \} \\ & 2.\;U_{I} = \{ u \in U|\exists_{\sigma \in W} u = {\text{first}}(\sigma )\} \\ & 3.\;U_{W} = \{ u \in U|\exists_{\sigma \in W} t = {\text{last}}(\sigma )\} \\ & 4.\;V_{W} = \{ (X,Y)|X \subseteq U_{W} \wedge Y \subseteq U_{W} \\ & \quad \quad \wedge \forall_{\sigma \in X} \forall_{y \in Y} x \to Wy \wedge \forall_{x1,x2 \in X} x1\#_{W} x2 \wedge \forall_{y1} \#_{W} y2\} \\ & 5.\;W_{w} = \{ (X,Y) \in X_{W} |\forall_{(X,Y) \in XW} X \subseteq X^{^{\prime}} \wedge Y \subseteq Y^{^{\prime}} \Rightarrow (X,Y) = (X^{^{\prime}} ,Y^{^{\prime}} )\} \\ & 6.\;P_{W} = \{ P(X,Y)|(X,Y) \in Y_{W} \} \cup \{ i_{w} ,o_{w} \} \\ & 7.\;F_{W} = \{ (a,P(X,Y))|(X,Y) \in Y_{w} \wedge x \in X\} \cup \{ (P(X,Y),y)|(X,Y) \\ & \quad \quad \in Y_{W} \wedge y \in Y \cup \{ i_{w} ,u\} |u \in UI\} \cup \{ (u,o_{w} )|u \in U_{o} \} \\ & 8.\;\alpha (W) = (P_{W} ,U_{W} ,F_{W} ) \\ \end{aligned}$$

2.2 Heuristic Miner

Heuristics miner [6] works with frequencies of events and activities and builds the frequency matrix [7]. It can deal with noisy events and find the behavior of the process. It can identify the exceptions of the event log which is not identified using the alpha algorithm. Consider there is N number of activities. A is the sequence of activity. E is the event log. In the event log, the activities will have happened multiple times. The following steps of activities show the relationship used in the algorithm. Let E consider as event log over N, Let x,y are the activities in x,y ϵ N:

  1. 1.

    x E y if any traces are noted in N = t1tn, t is the time stamp for the activity

  2. 2.

    x → E y whenever the activity meets x > E y and y > E x

  3. 3.

    x ≠ E y and x ║ E y whenever the activity meets x > E y and y > E x.

  4. 4.

    x >  > E y and x >  >  > E y whenever the activity meets a trace of N = ts1, ts2, ts3….. tn and i Ɛ {1;:::; n - 2} that based on N ϵ E and tsi = x and tsi + 1 = y and tsi + 2 = x.

The dependency graph shows the value of the dependency relation (1) between x and y:

$$|x = >_{L} y| = \left\{ {\begin{array}{*{20}l} {\frac{{x = >_{L} y| - |y + >_{L} x|}}{{|x = >_{L} y| + |y + >_{L} x| + 1}}} \hfill & {{\text{if}}\;x \ne y} \hfill \\ {\frac{{|x = >_{L} y|}}{{|x = >_{L} y| + 1}}} \hfill & {{\text{if}}\;x = y} \hfill \\ \end{array} } \right\}$$
(1)

This will produce the dependency graph from the traces [8] of activities. We can identify the dependency in between the activities based on the out product of the formula. The join and split process will find the proper flow of the activities. Based on the flow it can find the deviation [9] in the process in real-time application.

2.3 Genetic Miner

The genetic algorithm is started with an initial population [10] and crossover [11] which are created using dependency measures [12]. The initial population is derived from the dependency measures between the activities. For example x1, x2 are the activities. The dependency measure D(x1, x2) is high than the causal high probability then we have it true.

$$\begin{aligned} {\text{Fitness}}_{S} \left( {{\text{PM}},L} \right) & = 0.20 \times \frac{{{\text{parsed}}\_{\text{activities}}_{S} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{activities}}\_\log \left( L \right)}} + .0.30 \\ & \quad \times \frac{{{\text{completed}}\_{\text{log}}\_{\text{traces}}_{S} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} + 0.50 \\ & \quad \times \frac{{{\text{completed}}\_\log \_{\text{traces}}_{S} \left( {{\text{Pm}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} \\ \end{aligned}$$
(2)

After the initial population [13] is derived, then the fitness of the events is calculated using the below formula. The stop semantics is calculated using the activity log traces. Let L be an event log and PM be a process model. The fitness (2) and (3) is calculated based on the completed traces with a successful time frame. The continuous semantics [14] are calculated using the traces of the event log and shown below. The fitness of the continuous semantics is:

$$\begin{aligned} {\text{Fitness}}_{C} \left( {{\text{PM}},L} \right) & = 0.40 \times \frac{{{\text{Parsed}}\_{\text{activities}}_{C} \left( {{\text{PM}},L} \right)}}{{{\text{all}}\_{\text{actiivies}}\_\log_{C} \left( L \right)}} + 0.60 \\ & \quad \times \frac{{{\text{completed}}\_{\text{traces}}_{C} \left( {{\text{PM}},L} \right)}}{{{\text{num}}\_{\text{traces}}\_\log \left( L \right)}} \\ \end{aligned}$$
(3)

Then crossover and mutation are evaluated with the fitness values [15]. Crossover generates the new individuals from the fittest [16] in the current population.

2.4 Fuzzy Miner

The Fuzzy Miner [17] is applicable for semi-structured processes in a large database. The complex and uncertain activities [18] will happen in a real-time process. The unstructured process [19] is very difficult to find the flow. Fuzzy miner has been obtained using the flow of sub-logs in the particular event. It will execute up to the low-level abstraction in the sub-log also. A fuzzy net is generated as a Simple Precedence Diagram (SPD) [20]. An SPD consists of nodes and edges called a directed graph. The nodes are considered as activities, and edges are considered as transactions. A Fuzzy Performance Diagram (FPD) is the visual output diagram of an SPD. It also shows the performance analysis information. It can predict the control flow of a process in a visual manner that can help the process mining analysis. This will be captured in all nodes in the SPD in terms of clusters.

3 Comparative Analysis of Algorithms

The comparative analysis of process mining algorithms is displayed in Table 1. The resultant table depicts the various aspects of criteria and explained the differences of the algorithms.

Table 1 Analysis of various process mining algorithms in industry aspect

4 Conclusion

The importance and usage of process mining increase in real-world dynamic applications. Many of the domains find this process mining useful. In this paper, we have carried a comparative analysis of various process mining algorithms and domain usage. At the end of this research based on many papers, we can conclude that:

  • Alpha miner: This can be used for concrete process log and using replay it can find the deviation in the process. This is the basic model in process mining, which can show the flow of traces in the process.

  • Heuristic miner: This can be used for fewer noise data and having a simple sequence of activities in the process. This can produce the frequency dependency between the activities.

  • Genetic miner: This can be used with the noisy processes, multiple repetitions of activities in process with any kind of choices inflow. This can populate the evolution and analysis with the native events.

  • Fuzzy miner: This can be used with complex and irrelevant processes. This can depict the traits and eliminate irrelevant activities.