Keywords

1 Introduction

An enterprise usually has a large and complex information system (IS) that consists of a huge number of BPs. Some are well modeled and others are not. Enterprises need to continuously adapt their BPs to cope up with business requirement changes, which requires more agility and flexibility of their BPs [1]. This agility is difficult to achieve because: (i) BP modeling considers that all the instances of the same BP run in the same way, and (ii) the tightly coupled components of the architecture of the legacy IS supporting the BP do not enable the loose coupling and interoperability required for flexibility and agility [2].

It is proven that SOA enables the agility of the enterprise through more flexibility of its BPs.

However, moving to SOA requires re-engineering the BPs in order to transform their activities into reusable, loosely coupled services [3].

One of the most used techniques toward reengineer running BPs is known as BP mining, which allows the extraction of the knowledge about BPs from the event logs of existing IS [4]. It is concerned with the discovery, monitoring, and improvement of the BPs by obtaining the wider knowledge from the event logs that is generated from existing ISs.

However, how to extract the activities from running BPs that are not modeled and present them as services is still an issue?

In this research, we propose a novel approach that will assist enterprises to mine their BPs in order to reengineer them in new technologies such as SOA. It is based on service identification methods. the new suggested service identification approach that mines  BPs, clustering their activities into tasks, initialing a set of services, listing the services, and verifying the service candidates against the service orientation principles.

The remainder of this chapter is organized as follows. Section 2 presents the concepts related to this research. Section 3 presents some related works. Section 4 details the research methodology and the proposed approach. Section 5 applies the approach to a case study. Finally, a conclusion section provides some future work.

2 Background

In this chapter, key concepts related to the research are reviewed, including BP, BP mining, BP reengineering, SOA, and service identification techniques.

Business Process Management

Business Process Management (BPM) concerns with seven phases as proposed by Mathiesen et al. [5]. These are: identification, modeling (as-is), analysis, improvement (to-be), implementation, execution (to-do), and monitoring/controlling.

Business Process Modeling

BP modeling is defined as description of the activities in a diagrammatic manner. It shows the events and the variables from one place to another [6]. It is the most important step in BPM.

There are many techniques that are used by BPM such as UML diagrams, Business Process Modeling Notation (BPMN), data flow diagrams, flowchart technique, role activity diagrams, role interaction diagrams, Gantt charts, integrated definition for function modeling, colored petri-nets, object oriented methods, workflow technique, and simulation model.

Business Process Mining 

Business process mining is an extraction of the knowledge about the running BPs from the event logs of the running IS. The goal of BP mining is to discovering, monitoring, and improving real BPs by extracting them from event logs of running IS [4]. Indeed, more and more events are being recorded, thus, providing detailed information about the history of the running BPs, and there is a need to improve and support BPs in the competitive and rapidly changing environment.

The main components of BP mining are event logs and process model. An event log is defined as “a collection of events used as input for process mining. Events do not need to be stored in a separate log file” [7]. Event logs comprise information about the start and completion of process steps together with related context data [8]. Every log entry holds an identifier of the case, an activity name, a timestamp, a user identifier, a booking code and a notification [9]. It is used as starting point in mining [10]. The process model is generated from the event logs [11].

Tools for BP Mining

BP mining are performed by various software tools. There are about fifteen commercial process mining tools. This research uses Disco because it is free for academia (by using academic credentials). In addition, it provides an automatic process discovery modeling as the event logs are imported into the system. This allows a simply filtering and examining the attributes in parallel with the exposed fuzzy model. Furthermore, it provides a filtering mechanism that is more transparent and obviously introduced in comparison to the other tools. Moreover, it imports log size capacity up to five million events. It is supported by standalone desktop version platform and it is easy to use. It enables animation of processes, which makes it easy to discover bottlenecks on the flow of processes and the animation of processes can be stored as video format. This approach uses a Disco to mine BP, as it introduces a large number of cases [12]. Disco uses as input an event log in CSV, MXML, XLS, XES or FXL format and produces as-is BP model as output. In addition, it produces the frequency of each activity and the frequency of each edge in the BP model. Disco imports the event log which fulfills the minimum requirement for analysis, including case ID, activity, and timestamp.

Business Process Reengineering

Business Process Reengineering (BPR) was defined early in 1997 by Blyth as “the fundamental re-thinking and radical redesign of BP to achieve dramatic improvement in critical, contemporary measures of performance, such as cost, quality, service, and speed” [13]. The goal of BPR is to radically improve the BPs. It needs the organizational restructuring with the aid of simplification and standardization and IT. Hammer [14] defined seven principles for BPR.

Service Oriented Architecture

SOA is defined as “an architectural paradigm for developing and integrating heterogeneous ISs with strict message-driven communication paradigm” [15]. It provides a set of guidelines, principles, and techniques in which BPs, information, and enterprise assets can be effectively re-organized and re-deployed to support and enable strategic plans and productivity levels that are required by competitive business environments [16].

The improvement of flexibility, interoperability, and abstraction level of software components are the major aim of SOA. Loose coupling and discoverability are the two key principles of SOA [17].

3 Related Work

Several service identification methods (SIMs) are proposed. These methods use different inputs, strategies, and techniques, and produce different types of output. The artifacts used as input include BP, application domain, legacy system, mix, data, feature, or use cases. They generally produce one of the six types of services: business process service, data service, composite service, IT service, web service, and partner service.

In this research, we are interested in the SIMs that use the BP as input artifact. Wang et al. proposed a method to identify the normalized service from the BP model. This method has three stages: (1) identify some normalized services from BPs and design the containment relationship between service and business activities, (2) design service ports in the business aspect, and (3) design the component set that constitutes a service and message mapping between service ports and component interfaces in the technical aspect. It is based on BP decomposition and algorithm techniques. However, it has not been validated by a case study [18]. Inaganti and Behara [19] concentrated on recognizable proof of big business level services and built up the choreography of the services through BP. They use BP decomposition strategy and some guidelines techniques to identify services. Their technique for service identification is restricted to the service identification procedure. Although the proposed method is clear for service identification, the measures for identification of business activities are not introduced. Besides, it is not approved with any useful case study and does not propose standard displaying documentations for process modeling and SOA. Amsden [20] focused on the description of service elements that should be located in service form of the specification phase. The method uses BP decomposition strategy and analysis technique to determine services. It has been validated by real example. Mani et al. [21] presented a method that focuses on the design specification of the user interface. It captures the unified user interface design specification from the business procedure and the information display. It uses the BP decomposition and user interface strategies to identify services. A shopping cart of Amazon case study has been used to validate this method. Jamshidi et al. [22] proposed a method that consists of four steps: (1) modeling of BP, (2) identification or the location of service model elements, (3) grouping or the categorization of services, (and 4) proper documentation of service. It is uses BP decomposition and business entity strategies and algorithm technique to determine services. This method is evaluated based on its use, users’ form of analysis, and strength or power over the existing methods. Dwivedi and Kulkarni [23] introduced a method that uses some heuristics for identifying service candidates along with model driven development. It is uses BP model (with UML) as input. As techniques, it is uses BP decomposition strategy and algorithm technique to identify a list of services. It has been validated by a real example. Bianchini et al. [24] presented a method that focuses on identifying services from a collaborative BP. It is divided into four phases: (1) semantic form of the process annotation, (2) proper identification of candidate services, (3) evaluation and analysis of service strong cohesion/coupling, and (4) refinement of the entire process decomposition. It uses BP decomposition strategy and an ontology technique to determine candidate services. It has been validated by a case study. Yousef et al. [25] proposed a method called BPAOntoSOA that uses ontology technique and BP decomposition strategy to identify services. It is based on BP understanding and analysis, taking into account functional and non-functional requirements. It is uses clustering algorithm to improve the correctness of mapping business functions to services in the resulting service oriented model. It is validated by a case study of healthcare sector. Azevedo et al. [26] introduced a method that applies heuristics to characterize services from the semantic investigation of process components and from a syntactic examination of process models. It consists of three stages: (1) selection of activities, (2) identification and classification of candidate services, and (3) consolidation of candidate services. Kim and Doh [27] proposed a method that uses the concept of graph clustering. It considers activities as service identification and bunches activities with high cooperation to maximize the cohesion of local tasks. It uses BP decomposition strategy and algorithm technique to determine services. It is validated by case study. Ren and Wang [28] proposed a method that uses a BP as input and produces a service model as output. It uses a clustering algorithm. It is validated by a case study. Nikravesh et al. [29] presented a method called 2PSIM. It uses as a BP as input to generate a service model. It identifies services by applying graphic form of partitioning algorithms. Kazemi et al. [30] exhibited an automated technique to generate business services by using a BP decomposition strategy. Jamshidi et al. [31] introduced an automated method that uses the best practices and the core principles of the form of model driven software development. It uses as starting input an enterprise business model and produces a service model. It determines services by implementing a meta-heuristic algorithm. It was evaluated against enterprise scale study. Soltani and Benslimane [32] developed a method called an Automatic Model-Driven Service Identification (AMSI) that uses a high level of BP model as input to specify service model artifacts. It identifies services by applying multi-objective evolutionary. Birkmeier et al. [33] proposed a method to determine services from BP models. The general impacts of this approach are to illustrate how the form of the business architecture can be utilized to drive the organization of the IS architecture and to make a proper confirmation of the alignment with business needs. El Amine and Benslimane [34] exhibited a type of a mechanized approach to generate business services by actualizing a few outline measurements based on process decomposition strategy. It takes a BP as input information and produces a list of business services by applying a multi-objective combinatorial particle swarm optimization algorithm. Bianchini et al. [35] acquainted a Process with Service approach (P2S). It is a computer-aided methodology to permit the recognizable proof of services that consolidates a collaborative BP. Mohamed et al. [36] proposed an automated method. It identifies services from BPs. It applies for bunching a hybrid particle swarm optimization algorithm and several design metrics for delivering reusable services with appropriate granularity and a satisfactory level of coupling and cohesion. Leopold et al. [37] proposed a mechanized service identification technique that distinguishes a list of positioned service candidates from BP models. Specifically, it creates a list of atomic services, composite services, and inheritance hierarchy services.

All of the above-mentioned SIMs use as input a business process model. Since, it plays an important part wrapping the phase’s form of the BP design and system implementation. However, in most cases, the BP model could not introduce all possible essential tasks and interrelated control flows in the design phase. Moreover, the BP model is not sufficiently adaptive to respond at a runtime to the dynamic environments [38].

4 The Proposed Approach

This section presents the proposed approach along with the used research methodology. The research methodology mainly consists of two steps: (1) study the BP mining approaches, and (2) selection of a suitable SIM. The proposed approach is based on the service identification phase of a service oriented development.

4.1 The Research Methodology

This section details the steps of the research methodology that is used to design the proposed approach. As shown in Fig. 1, the methodology consists mainly in surveying the BP mining techniques and the existing SIMs to select a suitable SIM, then combining the BP mining techniques with the SIM method to identify services from the running BPs.

Fig. 1
figure 1

Steps and output of the research methodology

Step 1: Study the BP mining

BP mining focuses on the analysis of the BP by using an event log. It aims at discovering, monitoring, and improving the BPs by extracting knowledge from the event logs that are generated automatically in the IS [4]. It consists of three types of techniques: discovery, conformance checking, and enhancement.

However, the proposed approach is limited to the discovery technique. A discovery technique uses an event log as input and produces a BP model without using any prior information. Figure 2 depicts the BP mining discovery technique process. It starts from the event log, then extract each case or process instance, and finally combine them to produce the BP model.

Fig. 2
figure 2

The process of BP mining discovery technique

Step 2: Selecting a suitable SIM

This step consists mainly in searching for existing SIMs that use as input a BP to select a suitable SIM by comparing the existing, as shown in Fig. 3.

Fig. 3
figure 3

Methodology of selected SIMs

SIMs are divided into three scenarios: top down, bottom up, and hybrid. This research focuses on the top-down scenario and starts from the input-output matrix introduced by Gu and Lago [39] to select SIMs. According to this research, there are about thirty SIMs based on the top-down scenario. They use various inputs artifacts such as BP, application domain, legacy system, data, feature model, use case, and/or a mix of them. These SIMs use strategies such as BP decomposition, business functions, business entity, ownership and responsibility, goal driven, component based, existing supply, front office application usage analysis, infrastructure, non-functional requirement, and user interface. They are based on techniques used such as algorithm, guideline, analysis, ontology, pattern, and information manipulation. Finally, these methods produce output in various formats such as informal service specification, service model, formal service specification, service implementation, or a list of services.

This research focuses only on the SIMs that use as input a BP. Table 1 depicts an example of comparison between SIMs that use BP as input: authors, publication year, type of input, strategy used to understand the type of input, type of output, technique, and the validation.

Table 1 SIMs that use a BP as input to get a service as output

This research compares between two SIMs that use a BP as input and produce services as output. It uses different criteria introduced by Klose et al. [40] and Kim and Doh [27] as shown in Table 2. These criteria are: scenario, strategy, input, input format, service classification scheme, covering of SOA design phases, characteristics, and application of process models for service identification.

Table 2 Comparison between Mani et al. method and Kim and Doh method

Both methods are analyzed in the top-down scenario that starts with the high-level artifacts as input to identify services. While, both of them start with BP as input, the format of an input differs in each method. Mani et al. method starts with the BP model and data model, but Kim and Doh method starts with the BP model only. In addition, both methods produce as output service implementation written in Web Service Description Language (WSDL). The other comparison criteria are approximately similar. Thus, the proposed approach will be based on Kim and Doh method, because BP model is produced by only a BP only mining software tool.

The SIM introduced by Kim and Doh in 2009 has three activities: clustering of activities into tasks, organizing tasks into initial set of services, and describing the services.

4.2 The Proposed Approach Described

The proposed approach uses as input an event log and produces a list of services as output.

First, a BP mining discovery technique allows the extraction of the BP model from the event log of running IS. Next, the obtained BP model is used as an input in the clustering activity, whereby, each cluster is mapped into a task. These tasks are then organized into an initial set of services by allocating the activities to the corresponding task and defining operations for each task. Then, each task becomes a service and is inserted into its corresponding list. Finally, the resulted services are verified to comply with the principles SOA.

The approach consists of seven activities: collect an event log of the BP, convert an event log file into a suitable format that is understood by a BP mining software tool, use the BP mining discovery technique tool and produce a BP, cluster the activities of the BP model into tasks, organize tasks into initial a set of services, check the compliance of the resulting services with SOA principles. These activities are performed sequentially, as shown in Fig. 4.

Fig. 4
figure 4

The proposed approach illustrated

Step 1. Collecting an event log of the BP

An event log contains information about events referring to an activity and a case of the running BP. Each event represents an activity name and is related to the particular case (case identifier). In addition, an event log introduces more information about events such as resource executing or initiating the activity, the timestamp of event and data elements recorded with the event [41].

Event logs can be automatically generated from different business systems such as ERP systems, CRM or Workflow Management systems. The two common formats of event logs are Mining eXtensible Markup Language (MXML) and eXtensible Event Stream (XES) [42].

Since, most legacy systems of the large enterprises are not documented or explicitly presented, they have very limited information about what is actually happening in their organization. In practice, there is often a significant gap between what is prescribed or supposed to happen, and what actually happens. The event log contains more valuable and embedded knowledge. It is used as a starting point of this approach, i.e. as an input artifact for BP mining.

Step 2. Converting an event log file into a suitable format that is understood by a BP mining software tool

Each software tool uses different formats of event logs. For instance, CSV, XLS, MXML, XES, and FXL formats are understood by Disco.

Step 3: Using the BP mining discovery technique tool to produce a BP model

In this appraoch, we use Disco as the BP mining discovery technique software tool. The following steps depict how to use it:

Step 4: Clustering the activities into tasks

The BP model generated by Disco will be the starting input in this step with the frequency of each activity and the frequency of each edge in the model. Each activity in the BP model maps into a task. It clusters tasks which have edge with high frequency into a new task through hierarchical clustering algorithm to minimize the coupling of tasks and to maximize the cohesion of tasks. Then, the following rules depict how to cluster tasks and how to calculate the frequency of new tasks:

  • Rule 1: When two adjacent tasks are merged, one of their incident edges having the highest frequency is selected first.

  • Rule 2: Once an edge is selected, the edge cannot be selected again in each repetition.

  • Rule 3: When tasks are merged, a new task is generated and its frequency is calculated.

  • Rule 4: When adjacent two tasks are combined into a new task, the frequency of new task is the maximum frequency of two tasks.

Rule 1 and Rule 2 are used to determine which tasks are merged. On the other hand, the model is reconstructed by adjusting edges based on Rule 3 and the frequencies of new tasks are assigned based on Rule 4.

The edges linked to two tasks, which are merged into a new cluster, are connected to a new task. Tasks are then repeatedly combined in the same manner.

The termination of the clustering of tasks is based on the preferred size of services and the reasonable number of services desired for business domain. Since, the number of services directly affects the performance and network overheads.

Step 5. Organizing tasks into an initial set of services

The objective of this step is to organize tasks into services, where the cohesion within the service is maximized and the coupling between services is minimized. Service specification is derived by organizing identified tasks into services. Graph clustering provides a good way for grouping the vertices in the graph according to their connections. A structure of tasks is generated and activities in the task are suggested to be included in a service. This method provides a potential design with different levels of inter-service coupling, where activities are organized into services and smaller services are successively integrated into bigger ones.

Step 6. Listing of services

For an identified service, a set of attributes to describe and document the capability of each service is defined. The rich description of the capabilities can be passed to development teams who can use the information to help select the appropriate implementation technologies, hosts, deployment topologies. The list of services is easily derived from tasks and activities are provided by the BP model.

Step 7. Verifying service candidates against the service orientation principles

The identified services need to be evaluated against service orientation principles, while performing service identification phase.

5 Case Study

5.1 Presentation

The Road Traffic Fine Management sector was formed due to a great urge for a professional interface in between the motoring public and municipalities who gives out traffic fines. An Italian Local Police Force presents an IS that  handles Road Traffic Fine Management process.

The Road Traffic Fine Management process has eleven activities: create fine, send fine, insert fine notification, insert date appeal to the prefecture, appeal to judge, add a penalty, send for credit collection, send the appeal to the prefecture, receive appeal result from the prefecture, notify result appeal to an offender, and payment.

This research uses  the event log that was recorded by the IS that manages the Road Traffic Fine Management process, as a case study. The event log contains about 150,370 cases and 561,470 events.

The Road Traffic Fine Management process requires implementation in SOA to be more efficient, effective, and flexible.

5.2 Application

The event log has been collected from the internet, as a real event log of road traffic fine management process. It consists of 150,370 events. It contains all information of the road traffic fine management process for instance case ID, activity, complete timestamp, resource, total payment, notification type, expense, vehicle class. However, the needed fields are case ID, activity, and timestamp (Fig. 5).

Fig. 5
figure 5

A road traffic fine management process model

The event log format has been converted to CSV (Comma-Spread Values) format to be acceptable.

In this step, Disco takes an event log with the various format such as CSV, XLS, MXML, XES, and FXL as input and produces a BP model as output. Moreover, it presents the frequency of each activity and edge in the model. The format of Road Traffic Fine Management process event log is CSV.

Firstly, all activities are converted into tasks. Secondly, the edge between two tasks with the highest frequency is selected. The new task is generated by merging those two tasks, where the  frequency of this new task is the maximum frequency of two tasks. The connections to new task are updated.

The generated tasks are mapped into an initial set of services, as shown in Table 3.

Table 3 Set of services, tasks and activities

6 Conclusion

6.1 Summary

The research aimed at mining BPs in order to reengineer them in SOA. It concentrated on service identification based on running BPs in order to improve their flexibility with SOA.

The literature review has shown that the existing SIMs do not use event logs of the running BPs as input, though they contain valuable information of the running BPs.

The project has proposed a new top-down approach to identify service candidates by using the running BPs. The proposed approach uses as input an event log of a running BP to produce a list of services. It is a step-wise process that consists of: (1) collecting the event log of a running BP, (2) converting the event log file into a suitable format that is understood by a BP mining software tool, (3) using the BP mining discovery technique tool to produce a BP model, (4) clustering of activities into tasks, (5) organizing tasks into an initial set of services, (6) listing the services, and (7) verifying the resulting service against service orientation principles.

The proposed approach has been applied to a real BP, where the event log for Road Traffic Fine Management process was used as input to generate a set of services that would support the process in an SOA.

This research has theoretical and practical impacts. From a theoretical perspective, the research contributes to service technology by introducing new service identification approach. From a practical perspective, the proposed approach can assist the enterprises to mine their BPs in order to model them with services with respect to SOA and make them more flexible.

This approach still needs additional improvement, as it does not categorize the resulting services into various types, such as business services, IT services, data services, or composite services, based on the nature of the logic these services encapsulate and the manner the services are usually used within SOA.

6.2 Future Work

The proposed approach can be further:

  • Refined in order to identify high quality services.

  • Extended to cover all phases of SOA software development.

  • Automated by using a tool in order to perform the service identification process faster and easier.