1 Introduction

One of the greatest challenges in using health data is the asymmetry of information concerning data generation from several actors in health care—such as payers, patients, physicians and other service providers (Maissenhaelter et al. 2018), which greatly hinders the integration of clinical data and, consequently, the assessment of health care quality.

Also associated with these difficulties is the large amount of data generated by health information systems, which are collected from external environments and often fed in real time and made available in different formats (texts, videos, images, messages, gene expressions, etc.), defined as “Big Data”. The understanding and negotiation on the four elements of such information—volume, variety, velocity and veracity (Bellazzi 2014)—becomes crucial in the Health field.

An alternative to minimize these challenges is the discovery of knowledge using information technology tools—data structures arranged with greater availability and agility in the search for information, such as the proposed Data Warehouse (DW) environment and methodologies like the Knowledge Discovery in Database (KDD), as well as data mining itself (Dallagassa et al. 2019).

However, the specificity of health data justifies adoption a computational environment perspective involving analytical systems that solve problems related to the ordering and timing of health events to discover standards that allow the management, storage and exploitation of such knowledge (Carvalho et al. 2016).

Such demand reinforces the use of Process Mining (PM), a concept proposed by Van der Aalst, which applies robust methodologies using data mining and machine learning for pattern recognition, using models that represent the process flow identified by the sequence of events, their timing, and the assessment of resources used (Mans et al. 2015).

The PM approach to health care allows for the discovery of process models and, hence, it identifies sequential, temporal and resource patterns, evaluates performance through the duration of activities and bottlenecks to act on process improvement and also raises and analyzes decisive points of the processes to generate predictions (Van der Aalst et al. 2002).

Along with this technique for process model discovery, the use of PM concepts allows one to adopt comparative analyses between the discovered models and reference ones, such as medical protocols or clinical guidelines, as the example shown in Fig. 1, on which the left side shows the process model discovered from a patient’s use data, a hypertensive patient in this case, and the right side shows the reference model that represents the desirable situation for the patients’ journey with this disease, i.e., a clinical guideline or care protocol for the management of this disease.

Fig. 1
figure 1

Source: own author

Model of process mining application in health.

This systematic review, carried out with quantitative and qualitative approaches, mainly aims to identifying PM applications in health care based on articles within the scientific literature. The evolution and use of tools and algorithms and their main contributions and limitations, are presented.

2 Method

2.1 Research strategy

This study aims to observe, in the recent literature, possible PM applications geared towards health care, their characteristics and limitations, to conceptualize and propose applications using the PM technique.

To achieve this goal, this review’s method was based on recommendations of the Preferred reporting items for systematic reviews and meta-analyses—PRISMA (Moher et al. 2009), which consists of a series of checks on a flow diagram, considering essential items for the development of a systematic review.

2.2 Selection of studies and data extraction

The following databases were used to select the studies: ACM Digital Library, IEEEXplore, PUBMED, ScienceDirect, and SpringerLink, using an extensive search term, as follows: “process mining” OR “processes mining” OR “workflow mining” OR “mining workflow” OR “workflows mining” OR “mining workflows”.

The time frame comprise articles published from 2002—an important reference year for the subject of PM due to the creation of the Alpha or α-miner algorithm (Van der Aalst et al. 2002), an algorithm that, besides being the first to arise in the field, also made the subject of PM more relevant—to December 2019, the final period of this study.

After the reading of titles and abstracts, the following areas were identified: Health Care, Information and Communication Technology (ICT), Industry, Education, Finances, Logistics, Public/Government, Security, Call Center, Usability, Entertainment, Robotics, Utility, Garment, Audit, Commerce, Biology, Hospitality and Agriculture.

2.3 Criteria for study selection and scope

Two inclusion criteria (IC) and two exclusion criteria (EC) were used in the search, namely:

  • IC1: Articles in electronic format, found using the search terms and periods mentioned above;

  • IC2: Articles published from 2002 to December 2019;

  • EC3: Articles with title, abstract and keywords without relevance to the subject studied;

  • EC4: Elimination of duplicate articles.

Finally, only health care studies were chosen for the study selection. To explore the selected studies, the following research questions (RQ) were used for mapping:

  • RQ1: In what health environment has Process Mining been applied?

  • RQ2: What was the main application Analysis and Resource Assessment (AR), Discovery of Process Models (DM), Predictive Analysis and Outliers (PA), Analysis and Conformity Check (AC) of the work through process mining?

  • RQ3: What was the strategy/algorithm adopted?

  • RQ4: Among the studies, what was the main contribution of the application in the Health field?

2.4 Study analysis and evaluation

Looking at the research questions, one can see the studies were chronologically ordered to capture the historical evolution of the subject and to characterize their use in health care, pointing out some important milestones in this timeline of process mining applied to health care.

Given the proposal and the focus of this study, the following analysis efforts are limited to articles related to Process Mining in health care. They propose subjects interesting for the timeline of scientific contributions in the segment in question; such analysis will have as main focus algorithms, positive contributions, and limitations in the development of solutions for health care.

3 Results

3.1 Selection of studies and data extraction

The search in the target databases resulted in 5476 articles. Table 1 groups these studies by the database and number of articles.

Table 1 Database, number of articles retrieved.

After applying the inclusion and exclusion criteria mentioned in the methodology, 1949 articles were left for the review. As this study’s object is connected only to health care, only articles published within this field of study were selected, totaling 270 articles.

Figure 2 shows the article selection process, detailing the inclusion criterion step-by-step and the number of articles resulting in each step.

Fig. 2
figure 2

Source: author

Article selection process.

3.2 Historical evolution of PM utilization in health care

This study showed that process models in health care differ from other fields regarding their characteristics, high variability, complexity, security, privacy and the multidisciplinary nature of their activities (Van der Aalst et al. 2002; Rebuge and Ferreira 2012; Munoz-Gama and Echizen 2012).

Chronologically, the first article on Process Mining application found in the health care field, published in 2003 (Ciccarese et al. 2003), shows the importance of resource analysis through process models discovery and by comparing such models with clinical guidelines for compliance analysis. The authors used the alpha algorithm (Van der Aalst et al. 2002).

However, the author emphasizes the difficulties related to the representation of the discovered models. Such difficulties were inherited, to a certain extent, from algorithms designed for the industrial segment and that did not fit the models for the health care field.

From 2003 to 2008, only nine articles were found in the health care field, using the alpha algorithm (Van der Aalst et al. 2002) and Heuristic Miner (Weijters et al. 2006). Regarding PM usage, five of these studies were focused on the discovery of process models, aiming to verify the compliance with clinical guidelines to improve processes. Some difficulties were noted in these articles regarding the discovery of complex and high variability models, also known as spaghetti models.

The Fuzzy Miner algorithm proposed in 2007 (Günther and Van der Aalst 2007) constitutes an opportunity, an important alternative to treat the characteristics inherent to the discovery of process models in the health care field. The strategy adopted allowed for a more objective visualization through the abstraction of reality and representation of the most frequent activities and associated with temporal measures between them (Günther and Van der Aalst 2007).

From 2008 to 2014, some pre-processing techniques that establish ways of controlling path variability and grouping activities to represent and discover complex models have emerged, such as the MinAdept algorithm (Li et al. 2010) and other techniques (Song et al. 2013; Fei and Meskens 2013; Caron et al. 2013, Montani et al. 2014, 2017; Metsker et al. 2017; Xu et al. 2016a, b; Lu et al. 2016; Rojas et al. 2017).

The high variability of health events is a characteristic evidenced in several studies of the field, and is directly linked to how the patient is cared for, as it is difficult to establish the predictability of their script since it depends on numerous factors, such as: biological interactions, their pathology, the kind of treatment performed, among others (Li et al. 2010; Prodel et al. 2018). In addition to this lack of predictability, patients’ trajectories are stochastic, i.e., random in relation to their occurrence and difficult to plan (Li et al. 2010).

In 2014 the possibility of applications is disseminated through the book Process Mining in Healthcare (Mans et al. 2015) and the algorithm Inductive visual Miner (IvM) (Leemans et al. 2014) was proposed and made available from ProM1 version 6 onwards (Van der Aalst 2011), being one of the most recent and adequate techniques in the treatment of heterogeneity of event records for complex model abstractions, with the incorporation of several criteria, such as: frequency analysis, grouping, detection of deviations and frauds, analysis of times and bottlenecks, general visualizations, understanding of the process, and evaluation of values per outcome, all incorporated into the same solution (Leemans et al. 2014).

To evoke and facilitate the understanding of this evolution from 2001 to 2019, the timeline is shown in Fig. 3, indicating major milestones and their influence on the application in studies focused on health care.

Fig. 3
figure 3

Source: author

Evolution of process mining in healthcare.

3.3 Health care environment used

However, the approaches presented often deal with analyses geared towards a specific context, a pathology or a surgical procedure, which require time in the definition of parameters and expert knowledge of the Health field. Another point worth mentioning is that such approaches are often related to a health institution: 77 (36%) of the works found were performed to be used in hospital environments and, hence, one cannot have a full visualization of the processes on the trajectory of a patient’s care, making it difficult to analyze the perspective of health technology management and attention to patient health and safety.

3.4 Main areas of application

We identified, in each article, the previously defined area of application, among which: discovery of process models, analysis and evaluation of resources, conformity analysis, predictive analysis and outliers, and even other uses aimed at data processing, systematic reviews and algorithms. Table 2 and Fig. 4 presents such items, with their respective number of articles found in this review.

Table 2 Database, number of articles retrieved.
Fig. 4
figure 4

Source: author

Applying process mining in healthcare.

One can see that most studies (34.1%) use process models discovery for the identification of protocols and clinical guidelines. This was the most evident line of interest and, seemingly, the central object vis-à-vis to the prior recognition of process standards obtained through the representation of its model, with its activities and orientations between them.

In the second line of application, in descending order (24.8%), is analysis of resources, whose use is linked to the analysis and optimization of resources and to the evaluation of health technologies, using real data for the representation and analysis of teams, as well as the evaluation of health technologies.

Compliance analysis is also worth mentioning (17.4%), as it is focused on the evaluation of discovered process models by comparing them with clinical guidelines and protocols; this allows for the validation and certification of process standardization. Another point found in this item refers to conformity analysis being related to the patient care trajectory, or the patient’s journey, evaluating whether the patient is following application references related to a particular clinical guideline for an illness.

There is potential in predictive analysis and in the detection and identification of outliers, which were present only in 6.3% of the analyzed studies and could be more widely explored, as they comprise techniques for standard recognition related to references stored in the knowledge database.

3.5 Strategy/algorithm adopted

For the algorithms’ grouping, the techniques used, and the algorithms that had more than two occurrences were identified. For hybrid modeling, all algorithms were counted.

Regarding the most commonly used algorithms (cf. Table 3 and Fig. 5), the Fuzzy Miner algorithm (Günther and Van der Aalst 2007) was used by 31.4% of the studies. The ease justifies this preference in discovering health models, providing a better understanding and representation, and managing to reduce the complexity and difficulties of representing such models..

Table 3 Process mining algorithms used in healthcare.
Fig. 5
figure 5

Source: author

Process mining algorithms used in healthcare.

The algorithms Alpha (Van der Aalst et al. 2002), with 7.7%, and Heuristic Miner (Weijters et al. 2006), with 25.0% of usage, also show high adoption, demonstrating their pioneer character and availability in the ProM framework (Van der Aalst 2011). However, health care models are difficult to represent, often generating a complex view (spaghetti type) for the models of discovered processes.

The algorithm Inductive Visual Miner (Leemans et al. 2014) had a low utilization rate (7.7%) due to its more recent project, in 2014. It has significant potential for use due to its robustness in the discovery of health care models, being capable of delivering a perfect fitness and log precision. However, the significance filter and extension capabilities available in the Inductive miner algorithm have a lower capacity than those available in Fuzzy miner, mainly when compared to commercial tools such as Disco (Fluxicon) and Celonis. On the other hand, a yet little-explored differential is the Inductive Miner-infrequent (IMi) extension, which allows to detect problematic behaviors with low occurrence.

3.6 Main contributions of PM application to the health care field

Table 4 and Fig. 6 presents the studies and their classification regarding the main focus concerning the contribution to the field of health care.

Table 4 Main contributions to the Health care field.
Fig. 6
figure 6

Source: author

Applications Process mining in health.

Among the suggested contributions, one could observe the main application of the PM approach is related to the task of process model discovery (35.5%), aimed at identifying the most frequent activities for establishing or updating medical protocols and clinical guidelines that can be used for the standardization of health care trajectories and guidelines. Second, there is process compliance (17.4%), whose contribution is in confronting discovered models with standards identified or previously established by specialists or specialized companies. Another important point is the almost no existence of an effort for the integration process between systems that record patients’ events and implement PM tools. This greatly reduces the potential to exploit existing data on a large scale. Evidently, this integration gap of such systems and tools represents a significant research and development potential.

4 Discussion

4.1 Main findings and contributions

Based on the previous analyses, one may say the technological bases for discovering process models and verification of conformity, particularly regarding protocols and clinical guidelines, have already been established.

Nonetheless, there is still great potential to be explored concerning the systematized application of process mining in health care and a broader understanding by health care specialists. Such understanding goes through the fact that all activity generates events. These events are relevant to the disclosure of knowledge about processes related to clinical guidelines and protocols, making it easier to link-local or international trends or good practices to the application of evidence-based medicine in health care management.

The health care segment often offers resistance to innovations, but today society demands a disruptive model of assistance vis-à-vis its sustainability. This, to a certain extent, should potentiate the abandonment of conventional management and control solutions, and integrate technology to meet patients’ wishes for more quality of life and care, increasing the value offered to the chain of health service providers through the quality delivered to its customers.

4.2 Needs of the health care field

Due to the changes in the demographic profile, health services are expected to increase (Kilsztajn et al. 2016), and the care for the elderly and the chronically ill is becoming an essential strategy for all health stakeholders. Such possibility becomes a potential to be explored with Process Mining techniques, among which discover process models, with the possibility of comparative analysis with protocols and clinical guidelines, pointing to care deviations and alert signs. Here, the use of tools that allow long-term and systematic follow-up is highly relevant due to the frailties involved in the care to elderly people and high-cost situations, to provide health services.

Regarding the evaluation of health technology, the PM proposal is appropriate. It integrates with new approaches that operate with real and online data, also called Real-World Data, used in analyses titled Real-World Evidence (Sherman et al. 2016; Dang and Vallish 2016; Guerra-Júnior et al. 2017), thus enabling a more cost-effective health care process due to having an analysis with higher speed and usability (Guerra-Júnior et al. 2017), allowing managers to dedicate themselves to obtaining a more global vision of the health care process, consequently creating space to provide the most cost-effective use of health resources and, especially, to reduce adverse events and life-threatening situations.

4.3 Opportunities and improvements for PM use

Contributions resulting from the application of PM in health care not only focus on the opportunity to improve health management, but also contribute to the improvement in the use and transformation of data from patient records and to guidance on possible best practices in the quality of health care services, thus reducing the difficulty in standardizing and systematizing the best practices of health services.

We also note the concern about the treatment of the large volume of data generated in the health care area and the importance of algorithms being able to treat situations that require the reduction of variability for the recognition of patterns in a dynamic way, acquiring rapid and systematic results with the least involvement of information technology specialists and professionals in matters related to pre- and post-processing tasks.

In the studies analyzed, a relatively short-sighted approach to the entire patient’s trajectory can be seen, regardless of the institution where the patient was treated. In other words, one can see there is still difficulty in finding a process model to represent the patient’s entire journey, traveling through various institutions throughout his or her care. The proposal of a single electronic health record is believed to be the way to achieve the analysis of the entire care trajectory of a patient. In addition to a single record, it is common knowledge that one should also consider the data and assessment of outcomes by adding questionnaires, functioning as self-assessment of the patient’s outcome.

The possibility of having an integrated tool with records and patient use information for the process-based assessment will allow for the indication of alert situations regarding care quality for the patient’s health and safety, in addition to potentiating the discovery of new standards, regardless of what the manager already knows. This shall generate new knowledge that is likely useful to specialists in a systematic and real-time manner.

4.4 Research conclusions

According to the chronology of process mining application to health care, one can notice there was a great impulse after 2007, with the emergence of algorithms that can handle the complexity related the treatment of process variability in the health care field. The representation of comprehensive process models, made possible by the algorithms after this period, led to several studies being carried out and published with proposals for various solutions in health care.

Process Mining presents evidence to support process improvements, also supporting the shift to new models that favor the evaluation of health technologies and the quality of care associated with good management of patients with chronic diseases, dealing with patient eligibility, health prevention and promotion, care monitoring and assessments, and best outcomes in Health.

In addition to acting directly on the improvement of such care, an evolution and interest in the development of PM techniques for the treatment of large data volumes can be noted, as well as for issues related to the fragmentation of data and information, thus enabling the visualization of the patients’ entire trajectory, improving the quality of health information.

However, there are still some improvement points for fully using the Process Mining technique in the Health field, among them, the lack of information integration from the various health systems, the most appropriate treatment for the high variability of a patient’s event trajectory, the absence or difficulty of collecting clinical information, and processes that can automatically learn about pattern recognition. They can be stored in knowledge databases—repositories that store and maintain these patterns—, being, therefore, used as an automatic form for compliance analysis and verification of adherence to medical protocols and clinical guidelines.