Keywords

1 Introduction

The adoption of blockchain-related technologies is spreading over many different contexts. When adopted in a new context [26] they generally have disruptive effects on traditional business, and they introduce novel ways of interactions (e.g., payment [19], agriculture [23] and others). Such transformations involve not only companies but also public authorities, that are currently recognising the potentialities of such technologies. The success of the “blockchain” is also confirmed by the significant interest of the community towards a technology able to guarantee trust natively using a faultless and robust validation system (e.g., in the last two years there have been 3.7 million Google searches for blockchain). The other key factor in the success of such a technology is tied to smart contracts. A smart contract is very much similar to a real physical contract which however takes the form of a digital artefact, and it can be used to establish business relations. These relations are enforced automatically via transactions as soon as the terms of the contract are fulfilled, and then the transactions are stored in the blockchain. The execution of a smart contract results in a set of activities that are carried out in a particular order. The order of execution of the activities describes the business logic of the contracts, and provide evidence to the interested partners about their completion.

Even though in a contract it is generally useful to define an order for the permitted actions, the smart contract specification does not provide mechanisms to enforce an order. It is then generally useful to define new methodologies for auditing the control flow of blockchain-based applications [13], and then to check if the actual execution of the contract functions conforms to the expected ones. Process mining is certainly a possible strategy to support auditors in such checks. Indeed, previous experiences show the possible benefits of process mining [2] in relation to auditing activities [4, 20]. Such experiences underline the possibility to perform a better analysis of the process flow based on historical data as well as the possibility of auditing processes on-the-fly. Up to now, blockchain has the potential to impact the audit sector making particularly significant the application of process mining as a supporting technique.

In this paper, we illustrate the methodology, and we report the results we obtained in applying process mining for auditing smart contracts. In particular, we consider the list of transactions resulting from the execution of RotoHive, that is an online fantasy sport running weekly tournaments. The application has been implemented as a smart contract on the Ethreum blockchain, that provides a set of functions that a player can invoke to play in a tournament. In running process mining we apply three different algorithms: the Heuristics Miner [24], the Inductive Miner [21, 22], and the Split Miner [10]. Fitness, precision and generalisation are measured to check the quality of the mining activity. The major benefits of our methodology are as follows.

  • Reduce time and cost for auditing contracts usually done manually on a set of transactions randomly selected.

  • Improve the effectiveness of auditing, since by looking at all the transactions, auditors will inevitably find more exceptions requiring follow-up.

  • Make it easier to investigate deviations highlighting anomalies at run-time.

The rest of the paper is organised as follows. Section 2 provides an overview of blockchain technology and process mining. Sections 3 introduces the methodology we follow in our study, while Sect. 4 presents the case study we consider as well as recommendations resulting from the conducted analysis. Section 5 presents related works available in the literature. Finally, Sect. 6 closes the paper with some remarks and opportunities for future works.

2 Background

This section presents the relevant notions related to blockchain, with a particular focus on Ethereum, and process mining.

Blockchain and Ethereum. A blockchain is a distributed ledger composed by a linked list (cf. chain) of records called blocks [26]. Each block contains a limited number of transactions in its body, while the header includes, among other things, the hash of the current block and the hash of the previous block. New blocks are added to the chain at regular intervals of time by the so-called “miners”. These are computational nodes related to the blockchain infrastructure that is needed to derive the hash of a block. The mining process and the use of consensus protocols permit us to verify the genuineness of the transactions included in each block. Finally, the replication of the chain in any node of the network guarantees decentralization and trustworthiness, without the need of a third party independent authority. The blockchain ideas have been initially proposed to support payment systems based on cryptocurrencies. In the last years, its adoption spread off in many different contexts, also about the inclusion of additional mechanisms, such as that of smart contracts. These can be considered as special programs which are executed over the blockchain infrastructure, whose nodes are now equipped, in some specific technologies such as Ethereum, with computational power. The execution of smart contracts produces transactions to be stored in the blockchain, thus ensuring trust among the parties.

Ethereum is a concrete implementation of the blockchain that includes support for the execution of smart contracts [33]. This is the technology we used in our approach. In Ethereum every node connected to the Ethereum network embeds an instance of the Ethereum Virtual Machine (EVM). The operations executed in the EVM, like storage of information or contract instructions have an associated economic cost defined in terms of GAS, which is the unit measuring the amount of computational effort needed for the execution of the operation. The execution cost has two main advantages: (i) it reduces the risk of malicious computational tasks, and (ii) it encourages mining activities by network participants and, hence, it permits to keep the overall system working. Indeed, miners are rewarded for each block they mine with a default amount of Ethers plus the sum of the transaction fees included in the block. Currently, the most prominent language to write smart contracts for Ethereum is Solidity (https://solidity.readthedocs.io/).

Process Mining. Process Mining is a discipline in between data mining and computational intelligence on the one hand, and process modeling and analysis on the other [2]. Process mining aims to extract non-trivial and useful information from event logs available in today’s information systems for discovering, monitoring and improving real processes [3]. It is an evidence-based approach, and this ensures a closer correspondence between modeled and observed behavior because the evaluation and definition of the model are based on real process execution traces.

In process mining, we can distinguish different activities such as discovery and conformance. The first technique, discovery, produces a model from an event log without using any a priori information, and usually, the discovered model is a process model expressed in a formal notation. The second class is conformance; it allows users to compare a process model with an event log of the same process. This is a useful technique to check whether a process as inferred from the log corresponds to the expected model and vice versa.

The discovery activity is generally based on an algorithm able to produce a model from a log. Over the years several mining algorithms have been developed, each with its proper characteristics [9]. In this paper, we apply three of them, such as Heuristics Miner, Inductive Miner and Split Miner, and we shortly discuss the results we get.

  • The Data-aware Heuristic Miner (DHM) is an algorithm for discovering process models where the behaviour is obscured in the event logs by noise, infrequent outliers or recording errors [24]. Data-aware Heuristic Miner uses the data attributes and dependency condition to distinguish infrequent paths from random noise by using classification techniques directly embedded in the discovery algorithm built upon the Heuristic Miner. The discovered models are, then, visualized as Causal Nets (C-Nets), a concise graphical notation with clear semantics, which includes information on split and join gateways.

  • The Inductive Miner is an algorithm based on a divide-and-conquer approach [21, 22]. Such an approach is applied to the log splitting it into sub-logs and then recursively applied to these sub logs until they contain only a single activity. In this way, the problem of discovering a process model for a log is broken down in discovering several sub-processes, one for each sub-log. The algorithm ensures to return a sound, fitting and block-structured process model in finite time.

  • The Split Miner is an algorithm similar to the heuristic miner, however experiments showed that the algorithm is 2–6 times faster than other state-of-the-art methods [10]. The first step of the algorithm constructs the Directly-Follows Graph; then it detects self-loops and short-loops to discover concurrency relations between pairs of tasks. Whenever a likely concurrency relation between two tasks is discovered, the arcs between these two tasks are pruned from the Directly-Follows Graph resulting in a pruned Directly-Follows Graph. In the third step filtering is applied to the pruned Directly-Follows Graph to strike balanced fitness and precision, still maintaining low control-flow complexity. In the fourth step, split gateways are discovered for each task in the filtered pruned Directly-Follows Graph with more than one outgoing arc. This is followed by the discovery of join gateways that is the last step of the algorithm.

It is worth noticing that processes resulting from the mining are different in term of representation language. All of them can be traced back, up to some transformations, to BPMN [27] that is the target language we use in this paper being well-know and understandable to auditors.

To measure the quality of a discovered model in comparison to the event log that generated it several quality parameters have been defined [30]. Among the other we refer to:

  • Fitness: permits to measure the extent to which the discovered model can accurately reproduce the cases recorded in the log;

  • Precision: permits to measure how much additional behaviour is included in the model i.e. a poor precision means that a model admits much additional behaviour with respect to that reported in the log;

  • Generalization: permits to measure how much the model just reproduce the behaviour reported in the log i.e. a low level of generalization means that the model cannot handle much more behavior with respect to the one reported in the log, maybe because not yet observed.

Overall the purpose of mining is to discover a model representative of the behaviour expressed by the event log and “to guess” additional behaviour. To generate a process model in line with reality, the algorithms should maintain a proper balance between overfitting and underfitting. The former property means that the generated model is too specific and only admits behaviour similar to that observed, while the latter property, however, presents a model too general which also accepts behaviours that are probably unrelated to the observed one.

3 Enabling the Auditing of Blockchain Contract

In this paper, we envisage a scenario based on process mining techniques to support auditing of processes “enacted” using smart contracts. In Fig. 1 we illustrate how the methodology we propose fits in the life-cycle of business transactions established through a smart contract. In particular, given a set of requirements on the transactions, a developer will define a smart contract (expressed in Solidity in our case), that will be successively deployed and executed over a blockchain infrastructure (EVM in our case). The execution of the contract will lead to a set of related transactions stored in the blockchain. At that point, it is important to check that, among other checks, the sequence of actions and interactions put in place by the contract participants are in line with what was expressed in the requirements. To enable such auditing activity we conceived and implemented the ABC (Auditing Blockchain Contracts) methodology that we will detail in the following. The methodology consists of four phases executed one after the other iteratively as represented in Fig. 2.

Fig. 1.
figure 1

ABC methodology context.

Fig. 2.
figure 2

ABC methodology.

Smart Contract Transactions Retrieval. The first phase of the methodology consists in the selection of a smart contract to be audited from the blockchain. The proposed approach has some interest in case the contract embeds a complex behaviour in terms of ordering of the contract foreseen operations. In general, not all contracts implement complex behaviours, since they contain single functionality usually not correlated each others. In this work, we are interested in challenging contracts with a complex logic since we believe that auditing can give greater benefit in case of complex behaviour. In this work, we select two requirements to consider the contract auditable: (i) the number of recorded transactions should be higher of a given threshold calculated on the dimension of the contract (i.e., in our case we set such threshold to 100), and (ii) it should contain at least one user any links to multiple interactions on different methods of the contract to observe meaningful emergent behaviours. Indeed, if this were not the case, we would not have a concrete order on the operations of the contract. From a technical point of view, the described operations are pretty straightforward. We developed a simple application in C# integrating the Etherscan API, that permits to scan the blockchain looking for contracts according to pre-selected requirements, and it allows users to get the list of transactions in JSON format.

Transactions Clustering. The second phase of the methodology performs clustering activities on the retrieved blockchain transactions. The main challenge at this point is connected to the selection of the clustering criteria. Smart contracts do not integrate the notion of traces; each transaction represents something performed without any links with other transactions. To implement a significant correlation and to generate a set of traces we need to cluster transactions according to some logic.

In our approach, we solve the problem of creating traces grouping together transactions coming from the same sender. This means that a new trace is generated for each user. This trace contains the list of transactions exchanged between the user, and the contract ordered according to their timestamps. The main drawback of this clustering methodology refers to the possibility of correlating sequential operations just because they are executed one after the other, even if in reality they do not have any causal dependency.

From a technical perspective, in the clustering step we take the JSON file produced in the previous phase, and we generate an event log, which is then stored in a file in XES format [28].

Discovery and Evaluation. The third phase of the methodology performs a process mining discovery activity. In this work we have used three different discovery algorithms: the Heuristics Miner [24], the Inductive Miner [21, 22], and the Split Miner [10]. We consider the Inductive Miner and the Split Miner thanks to its performance characteristics [9], and we also include the Heuristic Miner because it generally performs better with respect to quality criteria [12]. The used algorithms generate three different models that are compared using quality measures like fitness, precision and generalization [1]. Running three algorithms the auditor has the possibility to consider a wider spectrum of possible behaviours. Indeed the three resulting models collectively represent different and possible working scenarios. From a technical perspective, we take in input the log in the XES format and using the Apromore process mining toolFootnote 1 we discover the behavioural model emerging from the recorded transactions using Split Miner, while we use ProMFootnote 2 in the case of Heuristic and Inducting. Finally, ProM was used to compute quality measure.

Conformance. The last phase of the methodology analyses the models generated by the discovery phase, to find discrepancies concerning what is expected by the specified requirements. This is the most important phase of the auditing activity; furthermore, this analysis phase will lead to a successive contract re-engineering in case of unsatisfactory results. In the presented approach this activity does not include automatic support, yet. Nevertheless, it is clear that model checking techniques [15, 16], to check interesting temporal properties, seem to be a perfect fit for such an activity. Clearly, in such a case it will be necessary to equip the auditor with user-friendly tools to define relevant properties out of the requirements list.

4 Process Mining in Blockchain: The RotoHive Case

In this section, we show the methodology in practice considering a real case study such as RotoHiveFootnote 3. More details on data used in the experiment as well as resulting model are available on-lineFootnote 4.

4.1 RotoHive Overview

RotoHive is a fantasy sport running weekly tournaments. Every Tuesday a new tournament starts, and users are asked to rank National Football League (NFL) players by role based on projected performance for the week. RotoHive user submissions are then rated against real player performances. At the end of Monday night football matches, top performing RotoHive users are paid according to the rank of the selected players. This process repeats on Tuesday morning when the next weekly tournament begins. Roto can then be staked to user submissions to win a portion of a separate weekly Ethereum prize pool.

4.2 ABC Methodology in Practice

Considering the smart contract transactions retrieve activity, the RotoHive application was selected since it contains more than 3000 transactionsFootnote 5 distributed over 4 months (from August to December 2018), and it includes several users. This characteristics make it a quite challenging scenario for experimenting with the proposed approach.

The transactions were clustered and formatted in a XES file considering the users interacting with the contract. Each trace is identified by a tag containing the address of the user, and a list of events performed by the user on the contract. Each event contains the name of the method called if it is completed and the corresponding timestamp. Listing 1.1 shows an excerpt of the XES file representing a trace performed by a user for the RotoHive stake method resulting in a transaction.

figure a

The process discovery resulted in three models generated applying the Split Miner, Inductive Miner and the Heuristic Miner. The processes are depicted in Figs. 3, 4 and 5 respectively.

The three discovered models contain the same number of tasks, with two principal dominant behaviours, one representing the users playing the game and the other covering the behaviour of the administrator. The path representing the users contains just one task stake closed in a loop, indicating that a player can perform multiple stakes in each tournament. The path representing the administrator is composed of two initial tasks constructor (i.e., 0x60806040) and settokencontract indicating the first initialization of the game followed by the tasks representing the tournaments. In the tournament we have createtournament for the creation of a new tournament followed by the operation performed once the tournament is completed releaseroto, rewardroto, destroyroto, and closetournament.

Analysing the models we can state that the behaviour of each player is rather simple, and all the models reproduce a similar structure. Different is the case for the administrator part where the three models differ significantly for the tasks executed at the end of each tournament: destroyroto, releaseroto, rewardroto, and closetournament. The Split Miner, in Fig. 3, admits a first occurrence of the rewardroto task, and then the other tasks. In particular, destroyroto when executed occurs always after releaseroto. The Inductive Miner, in Fig. 4, admits to create the tournament and then two paths are possible. It can complete or execute rewardroto followed by two paths in parallel. The first includes the possibility to eventually execute several times releaseroto, while the other can execute destroyroto follow by closetournament tasks enclosed in a loop structure. This two paths are successively synchronised, and then the process ends. The Heuristic Miner, in Fig. 5, instead admits createturnament that is always followed by rewardroto. Than the three tasks releaseroto, destroyroto and closeturnament can be execute in sequence. Eventually releaseroto and destroyroto can be skipped.

Fig. 3.
figure 3

RotoHive Split Miner.

Fig. 4.
figure 4

RotoHive Inductive Miner.

Fig. 5.
figure 5

RotoHive Heuristic Miner.

To evaluate the quality of the mining algorithms applied to the RotoHive case study we considered fitness, precision, and generalisation in Table 1. Generally, we can observe that both fitness and generalization values are quite good for all 3 algorithms, while precision is more variable, and in general observed values are lower. Notably, having models with a value of fitness equal to one guarantees that all the traces in the log can be reproduced by it.

All three models represent quite well the application domain, so it is challenging to choose the best process mining algorithm to be used for audit applied to the blockchain domain. At this point, the evaluation is up to the auditor, who must consider all the models and their quality. If the auditors are interested in a model reflecting better the whole log our best solution is the Split Miner or Inductive Miner with the highest value of fitness, but with the drawback of low precision. If the auditors are more interested in highest value of precision they should use the Heuristic Miner loosing a bit the quality of the other parameters that are slightly below the others but not significantly.

4.3 Discussion

The used approach has lead to good results with sound models discovered, and pretty good quality parameters measured. The usage of more than one mining algorithm seems somehow desirable for auditing purpose. Indeed, the objective of the auditors is to identify any potential risk, and to make a careful assessment on what happened, but also on what could potentially happen. In this sense, we are working to make the presentation of potential risks easier.

Potentially the methodology used could also be useful to understand how a user interacts with a system, and to compare different behaviours with the expected one defined by requirements. Going even further in this direction we could understand the characteristics of certain users by analysing how they interacted on different systems. The designer after an accurate evaluation of the divergences can also decide to review the contract to force or avoid specific behaviour.

Table 1. RotoHive quality measures.

5 Related Work

In this section, we refer to the research available in the literature that inspired our work. We first discuss other papers proposing process mining techniques for audit, then we discuss solutions to enable secure and trustworthy auditing of logs.

Much effort has been devoted to the application of process mining techniques to auditing scenarios. Here in the following, we refer to those contributions supporting, as in the case of our approach, a semi-automatic strategy.

Dogana and Curbera [17] present a semi-automatic auditing approach in cases where there is no process execution engine. Ghose and Koliadis [18] present a broad auditing framework suitable to check the compliance status of a business process against given regulations. Zerbino et al. [34] propose a novel methodology for auditing information systems; they also discuss an application on the information exchange among port stakeholders. The authors provide operational guidance bridging the gaps of the current approaches for off-line information system auditing. Similarly to our approach the proposed methodology promotes the process reengineering, and for revising the boundaries in the process flow of the port community system. Accorsi and Stocker [5] use conformance checking for security auditing. They also discuss a case study employing a bank scenario and a real-life loan application process. Conformance checking is also introduced by Ramezani et al. [29]. In this paper the check considers the control flow and the normative requirements. Mayers et al. [25] use process mining and conformance checking analysis techniques to identify anomalous behaviour and cyber-attacks using industrial control systems data logs. Moreover, Arya et al. [7] use event logs collected in real time to run conformance on the operational process. The obtained results are also compared with simulated event logs to perform more accurate conformance checking. Different from our work none of the considered papers take into account blockchain transactions as a log for auditing those applications based on blockchain.

Finally, we considered solutions to enable secure and trustworthy auditing of logs. Among the others, we refer to Ahmad et al. [8], and Sutton and Samavi [31, 32] discussing the possibility of the blockchain to enable privacy auditing. In particular, Ahmad et al. [8] present a scalable and tamper-proof system. Sutton and Samavi [32] provide a mechanism for log integrity and authenticity verification, by means of compliance checking queries. These papers underline the importance of performing the auditing in blockchain-related scenarios even though they do not propose any possible solutions for such an activity.

6 Conclusions and Future Work

The increasing adoption of blockchain technology disrupts traditional businesses, and it introduces a novel way to sign and run contracts. The combined use of blockchain technologies and process mining presents novel challenges and opportunities for auditing activities that can rely on trustworthy logs.

In this paper, we present the results we obtained in applying process mining for auditing Ethereum applications. In particular, we consider RotoHive’s generated transactions. This is an on-line fantasy sport that runs weekly tournaments. The auditing activity has been performed using the discovered models and considering fitness, precision and generalization.

In the future, we plan to continue our programme to support auditors of blockchain-based applications effectively. Therefore we aim at enlarging the study running a more extensive validation, and considering a broad set of different blockchain-based applications that can be optimized via cost/reward method [6]. We also intend to deepen our research on the possible selection of one or more mining algorithm, and their suitability and checking its performance and effectiveness. Moreover, we would evolve the methodology with a prototype suitable to run auditing activity in a user-friendly manner. Finally, we would like to explore other analysis techniques for auditing, i.e. monitoring [11] and conformance [14].