Keywords

1 Introduction

Visualization is an important area of research investigating how to increase the capability of people to capture information from data. This area has been investigated for a long time resulting in many visual artifacts and methods that are adapted to be used in many research areas, which makes the data visualization area multidisciplinary. The development of information systems changed the scale of recordable data, and the emergence of big data in recent years has increased the volume of data rapidly. The large volume of data is considered as a major challenge in data visualization because it results in the abundance of visual elements in a visual representation.

A large number of elements in a visual representation can hinder the capability of people to understand it, so representing information in different levels of details is considered as an effective approach to make the visual artefacts more useful. Different techniques support data visualization at different levels of abstraction. Some techniques aggregate data and present information in a higher level of abstraction, so they enable people to discover general trends based on data. Therefore, people can identify and select a slice of data that is relevant for their analysis. This capability enables people to focus on a particular set of data and employ relevant techniques to discover more information, which is known as slicing and dicing operations in data analysis area.

A chord diagram is a sort of visual representation that has been recently introduced to increase the level of abstraction in visualizing relations among nodes in networks. It is widely used and adapted in many disciplines to investigate and analyse patterns in different sort of networks including social networks, biological networks etc [7, 10, 12, 18, 26]. BPM supports managing business processes using different artefacts, and the interaction among people while enacting business processes play an important role in managing processes. Thus, discovering social networks from enacted processes data can facilitate managing business processes to be more efficient and effective. Our previous study also shows potential benefits in employing and adopting this technique in Business Process Management (BPM) area, which can increase the level of abstraction in visualizing social networks [9].

Despite different works that investigate how this technique should be applied, configure and adapted in different research areas, it is not clear how this technique can be employed in Business Process Management (BPM). Thus, this paper extends the social network visualization approaches in BPM area using chord diagram. It defines the formal definitions of elements and elaborates on how the visual representation can be compiled from them. The visualization is supported by implementing a plug-in in ProM - an open source framework for process mining. The plug-in is used to demonstrate social networks discovered from real log files in compare with those discovered by current visualization techniques. The result shows that this technique can complement previous artefacts to discover more social network patterns in BPM area.

The remainder of this paper is organized as follows. Section 2 introduces social network analysis in BPM area. Section 3 elaborates on how the chord diagram can be used to visualize social networks in BPM area. Section 4 introduces the implemented artifact that supports visualization of social networks in BPM using chord diagram. Section 5 demonstrates and discusses cases in which both this artifact and traditional techniques are used to visualize the social network. Section 6 discusses some related work. Finally, Sect. 7 concludes the paper and introduces future works.

2 Background

In this section, we explain terms and concepts of social network analysis in BPM area briefly.

Process models play an important role based on which people can design, understand, discuss, analyze, configure, enact, run and adjust business processes. Thus, different business process modelling technique are developed to support designing process models, e.g. Business Process Model and Notation (BPMN) [14], Yet Another Workflow Language (YAWL) [1], Petri nets [16], Unified Modeling Language (UML) [20], etc.

There are different perspectives based on which a business process can be defined to function effectively and efficiently like control-flow, data, resource, etc. The control-flow perspective is the dominant one in BPM area, and it focuses on the definition of the order of activities that should be performed when enacting a business process. The resource perspective focuses on definition of people and resources who are involved in a business process. The data perspective defines the information aspects of a business process. These perspectives are not entirely separated, and their combination specifies how a business process should be enacted. The model reflecting the combination of these perspectives that defines how a business process should function is called business process model.

Fig. 1.
figure 1

The handover of works in the selling process

The left side of Fig. 1 shows a fictitious process model using BPMN notation. This process aims to support the selling of customized products. In BPMN, the resource perspective is demonstrated through segmentation of a process model based on resources. This segmentation is done through artifacts called pools and swim-lanes. For the sake of simplicity, we demonstrate this perspective only through annotating each activity by the name of the role that is responsible for executing it.

A process model can be configured/implemented and enacted in different ways. The business process can be supported by Business Process Management Systems, or it can be supported by various software systems that their coordination supports the enactment of the process model. An instance of a business process model is called a case. Regardless of how a process is implemented, the enactment result can produce a log file recording how the process participants execute activities in different cases. The log file can be used to investigate different aspects of a business process.

Process Mining [21] is the area of research that aims to investigate insights from the enactment result of business processes. There are different techniques which are defined in this area to support such investigation, known as process discovery, process conformance, and process enhancement [21]. The process discovery techniques aim to extract insight about different aspects of business processes from the log files. The social interactions among process participants is also one of the important aspects that can be discovered from the process log in the presence of resource information.

There are different social network discovery metrics that are defined in process mining area, i.e. Handover of work, Subcontracting, Working together, Similar tasks and Reassignment [22]. These metrics are defined based on relations that can be identified among activities based on the order of events in the log file. For example, activity A can have a causal relation with activity B (shown by \(A\rightarrow B\)) iff for all events in the log, the events of activity A has followed at least for one case by the event of activity B, but the event of activity B has never followed directly by the event of activity A. For example, Manufacture has causal relation with Check the final product, but Deliver and Send invoice has not such a relation (since the are parallel). The casual relation can be considered as a direct succession relation. There is also indirect succession relation. For example, if \(A\rightarrow B\rightarrow C\), we can consider A has an indirect succession relation with C. There are different metrics that are defined to discover different social networks in business processes [22]:

  • Handover of work metric enables us to identify the resources who passed the work to another resource in general. There are different variations of the definition of the handover of work metric [22]. For example, this metric can be defined with/without considering the causal relations in a process model. The right side of Fig. 1 shows the social network graph discovered based on a variation of this metric based on the causal relation and direct succession. The nodes in this graph represent the roles of people, and the edges represent handover of works. This is an unweighted graph that does not take into account the amount of interaction between people. The weighted graph can represent the number of the handover of works through the thickness of the edges.

  • A subcontractor is defined as a person who performs a work based on a contract for another party. In process mining, the discovery of potential subcontracting patterns is identified if a resource handed over work to another one and receive it back directly. This metric also has different variants.

  • Working together metrics help us to identify the resources which used to work together. It ignores causal dependencies and focuses on resources who work together for the same case.

  • Similar tasks metric enables identifying people who used to work more on specific tasks together. It assumes that those people have stronger relations in compare to others. This metric also ignores the causal relationships among event, but it focuses on activities instead of cases. The metric for example can support the discovery of roles for a process model since activities in a business process can be executed by people with similar roles.

  • Reassignment metric investigates if the work has been reassigned among different resources in a process instance for an activity, so we can improve the process by avoiding such extra reassignments. By presence of a log file with the status of instances of activities, we can discover those interactions in the social network through reassignment metric. This metric can also be used to identify the potential power relations among resources, e.g. a boss may reassign the work to his or her employees.

There are some other works like discovering of handover of roles that aims to discover more insight from organizational perspective from event logs [3].

Fig. 2.
figure 2

Social network models mined from a real log file

The introduced metrics can result in models that can be represented by the usual graph visualization techniques using nodes and arcs. The representation of these model in process mining is supported by social networks plugins in ProM [25]. The size of the graph increases when the number of nodes and interactions increases, so it can be very difficult and inefficient to use such visualization technique in real applications.

Figure 2 shows two graphs that represent the handover of works among participants in a business process mined from a real log file. As it can be seen, it is very difficult to discover the interactions among participants based on these visualizations. The next section introduces our technique that can facilitate identifying some aspects of social networks for these cases.

3 Approach

This section introduces our approach to visualize dense social networks using chord diagram. Here, we introduce basic definitions which are used to explain the approach.

3.1 Definitions

Definition 1

(Social Network Graphs). A social network graph is a tuple \(G=(N,E,w)\), where:

  • N is the set of nodes,

  • \(E \subseteq N \times N\) is the set of edges connecting nodes together, and

  • \(w : E \rightarrow \mathbb {R}^+\) is a function that assigns a non-negative and non-zero real number to each edge.

The weight of an edge \(e\in E\) can be retrieved by w(e). In addition, we define two other operations retrieving the incoming and outgoing edges of a node:

  • The set of incoming edges of a node \(n\in N\) can be retrieved by \(\bullet n=\{\forall (x,y)\in E| y=n\}\).

  • The set of outgoing edges of a node \(n\in N\) can be retrieved by \(n\bullet =\{\forall (x,y)\in E| x=n\}\).

Fig. 3.
figure 3

An example graph

For example, the graph represented in Fig. 3 can be defined as:

\((N=\{a,b,c\},E=\{(a,a), (a,b), (b,a), (c,c), (c,a)\}, w=\{((a,a),2), ((a,b),3), ((b,a),2), ((c,c),2), ((c,a),1)\})\).

In this graph, we exemplify the following operations to clarify definitions:

  • \(w((a,a))=2\) that retrieves the weight of the edge that connects the nore a to itself, i.e. (aa),

  • \(\bullet a = \{(a,a), (b,a), (c,a)\}\) that retrieves the incoming edges to node a, and

  • \(a\bullet =\{(a,a), (a,b)\}\) that retrieves the outgoing edges from node a.

A chord diagram consists of Arcs and Chords.

  • An arc is a segment of the circumference of the circle that is mapped to a node in a social network graph. Figure 4(a) shows an example of arcs in a chord diagram that represents nodes in given example.

  • A chord is an area of the circle that connects two arcs together. It is possible that a chord connects an arc to itself. Figure 4(b) shows an example of a chord that connects two arcs (a and b) together. The details for computation are explained later.

Figure 4 (c) shows a complete version of the chord diagram that represents the given graph. We define how the elements in this diagram is computed as follow.

Definition 2

(Chord Diagrams). A chord diagram is a tuple \((G=(N,E,w),r,\phi ,\chi )\), where

  • G is a social network graph,

  • r is the radius of the graph,

  • \(\phi : N \rightarrow {(N,\mathbb {R}^+)}\) is a function that returns the set of arcs of the chord diagram based on graph \(G=(N,E,w)\), where:

    $$\phi (n) = (n,\dfrac{2\pi r\sum _{e\in n\bullet }w(e)}{\sum _{e\in E}w(e)})$$
  • \(\chi : (n\in N, m\in N) \rightarrow {(N,N,\mathbb {R}^+)}\) is a function that returns the set of chords of the chord diagram for each pairs of nodes from graph \(G=(N,E,w)\), where:

    $$\chi (n, m) = (n,m,\dfrac{2\pi r w(n,m)}{\sum _{e \in E}w(e)})$$
Fig. 4.
figure 4

A Chord diagram and its elements

It should be highlighted that for every two nodes a and b, it is important to compute both \(\chi (a,b)\) and \(\chi (b,a)\). We explain this definition through the given graph as an example. The chord diagram for our graph is a tuple including a graph, a variable r that define the radius of the circle, and two functions that compute the length of arcs and chords, i.e. \(\phi \) and \(\chi \) respectively.

$$\begin{aligned} \begin{array}{c} \phi (a) = (a,\dfrac{2\pi r\sum _{e\in a\bullet }w(e)}{\sum _{e\in E}w(e)}) = (a,\dfrac{2\pi r\sum _{e\in \{(a,a), (a,b)\}}w(e)}{\sum _{e\in \{(a,a), (a,b), (b,a), (c,c), (c,a)\}}w(e)}) = \\ \\ (a,\dfrac{2\pi r(2+3)}{2+3+2+2+1}) = (a,\pi r) \end{array} \end{aligned}$$

This means that the corresponding arc of the node a is \((a,\pi r)\), which is half of the circumference of the graph. The rest of the arc in Fig. 4 (c) is computed accordingly.

As mentioned earlier, a chord connects two arcs together. The \(\chi \) function computes the length of each side of a chord. For example, \(\chi (a,b)\) and \(\chi (b,a)\) computes the length of the chord that connects nodes a and b in each side respectively.

$$\begin{aligned} \begin{array}{c} \chi (a,b) = \dfrac{2\pi r \times w(a,b)}{\sum _{e\in E}w(e)} = \dfrac{2\pi r \times 3}{\sum _{e\in \{(a,a), (a,b), (b,a), (c,c), (c,a)\}}w(e)} = \\ \\ \dfrac{2\pi r \times 3}{2+3+2+2+1} = \dfrac{2\pi r \times 3}{10} = 0.6 \pi r \end{array} \end{aligned}$$

The length of the other side of the chord (\(\chi (b,a)\)) and the rest of the chords can also be computed accordingly. It should be mentioned that \(\chi (a,b)\) and \(\chi (b,a)\) can have different values because the first one is calculated based on the total outgoing weight of node a to b while the second one is based on the total outgoing weight of node b to a.

The calculation of \(\chi \) for all nodes enables visualization of the chord diagram, shown in Fig. 4(c). As it can be seen, the visualization of chords overlaps each other. Therefore, different configurations can be applied to enhance the capability of people to understand this diagram. Some possible configurations are explained in the next section.

3.2 Visualization Properties

The effective visualization of dense networks does not only depend on quantitative aspects but also qualitative [27]. Different qualitative aspects can enhance the usefulness of a visualization artifact, which are used in different approaches. Wills G.J. enumerates some of these aspects like the “ability to show or hide parts of a graph”, “color[ing] nodes and edges”, “selective labeling of nodes under user control” and supporting user interactions e.g. through mouse [27]. In this section, we explain how some of these aspects can be considered when illustrating a social network model using a chord diagram.

Interactivity. The nodes in social networks are represented as arcs in chord diagrams. The weight of relations of a node to other nodes is represented by the length of the arc. Thus, arcs and their length provide support to compare the weight of relations among different nodes in a social network. However, it is difficult for people to investigate these relations when there is a lot of chords in a social network. Therefore, the relations among nodes can be a good subject to be shown/hid to/from users based on the user interaction, e.g. when the mouse is moved over an arc. Therefore, our artifact only shows the relations of a node when a user moves the mouse over the corresponding arc.

Selective Hints. The name of arcs in the diagram can be specified explicitly. However, it is not a good idea to annotate the diagram with detailed information. Thus, our artifact shows the labels of the chords as hints of a mouse when a user moves the mouse over a chord. In this way, the user can receive the information about the particular chord that interests him/her.

Colors. Colors can play a significant role in visualization. In this artifact, we considered two design choices in regards to coloring the diagram. The first decision is to color arcs differently to facilitate their recognition by users. The second decision is to color a chord as the same color of the arc in which the chord has wider length. In the case of the equal length, we color the chord white. In this way, more aspects of relations among nodes can be visualized without making the diagram unreadable.

4 Implementation

This section specifies the architecture and the functionalities of the plug-in that we implemented to support visualization of social networks using chord diagram for business processes.

4.1 Architecture

Prom framework [25] has been selected as the framework to implement the artifact that can support visualization of the chord diagram. This framework is chosen because it is open-source, and there is social network analysis plug-ins that are already implemented there. Figure 5 shows the adapted version of the ProM architecture that explains how our plug-in supports visualization of social networks using chord diagram.

Fig. 5.
figure 5

The Architecture, adapted from [25]

The ProM framework has a log filter component that supports importing log files with a specific format to the framework. Different social network plug-ins are implemented in this framework that produces various social network models based on imported log file. Despite the different semantics behind these plug-ins, the models have the same structure, i.e. it is a weighted network graph. The social network model can be visualized through the visualization engine.

Although the ProM engine supports different sort of interactions, it does not provide the intractability feature that we require to fulfil the mentioned qualitative criteria in the previous section. Therefore, We define our plug-in as an export plug-in that produces a chord diagram based on HTML and D3 library [2]. D3 is a JavaScript based library that supports the development of graphical representation on the web using HTML and JavaScript. In this way, we support visualization of all social network analysis algorithm with the chord diagram because all of them produces the same social analysis model from the log file.

5 Demonstration and Discussion

We conducted a preliminary evaluation of our artifacts applicability using a real log file. We compare the visualization result of our approach with traditional ones. We selected real log files to investigate and analyze the results, i.e. logs from Fifth International Business Process Intelligence Challenge (BPIC15). The logs record all building permit applications for around four years in five Dutch municipalities. The processes in these municipalities are very similar, yet they have their own differences due to variations that are required to be applied in each municipality.

In this paper, we do not aim to evaluate the usefulness of the visualization result, and we only focus on showing the potential of our artifact that can produce visualizations that reveal more aspects of social networks. Thus, we present the visualization results of our approach and traditional ones. In addition, we compare them based on information that we can infer based on the visualization result.

We consider four cases to discover social networks for working together, the handover of works, subcontracting and similar task metrics. The reassignment metrics is not considered since the log files do not contain enough information to discover it.

5.1 Case 1: Working Together

We used the first municipality log file [23] to investigate and compare the visualization results of our approach with previous ones. We used the “Mine for a Working-Together Social Network” plug-in to discover the social network of resources who works together. Figure 2(a) shows the discovered social network model using the default layout.

Fig. 6.
figure 6

Chord diagram for the working together metric (Color figure online)

There are also other layouts that sometimes organizes nodes and arcs in a better way, but we could not find a more meaningful layout for this model. Figure 2(b) shows another layout of the same model, i.e. circle layout. This layout organizes nodes on a circle and draws interactions among them. In these two layouts, we can identify two nodes that have one incoming and one outgoing arc. These networks are dense, and it is very difficult to get more useful insights from them.

Figure 6 shows the chord diagram that is generated by our plug-in. Figure 6(a) shows the complete version of the diagram. Figure 6(b) shows only a part of the diagram that is filtered based on the user interaction. The diagram is filtered based on the node that the mouse is moved over it. We list our findings as follow.

  • Resource involvement: It is possible to compare how much each resource is contributed to the semantics of the social network based on the length of its arc, e.g. it can easily be recognized that node “1898401” has a higher involvement.

  • Association direction: It is possible to identify the dominant direction in a relation between two nodes. This feature is supported by colouring the arcs, e.g. the relations from the selected node (“1898401”) to other nodes have the same colour as the selected node, which means that the relation from the selected node to others are stronger than the relations from others to this node (see Fig. 6(b)).

  • Association contributions: It is possible to identify and compare the degree of contribution of each node in a relation, e.g. the contribution of the selected node (“1898401”) in its relation with the node “2670601” is higher that other nodes. The contributions can be compared using the length of the chord at the end points, i.e. involved arcs.

  • Special nodes: The association direction and contributions support identification of special nodes that e.g. only initiate the work (e.g. nodes “1898401” and “6”) or is related to only one node (“3175153”).

5.2 Case 2: Handover of Works

We also used the first municipality log file [23] to investigate handover of works using the “Mine for a Handover-of-Work Social Network” plug-in. Figure 7 shows the visualization of discovered social network model using the chord diagram and the traditional approach.

Fig. 7.
figure 7

Chord diagram and corresponding graph for the handover of works metric (Color figure online)

The introduced patterns in previous case can also be identified here, i.e. Resource involvement, Association direction, Association contributions and Special nodes. In this case, there are some relations that can be identified in traditional visualization technique easier. For example, there is only one incoming arc to the node “3175153” from node “560925”. The traditional visualization technique shows this relation more clearly; while this relation is harder to identified through the chord diagram. The reason is that the chord diagram shows the relationships on a more abstract level, and the relations of resources with very tiny contributions are difficult to discover.

  • Resource abstraction: Chord diagram facilitates identifications of resources with higher contribution degree in compare to those who has very small contributions to others.

5.3 Case 3: Subcontracting

We used the fourth municipality log file [24] to investigate and compare the visualization results of our approach with previous ones. We used the “Mine for a Subcontracting Social Network” plug-in to discover the social network of resources who may subcontract works to others. Figure 8 shows the visualization of discovered social network model using the chord diagram and the traditional approach.

Fig. 8.
figure 8

Chord diagram and corresponding graph for the subcontracting metric (Color figure online)

As it can be seen in Fig. 8(b), there are some nodes in the traditional visualization technique which are completely isolated from others. These nodes do not have any relations to or from other nodes. These nodes are not demonstrated in the chord diagram as can be seen in Fig. 8(a). However, it can show other perspectives that we mentioned previously.

  • Isolation limit: Chord diagram does not show nodes which do not have any relation to others.

5.4 Case 4: Similar Task

We used the fourth municipality log file [24] to investigate and compare the visualization results of our approach with previous ones. We used the “Mine for a Similar-Task Social Network” plug-in to discover the social network of resources who subcontract works to others. For this case, we filtered the log file based on “date decision for inspection” event. Figure 9 shows the visualization of discovered social network model using the chord diagram and the traditional approach.

In this metrics, the weight of relations among every two nodes is the same. Thus, all chords are white (see Fig. 9(a)). As it can be seen in Fig. 9(b), all nodes are related to each other. The traditional approach does not reveal more information; while the chord diagram enables comparison of each chord to others. For example, the relation between nodes “560821” and “560752” is stronger than nodes “560821” and “1550894”.

  • Association strength: Chord diagram enables comparison of relations between nodes.

Fig. 9.
figure 9

Chord diagram and corresponding graph for the similar task metric (Color figure online)

6 Related Work

The identification and analysis of similarities and differences in a large amount of data have been addressed by developing a tool called Circos [11]. This tool introduces a diagram, which is later called chord diagram. The diagram is used to display large volumes of genomic data, and it is then inspired the data visualization paradigm to increase the level of abstraction in the visualization of a large number of data. This diagram has been applied in other areas like finance to analyze trade data with monetary values [13]. D3 is the framework that supports the development of such diagram [2]. This diagram has been used widely in different areas to support identification and analysis of a large amount of data.

Henneman S. investigates several approaches representing information-rich visualization of dense geographical networks. He also describes this diagram as “very impressive due to its high information density” to visualize dense networks [8]. As another example of the application of this diagram in another area, we can refer to the visualization of mapped security standards for analysis and use optimisation [19]. The authors mention that “A big number of links between nodes are grouped to increase the abstraction level. However more detailed information can be extracted including interactive explanations, highlights, etc”. They also mention that the lack of standard structure for displaying more information can be considered as a disadvantage of this diagram, which is addressed through the qualitative aspects in our paper. This diagram is also used to analyze multidimensional astronomical datasets to represent the correlations among the galaxy properties [5]. There are many other applications of this diagram in different areas, e.g. [4, 17].

In BPM area, we only found one application of using this diagram so far. Paszkiewicz Z. et al. utilize this diagram to investigate a hypothesis based on the insight they discovered from log files [15]. This is very interesting work that shows the usefulness of the application of this diagram - as also noted by reviewers [6]. The diagram is not used to investigate the relations based on the semantics of business process models and metrics that are introduced to investigate social networks in BPM area.

It would also be beneficial to study systematically how dense social network data can be visualized using different alternative visualization approaches.

7 Conclusion

This paper introduced and adapted a new technique to visualize social networks in process mining. The new approach is based on a visualization technique, called Chord diagram. In this diagram, nodes are represented by arcs, and chords represent interactions among nodes. The new technique is defined formally, and a plug-in that supports the visualization of social networks in BPM area is developed in ProM. The plug-in is used to visualize social networks in real scenarios using both chord diagram and traditional graph visualization technique.

The result shows that the new artifact can support investigation of new insights from the social network, such as resource involvement, association direction, association contributions, special nodes, and resource abstraction. It also reveals that this approach cannot show nodes which are isolated from others.

The approach can be further evaluated in future. The evaluation can be performed based on the investigation of usability of the visualization technique. It can also be evaluated in an organization in which a researcher has access to both stakeholders to interview and log files of processes that contains resource information.