Keywords

1 Introduction

The widespread problem of information silos in organizations has resulted in knowledge workers having to navigate multiple information artefacts to complete their tasks. Such artefacts are often presented in different modalities and additionally may present overlapping, redundant or even conflicting information. Two commonly used artefacts are business process models and business rule repositories. A knowledge worker’s understanding of a task will be based on both the business process model and any related business rules, which may or may not be part of the model [31]. The understanding extracted from graphical process models is focused on the temporal or logical relationships between business activities, whereas the understanding of business rules may be embedded in constraints and policies to control the behavior of the process and its activities [36]. When these two artefacts are not integrated, which is often the case, the risk of incomplete understanding is increased, resulting in compromised efficiency and potential compliance breeches.

To overcome such problems, prior studies have advocated integration of business rules into business process model [e.g. 15, 31, 32], and various forms of integration have been proposed, namely diagrammatic integration, integration through text annotation, and linked rules [3, 31]. Further studies have also outlined when it may not be desirable to represent related business rules within process models [e.g. 9, 32]. Despite these previous works, there is limited knowledge on how knowledge workers make sense of the various representations and what effect these approaches have on the efficacy of accomplishing a task, including quality of the task performance as well as time and effort efficiency.

In this paper, we present the outcomes of an exploratory study undertaken to investigate the behavior of workers in tasks that require dual artefacts, namely business process models and business rules. We have approached the design of the study through a sensemaking lens. Sensemaking is defined as “the process of searching for a representation and encoding data in that representation to answer task-specific questions” [25]. Although extant literature on sensemaking [33] was primarily focused on the collective construction of meaning, later studies [14] expanded the role of sensemaking to individual cognitive processes, typically separated into two distinct phases, viz. information foraging and task specific information processing. In the context of business process and business rule integration, the current body of knowledge does not provide an adequate explanation of sensemaking behavior as knowledge workers navigate the two artefacts with various forms of representation integration, namely text, diagrammatic and linked integration. To explore this, we consider foundational sensemaking constructs of attention (search and encoding) and memory (performance on task-specific questions), and use a number of behavioral and performance measures to operationalize these constructs through the use of eye-tracking devices in a controlled experiment. Our objective is to unpack the mechanisms by which sensemaking behavior occurs when the form of integrated representation and task complexity changes. The results of our analysis show specific behaviors with respect to the three representations for integrated business process and rule modeling and provide insights to inform modeling practice with respect to representation approach and task complexity.

2 Related Work

2.1 Sensemaking

Sensemaking has been an active area of study from different perspectives, including Human Computer Interaction (HCI) [e.g. 25], Cognitive Systems [e.g. 14], Organizational Communication [e.g. 33] and Library and Information Science [e.g. 4, 17]. These studies have contributed to the understanding of sensemaking behavior in the context of information search, learning of new domains, problem solving, situation awareness, and participation in social exchange [14, 23, 26]. A number of models capture sensemaking as multiple loops. For example, the Representation Construction Model [22] has two major loops of sensemaking. The first is the information foraging loop, which includes seeking, filtering, reading and extracting information processes, and the second is the sensemaking loop which includes iterative development of representational schemas to provide a basis for understanding and performance.

Sensemaking is also classified across collective and individual perspectives. In collective settings, the focus is on collective construction of meaning and various studies have analyzed it through organizational [33], strategic [16], entrepreneurial [7] and team structures perspective [28]. In individual settings, which is more relevant for our work, the focus is on cognitive mechanisms [14, 35] that underpin individual sensemaking. Cognitive constructs of attention and memory have a natural and strong affinity to the two phases in sensemaking models. A large body of knowledge on cognitive load theory [2, 19, 27] provides proven mechanisms through which these constructs can be operationalized. For example, measurement of attention and search behavior has been undertaken through eye tracking devices, which can capture data on visual attention, scans and fixations [6], which in turn can be used for various behavioral measurements such as cognitive load, visual association, visual cognition efficiency and intensity [24]. Similarly measurement of memory is often undertaken through performance-based measures, such as task completion time, answer correctness, and task complexity [34].

2.2 Business Process and Rule Integration

Our study considers the specific context of business process and business rule modeling – two complementary approaches for modeling business activities, which have multiple integration methods [15] to improve their individual representational capacity. In summary, the integration methods can be categorized into three approaches with distinct format and construction, namely: text annotation, diagrammatic integration and link integration [3], as shown in Fig. 1. Text annotation and link integration both use a textual expression to describe the business rules and connect them with the corresponding section of the process model. With link integration, visual links can explicitly connect corresponding rules with the relevant process section. Diagrammatic integration relies on graphical process model construction, such as, sequence flows and gateways, to represent business rules in the process model.

Fig. 1.
figure 1

Business rules integration approaches [31]

Each of these methods has strengths and weaknesses, as summarized in [3, 31], and thus a potential impact on a knowledge worker’s understanding of a process.

2.3 Process Model Understanding

Prior research has shown that a variety of factors can affect the understanding of a process, including both process model factors, as well as human factors [29]. Cognitive load [27], and visual cognition [5] have been used as measures of process model understanding. Eye activity is one of the physiological variables that can be used as a technique to reflect the changes in cognition [5, 19]. Through the use of eye tracking technology, one can directly collect eye movement data and capture objective metrics such as pupillary response and fixation durations to indicate the correlation with cognitive function [2], and use indicators such as fixation in each area of interest (AOI), to identify the exact area that draws the attention of the participant. Although there is a long history on the use of eye tracking technology in medical and psychological studies [13], the use of such technology in the business process modeling context is quite recent. To exemplify a few, [21] defined the notion of Relevant Region and Scan-path to prove that Relevant Region is correlated to the answer during question comprehension. In [11], researchers used eye tracking technology to measure and assess user satisfaction in process model understanding. In [20], the use of eye tracking technology enabled researchers to identify the visual cues of coloring and layout that can improve performance in process model understanding. Recent work has also explored reading patterns in hybrid processes of DCR-HR [1], as well as on domain and code understanding tasks [12]. Our work builds on these works in the use of eye tracking data to study sense-making behaviour in dual-aretefact tasks.

3 Study Design

In this study, we use an experimental research design. In line with sensemaking foundations, we segment the experiment into two phases, namely a searching and encoding phase (we term this as the understanding phase) and a task specific information processing phase (termed the answering phase). The understanding phase commences when the participant first fixates on the experiment screen, and the answering phase commences when the participant starts to type the answer in the question area for the first time (see Fig. 2).

Fig. 2.
figure 2

Visual experiment design

The participants in our study are students at an Australian university. To be able to voluntarily participate, they were required to have foundational knowledge in conceptual modeling (such as flowcharts, UML or ER), but were not required to have any substantial knowledge of business process or rule modeling. Eligible participants were offered a $30 voucher for taking part in this research. In total, 75 students participated in this experiment, divided into three treatment groups (25 participants per group), with each experiment conducted one participant at a time. As in other similar experiments [10, 18], a sample size of 20 to 30 participants per group is considered adequate.

The experiment data consists of a pre-experiment questionnaire, eye tracking log data and task performance data. The eye tracking data was collected through a Tobii Pro TX300 eye trackerFootnote 1, which captures data on fixations, gaze, saccades, etc., with timestamps. To capture sensemaking behavior we used measurements related to fixation durations and frequencies to study the searching and encoding behavior in the understanding phase. To study the behavior related to task specific information processing behavior in the answering phase, we included task performance data in the analysis and used measurements related to AOI specific fixations, as well as transitions between AOIs.

3.1 Instruments

The experiment instruments included a tutorial, the treatments and a questionnaire. Each group of participants was first provided a BPMN tutorial and was then offered a model using one of the three different rule integration approaches. Our business process modeling language of choice was BPMN 2.0, due to its wide adoption and standing as an international standard. We encouraged each participant to ask questions during the tutorial session, to ensure their readiness for the experiment.

To ensure group balance, we used a pre-experiment questionnaire to capture participants’ prior knowledge and basic demographics, which we used to distribute participants across groups to avoid accidental homogeneity. The data of three participants whose eye movements failed to be properly recorded by Tobii eye tracker was discarded. We collected initial participant data including BPMN familiarity (1-3, from most unfamiliar to most familiar), Study major (0 and 1, Engineering and Science related majors were coded as 1, Business and Humanities related majors coded as 0), Language (0 and 1, first language is English being 1), Gender (0 and 1, female being 1, male being 0). Our results based on the Kruskal-WallisFootnote 2 test indicate that there were no significant differences between the three groups in any aspect, that is identified gaze (p = 0.694), tutorial time (p = 0.375), BPMN familiarity (p = 0.929) and study major (p = 0.933).

In the treatment, we used the three integration approaches (one per each treatment group). The scenarios of the model and rules originated from a travel booking diagram included in OMG’s BPMN 2.0 documentationFootnote 3. We ensured, through multiple revisions, that we created informationally equivalent models for all three integration approaches. Due to space limitations, the models cannot be included in the paper, but the complete experiment instruments are available for downloadFootnote 4. We ensured all confounding factors were constant, including same eye-tracking lab equipment and tutorial content. We did not set a limit on the experiment duration nor a word count limit on participants’ answers. The model was adjusted to ensure consistency of format for each of the integration approaches, while providing some diversity in terms of constructs and coverage, as summarized in Table 1, which indicates the types of constructs a participant will have to review to answer each question and the span of the question (a participant may have to navigate only a specific section of the process model to answer the question (local), or the whole process (global)). This diversity allowed us to gain further insights into the relationship between integration approaches and task complexity (reflected by the coverage of the model required to answer a particular question).

Table 1. Comparison of questions

3.2 Setting

The experiment was conducted in full screen mode and complete models were displayed without the function of zooming in or scrolling. The visibility of the experiment text and diagrams were examined carefully, with all text and diagrams being clear from a distance of 1.2 m. All experiments were conducted in the same lab with the same eye tracker.

We used multiple Areas of Interest (AOI) to capture eye movements (these were used for analysis and were invisible to participants). As shown in Fig. 2, for models featuring text annotation and diagrammatic integration, the screen was divided into 8 areas: seven different process model areas and a question area (which showed one question at a time). For models featuring link integration, there was an additional ninth area for rules, which displayed the corresponding business rules when participants clicked on each “R” icon in the model. Each question answer is related to different process areas. For local questions Q1 and Q2, the answer is related to area 6 and area 2, respectively. For Q3 (global question), the answer is related to areas 1, 5 and 7.

4 Results

4.1 Scanning and Attention

We note that, overall, the differences in fixation and visit durations of participants between the three groups is not significant (p = 0.946 and p = 0.884 respectively based on Kruskal-Wallis tests). However, by using mean fixation duration as a measure, a question wise analysis indicates that there is some fluctuation in attention between the three groups, as shown in Fig. 3.

Fig. 3.
figure 3

Mean fixation duration of each question for all participants

The mean fixation durations for Q1 are the highest among the three questions for all groups, followed by a reduced mean fixation for Q2, and a less increase in mean fixation for link group compared with text and diagrammatic groups for Q3. While mean fixation durations are limited in the insights they offer, heat maps can effectively reveal the focus of visual attention for multiple participants, especially for specific AOIs. Such maps show how participants’ gaze is distributed over the stimulus, although they cannot present the sequence of their gaze. In order to provide a snapshot in limited space, in Fig. 4 we show the respective heat maps in the understanding phase (phase 1) for participants who answered all questions correctly, i.e., best performers. The heat maps are generated on the basis of absolute fixation durations. The radius is 50px, with an adjusted scale to 0.5 s maximum (corresponding to deep red) in line with the threshold of deep processing [8]. The mean fixation duration and percentage proportion of fixation count for the area relevant to the question, other areas i.e., AOIs not relevant to the question, and the question area AOI is also shown in Fig. 4. For link representation the measurements for the rule area are also provided.

Fig. 4.
figure 4

Heat maps and AOI measures in phase 1 for best performers. Larger and clear version can be downloaded from https://www.dropbox.com/sh/zfw5uq0jyja8tt6/AADx2fm8Y9SSqAkGwTDKD7ITa?dl=0

To uncover the significant differences in scanning and attention behavior between the three representations we conducted a series of statistical tests, contrasting specifically the differences as task complexity changes, where Q1 and Q2 represent local questions and Q3 represents a more complex global question. We conducted the tests for best performers (also shown in the heat maps in Fig. 4) as well as all performers. For numerical data (mean fixation duration and proportion of transition of frequency), we used the Shapiro-Wilk testFootnote 5 to check whether it is normally distributed. For non-parametric data, we used the Kruskal-Wallis test for our analysis. If the result was significant, we used pairwise comparisons of Dunn’s testFootnote 6 to rank the groups in a pair-wise comparison. For parametric data, we used Levene’s testFootnote 7 for homogeneity of variance to check the assumption of equal variance. If the condition was met, we used one-way analysis of variance (ANOVA) to further test the difference of means. If the test result was significant, we used the Tukey’s HSD testFootnote 8 to further compare the difference in each pair of groups. For ordinal data (number of correct answers of each participant), we used Kruskal-Wallis test. If the result was significant, we used the same post hoc test to rank the groups in pair-wise comparison. We used 0.05 as the significance threshold for all tests.

For best performers, there is no significant difference in the mean fixation duration in model area across groups for local questions (p = 0.195 and p = 0.109 for Q1 and Q2 respectively; Kruskal-Wallis test). The model area includes all AOIs except the question area (for link representation it also includes the rule area). For the global Q3, compared with text and diagrammatic groups, the results indicate that best performers in the link group have the lowest mean fixation duration on the model area (p = 0.014; Kruskal-Wallis test). Further, using post-hoc pairwise comparisons of Dunn’s test, the link group shows a significantly lower mean fixation duration than the text (p = 0.036) as well as the diagrammatic group (p = 0.009), but text and diagrammatic group results do not differ significantly (p = 0.434). In other words, participants in the link group require less effort to interpret the model, even when question complexity increases.

For all participants, not just the best performers, there is significant difference in the mean fixation durations in the model area across groups for Q1 and Q3 in phase 1. On local question Q1, link group has the lowest mean fixation duration in the model area (p = 0.000; Kruskal-Wallis test). Given the result of post-hoc pairwise comparisons of Dunn’s test, the link group has a significantly lower mean fixation duration than text annotation (p = 0.000) and diagrammatic integration (p = 0.000), but text annotation and diagrammatic integration do not differ significantly (p = 0.436). However, there is no significant difference found for local question Q2 (p = 0.890). On global question Q3, there is a significant difference on mean fixation duration in the model area across groups (p = 0.010). Given the result of post-hoc pairwise comparisons of Dunn’s test, link group has a significantly lower mean fixation duration than diagrammatic group (p = 0.003), but no significant difference was found between link and text groups (p = 0.051), or between text and diagrammatic groups (p = 0.306).

From the above results we note that link representation requires less attention, as measured through mean fixation duration, indicating favorable performance from a scanning and attention perspective. For all participants this is observed in the initial question (Q1) and again as task complexity increases in the global question (Q3). For best performers, the lower level of attention required is again noted as task complexity increases, reflected through global question (Q3).

In addition to fixation behavior, the gaze paths of participants also provide insights into scanning and attention behavior, in particular how the movement across AOIs occurs in the different groups. However, the limitation of gaze plots is that it is hard to compare aggregated gaze plots across groups. We use process diagrams created with a process mining toolFootnote 9 to expose sequences of fixations and saccades. Although these diagrams for phase 1 are not included in the paper due to space limitations (see phase 2 diagrams in the next section), we provide some summary observations here. First, we noted that the transitions in Q1 have large loops across the other, relevant, and question areas, indicating that even the best performers need to reinspect areas they have already scanned as they develop an understanding of the model. In comparison, in Q2, the proportion of transition frequency was largest between the question and relevant area for all groups, possibly indicating an improvement in attention and hence a reduction in mental effort, although our data did not show a statistically significant difference. In Q3, we observe an increase in transition loops overall. In particular, we note that the transition loops are diverse in the text and link group compared with the diagrammatic group, which has the highest transition frequency between relevant area and question area. This might imply that the separation in text and link approaches (through annotations and rule area respectively) may afford some reduction in mental effort, compared to the diagrammatic integration approach. Despite these observations, no statistically significant difference was observed in the transition frequencies between the groups.

4.2 Task Specific Information Processing

The question answering phase in our study commences when the participant starts to type in the question area. This phase represents task specific information processing behavior – i.e. the sensemaking loop. To distinguish behavior between various levels of task performance (i.e. correctness of the answers provided), we categorize answers based on completeness of activities and rules (no missing content) and minimality (no redundancy). Figure 5 (a) shows the number of correct answers. Overall, our results indicate there is no significant difference between the three groups in terms of understanding accuracy (p = 0.579; Kruskal-Wallis test).

Fig. 5.
figure 5

Task performance

However, as per Fig. 5 (b), we observe an increase in the percentage of questions answered correctly for the text and link treatment group, while understanding accuracy in diagrammatic treatment group remains relatively stable. While task performance results provide an important perspective, we further investigated the answering phase (with respect to fixations as well as transitions) to reveal the sensemaking behavior that resulted in the respective task performance. We illustrate our results with the help of process diagrams (Fig. 6), where we have aggregated all the other areas for the purpose of illustration. The transition values indicate transition frequency proportion and the activity values indicate visit frequency proportion. For the global question Q3, the relevant areas include area 1, 5 and 7, hence we aggregated the proportion of the frequency of transitions on all relevant areas for Q3.

Fig. 6.
figure 6

Sequence of fixations in answering phase for best performers. Larger and clear version can be downloaded from https://www.dropbox.com/sh/zfw5uq0jyja8tt6/AADx2fm8Y9SSqAkGwTDKD7ITa?dl=0

Our results indicate that on global question Q3 (phase 2), compared with text and diagrammatic groups, the link group has the lowest mean fixation duration on the model area (p = 0.016; Kruskal-Wallis test). The link group has a lower mean fixation duration than text annotation (p = 0.005; post-hoc pairwise comparisons of Dunn’s test), but link and diagrammatic integration and text annotation and diagrammatic integration do not differ significantly (p = 0.118 and p = 0.440, respectively). Hence, link representation requires the least attention, indicating favorable performance from a task specific information processing perspective as task complexity increases.

For all groups, we observe reduced transitions (proportion of transition frequency count) in the answering phase as compared to the understanding phase, between relevant and other areaFootnote 10. Similarly, the transitions between question and other area is reduced for all groupsFootnote 11. It is important to note the presence of an additional rule area in the link group. All questions in the link group show reduced transitions between rule and other areaFootnote 12 and reduced transitions between rule and relevant area.Footnote 13 We further note that the link group showed the best accuracy in Q3 (Fig. 5 (b)). We would expect such transition frequency reductions to occur if efficiency gains were being made. Despite these trends, we did not find statistically significant differences across the groups.

Additionally, we note that the represented best performers undertake deep processing (number of long fixations above 500 ms) in both phases (i.e., the mean fixation durations on the relevant, other, and question and (rule area) for each question in both phases are all above 500 ms). Prior research differentiates between mere scanning of information (<500 ms), which indicates a superficial level of processing, and deeper processing (>500 ms) that is connected to purposeful consideration of information [8]. Our study results show that even after the understanding phase is complete, participants still engage in deep processing of information in the answering phase.

5 Conclusions and Outlook

In this paper we investigated how user behavior occurs in dual artefact tasks when the form of integrated representation of the artefacts (namely business process models and business rules) and task complexity changes. By using a sensemaking lens in our study we were able to delineate the behavior between developing model understanding and task accomplishment. Our results show that link representation shows better task performance in terms of accuracy as well as efficiency, especially as task complexity increases. Additionally our results provide some evidence that diagrammatic integration has better task performance on local questions in terms of accuracy, but also requires the most effort in the initial information foraging (understanding) phase. As task complexity increases, diagrammatic representation arguably requires the most effort indicated by the highest transition frequency between question and relevant areas. These results have implications for business process and rule integrated modeling frameworks, and may also provide guidance for users’ training and work allocation decisions. In addition, our study provides a methodological contribution by offering an approach to visualize the different behaviors inherent in the two phases of sensemaking.

Our study is not without limitations. We only considered basic constructs in business process models whereas advanced loop and nesting structures may introduce further complexities in sensemaking. The limitation of the eye tracking software limits the granularity of the AOI which causes some level of imprecision in AOI level metrics. Complementary approaches such as cued retrospective ‘thinking-out-loud’ [30] could also help to provide further explanations on the sensemaking behavior. In this paper, we have mostly analyzed and presented the results of performers who answered the questions correctly. Analysis of behavior of other participants as well as change in behavior over longer tasks with greater variability in task complexity will help further reveal insights into sensemaking, and may especially be valuable for training and work allocation purposes.