Keywords

1 Introduction

In order to stay competitive, organizations need to document and manage their business processes. Approaches related to process descriptions and Business Process Model and Notation (BPMN) can help them to achieve these goals. The combination of distinct approaches can improve understanding of the process as they provide different perspectives on it [21, 23]. Several proposals have been presented with the goal of contributing to the relationship between texts and process models in different scenarios, such as: automatic generation of process descriptions from process models [17, 18, 20], automatic generation of process models from process descriptions [8, 14, 15, 25], process mining from natural language text [5, 6, 10, 19], identification of business process elements in natural language texts [12], integration between texts and process models [16] and verification of conformity between texts and process models [1, 3, 24].

Fig. 1.
figure 1

Process-oriented text generation.

A number of the proposed approaches related to this context use sentence templates to generate or transform process descriptions. However, the corresponding related works do not explain how the sentence templates that compose the process description were selected and this information is important, as it directly interferes with the quality of the text. Sentence templates not carefully selected may produce sentences with ambiguity problems that may not be understood by the stakeholders of the business processes, such as process analysts and domain experts.

Furthermore, this work aims to help in the development of an approach that generates process-oriented texts from natural language texts. In the context of this approach, a process-oriented text is defined as a text that is both structured and capable of maintaining the maximum information related to the business process. In addition, it is expected to verify that the business process described in the text conforms to the BPMN specification. The approach consists of five stages: input data, text reading, BPMN verification, text writing and text output. Firstly, a natural language text is given as input to the input data (i.e., input data stage). Then the approach reads the natural language text and produces an intermediate structure (i.e., text reader stage). Then, the intermediate structure is used to verify the process described by the text (i.e., BPMN verification stage) and to generate the text writer (i.e., text writer stage). Finally, the verification and the structured text are combined for the generation of the process-oriented text (i.e., output text stage). Figure 1 shows the approach with its respective stages. This paper seeks to contribute with the sentence templates repository presented in stage “4. Text writer”.

In this context, this work proposes an analysis to identify sentence templates that are common in processes descriptions and that causes less ambiguity problems, so the findings can be useful in writing new process descriptions. For this, an empirical analysis of 64 texts describing business processes was performed. These texts were taken from the book Fundamentals of Business Process Management [9] and Friedrich [13]. Moreover, an investigation of ambiguity issues in process descriptions was carried out based on the literature and on the sentence templates identified. This paper presents the results of these analyses.

A total of 101 sentence templates was found, divided into 29 categories based on three criteria, namely: source, target and relationship. Six types of ambiguities were identified and, when compared with the found sentence templates, enabled us to define 13 templates related to ambiguity issues.

The remainder of this paper is organized as follows: Sect. 2 presents BPMN as background of the presented research. Section 3 presents the method used to identify and classify sentence templates as well as the identified ambiguity issues. Section 4 reports the results analysis. Section 5 contemplates the applications of this work, as well as discusses the results obtained. Section 6 presents the related work. Finally, Sect. 7 presents the final conclusions.

2 Background

The BPMN is a standard for process modeling maintained by the Object Management Group (OMG) [22]. BPMN includes five elements categories: flow objects (activities, events and gateways), data (data objects, data inputs, data outputs and data stores), connecting objects (sequence flows, message flows, associations and data associations), swimlanes (pools and lanes) and artifacts (groups and text annotations). In this work, the focus is on flow objects, swimlanes and connecting objects because they are used for the identification and classification of sentence templates.

Regarding flow objects, activities can be defined as a task that a company performs in a process. An activity can be atomic or non-atomic and is represented as rounded boxes. Events are represented as circles and indicate where a particular process starts (start event) or ends (end event). Moreover, there are events that can occur between a start event and an end event (intermediate event) which can affect the flow of the process but cannot start it or end it. Finally, gateways are represented as a diamond shape and are responsible for controlling divergence (split) and convergence (join) of sequence flows in a process. There are six different types of gateways which differ in both the logic that they execute and the representation placed within the gateway diamond. Among them, it can be highlighted: exclusive gateway (XOR, represented with or without a “X” marker), where the decision making leads to the execution of exactly one path; parallel gateway (AND, represented with a “+” marker), where all possible paths must be executed; and inclusive gateway (OR, represented with a “O” marker), where decision making leads to the execution of at least one path.

In relation to swimlane, a pool represents a participant in a business process. A pool is graphically represented as a container that partitions a process from other participants. If a pool does not contain a process, it is considered as a black box. Lane, on the other hand, are the partitions used to organize and categorize activities within a pool. Lanes are often used for representing internal roles (e.g., Manager, Associate), systems (e.g., an enterprise application), or an internal department (e.g., shipping, finance).

Regarding connecting objects, sequence flows are used to show the order of flow objects in a process. A sequence flow is represented as a solid single line with a solid arrowhead. A message flow represents the flow of messages between two different participants and is represented as a dashed single line with an open circle line start and an open arrowhead line end. In addition, an association is used to link information and artifacts with flow objects. Data associations, on the other hand, are used to relate data objects and activities. Both association and data association are represented as a dotted single line.

Fig. 2.
figure 2

Example of BPMN model: computer repair.

Figure 2 presents an example of a BPMN process model composed by one start event, five activities, one exclusive decision gateway (XOR-split), one exclusive merge gateway (XOR-join) and one end event. After the process starts, an activity is executed (called “Make evaluation”). Then, there is a decision making in which only one of three possible paths can be followed. After one path is followed, the process returns to the main path, another activity is performed and the process ends. A possible description of the process shown in Fig. 2 can be seen in Fig. 3. The relationships between the text and the model are evidenced through \(s_x\), where x refers to the sentence number in the text.

Fig. 3.
figure 3

Example of process description: computer repair.

3 Sentence Templates and Ambiguity Issues

This section presents the method used to identify the sentence templates.

The process descriptions used in the analysis came from two different sources. Only process descriptions in English were considered, as templates are very sensitive to the language. Firstly, 47 process descriptions present in Friedrich [13] were identified. From this first source, 17 process descriptions were disregarded for the following reasons: they were translated from another language through machine translation services (14 texts), were duplicated (2 texts) or had a description format based on enumeration (1 text). Secondly, 34 process descriptions from the book Fundamentals of Business Process Management [9] were identified. The final set of 64 process descriptions, as well as their respective sources and types are presented in Table 1.

Table 1. Data sources.

The following subsections present the procedures followed to: (Sect. 3.1) prepare the sentences, (Sect. 3.2) identify and classify the sentence templates and (Sect. 3.3) address the ambiguity issues.

3.1 Preparation of Sentences

In this first stage, the sentences are prepared for the identification and classification of sentence templates. For this, the sentences of a process description are modified to become more generic.

A business process description may contain snippets of text that are directly related to the process context. As an example, the sentences “The manager must sell the product” and “The salesman must sell the product” are identical, except by who carries out the activity of selling the product. This difference can hinder the identification and classification of sentence templates, since these sentences can be considered as different sentence templates. In this sense, a term capable of representing both “manager” and “salesman” could be used in order to make these two sentences equal and, consequently, to define both as the same sentence template. Thereby, the process descriptions were previously analyzed and four different placeholders were created with the goal to replace in the sentences the snippets related to the context by more generic information. The created placeholders are: role, condition, number and object.

The placeholder role is associated with the role responsible for performing a particular action. In relation to the business process model, a role could be considered as a participant. As an example, in the sentence “The process finishes after the technician completes the repair form.”, once the technician is the one performing the action, the word “technician” can be replaced by the placeholder role. Therefore, the sentence after the modification would be written as “The process finishes after the role completes the repair form”.

The placeholder condition aims to define some condition that needs to be satisfied for a given flow to occur. Normally, the condition appears in a business process model as a label that tracks the output sequence flow of an exclusive or inclusive gateway. For instance, in the sentence “In case it is a software problem, the technician must format the computer.” it is possible to observe that to be done the activity of formatting the computer must exist before the condition “it is a software problem”. Therefore, this condition will be replaced in the text by the placeholder condition. Moreover, the technician can also be replaced in this sentence by the placeholder role.

The placeholder number is used to represents a certain amount of process elements or paths in a process model. As an example, in the sentence “After all five activities are completed, the process ends.”, the amount “five” can be replaced by the placeholder number.

Finally, the placeholder object can represent the business object to which the sentence refers. For instance, in the sentence “The car can be sold by the manager or the seller”, the business object “car” can be replaced by the placeholder object. In addition, the placeholders role 1 and role 2 could be created to represent the manager and seller respectively.

After the preparation stage, the modified sentences containing placeholders will be used for the identification and classification of sentence templates.

3.2 Identification and Classification of Sentence Templates

In the context of this work, a sentence template was considered as each pattern present in a sentence that is able to describe one or more process elements. These process elements appear in the template as placeholders to be replaced. For the scope of this paper, a reduced set of elements is taken into account to find sentence templates, being: activity (\(A_{c}\)), AND-split (\(G_{+s}\)), AND-join (\(G_{+j}\)), XOR-split (\(G_{Xs}\)), XOR-join (\(G_{Xj}\)), OR-split (\(G_{Os}\)), OR-join (\(G_{Oj}\)), start event (\(E_{s}\)), intermediate event (\(E_{i}\)), end event (\(E_{e}\)). In addition, “empty” is used to define paths without elements (e.g., Fig. 2, \(s_6\)).

In order to identify a sentence template, it is necessary to identify beforehand the process elements in the text. Although there are works that contribute to the automated identification of process elements in texts, to the best of our knowledge, there is no approach capable of extracting the process elements in a textual description with complete precision [10, 12, 14]. In addition, automated identification approaches can draw incorrect conclusions about a process by making assumptions about texts that allow for multiple interpretations [2]. Therefore, an automated analysis of sentence templates could be compromised by the selected approach of extracting process elements, so the identification of the sentence templates was carried out manually.

The identification and classification of sentence templates were carried out in parallel. In the context of this paper, each sentence template is considered as composed by the following elements:

  • Target: the set of process elements described by the sentence template. A target must appear in the sentence, even if implicitly (i.e., without a placeholder to fill with the process element).

  • Relationship: how the process elements in the sentence are associated to each other: none (\(R_N\)), composed by 0 or 1 process element; sequential (\(R_S\)), one element follows the other; exclusive (\(R_{X}\)); inclusive (\(R_{O}\)); and parallel (\(R_{+}\)).

  • Source: the process element that occurs immediately before the analyzed sentence. As in the BPMN specification [22], the source can be understood as the element prior to the currently described element connected by a sequence flow. A source may or may not be evidenced in the sentence.

As an example, for the process description presented in Fig. 3 (corresponding to the BPMN model depicted in Fig. 2), five sentence templates were identified, two of which has target with the sequential relationship (\(s_4\), \(s_7\)) and three has target with the none relationship (\(s_2\), \(s_3\), \(s_6\)). In the sentence \(s_2\), it is possible to define the sentence template “Once \(E_s\), the role must \(A_c\)”, where \(E_s\) is the source evidenced in the sentence template, \(A_c\) is a placeholder for an activity described in the target and role refers to some participant in the process that performs the activity \(A_c\). The sentences \(s_3\), \(s_4\) and \(s_6\) have as source a XOR-split gateway not evidenced in the sentence template. In addition, the sentence \(s_7\) has as its source a XOR-join gateway and as target an activity (\(A_c\)) and an end event (represented implicitly by “The process finishes after”). Not all sentences in a text are necessarily mapped to a sentence template, since process descriptions can be composed by other information, such as statements that contextualize the process (\(s_1\)) and statements that detail an activity or business rule (\(s_5\)).

In order to identify the sentence templates, each process description was inserted into a spreadsheet, as illustrated in Table 2. In the spreadsheet, each line represents one sentence and the columns represent the following attributes: sentence, sentence template ID (i.e., the ID of the sentence template that can be a number or “none") and sentence template.

Table 2. Example of identification of sentence templates.

After all the sentence templates were identified, they were grouped into categories based on source, target and relationship. As a result, each category is composed by sentence templates that can be replaced in a process description and represent the same information. As an example, the sentence \(s_3\) is defined as a sequential relationship between a XOR-split (source) and an activity (target). This sentence can be rewritten by another sentence template that have the same properties, therefore the same category, such as: “Once condition, the role needs to \(A_c\)”.

The analysis of sentence templates was done in two different manners, namely atomic level analysis and group level analysis. At the atomic level analysis, it is considered that if two sentences have the same text, but represent different process elements in the source, they are defined as two distinct sentence templates. For example, sentences “When a computer with problems arrives, the technician must perform an evaluation.” and “When performing a repair, the technician must perform an evaluation.” are defined as different sentence templates because they have different process elements in the source, being respectively: “When \(E_s\), \(A_c\).” and “When \(A_c\), \(A_c\).”. On the other hand, at the grouped level it is considered that different possible process elements can be translated as the same sentence template. In this case, the two sentence templates described above can be viewed as a single sentence template (i.e., “When (\(A_c\) or \(E_s\)), \(A_c\).”), capable to have as source either an activity or a start event.

To facilitate the categorization of sentence templates, a notation was created based on the previously defined criteria. \(St_i= R_s(source,target)\) can be interpreted as: there is a sentence template \(St_i\) that starts from a source, can describe a target and is associated through a sequential relationship \(R_s\). In the case of atomic level analysis, a source is a process element. On the other hand, in the case of group level analysis a source is a set of possible process elements (e.g., \(A_c | G_{Xs} | G_{+s}\)). A target can be described as \(target = R_x (component_1, ..., component_n)\), i.e., a target is a set of components that relate to each other through a relationship \(R_x\). Finally, a component can be a process element, empty, or another target. Thus, two different sentence templates belong to the same category if they share the same notation, which means to start from the same source and reach the same target.

3.3 Ambiguity in Sentence Templates

After the identification and classification of the sentence templates, they were analyzed in relation to ambiguity issues. A sentence template was considered ambiguous when it allows multiple interpretations of the process. To identify common ambiguity issues in process descriptions, two approaches were carried out: analysis of the literature and analysis of the sentence templates.

As for the analysis of the literature, works related to ambiguity in process descriptions were investigated. Although some works related to this subject were found, only a few of them [1,2,3,4] presented cases of ambiguity. This analysis of the literature made it possible to find eight ambiguity problems that were categorized into five different types of ambiguity.

In terms of the analysis of the sentence templates, two independent tasks were conducted. In the first part, an analysis of each sentence template was carried out individually in order to identify ambiguity issues. In the second part, an analysis was carried out involving the combination of sentence templates. For the latter case, sentence templates that share the same description, but do not have the same classification (i.e., source, target or relationship) were considered candidates for ambiguity issues.

Table 3 presents the six different types of ambiguities that were identified in this work, with their respective identifiers (\(Ambi_{ID}\)), descriptions, examples and source.

Table 3. Identified ambiguity issues.
Table 4. Atomic sentence templates by category – 1.
Table 5. Atomic sentence templates by category – 2.
Table 6. Atomic sentence templates by category – 3.

4 Results and Analysis

After analyzing the process descriptions in an atomic level, it was possible to obtain a set of 101 sentence templates for 29 categories. Of these, 13 sentence templates were classified as having one of the six ambiguity issues. Tables 4, 5 and 6 show the sentence templates for each one of the 29 categories (\(C_{ID}\)), with their respective category notation. Each sentence template has a specific identifier presented in the “\(St_{ID}\)” column. In addition, the number of times each sentence template appeared in the process descriptions analyzed is presented in the “N” column. Moreover, the ambiguity issues identified for each sentence template, when identified, is presented in the “\(Ambi_{ID}\)” column, based on the elements in Table 3.

Of the identified sentence templates, the most recurrent is “If condition, \(A_{c}\).” (\(St_{71}\)), from category \(C_{18}\), which appeared 81 times. It is possible to observe that this sentence template is fairly recurrent in process descriptions because the two sentence templates that appear the most after this first (i.e., \(St_{1}\) and \(St_{36}\)) were identified only 15 times. Moreover, the category that presented the largest diversity of sentence templates is \(C_1\), with 19 distinct sentence templates, followed by \(C_{18}\) (with 12) and \(C_{3}\) (with 10).

In terms of ambiguity, the type that appeared most in the sentence templates is related to the term “and” (\(Ambi_1\)), having occurred five times. Moreover, \(Ambi_2\) appeared three times, followed by both \(Ambi_4\) and \(Ambi_6\) (2 times), and \(Ambi_5\) (1 time). In addition, in the identified sentence templates no case was found related to the ambiguity \(Ambi_3\).

Furthermore, it is possible to notice that not all relationships between process elements are explored in Tables 4, 5 and 6. This occurs because some relationships that occur in the model are not explicitly transformed into sentences. Also, there are some relationships that appear less frequently than others in the texts considered.

In the group level analysis, the atomic level sentence templates that share the same target but presents different process elements as sources were grouped. From the data collected in the atomic level analysis, it was possible to identify 8 sentence templates that were transformed into four grouped sentence templates. Table 7 presents the grouped sentence templates. In this table are presented the new notations able to represent the grouped sentence templates, as well as the new sentence templates. In addition, the identifier of the atomic sentence templates used in each grouped sentence template are presented in the “\(St_ {ID}\)” column. Finally, as in atomic level analysis, the number of times each grouped sentence template appeared in the process descriptions analyzed is presented in the “N” column.

Table 7. Grouped sentence templates.

5 Discussion

The identified sentence templates can help approaches for identification of process elements in natural language texts and for automated creation of business process descriptions. For identification of process elements in natural language texts, the approaches can use the identified sentence templates as patterns to be sought in texts. In this case, sentence templates could be searched in the sentences of a process description. By finding a sentence corresponding to a sentence template, the process elements present in the sentence could be identified.

For automated creation of process descriptions, the approaches can choose to use the most recurring sentence templates, or take advantage of the variety of sentence templates in each category to make the text more diversified. As a demonstration, the text described in Fig. 3 could with the use of the identified sentence templates be rewritten as presented by Fig. 4. In this new process description, the sentence \(s_1\) remained the same because it only presents context information. In addition, the sentence \(s_2\) (belonging in the category \(C_{14}\)) was modified by the sentence template “The process starts with \(A_c\)” (\(St_{57}\)) because it is recurrent in the process descriptions analyzed. Moreover, the sentences \(s_3\) and \(s_6\) (both belonging to category \(C_{18}\)) were rewritten using the sentence template “If condition, \(A_c\)” (\(St_{71}\)) because this sentence template appears recurrent in describing activities starting from an XOR-split. The sentence \(s_5\) also has not been modified because this sentence only details an activity. Furthermore, the sentence \(s_6\) has an ambiguity problem related to “and” (\(Ambi_1\)). Although the sentence templates collected do not present any candidate capable of removing the problem, the results help indicate that there is a problem in the process description. Among possible solutions, the sentence could separate into two new sentences or seek for a different sentence capable of avoiding the problem of ambiguity, such as: “If it is a hardware problem, the technician must replace the part and then fill out the part replacement form.”. Finally, the sentence \(s_7\) (belonging to category \( C_{7 }\)) was modified by the sentence template “Finally, \(A_{c}\)” (\(St_{41}\)) because it appears to be most recurrent in the identified sentence templates.

Fig. 4.
figure 4

Example of rewritten process description: computer repair.

Regarding categorization of sentence templates, although the classification based on source, target and relationship helps in the task of grouping sentence templates that have the same characteristics, in some cases an analysis of the context can help to select a sentence template that more fits the text and, consequently, to produce a text more suitable for process analysts and domain experts. For instance, the sentence templates “If condition, \(A_c\)” (\(St_{71}\)) and “Otherwise, \(A_c\)” (\(St_{72}\)) are related in the same category (\(C_{18}\)) and are both able to represents an activity being performed after an XOR-split. However, the second sentence template could produce a disconnected text by referring to the first possible path of a decision making in a process description.

6 Related Work

The work presented here relates to two different streams of research: generation of process descriptions and ambiguities present in process descriptions.

For the generation of process descriptions context, Leopold et al. [18] proposed a technique to generate natural language texts from business process models. The authors used sentence templates to transform business process models into sentences that compose the text. Furthermore, Aysolmaz et al. [7] defined a semi-automated approach to generate natural language requirements documents based on business process models. The authors adopted a template filling technique, in which sentence templates are defined containing gaps that must be filled with information from a requirements model. In addition, Caporale [8] suggested a method that allows generating process models from process descriptions. To achieve this, the author proposed that the process descriptions should be specified with a controlled natural language, based on sentence templates, in order to facilitate the extraction of information necessary to generate the models. Moreover, Ghose et al. [15] proposed a framework and prototype tool that can query information resources (e.g. corporate documentation, web-content, code) for construct models to be incrementally adjusted to correctness by an analyst. One of the techniques used by authors to extract information from text is based on template extraction. In this technique, the authors created templates from textual structures that are commonly used in describing processes and used these templates to extract knowledge from text documents.

Regarding the ambiguity present in process descriptions, Ferrari et al. [11] conducted a literature review and a set of interviews with different public institutions aiming at improving the process descriptions to be used in public administrations. The authors concluded that ambiguity is one of the macro-areas of research in which computer scientists can contribute towards more quality in business process descriptions. Moreover, van der Aa et al. [1] presented an approach to automatically detect inconsistencies between process model and the corresponding textual description. The authors identified that a technique to detect inconsistencies must deal with ambiguity issues present in natural language. In another work, van der Aa et al. [2] proposed to deal with ambiguity in textual process descriptions introducing the behavioral space concept. The behavioral space captures all possible behavioral interpretations of a textual process description. Furthermore, van der Aa et al. [3] presented an approach to verify the compliance between a process and a process description, considering the ambiguity present in texts. To handle the ambiguity issues, they used the concept of behavioral space previously proposed in [2].

As can be observed, the first stream of research makes use of sentence templates for generation of business process descriptions from process models and extraction of text information for the generation of process models. Moreover, the second stream discusses the problem of ambiguity in process descriptions and presents some cases of ambiguity. This work is distinguished by the fact that it presents an empirical analysis in business processes descriptions, in order to find recurrent sentence templates and to highlight ambiguity issues in these sentence templates.

7 Conclusion

In this work, an empirical analysis of business process descriptions was carried out in order to discover the most recurrent sentences used to describe BPMN process models. In addition, an analysis was performed in order to find sentences with ambiguous meaning. The analysis consisted of three different steps. First, a set of 64 process descriptions was selected and their sentences were prepared for the identification of sentence templates. Then, an identification and a classification of sentence templates was performed in the prepared sentences. Finally, the sentence templates were marked as having or not ambiguity issues. Among all, 101 sentence templates were found and they were classified into 29 different categories. Of these, 13 sentence templates were considered as having ambiguity issues.

This work aims to contribute to the description of business processes in a way that is closer to a pattern and with less ambiguity issues. It can be useful for creating process descriptions more suitable for process analysts and domain experts. In addition, this analysis can be used by tools that automate the creation of process descriptions. As future work, it is intended to increase this analysis for other elements in BPMN, expand the sample of textual descriptions and construct a technique that uses these sentence templates to produce process descriptions.