1 Introduction

Businesses are increasingly interested in improving the quality and efficiency of their processes and in aligning their information systems in a process-centered way [32]. In such context, process-aware information systems (PAISs) [8] have emerged to provide a more dynamic and flexible support for business processes (i.e., BPs). BPs can be defined as sets of activitiesFootnote 1 which are performed in coordination in an organizational and technical environment [43] and which jointly achieve a business goal.

The support provided through a PAIS is greatly enhanced when it is able to provide accurate time predictions (i.e., predictions related to the completion time of a running process instance) since these predictions constitute a valuable tool when managing processes [29]. In fact, there exist many process scenarios for which temporal aspects are of utmost importance [44], and hence, reliable time predictions are crucial for any PAIS [41]. Specifically, such predictions allow process managers to (1) anticipate time problems, (2) proactively avoid time constraint violations, and (3) make decisions about the relative process priorities and timing constraints when significant or unexpected delays occur [10].

In such a context, time predictions must be provided while taking into account a set of basic requirements [33]: (1) the forecast must be highly accurate, (2) the prediction must take place nearly instantaneously, (3) the prediction functionality must be easy to use, and (4) the prediction may not interfere with the efficient operation of the PAISs. Therefore, time prediction represents a very challenging task and, even more, if the following desirable characteristics are considered:

  • The predictions should not be based on a single process instance Typically, activities in a BP compete for limited resources which are shared between all the process instances which are executed in the PAIS. Therefore, predictions which are evaluated over an isolate process instance may lack accuracy since some of the resources might be assigned to a different instance. These resources are then not available for such instance. For this, in order to provide accurate predictions they must consider multiple process instances and resources [33, 37].

  • The predictions should be multi-dimensional In modern cooperative business, time is of utmost importance. Therefore, process models should be able to take this dimension into consideration [44]. Moreover, the resource perspective of BPs—which refers to the link between the activities defined in the processes and the entities that carry out the work related to them [45]—is significant to the efficiency and effectiveness of a process [40], and hence, should be also considered when designing a BP model [38]. Likewise, it becomes essential to take into account the data perspective since data constraints influence the possible executions of activities and, in turn, the execution of activities results in certain data constraints that should be met. Thus, a multi-perspective time prediction methodology, including all these aspects, would be desirable, i.e., besides predicting the remaining time of a specific instance, other relevant issues that can be also predicted for improving the management of running instances (e.g., start and end times of process activities, use of resources, and critical activities).

  • The prediction system should be able to adapt to changing circumstances Many real scenarios might be subject to input uncertainty, e.g., the arrival time of clients is not well known or a resource became unavailable during the process enactment [39]. Therefore, in general, BPs are designed considering different alternatives to cope with such situations when they are enacted. Accordingly, a related prediction system should also consider such alternatives during the enactment to increase the supported flexibility.

  • The prediction system should be able to deal with declarative models Flexible PAISs [32] are required to allow companies to rapidly adjust their BPs to changes. In such a context, declarative BP models (e.g., constraint-based models) are increasingly used allowing their users to specify what has to be done instead of how [27], and hence, offering a high flexibility to end users. Many enactment plans related to the same constraint-based process model typically exist, and each of these plans presents different values for relevant objective functions (e.g., overall completion time).

Although there exist some approaches related to time prediction (e.g., [9, 23, 29, 34, 37, 41, 42]), they neglect some of the aforementioned characteristics. Especially in the last one, whereas there exist solid time prediction techniques for imperative models (e.g., [9, 23, 24]), little work has been conducted for declarative models [41].

In this work, an approach for generating time predictions of running process instances related to a multi-perspective constraint-based process model is proposed, i.e., which considers time, resource, data and control flow perspectives. The generated predictions are based on information extracted from both a constraint-based process model and the current state of partially executed process instances.

Note that while constraint-based process models offer a high flexibility to end users, they also increase the challenge of performing accurate predictions in such uncertain scenarios. In this work, we consider that all the decisions that the process stakeholders make about the way to execute it are generally aligned to the optimization of relevant objective functions (e.g., among all the available activities, which one should be executed next in order to minimize the overall completion time of the current process instances?). For this, instead of using any possible enactment plan which is compliant with the constraint-based process model, we propose to base the predictions on an optimized enactment plan that is generated from such a model at the current execution state.

Fig. 1
figure 1

Overview of the proposed approach for generating time predictions

Figure 1 depicts the main contribution of this work. Starting from (1) a multi-perspective constraint-based process model, (2) an objective function, and (3) the execution state of multiple running process instances, an optimized enactment plan is automatically generated. The generation of such plan is carried out by solving a planning and scheduling (P&S) problem in which, on the one hand, the activities to be executed are selected and ordered considering all the constraints, resource requirements, and resource availabilities—what constitutes a planning problem [15]. On the other hand, attribute values like start time are assigned to activities—what constitutes a scheduling problem [6]. For solving this P&S problem, we base on a previous work [21] where we propose a constraint-based approach in which the process model and the objective function are represented as a constraint optimization problem (i.e., COP). The generated enactment plan is obtained as an optimized solution to this problem and is, in turn, used for performing the predictions. Thereafter, the predictions are performed through the evaluation of desired functions over the optimized enactment plan.

Note that the decisions which are taken by the process stakeholders might not be aligned with the enactment plan which has been used to generate the predictions due to unexpected events (e.g., a resource became unavailable). Nonetheless, at run-time this plan is updated—if necessary—considering the current state of the running process instances, and therefore, the predictions are also updated as the execution of the process proceeds.

The proposed approach has several advantages that are worth nothing. First, when generating the optimized enactment plan, multiple process instances as well as the allocation of resources are considered. Secondly, since the optimized enactment plans are updated according to the running process instances, it is possible to deal with unexpected factors and hence, to adapt to changing circumstances. Third, besides predicting the remaining time of a specific process instance, the proposed approach allows the prediction of other relevant issues. Finally, declarative models are considered as starting point.

In previous work, we presented an approach for generating optimized enactment plans from constraint-based specifications [21]. In addition, we applied such technique to provide recommendations [2] and generate imperative BP models [1]. However, this paper significantly extends these previous works by:

  1. 1.

    Introducing a novel method to generate predictions from declarative specifications. Such a method is built upon the constraint-based tool developed in [21].

  2. 2.

    Performing a case study considering a benchmark and a real scenario in order to validate the effectiveness and suitability of the proposal.

Moreover, “Appendix A” of this paper provides further implementation details of the constraint-based tool which are not included in previous works.

The rest of the paper is organized as follows: Sect. 2 introduces backgrounds, Sect. 3 details the proposal for providing predictions, Sect. 4 explains a real example where the current proposal is applied, Sect. 5 deals with the evaluation, Sect. 6 presents a critical discussion, and Sect. 7 includes some conclusions and future work.

2 Background

This section introduces the related concepts which are used in the remainder of this paper. Section 2.1 provides backgrounds regarding constraint-based BP models. Section 2.2 gives an overview of planning, scheduling, and constraint programming. Section 2.3 summarizes previous related work on time prediction on business processes.

2.1 Constraint-based BP models

Different paradigms for process modeling exist, e.g., imperative and declarative. Irrespective of the chosen paradigm, desired behavior must be supported by the process model, while forbidden behavior must be prohibited [25, 27]. While imperative process models specify exactly how things have to be done, declarative models only focus on what should be done. In our proposal, we use the constraint-based language DeclareFootnote 2 [27, 28] for the BP control flow specification. Declare is based on constraint-based process models.

Definition 2.1

A constraint-based process model \(CM = (A, C_{BP})\) consists of a set of activities A, and a set of constraints \(C_{BP}\) prohibiting undesired execution behavior. Each activity a \(\in \) A can be executed arbitrarily often if not restricted by any constraints.

Constraints can be added to a Declare model to specify forbidden behavior, restricting the desired behavior. For this, Declare proposes an open set of templates which can be divided into 4 groups:

  1. 1.

    Existence templates: unary template (i.e., it involves only one activity) concerning the number of times one activity is executed, e.g., Exactly(N, A) specifies that A must be executed exactly N times.

  2. 2.

    Relation templates: positive binary templates used to impose the presence of a certain activity when some other activity is performed, e.g., Precedence(A, B) specifies that to execute activity B, activity A needs to be executed before.

  3. 3.

    Negation templates: negative templates used to forbid the execution of activities in specific situations, e.g., NotCoexistence(A,B) specifies that if B is executed, then A cannot be executed, and vice versa.

  4. 4.

    Choice templates: n-ary templates expressing the need of executing activities belonging to a set of possible choices, e.g., ExactlyChoice(N, S) specifies that exactly N activities of the set of activities S must be executed.

Fig. 2
figure 2

Increased flexibility of declarative models versus imperative models

Example 2.1

Figure 2a shows a constraint-based BP model where tracesFootnote 3 <AAB>, <AB>, <ABAB>, <ABB>, <A> are some of the valid ways of executing such model, while traces <BA>, <BB>, <BAAB> are invalid since A must precede B. In contrast, Fig. 2b shows an imperative model where there is only one valid execution trace, <AB>.

There are different ways to execute a constraint-based process model while fulfilling the constraints, i.e., there are several related enactment plans.Footnote 4 The different valid execution alternatives, however, can greatly vary in respect to their quality, i.e., how well different performance objective functions can be achieved. Such objective functions of the BPs are the functions to be optimized during the BP enactment, e.g., minimization of the overall completion time.

In order to allow dealing with more realistic problems compared to Declare, and motivated by requirements described in the literature [22, 25, 27, 44], in previous works we extended Declare to ConDec-R [20] by adding resource reasoning as well as temporal and data constraints,Footnote 5 resulting in ConDec-R process models. For this, ConDec-R supports activities with an open set of attributes and alternative resources.

Definition 2.2

A BP activity \(BPAct = (a, Res, Atts)\) represents a BP activity called a, which can be performed by any resource included in Res, and which has a set Atts of attributes associated (e.g., duration and profit) which is composed of tuples <att, val>.

Definition 2.3

A ConDec-R process model \(CR = (BPActs\), Data, \(C_{BP}\), AvRes, OF) related to a constraint-based process model \(CM = (Acts\), \(C'_{BP})\) is composed of (1) a set of BP activities BPActs associated with Acts, (2) problem data information Data, (3) a set of ConDec-R constraints \(C_{BP}\) including the constraints of \(C'_{BP}\) and the constraints which relates activities included in BPActs and the data included in Data, (4) a set of available resources AvRes which is composed of tuples (role,#role) which includes for each role (i.e., role) the number #role of available resources, and (5) an objective function OF to be optimized.Footnote 6

Fig. 3
figure 3

A simple ConDec-R process model

Figure 3 shows a simple ConDec-R process model where: \(BPActs=\) \(\{(A,<R1>,{<<}\,att_1,2>,<att_2,6{>>}),\, (B,<R2>,<<att_1,3>,<att_2,2>>),(C,<R1,R2>,<<att_1,2>,<att_2,2{>>}), (D,<R1,R2>,{<<}\,att_1,2>,<att_2,3{>>})\}\);Footnote 7 \(Data=\{{<}N,1{>}\}\); \(C_{BP}=\) \(\{exactly(1,A)\), succession(AB), response(AB), negate-\(response(B,C)\), \(pre\) \(cedence(C,D)\), \(exactly(2,B)\}\); \(Av\) \(Res=\) \(\{(R1,2)\), (R2,  \(2)\}\); and \(OF=\) \(maximize(OF_1)\).

2.2 Planning, scheduling, and constraint programming

The area of scheduling includes problems in which it is necessary to determine an enactment plan for a set of activities related to temporal and resource constraints (in our context the control flow constraints). In scheduling problems, several objective functions are usually considered to be optimized, in most cases related to temporal measures, or considering the optimal use of resources. In a wider perspective, in artificial intelligence (AI) planning [15], the activities to be executed are not established a priori; hence, it is necessary to select them from a set of alternatives and to establish an ordering. Thus, the objective of P&S is to find an enactment plan which fulfills the temporal and resource constraints while considering the optimization of some objective function. Such enactment plans are commonly represented as Gantt Charts [13] (cf. Fig. 4).

Definition 2.4

An enactment plan \(EP= (pId, Acts)\) is composed by an identifier (i.e., pId) and a set of activities (i.e., Acts) which are executed without preemption. Each activity \(act \in Acts\) consists of a tuple \({<}actId, st, et, res{>}\) where: actId is an unique identifier of the activity, st and et state the start time and end time of the activity in the enactment plan, respectively, and res identifies the resource where the activity is allocated.Footnote 8

Fig. 4
figure 4

Enactment plan example as a Gantt chart

In such a context, constraint programming (CP) [31] supplies a suitable framework for modeling and solving problems involving P&S aspects [36]. In order to solve a problem through CP, it needs to be modeled as a constraint satisfaction problem (CSP).

Definition 2.5

A CSP \(P = (V, D, C_{CSP})\) is composed of a set of variables V, a set of domains D which is composed of the domain of values for each variable \(var_i \in V\), and a set of constraints \(C_{CSP}\) between variables, so that each constraint represents a relation between a subset of variables and specifies the allowed combinations of values for these variables.

A solution to a CSP consists of an assignment of values to the CSP variables.

Definition 2.6

A solution \(S ={<}(var_1, val_1), (var_2, val_2), ...(var_n, val_n){>}\) for a CSP \(P = (V,D,C_{CSP})\) is an assignment of a value \(val_i \in dom_i\) to each variable \(var_i \in V\).

A solution is feasible when the assignments variable-value satisfy all the constraints. In a similar way, a CSP is feasible if at least one feasible solution for this CSP exists. From now on, \(S^{var}\) refers to the value assigned to variable var in a solution S.

Similar to CSPs, constraint optimization problems (COPs) require solutions that optimize an objective function.

Definition 2.7

A COP \(P_o = (V,D,C_{CSP},o)\) related to a CSP \(P=(V, D, C_{CSP})\) is a CSP which also includes an objective function o to be optimized.

A feasible solution S for a COP is optimal when no other feasible solution exists with a better value for the objective function o.

Constraint programming allows to separate the models from the algorithms, so that once a problem is modeled in a declarative way as a CSP, a generic or specialized constraint-based solver can be used to obtain the required solution. Furthermore, constraint-based models can be extended in a natural way, maintaining the solving methods. Several mechanisms are available for solving CSPs and COPs [31], which can be classified as search algorithms (i.e., for exploring the solution space to find a solution or to prove that none exists) or consistency algorithms (i.e., filtering rules for removing inconsistent values from the domain of the variables). In turn, search algorithms can be classified as complete search algorithms (i.e., performing a complete exploration of a search space which is based on all possible combinations of assignments of values to the CSP variables) and incomplete search algorithms (i.e., performing an incomplete exploration of the search space so that, in general, to get a feasible or an optimal solution is not guaranteed). In this work, we apply P&S to generate the best possible enactment plan from the a constraint-based process model through a complete search algorithm.

Since many COPs present NP complexity [14], optimized solutions are considered.

Definition 2.8

Let Sols be the set of all the solutions of a COP \(P_o\) and let \(Sols_t \subseteq Sols\) be the subset of the solutions already explored at certain time t. Then, a solution \(sol_1 \in Sols_t\) is optimized if it can be ensured that it is optimal regarding only the subset \(Sols_t\).

2.3 Time prediction on business processes

Time predictionFootnote 9 represents a valuable tool for any PAIS [41] since there exist many process scenarios for which time is of utmost importance [44].

Many proposals related to time prediction can be found in the literature. Such proposals base their predictions either on: (1) applying data mining techniques for analyzing logs of past process execution [33, 41, 42], (2) applying certain techniques over the process model [9, 24, 37] (e.g., simulation, critical path method, or queuing network analysis), and (3) both sources of information [23, 34] (i.e., event logs and process design).

Regardless of the source information which is analyzed for performing the predictions, such predictions should not be based on a single process instance in isolation, but consider multiple process instances and resources [33, 37]. However, while only few proposals deal with this issue (e.g., [24, 34]), most proposals (e.g., [9, 23, 41, 42]) do not pay attention to the influence that the execution of multiple instances competing for shared resource has on the related predictions.

Moreover, since flexible PAISs [32] are required more and more, the prediction system should be able to adapt to changing circumstances (e.g., a resource became unavailable during the process enactment) [23, 34, 37].

Although most proposals on time predictions are only focused on predicting the remaining time that is needed to complete the handling of a specific instance, there are other relevant issues that can be also predicted to support and improve the management of running instances: (1) start and end times of process activities, (2) use of resources, or (3) critical activities (i.e., activities whose delay implies a greater overall completion time). In such respect, existing proposals could be used for dealing with some of these issues, but, to the best of our knowledge, there is not any previous proposal able to deal with all the aforementioned issues.

3 Method for generating time predictions

This section details the method which is proposed for generating time predictions. Figure 5 shows an overview of the proposed approach, and its main components are explained as follows:

Fig. 5
figure 5

Overview of our proposal for generating time predictions based on optimized enactment plans

  • ConDec-R Specification is the multi-perspective constraint-based model (cf. Definition 2.1) created by an analyst through the ConDec-R language (cf. Definition 2.3).

  • Declarative PAIS is the component which takes a declarative specification and allows the user for instantiating such specification and interacting with the running process instances.

  • P&S Module is a constraint-based tool (i.e., it relies on a COP solver) which is in charge of generating optimized BP enactment plans from a declarative specification. In addition, the P&S Module allows considering that the current execution state (i.e., partial traces) of the running process instances is reproducible in the generated plan (i.e., the plan includes the current execution state).

  • Store of Plans is just the shared place where both the P&S Module and the Prediction Service manage the plans.

  • Prediction Service is the system which allows the user to ask for predictions. Each time the user asks for a prediction related to any magnitude (e.g., remaining time until completion of all running instances), this service considers the most recent optimized enactment plan for producing the prediction response.

Therefore, the Prediction Service allows to predict values not only over a single process instance but over the whole set of instances which is planned to be executed within a certain timeframe.

To initialize the components, in a first step, a multi-perspective constraint-based model is created by an analyst through the ConDec-R language (cf. step a in Fig. 5). Therefore, the control flow, resource requirements, estimates of the activities (e.g., the duration), and resource availabilities are specified. Estimates can be obtained by interviewing business experts or by analyzing past process executions (e.g., by calculating the average values of the parameters to be estimated from event logs). Moreover, both approaches can be combined to get more reliable estimates. Second, the Declarative PAIS takes such specification as input to allow users to execute related running process instances (cf. step b in Fig. 5). And third, the same specification is transformed and passed to the P&S Module to initialize it (cf. step c in Fig. 5). For this, the elements of the ConDec-R model (i.e., BP activities, constraints, resources, data and objective function) are transformed into the elements of the COP (i.e., variables, domains, constraints and objective function). Thereafter, the resulting COP can be solved using a search algorithm to obtain the solution of the COP which is directly considered as the optimized enactment plan (cf. step d in Fig. 5).Footnote 10

After these initial steps, two different processes run in a parallel way:

  • The user requests a prediction: At run-time, the user may request a prediction for the running process instances (cf. step e in Fig. 5). Thereafter, as stated in Algorithm 1, the prediction service registers that a user requested a prediction through the method registerPendingPrediction. Then the algorithm gets the existing optimized enactment plan from the store or waits for it in case that no plan is there yet (cf. step f in Fig. 5). As soon as a new plan is calculated by the P&S module, such a plan is returned by the method getPlanOrWait. After that, the prediction is generated (cf. step g in Fig. 5) using: (1) the optimized enactment plan and (2) a measurement functionFootnote 11 Mf which states the magnitudes to extract from the plan. After the prediction is performed, the method deregisterPendingPrediction informs the system that the prediction has been served.

figure f

Definition 3.1

Let EP be an enactment plan and T a time, then a Measurement function Mf(EP,T) is a function that produces some related measurements, e.g., accumulated profit until time T, remaining profit from time T, time until completion. Formally, \(Mf \in (C\times N)\rightarrow M\), where C is the set of possible enactment plans, N is the set of possible time stamps, and M is the set of possible measurement values (e.g., some time duration).

Note that, similarly to [41], this definition of measurement can be used for generating both predictive and non-predictive values. That is, a predictive value is obtained when the measurement function looks beyond T, e.g., expected time until completion. In contrast, non-predictive values are those which only require information of the enactment plan regarding the elapsed time (i.e., before T), e.g., accumulated profit.

Example 3.1

On the one hand, a predictive measurement function related to the remaining completion time would be described as follows:

$$\begin{aligned} Max_{act\in EP.Acts}(act.et) - T \end{aligned}$$

That is, it is measured by calculating the end time of the last activity in the plan—which is the total completion time—and then, subtracting the time T when the prediction is asked.

In addition, a measurement function to predict the expected profit would be described as follows:

$$\begin{aligned} \sum _{act\in EP.Acts}{\textit{profit}}{(act)} \end{aligned}$$

That is, it is calculated by summing the attribute profit of all the activities which exist in the plan.

On the other hand, a non-predictive measurement function related to the accumulated profit would be described as follows:

$$\begin{aligned} \sum _{{act\in EP.Acts | act.et<T}{\textit{profit}}(act)} \end{aligned}$$

That is, the sum of the attribute profit of all the activities whose end time is before the time T when the prediction is asked.

In the above formulas, EP.Acts refers to the activities which are enacted in the enactment plan EP, act.et represents the value of the et variable of the activity act, and profit(act) is the value of the attribute profit of the BP activity associated with act in the model.

  • Process instances are executed by authorized users (cf. step h in Fig. 5): As the execution of the process proceeds, events are generated (i.e., starting or finishing an activity by a resource) and constitute the current execution state which is sent to the P&S Module (cf. step i in Fig. 5). This module is in charge of updating the optimized enactment plan which is in the store by: (1) removing it if does not support the current state, i.e., the current partial trace is not a subtrace of such a plan, (2) continuously generating new optimized enactment plans in order to improve and replace the existing one, i.e., allowing continuous replanning. In this step, the solver considers the information of the current execution state obtained from the declarative PAIS. Note that this allows to adapt the predictions to changing circumstances.

    The behavior of the P&S Module is described in Algorithm 2. In the first line, a solver is created to look for solutions (i.e., optimized enactment plans) compliant with the ConDec-R process model. Each time the solver finds a solution which is better than the previous one, the solver updates the plan in the store. It is important to notice that, since predictions are requested on demand (cf. Algorithm 1), the time between the user interacts with the PAIS and the Prediction Service takes the optimized enactment plan from the store for generating a prediction is unknown. For this, the solver is executed without establishing a time limit, i.e., the startGeneratingPlans method launches the solver which goes on optimizing the plan, always compliant with the currentState, until stopGeneratingPlans is invoked. The algorithm is executed until all running instances have been completed (line 9 in Algorithm 2). While the solver is running, the algorithm waits until no predictions are pending (cf. line 5 in Algorithm 2), i.e., the Prediction Service is not waiting for a plan in the store. After that, it waits and listens to the changes in the current execution state that the P&S module may receive (line 6 in Algorithm 2). Whenever a change is received, the solver is stopped (line 7 in Algorithm 2) since the change may invalidate the solutions which are being generated and the optimized enactment plan in the store is updated (line 8 in Algorithm 2). Specifically, the removePlanIfNotCompliant method removes the plan from the store if the current execution trace does not match, i.e., it is not reproducible in the plan.

figure g

Despite the NP complexity of the considered problems, in general, replanning is less time-consuming than initial planning, since most of the information about previous generated plans can usually be reused. In the context of the current approach for the P&S Module, CSP variable values become known as execution proceeds.

From the point of view of the proposed approach, the complexity of performing a prediction over a given declarative model depends on the number and diversity of its execution alternatives, i.e., on the size of the search space. To be more precise, such complexity is related to the number of BP activities, the number and type of constraints among these activities, and the percentage of the plan that is already executed.

4 A real example: a beauty salon of Seville

This section introduces a real example from a beauty salon that is used to validate the current proposal in the considered case study.

The considered business has expanded quickly in the last years involving more staff, services, and complex constraints which resulted in problems related to the management of the salon. In particular, long waiting time for clients and a lack of information for the manager are causing problems, affecting customer satisfaction and profit of the business.

Since our approach generates predictions based on the state of the beauty salon, the aforementioned problems can be detected in advance, and therefore, the manager of the salon can react to overcome them.

The considered beauty salon offers various servicesFootnote 12 like dye, clean&cut, manicure, and facial services. Clients are required to make appointment calls so the number of clients and its booked services are known at the beginning of the day. There are several full-time employees identified by A, R, L, and M, and each activity can be performed by certain employees only. In addition, each activity has an average estimated duration and a profit which is obtained after their execution. The manager of the salon wants to plan and schedule a working day with several clients considering that the waiting time (WT) of the clients has to be minimized and distributed uniformly among all the clients (objective function):

$$\begin{aligned} WT= \frac{\sqrt{\sum _{c \in C}{((S^{et(c)}-c.appT)-(\sum _{b \in c.served}{\textit{b.estimate}}))^2}}}{C.size} \end{aligned}$$

where C is the set of clients, S is the considered solution, \(S^{et(c)}\) is the time when the client c has finished, c.appT is the appointment time of c, \({\textit{c.served}}\) is the set of services which are applied to c (i.e., included in the enactment plan), and \({\textit{b.estimate}}\) is the estimated duration for service b.

Fig. 6
figure 6

ConDec-R Model for the beauty salon problem (top level process)

Fig. 7
figure 7

ConDec-R Model for some of the services which are offered

Typically, as illustrated in Fig. 6, a client visit starts with the reception in the beauty salon. After that, the staff applies some services to the client and, finally, the client is charged. Complex activity \({\textit{Services}}\) is composed of other activitiesFootnote 13 (e.g., dye, clean&cut, facial, and manicure, cf. Fig. 7), while \({\textit{Reception}}\) and \({\textit{Charge}}\) are BP activities (cf. Definition 2.2). For each BP activity two attributes are considered: (1) estimated activity duration and (2) profit.Footnote 14 Moreover, the set of alternative resources which can perform the BP activity is also included, e.g., the activity \({\textit{Reception}}\) of Fig. 6 has an estimated duration of 1 min and a profit of 0 and can be performed by A, R, M, or L.

The current problem deals with N clients (cf. Existence constraint in Fig. 6)—each one representing an instance of the model—which come to the salon at different times and with different bookings during a working day.

Such information is included in the data perspective (cf. Client-Data element in Fig. 6). Through the data perspective, it is also modeled that activity Reception cannot start before the client appointment time. Moreover, a data constraint is used (in conjunction with the choice constraint) to ensure that all the services the client has booked are selected in the generated enactment plans.Footnote 15

5 Empirical evaluation

This section provides an empirical study for the proposed approach. Specifically, the purpose of this study is the evaluation of the whole approach in terms of its suitability to provide predictions. In this section, the case study protocol for the software engineering field proposed by [5] is followed to improve the rigor and validity of the proposed study. Such protocol suggests the following sections: background, design, case selection, case study procedure and data collection, analysis and interpretation and validity evaluation.

Background Taking the purpose of the proposed study into account, a main research question (MQ1) is defined (cf. Table 1). Specifically, MQ1 assesses whether the current approach can be useful to provide predictions by satisfying the aforementioned requirements [33]. For this, MQ1 is divided into two additional questions: (1) AQ1 checks whether the predictions which are obtained by our approach are accurate, and (2) AQ2 evaluates the immediacy of our approach for generating accurate predictions.

Table 1 Case study research questions

Design The object of study is the method which is proposed for generating predictions from ConDec-R specifications. For this, a holistic design which considers the overall proposal as a whole is carried out. In this design, the architecture described in Sect. 3 is established for addressing MQ1 (i.e., AQ1 and AQ2). In such respect, the interaction between the user and the declarative PAIS (cf. Fig. 5) is simulated as detailed later in the case study procedure.

This case study is run on a Intel(R) Core (TM) CPU i7-3517U, 1.90 GHz, 10 GB memory, running Windows 7. In this work, we consider the constraint-based system IBM ILOG CPLEX Optimization Studio (CPLEX) [18] for implementing the constraint-based approach detailed in “Appendix A”.Footnote 16 CPLEX provides for efficient search algorithms as well as efficient high-level objects and constraints to deal with temporal constraints, resource allocation, and optimization. This leads to an efficient management of the problems to be solved. After the application of the aforementioned holistic design, the generated information (i.e., the predictions) is analyzed to answer the research questions (cf. Table 1).

The data described in Table 2 are quantified for each ConDec-R model which is considered following the case study procedure.

Table 2 Quantified variables for the holistic design which are obtained by applying the proposed approach with a time limit equal to lt seconds and after the sp% of a reference process enactment is executed with \(lt \in \{1,5,10\}\) and \(sp \in \{0,25,50,75\}\)
Table 3 Generic synthetic models with 10 and 20 activities and a varying number of constraints

Case selection For this case study, the beauty salon problem is studied. We consider this is a good and suitable case since it fulfills the following selection criteria: (1) it has been created for an actual business, (2) the business has grown up and now it has scheduling problems (i.e., involves resource allocation, complex constraints and the optimization some objective function), and (3) it manages several performance measures which can be used to generate predictions (e.g., resource usage, waiting times, profit, completion time, etc.).

In addition, in order to extend the external validity, a set of synthetic models has been created taking some important characteristics into account. First, correctness, i.e., the ConDec-R models must represent feasible problems without any conflict (i.e., there are some traces that satisfy the model). Second, representativeness, i.e., the ConDec-R models must represent problems which are similar to actual BPs with different constraints and sizes. Consequently, we considered models of medium-size (i.e., including 10–20 activities) which comprise all basic types of ConDec-R templates, i.e., existence, relation, and negation. For this, 6 generic test models are considered with 10, 15 and 20 activities, respectively, and a varying number of constraints (cf. Table 3).

Case study procedure and data collection The execution of the study is planned as follows.

First, the business is selected—according to the selection criteria—and it is modeled as a ConDec-R model.

After that, on the one hand, different configurations are generated related to the beauty salon problem. Each configuration specifies the number of clients (NC) which are considered and the average number of booked services (NS). According to the information which is provided by the manager of the salon (i.e., there are normally between 10 and 20 clients per day and a client typically books one or two services), we consider values {1, 1.5, 2} for NS and the values {10, 15, 20} for NC. Based on this information, to average the results, a collection of 30 ConDec-R models is randomly generated for each pair \({<}NC, NS{>}\) by varying the booked services of each client (S) and their appointment times (T). In summary, \(30*3*3=270\) different ConDec-R related to the beauty salon is considered. Figure 8 shows an example of a problem file of the beauty salon generated for 10 clients (i.e., \(NC=10\)) with 2 booked services in average (i.e., \(NS=2\)).Footnote 17 To clarify, Table 4 summarizes the different acronyms which appear above.

On the other hand, for the synthetic problems, Fig. 10 shows the ConDec-R representation of the generic models A10, B10, A15, B15, A20 and B20. There are some activities that are involved in an existence constraint, which means that such activities must be repeated several times. We have considered 15, 30 and 60 repetitions, i.e., \(N\in \{15,30,60\}\). Regarding the number of available resources, in turn, for all the generated test models, two available resources of two kinds of roles (i.e., R1 and R2) are considered. Moreover, random durations and resource requirements are considered for each activity since these aspects have a great influence on the complexity of the search of optimal solutions. This is due to the considered problems are extensions of typical scheduling problems. Specifically, in order to average the results over a collection of randomly generated ConDec-R models, 30 instances are randomly generated for each specific ConDec-R model by varying activity durations between 1 and 10 and role of required resources between R1 and R2. In summary, \(6*3*30=540\) different synthetic ConDec-R models are considered for evaluating predictions. In this case, the objective function and the prediction function are both related to the overall completion time, i.e., the time spent to complete all the instances.

Table 4 Description of the acronyms
Fig. 8
figure 8

Example of a problem file of the empirical evaluation generated for 10 clients (i.e., \(NC=10\)) with 2 booked services in average (i.e., \(NS=2\)). For each client (i.e., C), its appointment time (i.e., T) and its booked services (i.e., S) are stated

Fig. 9
figure 9

Case study procedure

Thereafter, for each beauty salon problem (i.e., same activities, relations and resources but different booked services and appointment times) and for each synthetic problem, we proceed as depicted in Fig. 9:

  1. 1.

    An optimized BP enactment plan is generated by the proposed approach for minimizing the waiting time when establishing the time limit of the solver equal to 5 min. This plan is then selected for being the reference process enactment, i.e., to simulate the behavior of a potential user.Footnote 18 In this step, the \(Ref^{WT}\) is obtained which is the same as \(Min^{WT}\).

  2. 2.

    In order to calculate a tentative maximum value of the waiting time—which is necessary for calculating \(range^{WT}\)—another search is performed for 5 min considering the maximization of the waiting time as objective function and then, \(Max^{WT}\) is obtained.

  3. 3.

    The next steps evaluate different predictions using the proposed approach (cf. Sect. 3). For this, the initial parts of the reference plan are considered as the current state of the instances (cf. Algorithm 2). The reasons for taking initial parts of this best plan are to (1) easily get feasible traces and (2) be able to compare the quality of the enactment plans which are generated in few seconds versus the ones which are generated in 5 min. Specifically, 0, 25, 50 and 75% of the reference plan are considered as already executed. For each percentage, the solver is given different time limits to generate the plans (i.e., the time which exists between the startGeneratingPlans method is invoked in Algorithm 2 and the related prediction is requested by Algorithm 1) which are used by the Prediction Service to provide the predictions. Specifically, 5, 10 and 15 s. are used. Therefore, the values of \(Pred_{tl,sp}\) are obtained for \(tl\in \{5,10,15\}\) and \(sp\in \{0, 25, 50, 75\%\}\).

  4. 4.

    In addition, in order to illustrate the effectiveness of the method in the beauty salon scenario, the values of \(NPred_{sp}\) are obtained by considering the first solution which is obtained by the solver, i.e., a non-optimized solution.

  5. 5.

    After that, the \(Err_{tl,sp}\) and the \(NErr_{sp}\) values are calculated using the above data.

Finally, the analysis and interpretation of the collected data is conducted and the validity of the case study procedure is studied.

Fig. 10
figure 10

Generic synthetic ConDec-R models

The values for the response variables are included in Table 5.

Table 5 Quantified variables for the experiment

Analysis and interpretation The data which are collected are analyzed to answer the research question and to draw conclusions (cf. Tables 5 and 6). In order to address MQ1, sub-questions AQ1 and AQ2 need to be answered (cf. Table 1).

As expected, the ranges [MinMax] are narrower as the complexity of the problem increases since fewer options to allocate the activities exist. Moreover, when the prediction (cf. column Pred) is closer to the reference value (cf. column Min) the average error is lower (cf. column Err) (Fig. 10).

Regarding the accuracy of the solutions, the average of the error increases as the complexity of the problems increases in both the beauty salon problems and the synthetic problems. Specifically, the problems which entail the highest complexity are those related to the configuration \({<}NC=20, NS=2\), \(tl =5s{>}\)—in the beauty salon—and \({<}Model=A20\), \(N=60{>}\)—in the synthetic problems—in which the value for Err is nearly \(50\%\) when 0% of the reference plan is known (i.e., \(sp=0\%\)). Although it is not a good value, this can be explained by the fact that the process enactment has not been started yet, and the time limit for solving a rather complex problem is very low (i.e., only 5 s). Moreover, it is the only case where the obtained prediction is really close to the one obtained by the trial predictor. Nonetheless, as can be observed, for all the configurations where \(sp\ge 25\%\), Err is lower than \(20\%\) when 10 s or more are provided as time limit. Moreover, the average error is lower than \(10\%\) in most cases, which can be considered good solutions overall if it is compared with the trial predictor. A similar behavior is observed with the synthetic problems in which the error is lower and stays below 7% for configuration with \(N=15\) regardless of the model. As can be seen in Fig. 11—which shows the average error which is committed by both predictors grouped by the number of clients—the error of the proposed prediction is considerably lower than the trial one. For this, AQ1 can be answered as true when some information is known about the process enactment, but might be questioned for early predictions.

Fig. 11
figure 11

Average error committed by the predictors grouped by NC

Regarding the immediacy of the predictions (i.e., AQ2), as expected, the accuracy of the predictions increases as the time limit increases. In addition, for most of the problems, the error in the prediction considering \(time = 10\) is lower than 12%, that can be considered a good result. Thereafter, AQ2 can be answered as true since 10 s can be considered a reasonable time limit for the considered scenario.

Table 6 Error values for the synthetic experiment for \(tl=5s\)

Validity evaluation This section evaluates if the results of the proposed case study are valid and not biased. To be more precise, three types of validity are addressed in this section: construct, internal and external.

Firstly, regarding the construct validity, it has to be addressed in how far the measures which have been used are appropriate to address the research questions which have been planned. Three different threats are identified related to the acquisition of the data. The first threat is related to how the problems have been randomly generated in both designs. In these designs, unsolvable problems were not considered in order to evaluate the algorithm better. This is checked considering a simple rule: the generated appointment time of a client plus the time which her booked services consume cannot overpass the closing time of the beauty salon. Due to the parallelism which may exist because of the temporal constraints (i.e., a client can be served by different employees at the same time), this rule leaves out some problems which might be solvable. To mitigate this threat, a more elaborated algorithm can be performed to avoid eliminating problems which may be solvable. Secondly, the complexity of the problems which are generated is controlled only by varying the number of clients and her booked services. Although we consider that the beauty salon is a suitable business due to its complexity, different ways of controlling this complexity can be applied to mitigate this threat, e.g., by changing the type of constraints. The third threat concerns the data collected (cf. Table 2). To the best of our knowledge, the metrics are good enough for addressing AQ1 and AQ2. To mitigate this threat, a consolidated definition of accuracy of a prediction and a way of measuring it could be defined.

Regarding the internal validity, the main threat is that the obtained results related to the immediacy of the proposal could be biased. This is because its interpretation can be subjective since it depends on the business which is analyzed. To mitigate it, other business experts can be consulted in order to state what is a appropriate period for considering a prediction to be instantaneous.

Finally, the external validity considers in how far the obtained results could be generalized to any business. This generalization is threatened by the fact that the beauty salon was the unique real scenario which was studied. Although a set of synthetic models of a range of complexities has been included in the experiment, other real scenarios can be considered to replicate this study in order to mitigate this threat.

6 Discussion and limitations

The Declare language [27] has been extended in several works [19, 25, 26, 44]. In fact, ConDec-R is based on the time extensions defined in [25, 44] where it is possible to define time lags over the different Declare constraints. Furthermore, the data-aware extension which has been proposed in [26] is considered in the current approach. This way, with the proposed declarative language, the considered problems can be modeled in an easy way, since it is based on high-level constraints. Moreover, realistic problems can be managed, e.g., the beauty salon detailed in Sect. 4.

Regarding proposals on time prediction, a probabilistic time-aware workflow system for time prediction is presented in [11]. However, the focus of [11] is more on scheduling, and, unlike the current approach, assumes that the workflow is known beforehand and stable [41]. Moreover, both [11, 12] provide design-time support (i.e., before the enactment time), whereas the proposed approach provides run-time support as well (i.e., support during the enactment of instances). Similarly, [42] proposes a service that predicts the completion time of process instances by using nonparametric regression. In addition, [41] proposes the application of process mining to an event log in order to obtain a transition system. In a related way, [4] presents an approach for predicting process remaining time based on query catalogs. Such catalogs are groups of partial traces (annotated with additional information about each partial trace) that have occurred in an event log, and are then used to estimate the remaining time of new executions of the process. Although the interaction between process instances and the availability of resources constitute important factors for predicting the remaining time until completion, the proposals presented in [4, 37, 41, 42] do not consider the number of instances being executed and the resources available at a specific point of time during process enactment when making this prediction. Furthermore, unlike in our approach, the predictions cannot be adapted to changing circumstances (e.g., a resource became unavailable during the process enactment). Similar to our approach, [34, 37] consider the enactment of multiple instances and resource availabilities when making the prediction. Moreover, in such approaches the predictions can be adapted to changing circumstances. Nevertheless, as opposed to the current approach, [34, 37] do not perform optimization over objective functions and do not start from a declarative model.

However, our approach also presents a few limitations. The predictions are generated by considering estimated values for the number of instances to execute, and hence, our proposal is only appropriate for processes in which this number is known a priori. In a related way, activity attributes and resource availability need to be estimated. As a real example, the beauty salon problem is detailed and an extensive empirical evaluation is carried out with the goal of supporting the contributions of our proposal. Moreover, some of our previous works also dealt with this kind of scenarios (e.g., [1] describes a travel agency problem and [3] considers computer support for clinical guidelines as an application example). Nevertheless, if the actual values of the estimates deviate from the estimated values during the execution of the model, P&S techniques can be applied to replan the activities and to update the predictions at run-time by considering the actual values of the estimates.

In addition, motivated by the requirements of the considered scenarios, the data perspective which is considered in the current approach mainly includes data constraints which can be applied to input data and activity relations. However, more advanced features like dynamic data or data-flow perspective have been left out since they are not part of the design requirements of the considered scenarios and will be addressed in future work when applying our proposal to BPs with different characteristics.

7 Conclusions and future work

In this work, an approach for generating time predictions of running process instances related to a multi-perspective constraint-based process model is proposed. For performing such predictions, we propose generating optimized enactment plans from a multi-perspective constraint-based process model and from the current state of partially executed process instances by considering a given objective function. This approach has several advantages regarding previous related work: (1) multiple process instances as well as the allocation of resources are considered, (2) is able to adapt to changing circumstances, and (3) besides predicting the remaining time of a specific process instance, it allows the prediction of other relevant issues. To evaluate the applicability of our approach in practical settings, we applied it to a real process scenario. Despite the high complexity of the considered problems, results indicate that our approach produces a satisfactory number of good solutions in a reasonable time.

As for future work, we will consider to deal with both paradigms imperative and declarative for the specification of the BPs. Furthermore, it is planned to consider the information from past process executions as additional input data for providing more accurate predictions. Additionally, we intend to apply it to real scenarios from other domains. Finally, further aspects of data perspective (e.g., the data-flow) are planned to be considered in future versions of ConDec-R.