1 Introduction

Reducing the time and cost of projects is important for project managers, and one way to reduce the time is by increasing the resources (costs). This approach is known as the Time–Cost Trade-off (TCT) problem, which has been investigated extensively in the literature (e.g., Cheng et al., 2015; Eshtehardian et al., 2009; Hazir et al., 2011; Tao & Dong, 2018). Time–cost trade-off problem aims to determine the appropriate execution mode for each activity in a way that the project completes within the pre-determined time and budget.

The TCT problem always assumes the projects with a deterministic network structure, where each activity starts whenever all preceding activities have been completed and also there is no reworking. However, there are some actual cases such as research and development projects (Kosugi et al., 2004), marketing (Bellas, 1971), software development (Yu & Zuo, 2008), and many others that have a stochastic structure and feedback loop. In these cases, the two most common project planning methods, the Critical Path Method (CPM) and Program Evaluation and Review Technic (PERT) are not applicable. Because these methods are restricted by numerous assumptions and they are limited in reflection of many real-world complexities.

The Graphical and Evaluation Review Technique (GERT), first introduced by Pritsker (1966), is a method for the graphical presentation of a stochastic network of projects. GERT networks have various types of nodes to show the stochastic precedence relations. A node consists of an input side and an output side and can take one of the multiple possible modes for each side, as is described in Table 1 (Pritsker, 1966). In a GERT network, it is possible to have more than one sink node.

Table 1 Input and output side of a node

Balancing the time and cost in the GERT networks is a challenging problem because of the uncertainty in two dimensions, the realization of activities and duration of them, which is not the case for CPM and PERT. Arisawa and Elmaghraby (1972) were the first to study the time–cost trade-off model in the GERT network. They considered the problem in monitoring and controlling processes, where there is a gap between planned and actual time and cost. They attempted to decrease and control the project completion time by adding the investment and using more resources. Years later, Tao et al. (2022) added two constraints to the previous model and tried to minimize the project completion time using a genetic-based solution methodology. However, these approaches are not proper to estimate the time and resource (cost) requirements at the contract and planning phase of project management. Insufficient and unclear negotiation regarding the desired time and cost may result in serious breaches, often leading to costly contract disputes. So, there is a need to determine the best time-resource set for each activity given that the project completes within the pre-determined time and the resource constraint is satisfied. To resolve this issue, the current study is initiated.

In this study, we deal with the complexity of planning a stochastic project, where there is no information about the occurrence of activities and their consumed time and resource. To this aim, we propose a Time–Cost Trade-off Resource-Constrained Project Scheduling Problem (TCT-RCPSP) in a GERT network. Due to the stochastic structure of the network, the minimum time required to complete the project cannot be evaluated precisely. Hence, the objective of the problem is maximization of probability that the project completes within a pre-determined time, considering the resource limitation. As the probability of each activity occurrence and duration of that activity are independent of the history up to point that the activity realized, two analytical methods based on the Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) are implemented to solve the problems considering discrete and continuous time distributions, respectively.

RCPSP is NP-hard (Rostami et al., 2018), hence the considered TCT-RCPSP is NP-hard. So, solving the problem analytically is an advance. We adopted a set of computational experiments from a well-known data set, Project Scheduling Problem LIBrary (PSPLIB) (Machado-Domínguez et al., 2021), to represent the implementation of our approach on TCT-RCPSP in a GERT network. In order to evaluation of model effectiveness, the obtained results are compared to the those of Genetic Algorithm (GA) and Monte-Carlo simulation (MC Simulation) methods, as two extensively-studied methods in the related literature. (See Table 2 and 3). According to the results, some managerial insights are suggested to balance the available resource and desired horizon.

Table 2 A summary of the studies on the time–cost trade-off problem
Table 3 A summary of studies on GERT networks

The remainder of the study is organized as follows: Sect. 2 briefly reviews the related literature, and Sect. 3 provides background about GERT networks and describes the problem in detail. Section 4 explains the solution methodology, and then Sect. 5 gives several numerical examples from PSPLIB to show the applicability of the proposed methods. Finally, Sect. 6 concludes the study and recommends some areas for future studies.

2 Literature review

2.1 Time–cost trade-off problem

There are numerous studies to summarize the literature on time–cost trade-off problems consisting of researches by Feng et al. (1997), Vanhoucke and Debels (2007), Sonmez and Bettemir (2012), Tavana et al. (2014), Aouam and Vanhoucke (2019), Liu et al. (2020). All mentioned studies considered that the time of activities is deterministic. As this assumption is too restrictive for more real-world situations, some researchers (Azaron and Tavakkoli-Moghaddam (2007), Klerides and Hadjiconstantinou (2010), Said and Haouari (2015)) extend the deterministic problem to be stochastic by considering several probability functions for the duration of each activity. From a methodological standpoint, a hybrid of simulation techniques and the Genetic Algorithm (GA) is widely implemented (Feng et al., 2000; Ke et al., 2009, 2012). Regardless of which extension of the problem, the significant feature of all previous studies is that all activities of the project occur indeed, and their precedence relations are deterministic. However, real-world projects usually have alternative structures. It means that the project process varies upon the result of executing activities. To overcome this shortcoming, S. Tao and Dong (2018) considered an AND-OR network to represent the alternatives in the project structure. They formulated mathematical programming to select the optimal set of activities to be realized. This approach controls the project process, while, there are real cases that the executing process of project is out of control. In other words, the realization of an activity depends on the predecessor activity result. To resolve this issue, we propose a complicated extension of the problem in which each activity's occurrence is stochastic and depends on the outcome of the predecessor activity. We modeled the project using GERT. The main aim is to find the optimal time-resource set for each activity considering the uncertainty in realization of activities, in a way that the probability of project completion within a specific time is maximized. To achieve the characteristics of the GERT network, an analytical method based on the Markov and Semi-Markov formulations is used. According to the literature, GA and MC simulation techniques are two extensively studied techniques in TCT problems. Regarding that, these techniques are applied to evaluate the efficiency of the proposed methods against those provided previously. Table 2 briefly compares the assumptions of this study in contrast to the previous ones.

2.2 GERT networks

For the first time, Pritsker (1966) discussed the GERT network, which is beyond the capability of the classical project representation network. Meanwhile, several researchers (Fix & Neumann, 1979; Neumann, 1985; Raju, 1971; Whitehouse & Pritsker, 1969) extended the variants of the problem and analyzed the characteristics of the network analytically. The advances in the GERT network theory have encouraged the researchers to use this network capability to solve several stochastic problems using analytical techniques (Aytulun & Guneri, 2008; Kerem Aytulun & Ermis, 2009; Nelson et al., 2016; Tao et al., 2017) and simulation methods (Ren & Yuan, 2012; Taylor et al., 1982). Along with this studies, a few researchers investigated the Resource-Constrained Project Scheduling Problem (RCPSP) in a GERT network. Hogg et al. (1975a) provided a simulator called "GERTS QR" and examined its performance by applying it for some examples of a multi-resource constrained queuing system. Then, Hogg et al. (1975b) implemented the GERTS QR on a job-shop system. Later, Phillips and Hogg (1976) developed several GERT simulation algorithms called "GERTS III," "GERTS III-R," "GERTS III-Q," and "GERTS III-QR" applicable for a stochastic network with resource constraints and queuing characteristics. More recently, Golenko-Ginzburg et al. (2003) presented a heuristic simulation algorithm for a resource-constrained project scheduling problem and modeled the problem with renewable resources. The objective of most previous studies in the GERT literature was minimizing the project completion time, while, they neglected the correlation between time and used resources. In most real projects, one way to reduce time is investing additional resources. So, there is a gap in consideration of the time–cost settings for activities in GERT projects. To fill in this gap, the authors are motivated to consider time–cost trade-off settings for all activities of a GERT network. Regardless of developed simulation and heuristic solution methodologies in the literature, the current study proposes an analytical approach based on the MDP and SMDP. The reason is that, although the simulation methods have computational advantages, they are still time-consuming and suffer from the shortcoming that many unnecessary scenarios are investigated in some cases. On the other hand, there is a need to have a baseline to validate the simulation and heuristic algorithms. So, there is a need to provide an analytical solution methodology to determine the optimal time–cost (resource) set for each probabilistic realized activity. Table 3 briefly describes the reviewed studies and compares the novelty of this study in contrast to them.

2.3 The time–cost trade-off in a resource-constrained GERT project scheduling problem

The time–cost trade-off problem in a GERT network has not been well-studied due to the high complexity of the problem. Arisawa and Elmaghraby (1972) were the first to determine the time–cost trade-off problem in a GERT network. They supposed that there is a gap between planned and actual time. Regarding that, they proposed an LP formulation to optimally find the least time of the project given that the available resource is increased. They selected a set of activities and add additional resources to shrink their duration in order to decrease the gap between planned time and actual time. Years later, Tao et al. (2022) solved some shortcomings of study by Arisawa and Elmaghraby (1972), and added two constraints on the probability of on-time delivery and under-budget to ensure successful completion of the project. As an extension to latter, this study considers a time–cost (resource) trade-off settings for all activities and attempts to maximize the probability of project completion within desired time considering the probability of each activity realization. Consequently, an analytical method based on the MDP and SMDP is proposed to determine the optimal execution mode for each realized activity. This approach can be a baseline to evaluate the efficiency of the proposed simulation techniques in the literature.

3 Problem description

In practice, there is a set of projects in which activities occur with a probability of less than one and the structure of project network is stochastic. In other words, occurrence of activities and consequently the realized path cannot be determined in advance. This issue makes a challenge in planning and scheduling of such projects. Considering constraints on the available resource and project completion time adds more complexity to solving the problem. GERT network is an approach to model such stochastic projects. GERT network is a graphical representation of stochastic projects which combines the flow-graph theory and PERT (Pritsker, 1966).

The aim of this study is to plan a stochastic project using GERT network. To this end, it considers an Activity-On-Node (AON) STEOR (Stochastic Exclusive-OR) network to model a stochastic project. In this network, a node activates if and only if one node leading to that is realized, and after completion of that activity, only one of the successor nodes will be realized. Let the network \(V = [{{\mathbb{N}}}, E]\) presents the GERT network, where \({{\mathbb{N}}} = \left\{ {1,2, \ldots ,{{\mathcal{N}}}} \right\}\) refers to the set of all activities, and \(E \subseteq {{\mathbb{N}}}^2\) presents the precedence relations in the network. Node 1 and \({{\mathcal{N}}}\) are the dummy start and sink node, respectively. The dummy nodes consume no time and resource. Two parameters associated with each activity are the probability of activity occurrence \((P_i )\) and the duration of activity \((t_i )\). To preserve the generality of problem, \(t_i\) can follow discrete and continuous statistical distributions. All activities need a non-renewable resource (e.g., budget) to be activated. There is a hard constraint on project time horizon (\(T\)) and entire available resources (\(TR\)), and it is not accepted to exceed these limits. It is supposed that each activity can be executed in various time–cost settings. Each set is called an execution mode and is addressed with two characteristics, namely activity duration \((t_i )\) and resource requirement \(\left( {\pi_i } \right)\), where, time has a negative correlation function with the allocated resource level.

Since the problem is modeled based on the GERT network, two essential parameters must be ascertained: (1) the probability of each destination node realization (i.e., the equivalent probability), (2) the expected time of the project completion (i.e., the equivalent expected time), given that a destination node has occurred. These parameters are called GERT network characteristic. In this study, the equivalent probability of each sink node is computed using the formulations provided by Pritsker (1966), while the w-function for each arc is conditioned on the allocated resource level.

The objective of the proposed TCT-RCPSP is finding the probability of project completion within desired time. To do this, we develop analytical methods including finite horizon MDP model for discrete random variables and SMDP model for continuous random variables in order to determine the optimal resource level for each potentially realized activity. In fact, using these models, we can identify the makespan conditional probability distribution, which is a more granular output, and by which the equivalent expected time may easily be computed.

4 Solution methodology

In a TCT-RCPSP, the decision about allocated resource level to each activity affects the project makespan. Given the assumptions of discussed problem, duration of each activity varies upon the used resource. The classic GERT network formulations are unable to consider the correlation between time and cost/resource. So, these formulations cannot be applied on a TCT problem. This shortcoming of the solution methodology motivated the authors to provide an analytical approach to find the equivalent expected time and equivalent probability in a TCT-RCPSP with GERT network structure. This study applies MDP to formulate discrete-time problems and SMDP to model continuous-time problems and specifies the optimal resource allocation policy in the desired GERT network. The reason for implementing these models is that for any activity (\(i\)), \((P_i ,t_i )\) is independent of the project execution history up to when \(i\) occurs, so the Markov property holds (Fix & Neumann, 1979). These models aim to maximize project completion probability over the pre-determined time (\(T\)) and limited resource level (\(TR\)). Generally, any MDP and SMDP consists of the following components: (1) set of decision epochs (periods), (2) set of system states, (3) set of available actions, (4) set of state and action dependent rewards or costs, (5) set of state and action dependent transition probabilities (For more details, see Martin L. Puterman, (1994)). These components and the related recursive function for the MDP and SMDP of a GERT network are described in the following sub-sections. Each MDP and SMDP model is solved using both Value iteration and Linear Programming (LP) methods. To verify the efficiency of the implemented MDP and SMDP models, GA and MC simulation techniques are applied on the problem. Since these techniques are extensively studied, we will not describe them in details. Instead, the readers are referred to some comprehensive studies as, Mirjalili (2019), Eiben et al. (1994), Rubinstein and Kroese (2007).

4.1 Markov decision process (MDP)

Notations

\({\rm{\mathbb{N}}}\)

Set of all nodes/activities

\({\rm{\mathcal{N}}}\):

Number of nodes/activities of project,\({\rm{\mathcal{N}}} = \left| {\rm{\mathbb{N}}} \right|\)

\(i\)

Index of activities/nodes of network,\(i \in {\rm{\mathbb{N}}} = \left\{ {1,2, \ldots ,{\rm{\mathcal{N}}}} \right\}\)

\(J_i \)

The set of successor nodes of node \(i\)

\(\pi_i \)

The set of resource requirements for activity \(i\)

\(t_i \)

Execution time/duration of activity \(i\)

\(T\)

The pre-determined time of project makespan

\({\rm{\mathbb{T}}}\):

Set of time points in time horizon,\({\rm{\mathbb{T}}} = \left\{ {1,2, \ldots ,T} \right\}\)

\(\tau \)

The time moment that an activity is activated,\(\tau \in {\rm{\mathbb{T}}} = \left\{ {1,2, \ldots ,T} \right\}\)

\(TR\)

The total amount of considered non-renewable resource

\({\rm{\mathbb{R}}}\):

Set of remaining resources,\({\rm{\mathbb{R}}} = \left\{ {0,1, \ldots ,TR} \right\}\)

\(r\)

The amount of remained resource,\(r \in {\rm{\mathbb{R}}} = \left\{ {0,1, \ldots ,TR} \right\}\)

\(\Pr \left( . \right)\)

Probability function of a random variable

\(m\)

Decision epochs or periods of MDP,\(m = 1,2, \ldots ,M\)

\(s_m\):

The state of the system at decision epoch \(m\)

\({\rm{\mathbb{S}}}\)

The set of all states of the system

\({\rm{\mathbb{A}}}\)

The set of all actions

\(a_m \)

The action that is taken at decision epoch \(m\)

\(A_{s_m } \)

The set of all actions that can be chosen for each state at decision epoch \(m\) (\(s_m\)), \(A_{s_m } \in {\rm{\mathbb{A}}}\)

\(R\left( {s_m ,a_m } \right)\)

The reward/ cost achieved at decision epoch \(m\) by taking action \(a_m\) at state \(s_m\)

\(\delta \)

A Markovian policy,\(\delta = \left\{ {\left( {s_1 ,a_1 } \right), \left( {s_2 ,a_2 } \right), \ldots ,\left( {s_M ,a_M } \right)} \right\}\)

\(V_m \left( {s_m ,\delta (s_m )} \right)\)

Value function of state \(s_m\) at decision epoch \(m\) under policy \(\delta\)

\(P\left( {s_m ,s_{m + 1} } \right)\)

Probability of transition from a state at decision epoch \(m\), (\(s_m\)), to another state at decision epoch \(m + 1\), (\(s_{m + 1}\))

\(P_i \)

The occurrence probability of node/activity \(i\)

\(E\left( . \right)\)

Expected value of a random variable

\(V_m^* \left( {s_m } \right)\)

Value function of state \(s_m\) at decision epoch \(m\) under optimal policy

\(a_m^* \)

The optimal action taken at decision epoch \(m\)

\(t_F \)

Project Completion time

  1. 1In this manuscript, j is the same as i.

An MDP is adopted to model the evolution of controllable Markovian system. The value iteration procedure to solve the MDP model is described as follows: In MDP, the decision-maker decides about the system's actions at the specified points in time, called "decision epoch" or "periods." At each decision epoch (\(m\)), the system occupies a state (\(s_m \in {{\mathbb{S}}}\)), which should consist of all the necessary information for decision making. According to the state information, an action (\(a_m \in A_{s_m }\)) is chosen, and the decision-maker accrues a reward/cost that depends on the current state and the selected action (\(R(s_m , a_m )\)). Following this procedure, the system moves to a new state in the next epoch, and the process repeats. Since the actions of the system are chosen based on the states, the transition process of the system is not deterministic. Regarding that, the expected total reward/cost must be computed. The performance of the system is affected by the actions that are chosen at any decision epoch. Hence, to optimize the problem, it is necessary to find an appropriate series of actions, which is called Markovian policy.

A Markovian policy (δ) of a Markov decision process is a function that prescribes the desired action for each state at each decision epoch, \(\delta :{{\mathbb{S}}} \to {{\mathbb{A}}}\). Under a certain policy δ, the value function is defined for each state at decision epoch \(m\) (\(s_m\)) as the total expected reward/cost accrued from the decision epoch \(m\) onward. This value function may be computed using the recursive equation

$$ V_m \left( {s_m ,\delta (s_m )} \right) = R\left( {s_m ,\delta \left( {s_m } \right)} \right) + \sum \limits_{s_{m + 1} \in {{\mathbb{S}}}} P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1} (s_{m + 1} ,\delta (s_{m + 1} )) , $$
(1)

where the first term is the immediate reward/cost associated with \(s_m\) and its designated action according to δ and the second term is the expected value function of the following states (\(s_{m + 1} , \ldots ,s_M )\). Then, the decision-maker attempts to maximize (minimize) the expected total reward (cost) collected at the end of the problem horizon. In the MDP literature, there is a policy called an optimal policy that can achieve this goal. With the desired reward function and the transition probability, the value function of the optimal policy denoted by \(V_m^* (s)\) may be computed using the recursive equation

$$ V_m^* \left( {s_m } \right) = \max_{a_m \in A_{s_m } } \left\{ {R\left( {s_m ,a_m } \right) + \sum \limits_{s_{m + 1} \in {{\mathbb{S}}}} (P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1}^* \left( {s_{m + 1} } \right)} \right\}. $$
(2)

The optimal action for each state at the decision epoch \(m\) is defined as

$$ a_m^* \in \text{argmax} \left\{ {R\left( {s_m ,a_m } \right) + \sum \limits_{s_{m + 1} \in {{\mathbb{S}}}} (P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1}^* \left( {s_{m + 1} } \right)} \right\}. $$
(3)

The computational complexity of each iteration in value iteration method for MDP is O (\(\left| {{\mathbb{A}}} \right| \times \left| {{\mathbb{S}}} \right|^2\)) (Littman et al., 2013). The method is polynomial, if and only if the number of iterations required is polynomial.

For our problem, the MDP model is adjusted as follow: the decision-maker seeks to determine the amount of resource allocated to each activity (\(i\)), on the condition that node \(i\) occurs, in a way that the probability of project completion within \(T\) is maximized. To address this question, the components of MDP are described in the following. First, we define the decision epochs as representing points in time in which we make our decision, and it may graphically be depicted in our network by a set of nodes rather than time points (\(m = 1,2, \ldots , M\)). The nodes in a decision epoch must have joint successors and not be linked with each other, as illustrated in Figs. 1 and 2 for our problem. In case that the nodes of a decision epoch are interrelated, Algorithm 1, presented in the following, modifies the network to hold the mentioned property.

Fig. 1
figure 1

Project network

Fig. 2
figure 2

The modified network of generated example

Algorithm 1


Step 1. Define the set \(F = \{ \textit{the sink nodes}\}\).


Step 2. Define the set \(\rho\) as a combination of predecessors of nodes \(j \in F\).


Step 3. Define the set \(\varphi\) as a combination of predecessors of nodes \(k \in \rho\).


Step 4. If \(\rho \cap^\varphi = \phi ,\forall i \in \rho\), go to step 6. Else define the \(I = \{ \textit{set of common nodes}\}\).


Step 5. Split each node \(i \in I\) into two parts, \(i\) and \(i^*\). Hence, the arc (\(i, j\)) is divided such that the probability of arc (\(i, i^*\)) equals the probability of (\(i, j\)), \(\forall j \in F\), and elapsed time of \(i^*\) is zero. While the probability of arc (\(i^* , j\)), ∀ \(j \in F\), equals one.


Step 6. Let \(F = \rho\).


Step 7. If \(F = \{ 1\}\), stop the algorithm, else go to step 2.

To make an appropriate decision about the allocated resource level to each realized activity, the state of the system represents the information of three elements, including the realized node, time moment that node is activated, and the remaining resource level, respectively. The state at decision epoch \(m\) is represented as \(s_m = \left( {i,\tau ,r} \right), s_m \in {{\mathbb{S}}} = ({{\mathbb{N}}},{{\mathbb{T}}},{{\mathbb{R}}})\), which means that at decision epoch \(m\), node \(i \) is realized at time point \(\tau\) while \(r\) unit of the resource has remained. Remember that due to the STEOR network structure, only one node can be realized at each decision epoch. The decision-maker must choose an action \(a_m \in A_{s_m }\), which specifies the allocated resource level to the node \(i\). Given the assumptions of the problem, the resource requirements of each activity (\(\pi_i = A_{s_m }\)) are known and the allocated resource must be less than the remaining level, \(a_m \le r\). The activity is carried out only if the prerequisite activity is done, and the resource requirement is satisfied. After the completion of activity, one of the successor activities activates and system proceeds to a new state at the next decision epoch \(\left( {s_{m + 1} = \left( {j,\tau + t_i ,r - a_m } \right)} \right)\) with probability \(P_j\) (\(P\left( {s_m ,s_{m + 1} } \right) = P_j\)). When an action is chosen, the decision-maker accrues a reward,

$$ R\left( {s_m ,a_m } \right) = Pr\left( {t_i \le T - \tau {|}s_m ,a_m } \right), $$
(4)

which equals the probability that the activity completes within the remaining time given that the resource constraint holds. As the objective of the problem is probability maximization, the accrued rewards over the process are independent probability functions. Although in literature, it is usual to consider the value function of MDP being additive, in the discussed problem, as the probability is a multiplicative parameter, the expected total reward under policy \(\delta\) is denoted as:

$$ V_1 \left( {s_1 ,\delta (s_1 )} \right) = E_{(s_1 ,\delta )} \left\{ {\mathop \prod \limits_{m = 1}^{M - 1} R\left( {s_m ,\delta \left( {s_m } \right)} \right) \times R(s_M )} \right\}. $$
(5)

Note that, at the last decision epoch, the project completes and no action is taken. By replacing the reward in Eq. (5) with its probability counterpart as defined in Eq. (4), the value function for each decision epoch under optimal policy is defined as

$$ V_m^* \left( {s_m } \right) = max_{a_m \in A_{(s_m )} } \left\{ { \sum \limits_{j \in J_i } Pr\left( {t_i \le T - \tau |s_m ,a_m } \right) \times P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1} (s_{m + 1} )} \right\}, $$
(6)

where set \(J_i\) refers to the successors set of the node (\(i\)). The transition probability matrix (\(P\)) is an uncontrollable parameter. So, it is independent of the actions and the execution time of activities. In order to meet the time limit of the project, all sink nodes must occur before or at time \(T\). Therefore, for each sink node \(V_M (s_M )\) is set to be 1 for \(\tau \in {{\mathbb{T}}} = \left\{ {1,2, \cdots ,T} \right\}\), and 0 otherwise.

The optimal resource level for each state at the decision epoch \(m\) is eventuated from the relation

$$ a_m^* \in \arg \max \left\{ { \sum \limits_{j \in J_i } \Pr \left( {t_i \le T - \tau |s_m ,a_m } \right) \times P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1} (s_{m + 1} )} \right\}. $$
(7)

LP formulation is another approach to solve the MDP. With the given transition probability matrix \(P\) and the execution time statistical distribution matrix \(Pr\), the problem

$$ \begin{gathered} Min Z = \mathop \sum \limits_{m = 1}^M \sum \limits_{i \in {{\mathbb{N}}}} \sum \limits_{\tau \in {{\mathbb{T}}}} \sum \limits_{r \in {{\mathbb{R}}}} V_m (i,\tau ,r), \hfill \\ S.t: \hfill \\ V_m \left( {i,\tau ,r} \right) \ge \sum \limits_{j \in J_i } Pr\left( {t_i \le T - \tau |s_m ,a_m } \right) \times P\left( {s_m ,s_{m + 1} } \right) \times V_{m + 1} (s_{m + 1} ), \;\;\;\;\forall m,i,\tau ,r,a_m \in A_{s_m } \hfill \\ V_m \left( {i,\tau ,r} \right) \ge 0,\;\;\;\;\forall m, i, \tau , r \hfill \\ \end{gathered} $$
(8)

provides the optimal value function for each state at the desired decision epoch (\(V_m (i,\tau ,r)\)).

4.2 Semi-Markov decision process (SMDP)

Notations

 

\({{\mathbb{N}}}\)

Set of all nodes/activities

\({{\mathcal{N}}}\)

Number of nodes/activities of project, \({{\mathcal{N}}} = \left| {{\mathbb{N}}} \right|\)

\(i\)

Index of activities/nodes of network, \(i \in {{\mathbb{N}}} = \left\{ {1,2, \ldots ,{{\mathcal{N}}}} \right\}\)

\(J_i \)

The set of successor nodes of node \(i\)

\(\pi_i \)

The set of resource requirements for activity \(i\)

\(t_i \)

Execution time/duration of activity \(i\)

\(T\)

The pre-determined time of project makespan

\({\mathbb{T}}\)

Set of time points in time horizon, \({{\mathbb{T}}} = \left[ {0,T} \right]\)

\(\tau \)

The time moment that an activity is activated, \(\tau \in {{\mathbb{T}}} = \left[ {0,T} \right]\)

\(TR\)

The total amount of considered non-renewable resource

\({{\mathbb{R}}}\)

Set of remaining resources, \({{\mathbb{R}}} = \left\{ {0,1, \ldots ,TR} \right\}\)

\(r\)

The amount of remained resource, \(r \in {{\mathbb{R}}} = \left\{ {0,1, \ldots ,TR} \right\}\)

\(m\)

Decision epochs or periods of MDP, \(m = 1,2, \ldots ,M\)

\(s_m\)

The state of the system at decision epoch \(m\)

\({{\mathbb{S}}}\)

The set of all states of the system

\(a_m \)

The action that is taken at decision epoch \(m\)

\(A_{s_m } \)

The set of all actions that can be chosen for each state at decision epoch \(m\) (\(s_m\))

\({{\mathbb{A}}}\)

The set of all actions

\(\tau_m^{\prime} \)

The time point that system occupies a new state

\(t_m^{\prime} \)

The duration of remaining at the state

\(Q\left( {t_m^{\prime} ,s_{m + 1} {|}s_m ,a_m } \right)\)

Probability of transition from a state at decision epoch \(m\), (\(s_m )\), to another state at decision epoch \(m + 1\), \(\left( {s_{m + 1} } \right)\), within \(t_m^{\prime}\) time units when action \(a_m\) is taken

\(F\left( {t^{\prime}_m |s_m ,a_m } \right)\)

Sojourn time distribution in the state \(s_m\) when action \(a_m\) is taken

\(P\left( {s_{m + 1} {|}s_m ,a_m } \right)\)

Probability of transition from a state at decision epoch \(m\), (\(s_m )\), to another state at decision epoch \(m + 1\), \(\left( {s_{m + 1} } \right)\), when action \(a_m\) is taken

\(V_m \left( {s_m } \right)\)

Value function of state \(s_m\) at decision epoch \(m\)

\(f\left( {t_i {|}s_m ,a_m } \right)\)

Probability density function of each activity duration at state \(s_m\), given that \(a_m\) is taken

\(V_m^* \left( {s_m } \right)\)

Value function of state \(s_m\) at decision epoch \(m\) under optimal policy

\(a_m^* \)

The optimal action chosen at decision epoch \(m\)

\(t_F \)

Project Completion time

  1. 1In this manuscript, j is the same as i.

In some applications, a dynamic system transfers from the current state to another state at continuous random times. In this case, the time interval for moving from one state to the next is a continuous random variable, and decision epochs are random points in time. For analysis of decision making in this environment, MDP cannot be used. Since in MDP, the time interval between decision epochs is discrete and deterministic, for which the time points of decision making are pre-determined. Instead, for analysis of the environment mentioned above, we may use SMDP, a generalized MDP framework.

In an SMDP, the decision-maker controls the system states continuously over the time horizon. Let us suppose that the system occupies the state \(s_0 \in {{\mathbb{S}}}\) at the initial time \(\tau_0^{\prime} \in R_+\), where \(R_+ = [0,\infty )\). Then, if the decision-maker chooses an action \(a_0 \in A_{s_0 }\), the system transfers to a new state \(s_1 \in {{\mathbb{S}}}\) at or before \(t^{\prime}_0 \) time units with probability \(Q(t_0^{\prime} ,s_1 |s_0 ,a_0 )\). At time \(\tau_1^{\prime} = \tau_0^{\prime} + t_0^{\prime}\) another action \(a_1 \in A_{s_1 }\) is chosen, and the same procedure repeats for the following states. In some applications, \(Q(t_m^{\prime} ,s_{m + 1} |s_m ,a_m )\), which is a joint distribution, may be written as the product of the relevant marginal distributions as described in the following equation:

$$ Q\left( {t_m^{\prime} ,s_{m + 1} {|}s_m , a_m } \right) = F(t_m^{\prime} |s_m ,a_m ) \times P(s_{m + 1} |s_m ,a_m ), $$
(9)

where \(F(t_m^{\prime} |s_m ,a_m )\) denotes the sojourn time distribution in the state \(s_m\) when action \(a_m\) is chosen, and the \(P\left( {s_{m + 1} {|}s_m ,a_m } \right)\) represents the transition probability of moving from state \(s_m\) to state \(s_{m + 1}\) when action \(a_m\) is chosen.

Like the MDP, as a consequence of choosing an action, the decision-maker accrues a reward/cost dependent on the current state and the chosen action, and the expected total of such rewards from the current decision epoch onward is the value function of the current state and the selected action. Therefore, the goal is to optimize the expected total reward and identify the optimal policy. Due to the similarity of relations in MDP and SMDP and for brevity, we refer the readers to the explanations of optimizing the expected reward in MDP models described in Sect. 4.1.

For our GERT problem, the project completes gradually over time in which each activity is completed in random points of the time, and the state of the system changes when an activity completes and the successor activity starts. More specifically, the execution time of each activity (\(t_i\)) follows a continuous distribution function and is conditioned on the level of allocated resource to that activity. Hence, SMDP is the proper approach for analyzing the project makespan. At decision epoch \(m\), the system occupies the state \(s_m = \left( {i,\tau ,r} \right), s_m \in {{\mathbb{S}}} = ({{\mathbb{N}}},{{\mathbb{T}}},{{\mathbb{R}}})\), where \(i\) represents the realized node at the decision epoch \(m\), \(\tau\) refers to the time point of node realization, and \(r\) shows the available resource level. For each state \(s_m \in {{\mathbb{S}}}\), a set of possible actions is available (\(A_{s_m } = \pi_i\)), and it is assumed that sets \({{\mathbb{S}}}\) and \(A_{s_m }\) are finite. As a consequence of choosing the action \(a_m \in A_{s_m }\), the state of the system transits from \(s_m = \left( {i,\tau ,r} \right)\) to the state \(s_{m + 1} = (j, \tau + t_i , r - a_m )\) within \(t_i\) time units with the joint probability:

$$ \begin{aligned} Q\left( {t_i ,s_{m + 1} {|}s_m , a_m } \right) & = P\left( {s_{m + 1} = \left( {j, \tau + t_i , r - a_m } \right),t_i |s_m = \left( {i,\tau ,r} \right),a_m } \right) \\ & = F\left( {t_i {|}s_m ,a_m } \right) \times P\left( {s_{m + 1} {|}s_m ,a_m } \right). \\ \end{aligned} $$
(10)

In this equation, \(F(t_i |s_m ,a_m )\) represents the cumulative distribution function of the realized activity duration given that action \(a_m\) is chosen at the state \(s_m\), and \(P(s_{m + 1} |s_m ,a_m )\) denotes the probability of transition from state \(s_m\) to state \(s_{m + 1}\) given action \(a_m\). As a consequence of choosing action \(a_m\), a reward equal to the probability of activity completion during the remained horizon (\(T - \tau\)) is obtained.

The decision-maker aims to maximize the project completion probability in the desired time limit (\(T\)) subject to the resource limitation. Recall that this study uses a multiplicative form of expected value function rather than the classic additive form. The reason is that the obtained reward equals the probability of activity completion at a specified time, and probability is a multiplicative factor. The recursive function to maximize the expected total reward is defined as

$$ V_m \left( {s_m } \right) = \max_{a_m \in A_{s_m } } \left\{ { \sum \limits_{j \in J_i } \mathop \int \limits_0^{T - \tau } dQ(t_i ,s_{m + 1} |s_m ,a_m ) \times V_{m + 1} (s_{m + 1} )} \right\}. $$
(11)

where \(\mathop \int \limits_0^{T - \tau } dQ(t_i ,s_{m + 1} |s_m ,a_m )\) refers to the joint probability of the activity (\(i\)) completion within time \(t_i , \forall t_i \le T - \tau\), and the transition from the state \(s_m\) to the state \(s_{m + 1}\). \(V_{m + 1} (s_{m + 1} )\) denotes the total expected value obtained from the decision epoch \(m + 1\) onward. In this equation, \(J_i \) is the set of each node's successors. As the network arcs occur with a probability of less than one, all the successor arcs must be taken into account to evaluate the desired node's total reward. By placing the right-hand side of Eq. (10) in Eq. (11), the expected value function for each decision epoch is as follows:

$$ \begin{aligned} \mathop \int \limits_0^\infty dF\left( {t_i {|}s_m ,a_m } \right) & = \mathop \int \limits_0^\infty f\left( {t_i {|}s_m ,a_m } \right)dt, \\ V_m^* \left( {s_m } \right) & = {\text{max}}_{a_m \in A_{(s_m )} } \left\{ { \sum \limits_{j \in J_i } \mathop \int \limits_0^{T - \tau } f(t_i |s_m ,a_m ) \times P(s_{m + 1} |s_m ,a_m ) \times V_{m + 1} (s_{m + 1} )dt} \right\}. \\ \end{aligned} $$
(12)

As stated in MDP, \(V_M \left( {s_M } \right) = 1, \forall \tau \le T,\) for each sink node and equals 0 otherwise. Obviously, \(V_1 (s_1 )\) gives the total probability of project completion over the \(T\) horizon. The optimal resource allocation policy for each state can be derived from the following equation:

$$ a_{(m)}^* \in argmax\left\{ { \sum \limits_{j \in J_i } \mathop \int \limits_0^{T - \tau } f(t_i |s_m ,a_m ) \times P(s_{m + 1} |s_m ,a_m ) \times V_{m + 1} (s_{m + 1} )dt} \right\}. $$
(13)

The equivalent LP formulation for the developed SMDP model is represented in (14) (Martin L. Puterman, 1994). With the given transition probability matrix \(P\) and the execution time probability distribution (\(f(x)\)), problem

$$ \begin{gathered} Min Z = \sum \limits_{m \in [0,T]} \sum \limits_{i \in {{\mathbb{N}}}} \sum \limits_{\tau \in {{\mathbb{T}}}} \sum \limits_{r \in {{\mathbb{R}}}} V_m (i,\tau ,r), \hfill \\ S.t: \hfill \\ V_m \left( {i,\tau ,r} \right) \ge \sum \limits_{j \in J_i } \mathop \int \limits_0^{T - \tau } f\left( {t_i {|}s_m ,a_m } \right) \times P\left( {s_{m + 1} {|}s_m ,a_m } \right) \times V_{m + 1} \left( {s_{m + 1} } \right)dt,\;\;\;\; \forall m,i,\tau ,r,a_m \in A_{s_m } \hfill \\ V_m \left( {i,\tau ,r} \right) \ge 0\;\;\;\;\forall m, i,\tau ,r \hfill \\ \end{gathered} $$
(14)

determines the optimal value function for each state (\(V_m \left( {i,\tau ,r} \right)\)). Since in SMDP, \(t_i\) is a continuous random variable and interval \(\left[ {0,T} \right] \) includes infinite time points to make decisions, this interval must be divided into short time intervals and considered as a discrete variable.

5 Numerical examples

This section illustrates the numerical experiments to validate the MDP and SMDP models using proposed value iteration method and LP formulation.

5.1 Computational setup

The LP formulations of MDP and SMDP models are coded in GAMS ver. 24.1.2 and the others are coded in MATLAB R2019 (b). The numerical instances are implemented on a laptop with a specification of Intel Core i5-3210M (2.50 GHz) CPU and internal memory 6 GB on Windows 10 ×64.

5.2 Examples group setups

To the best of our knowledge, there is no benchmark in the related literature, and we generated two completely random GERT networks, which are modeled by MDP and SMDP. These models are solved using value iteration and also LP programming. Besides, we used seven example groups of different problem scales and each group consists of 5 test instances. The scale of a problem is the number of activities. All these example groups are adapted from MRCPSP instances provided in the PSPLIB library, a widely known data set. Since the instances of PSPLIB are deterministic, we only adopt the network structure and randomly generate the other parameters.

5.3 Discrete-time GERT network

First, we generated a random STEOR network with AON representation and supposed that the project must be completed within the time limit \(T\), and only \(TR\) units of a non-renewable resource are available. Let us define \(V = \{ {{\mathbb{N}}}, E\}\), where \({{\mathbb{N}}} = \{ 1,2, \ldots ,12\}\) represent the activities of the project, and \(E = \{ (i,j)|i,j \in {{\mathbb{N}}}\}\) depicts the stochastic relations among the activities. Nodes {1,12} are dummy nodes that denote the source and sink nodes, respectively. Figure 1 illustrates the generated network schematically, in which the values on nodes represent the probability of each node occurrence (\(P_i\)). In real cases, these values can be estimated based on historical data. In this example, we generated the \(P_i\) randomly following uniform distribution function between interval \(\left[ {0,1} \right]\), and the summation of emanating arcs probability must be equal one. The duration of each activity (\(t_i\)) follows a general discrete distribution and is conditioned on the allocated resource level (\(\Pr_i \left( {\pi_i , t_i } \right):\left( {R^+ \times R^+ } \right) \to [0,1]\)), where \(\pi_i\) represent the resource level of activity \(i\) and \(t_i\) is duration of that activity.

It is considered that there are two different modes to execute each activity, and solving the problem to find the optimal resource allocation policy leads to the determination of the optimal execution mode. Table 4 shows correlation between resource level and probability distribution function of \(t_i\).

Table 4 The distribution function of \({t}_{i}\)

Note that the dummy nodes need no resource and time. The different resource levels that can be used to activate the activity are shown in column \(\pi_i\). The column \({ }Pr_i (\pi_i , t_i )\) represents the probability that the activity \((i)\) lasts \(t_i\) time units while consumes \(\pi_i\) units of the resource. All these values are generated randomly to adapt the network with GERT structure.

As discussed in previous sections, we attempt to find an optimal solution for a TCT-RCPSP in GERT structure using MDP model. For a GERT network, two essential parameters must be computed. We implement the formulations provided by Pritsker (1966) to determine the equivalent probability of the sink node. Considering the correlation between time and resource, the w-function of each node is conditioned on the allocated resource level.

$$ \begin{aligned} w_j &= P_j \times M_{t_j } \left( u \right) = P_j \times \sum \limits_{t_j } e^{t_j u} Pr(\pi_j , t_j ) \\ w_{E_{{\mathcal{N}}} } \left( u \right) &= \frac{{\sum_k {\text{Path}} (k)\left[ {1 + \sum_m \left( { - 1} \right)^m ({\text{loops of order }}m{\text{ not touching Path }}k)} \right]}}{{\left[ {1 + \sum_m \left( { - 1} \right)^m ({\text{loops of order }}m)} \right]}} \\ & = \frac{{\begin{array}{*{20}c} {\left( {w_1 w_2 w_5 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_7 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_{11} w_{12} } \right)} \\ { + \left( {w_1 w_3 w_6 w_9 w_{12} } \right) + \left( {w_1 w_3 w_6 w_{10} w_{12} } \right)} \\ \end{array} }}{1}, \\ \end{aligned} $$
(15)

where Path (\(i\)) is a sequence of arcs from the source node to the desired sink node, and a loop of order m refers to the loop product of m disjoint loops (\(m = 1,2, \ldots\)). As the probability of arc occurrence is independent of execution time, setting the variable s to zero yields each sink node realization's probability. So, \(P_E = w_E \left( 0 \right)\). Due to the uncertainty in the structure of the GERT network, the project execution path is stochastic, but the sink node occurs with probability 1. This result is proven as

$$ \begin{aligned} w_{E_{12} } \left( u \right) & = \frac{{\begin{array}{*{20}c} {\left( {w_1 w_2 w_5 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_7 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_{11} w_{12} } \right)} \\ { + \left( {w_1 w_3 w_6 w_9 w_{12} } \right) + \left( {w_1 w_3 w_6 w_{10} w_{12} } \right)} \\ \end{array} }}{1}, \\ P_E \left( {12} \right) & = w_{E_{12} } \left( u \right)|_{u = 0} = \frac{1}{4} + \frac{3}{16} + \frac{3}{16} + \frac{1}{8} + \frac{1}{4} = 1. \\ \end{aligned} $$
(16)

According to the similarity of the duration distribution per execution mode, considering one mode to compute the moment generating function is sufficient. The contribution of this paper is to determine the probability distribution function of makespan. The decision-maker determines the optimal policy for resource allocation subject to the column \(\pi_i\). Given the allocated resource level to each realized activity, the makespan distribution can be computed. To this aim, the MDP model is adapted as below:

In this discrete-time example, as stated in Sect. 4.1, the decision epoch is a set of nodes rather than time points. In other words, it is similar to applying dynamic programming to the longest path problem (See Puterman, 1994). The nodes in a decision epoch set must have joint successors and not be linked, but this property is not hold for nodes of generated network. So, there is a need to modify the nodes of network using Algorithm 1. Table 5 illustrates the steps of the Algorithm 1 to modify the network.

Table 5 Steps of Algorithm 1

Figure 2 demonstrates the modified form of the generated network. In the example of Fig. 1, nodes {1,3,4} need to be separated to form the decision epochs as depicted in Fig. 2. In this regard, three artificial nodes {1*, 3*, 4*} are added to the network. If each of the nodes {1*, 3*, 4*} occurs, the successor activity will happen certainly. The duration of artificial nodes {1*, 3*, 4*} equals zero.

The set of actions can be chosen at each state (\(A_{s_m }\)), are equivalent to the different resource levels that the realized activity at that state can use (\(\pi_i\)). The stochastic transition of the system from state \(s_m = (i,\tau ,r)\) to state \(s_{m + 1} = (j,\tau + t_i ,r - a_m )\) equals to the occurrence probability of node \(j\). These transition probabilities are generated randomly,

$$ P\left( {s_{m + 1} = \left( {j,\tau + t_i ,r - a_m } \right){|}s_m = \left( {i,\tau ,r} \right),a_m } \right) = P_j . $$

By considering the resource limitation, the implementation of Eqs. (6) and (7) computes the optimal value function and the optimal policy. Tables 6, 7, 8, 9 show the obtained results for the optimal value function of each state \(s_m = (i,\tau ,r)\) at decision epoch \(m\), given that \(T = 16\) and \(TR = 10\).

Table 6 Value function for \(m=5\)
Table 7 Value function for \(m=4\)
Table 8 Value function for \(m=3\)
Table 9 Value function for \(m = 2\)

Note that the values of the \(\tau , r\) represent the potential values for the realization time of each node and the remaining resource, respectively. These values are calculated based on the all paths and considering the different modes of execution and domain of duration for each activity on path. For example, node \(i = 11\), happens on one path \(\left( {\left\{ {1,3,4,4^* ,11,12} \right\}} \right)\). Duration of activities 1 and \(4^*\) is zero. However, activity 3 can take {2,3,4} time units. In addition, activity 4 may elapse {6,7,8} time units. Hence, activity 11 can start at {8,9,10,11,12}.

According to the hard constraint on the project completion time, it is supposed that \(V_6 \left( {i,\tau ,r} \right) = 1, \forall i \in {{\mathbb{N}}},{\rm{ }}\tau \le T, r \le TR\) and \(V_6 \left( {i,\tau ,r} \right) = 0\) otherwise. The equivalent probability of the project completion within \(T\) conditioned on the best policy is \(V_1 \left( {1,0,10} \right) = 0.99\). Resource management is a crucial aspect of project management that significantly impacts the project execution's success. The sensitivity analysis of the project makespan distribution with respect to the available resource (\(TR\)) provides a useful insight for the associated managers to estimate the appropriate delivery time. Figure 3 depicts the sensitivity analysis of the project makespan distribution to \(\pm 20\%\) \(TR\) for the generated sample project.

Fig. 3
figure 3

The sensitivity analysis of the project makespan distribution with respect to change in \(TR\)

As more resources are available, the confidence in project completion within \(T\) increases, while fewer resources lead to a smaller probability to complete the project at a specific time. This result helps the associated manager decide whether it is cost-effective to increase resources given the amount of time saved.

5.4 Comparison of MDP model with GA and Monte-Carlo simulation

The efficiency of the provided MDP structure will be validated by comparing with GA and Monte-Carlo simulation, which are the extensively- studied approaches in related literature. The parameters of the GA are tuned using Taguchi experimental design technique. Table 10 indicates the settings of GA parameters for each example group. In this study a classic GA is implemented, in which the initial solution is generated randomly and improved through 1-point crossover and swap mutation operators.

Table 10 Parameter settings of GA

Table 11 compares the details of the computational experiments of all methods for instances adapted from PSPLIB. This table depicts the probability of project completion at or before deadline (\(Pr(t_F \le T)\)) given the available resource limitation (\(TR\)). For each instance, the horizon limitation is depicted in column \(T\). In Table 11, \({{\mathcal{N}}}\) refers to the number of network nodes, while \(AN\) represents the number of nodes in modified network (algorithm 1 adds some artificial nodes to the original network). To solve the MDP model, value iteration and LP methods are applied as analytical methods. The value of \(V_1 \left( {1,0,TR} \right)\) for each \(T\) is demonstrates in Table 11 and is equivalent to the probability that the project completes within time interval \([0,T]\). The columns \(CT\) report the computational time for each method. Considering the time complexity of MDP, it increases as the number of states increase. Comparison of value iteration and LP formulation results with those obtained by GA, indicates that the results of analytical solutions are an upper-bound for GA results. GA as a metaheuristic method finds near optimal solutions. As the computation time of GA depends on the number of initial population and running iterations of the algorithm, it is not fair to compare with others.

Table 11 The distribution function of equivalent time parameter (\(t_F\))

On the other hand, the same conclusion holds for MC simulation results. According to the results in Table 11, the values obtained through value iteration and LP formulation are an upper-bound for MC simulation. The Convergency of the MC simulation to the MDP for different number of simulations is illustrated in Fig. 4. In this Figure, the J302 example class is considered. The mean square error (MSE) is an index to demonstrate the deviation of MC simulation results from the analytical approach (MDP) results. These values are computed based on Eq. (17) and are reported in Fig. 4. The low value of MSEs confirms the efficiency of proposed MDP methodology.

$$ {\text{MSE}} = \sum \limits_{{\text{NS}}} \frac{{({\text{objective}}\;{\text{ value}}\;{\text{ of}}\;{\text{ value }}\;{\text{iteration}} - {\text{objective}}\;{\text{ value }}\;{\text{of }}\;{\text{MC }}\;{\text{simulation}})^2 }}{{{\text{Number}}\;{\text{ of}}\;{\text{ Simulations}}}}, $$
(17)

where \({\text{NS}}\) is the number of simulations.

Fig. 4
figure 4

Convergence of Mont Carlo simulation results to those of MDP

As the state of the problem is 3-dimensional and depends on the number of nodes, time horizon and total resource, the increment in instance scale leads to rise of running time. Although the time of value iteration increases for \({{\mathcal{N}}} \ge 16\), computational time of LP formulation is significantly less than the Monte-Carlo simulation. On the other hand, MDP can be a baseline to evaluate the validation of other simulation methods which will be provided in the literature.

To study a wide range of examples, the MDP is applied on samples with \({{\mathcal{N}}} \ge 50\). By adding one node to the network, some artificial nodes are added to the network, the number of variables may increase dramatically. In this case, the GAMS solver is inefficient in solving the LP model. So, the report of these examples is disregarded.

The obtained results about the maximum probability that the project completes within \(T\) and the sensitivity analysis of the available resource provide the contractor with a forecast about the most likely time of project completion. These results give an insight to associated managers to negotiate about the time and required resources and prevent the breach of contract.

5.5 Continuous-time GERT network

In this subsection, we converted the randomly generated network in SubSect. 5.3 to a network in which the duration of each activity follows a continuous distribution and is conditioned on the allocated resource level. The precedence relations among activities are similar to those of the network provided in Fig. 1. The values on each node present the occurrence probability of that node (\(P_j\)). The duration of the activity (\(t_i\)) follows the distribution given in Table 12, and there are two different resource levels to execute each activity represented by \(\pi_i\). Recall that, for any dummy nodes, there is no need to use resource and time. In the generated example, we consider that duration of each activity follows an exponential distribution, while, due to the generality of the proposed solution method, it is possible to consider other distributions as well.

Table 12 The distribution function of \(t_i\)

Although the relations among activities are stochastic, the sink node realizes with probability one. This result can be proven as

$$ \begin{aligned} w_{E_{12} } \left( u \right) & = \frac{{\sum_k {\text{Path}} (k)\left[ {1 + \sum_m \left( { - 1} \right)^m ({\text{loops of order m not touching Path }}k)} \right]}}{{\left[ {1 + \sum_m \left( { - 1} \right)^m ({\text{loops of order m}})} \right]}} \\ & = \frac{{\begin{array}{*{20}c} {\left( {w_1 w_2 w_5 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_7 w_8 w_{12} } \right) + \left( {w_1 w_3 w_4 w_{11} w_{12} } \right)} \\ { + \left( {w_1 w_3 w_6 w_9 w_{12} } \right) + \left( {w_1 w_3 w_6 w_{10} w_{12} } \right)} \\ \end{array} }}{1}, \\ w_j & = P_j \times M_{t_j } \left( u \right) = P_j \times \mathop \int \limits_0^\infty e^{t_j u} f(t_j )dt_j \;\;\;\; \forall (i,j) \in Path(k) \\ P_E \left( {12} \right) & = w_{E_{12} } \left( u \right)|_{u = 0} = \frac{1}{4} + \frac{3}{16} + \frac{3}{16} + \frac{1}{8} + \frac{1}{4} = 1. \\ \end{aligned} $$
(17)

The challenge is to determine the project makespan probability distribution function subject to the limitation of the available resource. To overcome this challenge, we adopted the SMDP model, and the reason is that, due to the continuity of time distribution function, the interval between making decisions on the allocation of the resource for starting the next activity is uncertain. For this situation, as stated in Sect. 4.2, the decision epoch is when an activity ends and the successor activity starts. In SMDP, the system transits from state \(s_m = (i,\tau ,r)\) to the state \(s_{m + 1} = (j, \tau + t_i ,r - a_m )\) within the time interval \(t_i \) with joint probability of \(Q\left( {t_i ,s_{m + 1} {|}s_m ,a_m } \right) = (1 - e^{ - {\uplambda }t_i } |s_m , a_m ) \times P_j\).

Due to the correlation of time and resource, the decision-maker seeks to maximize the probability of project completion within time limit \(T\) subject to available resource \(TR\).

As \(\tau\) is a continuous variable, the start time of activities varies in the interval [0, \(T\)], and hence we should evaluate the value function in an uncountable infinite number of points. So, in order to solve the problem in a reasonable time using Eq. (12), we discretized the \(\tau\) by dividing it to short time intervals and investigated a discrete-time approximation of each activity's duration.

Figure 5 depicts the obtained results for \(T\) = 16 and \(TR\) = 10. As the project must be completed at or before time \(T\), we let the \(V_M \left( {i,\tau ,r} \right) = 1, \forall i \in {{\mathcal{N}}},{\rm{ }}\tau \le T, r \le TR\) and \(V_M \left( {i,\tau ,r} \right) = 0\) otherwise, which is represented by the diagram for \(i = 12\). In Fig. 5, each diagram illustrates the probability of any node completion given that the activity starts at time point \(\tau\), which is represented by the horizontal axis, subject to the remaining potential resource.

Fig. 5
figure 5

The value function for each state in the continuous-time GERT network

Figure 6 depicts the sensitivity analysis for our randomly generated instance with respect to the resource variation. \(TR = 12\) results in the probability of one for each \(T = [11,20]\) while \(TR = 8\) leads to small probabilities for each \(T\). This implies that there is a need to consume more budget and prepare more resources to execute the project with more confidence at the desired time.

Fig. 6
figure 6

The sensitivity analysis of the project makespan distribution with respect to the changes in \(TR\)

5.6 Comparison of SMDP with GA and Monte-Carlo simulation

Similar to the discrete-time GERT network, some additional examples are adapted from the PSPLIB, and they are analyzed by generating random values for the required parameters. The efficiency of the SMDP model is represented in comparison of the obtained results with those of GA and MC simulation techniques. The parameters of the classic GA are tuned based on Taguchi experimental design approach. The parameter settings for GA are illustrated in Table 13. This study implements 1-point crossover and swap mutation operators.

Table 13 The parameter settings for GA

The SMDP model is solved using both value iteration and LP formulation methods. Table 14 presents the details about the probability of the project completion time within time horizon T(\(Pr (t_F \le T)\)), which is equal to \(V_1 (1,0,TR)\), for both value iteration and LP solution approaches. Similar to the Table 11, as the computation time of GA depends on its parameters, it is not considered in the Table 14. According to the obtained results, the values of GA and simulation methods converge to those of SMDP solution methodology and this validates the efficiency of the approach in solving problem. The convergency of MC simulation results to those of SMDP for J302 example class is depicted in Fig. 7. In some points, the value function of MC simulation is higher than SMDP. Mostly, this issue happens for short time horizons. The reason is that in solving SMDP, the time is discretized and under this assumption some time points are not considered. To overcome this issue, can shorten the discretization interval. However, in this study a fixed unit is considered to discretize the time interval in order to have a common baseline for comparison. For the discussed example class, MSEs are computed using Eq. (17) and are reported in Fig. 7. The low value of MSEs presents the efficiency of SMDP in contrast to the MC simulation as a most studied solution methodology in the literature.

Table 14 The distribution function of equivalent time parameter (\(t_F\))
Fig. 7
figure 7

The convergency of MC simulation results to those of SMDP

Time complexity of SMDP follows \(O\left( {\left| {{\mathbb{A}}} \right| \times \left| {{\mathbb{S}}} \right|^2 } \right)\), and depends on the number of states and actions. In contrast to the MDP, in this problem the computation time of value iteration method increases smoothly. The reason is that in SMDP model there is no artificial node and the scale of the problems increments easily.

Regarding to this, for \({{\mathcal{N}}} \le 32\) the computational time of value iteration is less than the Monte-Carlo simulation.

6 Conclusion

This study discusses a time–cost trade-off project scheduling problem in which it is not necessary to start an activity only when all preceding activities have been completed. The GERT network is applied to represent the project’s activities and their precedence relations schematically. We seek to find an optimal policy to allocate a constrained non-renewable resource to activities considering the finite time horizon. As a general study, it is considered that duration of each activity can follow discrete or continuous distribution function. Due to the correlation of time and resource (cost), the equivalent time of the project is conditioned on the realization of sink node. Given that it is possible for a GERT network to have more than one sink node, we applied the flow-graph topological equation to specify the probability of each sink node realization.

According to the independence of the activity’s duration, Markov property holds for considered GERT network. So, we adopted Markov and Semi-Markov decision processes to determine the conditional probability distribution function of the project makespan based on which the expected time can be specified. We implemented the value iteration and LP formulation approaches to solve the finite-horizon MDP and SMDP models. The value iteration method for both models is explained on two randomly generated examples in details. Besides, seven example groups each with five instances are adopted from a well-known data set, PSPLIB, to numerically validate the efficiency of the proposed models in contrast to the two extensively-studied methods, GA and Monte-Carlo simulation algorithms. The convergence of the GA and simulation algorithms to the results of MDP and SMDP represent the efficiency of the proposed models in solving a time–cost trade-off problem in a resource-constrained GERT project scheduling problem. As time complexity of MDP and SMDP depend on the number of states, as problem scale gets larger, the computational time increments. But this reason does not violate the efficiency of the proposed methods. Because, these models can be a baseline as an analytical approach to evaluate the heuristic and simulation algorithms.

Conducting a sensitivity analysis of the project makespan distribution with respect to the available resource reveals the reliability of project completion within the pre-determined time. This outcome assists the associated managers to analyze the project requirements from an economic point of view. In contracts, the estimation of equivalent time and required resource are essential factors and need a professional negotiation to prevent the contract breach.

As a future direction, we direct the researchers to consider different non-renewable resources and the potential correlation between the resources.