Keywords

1 Introduction

We consider the problem of selecting services and transfers in a synchromodal network, to transport freights from their origin to their destination, while minimizing costs over a multi-period horizon. In a synchromodal setting, all freights are booked “mode-free”, meaning that there are no restrictions for selecting a transportation mode or deciding the number transfers among the intermodal terminals. The network operator is able to decide over all services in the network even if they are not its own. However, the flexibility in selecting services and transfers is encumbered by various time restrictions, such as service schedules and freight time-windows, and by the variability in the arrival of freights over time. In this paper, we study how these challenges can be tackled, heuristically, in order to solve this stochastic and finite horizon optimization problem.

In synchromodal planning, it is possible to change the transportation plan to bring a freight from its origin to its destination, at any point in time. Even though the planner might have a complete plan at a given moment, only the first part of such a plan is implemented. The next decision moment, the planner has the flexibility to change the original plan if necessary. Consequently, there are three types of decisions each day: (i) transport a freight to its final destination, (ii) transport a freight to an intermediate terminal, and (iii) postpone the transport of a freight. All types of decisions incur some form of costs. The first and the second type incur direct costs, which are costs realized by the services required for the transportation of a freight. The third type has direct costs only in case of holding costs. Since the problem is to minimize costs over a multi-period horizon, the second and third type also incur future costs, which are costs that are incurred on a posterior moment within the horizon. Future costs depend on the new arriving freights and their characteristics (which are uncertain) and the known transportation mode characteristics (e.g., schedules, capacity, etc.). The optimal balance between direct and future costs guarantees the best performance over the horizon. However, anticipating future costs is challenging.

Fig. 1.
figure 1

Time evolution and planning example of service and transfer selection

Decisions are influenced by two types of time restrictions. The first type corresponds to the durations and schedules of services and transfers. As an example, consider Fig. 1, which shows a possible plan spanning 5 days using both train and barge. In this example, barges have a duration of 2 days, and the train between Terminals 3 and 4 departs on even days. The second type corresponds to the time-window of freights, which limit the feasible transportation services and transfers, and thus the feasible decisions. In addition to the time restrictions, the variability in the number of freights that arrive each day and their characteristics (i.e., origin, destination, time-window), also influence the decisions. Although freights and their characteristics are unknown beforehand, there is probabilistic information about their arrival. Every day, the planner must consider all these characteristics and select which freights use the services available that day, balancing direct and future costs.

The objective of this paper is twofold: (i) to design a model and look-ahead solution method that capture all problem characteristics and their effect on the planning objective, and (ii) to explore the use of look-ahead decision methods under several settings. We model the decision problem and the evolution of the network using a Markov Decision Process (MDP) model. With this model, the optimal trade-off between the three types of decisions, over time and under uncertain demand, can be obtained. However, solving MDP models become unmanageable as problem instances grow larger. To overcome this, we use Approximate Dynamic Programming (ADP), a framework that uses parts from the MDP model and iteratively estimates future costs. ADP combines simulation, optimization, and statistical techniques to solve an MDP heuristically.

The remainder of this paper is organized as follows. In Sect. 2, we briefly mention the relevant literature and specify our contribution to it. In Sect. 3, we introduce the MDP model. In Sect. 4, we explain our ADP solution approach. In Sect. 5, we test various designs within the ADP algorithm, and provide a comparison with benchmark heuristics. Finally, we close in Sect. 6 with conclusions and insights for further research.

2 Literature Review

In this section, we focus our attention on the literature about planning problems in dynamic and flexible intermodal transportation networks. It is our goal to provide an overview of the advantages and limitations of related work, i.e. possible solution methods. Extensive literature reviews about this area and thorough explanation of modeling and solution approaches can be found in [2, 14].

Synchromodal planning is the proactive organization and control of intermodal transportation services based on the latest information available [14]. In such a planning paradigm, decision methods must balance the demand with all available services and intermodal transfers each time new information becomes known [13]. Although research about synchromodal planning methods is on its infancy, several studies show how existing methods for intermodal transport planning can be extended to such problem settings [16] and how significant gains can be achieved in practice [9, 16].

In intermodal transport planning, Dynamic Service Network Design (DSND) problems are the closest to the synchromodal planning problems. DSND involves the selection of transportation services and modes for freights, where at least one feature of the network varies over time [14]. Due to the time-space nature of DSND problems, graph theory and mathematical programming approaches are commonly used in this area. However, these approaches have computational limitations for large and complex time-evolving problem instances [15], which are characteristics common to synchromodality [13]. To overcome these limitations, additional designs, such as decomposition algorithms [5], receding horizons [6], and model predictive control [10], are necessary. These additional designs are less suitable for including probabilistic information in the decisions, which may explain why most DSND studies assume deterministic demand [14] even though the need to incorporate stochastic demand has been recognized [7].

To incorporate stochasticity in DSND approaches, techniques such as scenario generation [3, 7], two-stage stochastic programming [1, 8], and Approximate Dynamic Programming (ADP) [4, 11] have been used. Although these approaches perform better than their deterministic counterpart, they have limitations when considering synchromodal planning. In the scenario generation technique, plans do not change as new information becomes available. In the two-stage stochastic programming approach, explicit probabilistic constraints and high computational requirements limit their applicability to large instances. In ADP, a proper design and validation of the approximation algorithm is crucial and challenging. Nevertheless, ADP allows generic modeling of complex, time-revealing, stochastic networks and a fast response time for updating plans.

To summarize, DSND research provides a useful base for synchromodal planning. Considering all challenges and opportunities mentioned before, we believe that our contribution to the literature about stochastic DSND problems and synchromodal planning has three key points. First, we design an MDP model and solution method based on ADP that capture all problem characteristics and their effect on the planning objective. Second, we explore the use of such a look-ahead approach, under different problem settings, and provide design and validation insights. Third, we compare the ADP approach against an advanced sampling procedure and specify further research directions based on the insights.

3 Optimization Model

In this section, we present our optimization model. Following the DSND convention, we begin presenting the network parameters using a directed graph. Then, we present the MDP model for our stochastic planning problem.

3.1 Input Parameters

We define a directed graph \(G_t=(\mathcal {N}_t,\mathcal {A}_t)\), where \(t\in \mathcal {T}=\left\{ 0,1,2,\ldots ,T^\text {max}-1\right\} \) represents the finite planning horizon (i.e., \(T^\text {max}\) decision periods), \(\mathcal {N}_t\) represents the set of all nodes at time t, and \(\mathcal {A}_t\) represents the set of all directed arcs at time t. In the remainder of the paper, we refer to a time period t as a day, although it is important to note that time can be discretized in any arbitrary interval. Also in the remainder of the model description, all notation and formulations indexed by t correspond to that day. Nodes \(\mathcal {N}_t\) represent physical locations where freight can begin or end transportation, i.e., origins, intermodal terminals, and destinations. We denote the set of origin nodes as \(\mathcal {N}^\text {O}_t\), the set of destination nodes as \(\mathcal {N}^\text {D}_t\), and the set of intermodal terminal nodes as \(\mathcal {N}^\text {I}_t\). These three sets are mutually exclusive, make up the set of all nodes, and are all indexed with ij and d. Note that this separation of the node sets applies to the model, but not necessarily to the problem instance. For example, an intermodal terminal which receives arriving freights will be modeled both as an origin and terminal node, with all services between these two nodes properly adjusted. Arcs \(\mathcal {A}_t\) represent all transportation services in the network. Similar to the node classification, we classify the arcs into three types. The set of arcs between an origin and an intermodal node is denoted as \(\mathcal {A}^\text {O}_t=\left\{ (i,j)|i\in \mathcal {N}^\text {O}_t\text { and } j\in \mathcal {N}^\text {I}_t\right\} \). The set of arcs between two intermodal terminal nodes is denoted as \(\mathcal {A}^\text {I}_t=\left\{ (i,j)|i,j\in \mathcal {N}^\text {I}_t\right\} \). The set of arcs between an origin or an intermodal node, and a destination, is denoted as \(\mathcal {A}^\text {D}_t=\left\{ (i,d)|i\in \mathcal {N}^\text {O}_t\cup \mathcal {N}^\text {I}_t\text { and } d\in \mathcal {N}^\text {D}_t\right\} \).

We make three modeling assumptions with respect to the services between different types of locations. First, we assume that services beginning at an origin, i.e., \(\mathcal {A}^\text {O}_t\), as well as services ending in a destination, i.e., \(\mathcal {A}^\text {D}_t\), are available every day and are realized by truck. This assumption corresponds to the usual pre- and end-haulage operations in an synchromodal network. Second, we assume that services between two intermodal terminals, i.e., \(\mathcal {A}^\text {I}_t\), are done by high-capacity modes and never by truck. Although this is a simplification of the network, trucks between intermodal terminals are rarely used. If the problem instance requires it, a truck service between two intermodal terminals can be modeled using “dummy” nodes for the respective terminals, with other arcs properly adjusted. Third, we assume there is at most one service between two intermodal terminal nodes. Just as before, multiple services between two intermodal terminals can be modeled using more than one pair of nodes representing those terminals. Note that the services between two intermodal terminals are not necessarily the same every day to represent the schedules for the high-capacity modes.

Services in the network have their starting and ending location modeled as nodes within \(G_t\). For the service between two intermodal terminals \((i,j)\in \mathcal {A}^\text {I}_t\), there is a maximum capacity \(Q_{i,j,t}\) measured in number of freights. For all services involving an origin or a destination, we assume that there is an unlimited number of trucks. All services \((i,j)\in \mathcal {A}_t\) have a service duration of \(L^A_{i,j,t}\) days, which lasts at least one day. All transfer and handling operations at each location \(i\in \mathcal {N}_t\) have a duration of \(L^N_{i,t}\) days. To measure the total time required for the service between two locations, we define the parameter \(M_{i,j,t}=L^N_{i,t}+L^A_{i,j,t}+L^N_{j,t}\). We assume that traveling directly to a destination by truck is always faster than going through an intermodal terminal, i.e., \(L^A_{i,d,t}< \min _{j\in \mathcal {N}^\text {I}_t}{\left\{ M_{i,j,t}+L^A_{j,d,t}\right\} },\forall (i,d)\in \mathcal {A}^\text {D}_t\). This assumption works in a similar way as the triangle inequality in routing problems. All relevant costs from a service \((i,j)\in \mathcal {A}_t\) are captured in the cost function \(C_{i,j,t}\). This means that, although pre- and end-haulage decisions, as well as freight handling decisions, are outside the scope of the planner, their costs can be captured with the function \(C_{i,j,t}\).

Each day t, freights with different attributes become known to the planner. These freights are characterized by an origin \(i\in \mathcal {N}^\text {O}_t\), a destination \(d\in \mathcal {N}^\text {D}_t\), a release day \(r\in \mathcal {R}_t=\left\{ 0,1,2,\ldots ,R^\text {max}_t\right\} \), and a time-window length \(k\in \mathcal {K}_t=\left\{ 0,1,2,\ldots ,K^\text {max}_t\right\} \), where \(R^\text {max}_t\) and \(K^\text {max}_t\) are the maximum release day and time-window length, respectively, that a freight can have. Note that the absolute due-day is k days after r. Even though new freights and their characteristics are only known until they arrive, there is probabilistic knowledge about their arrival. In between two consecutive days \(t-1\) and t, a total of \(f\in \mathbb {N}\) freights arrive into the system with probability \(p^\text {F}_{f,t}\). A freight that arrives has origin \(i\in \mathcal {N}^\text {O}_t\) with probability \(p^\text {O}_{i,t}\), destination \(d\in \mathcal {N}^\text {D}_t\) with probability \(p^\text {D}_{d,t}\), release-day \(r\in \mathcal {R}_t\) with probability \(p^\text {R}_{r,t}\), and time-window length \(k\in \mathcal {K}_t\) with probability \(p^\text {K}_{k,t}\).

3.2 MDP Model

In this section, we transform the problem horizon, constraints, and objective into the building blocks of an MDP: stages, state, decision, transition, and optimality equations. The stages of the MDP are defined by \(t\in \mathcal {T}\). The state \(S_t\) consists of all freights in the network and their characteristics. To model these freights, we introduce the variable \(F_{i,d,r,k,t}\in \mathbb {Z}^{+}\) that represents the number of freights at location \(i\in \mathcal {N}^\text {O}_t\cup \mathcal {N}^\text {I}_t\), that have destination \(d\in \mathcal {N}^\text {D}_t\), release day \(r\in \mathcal {R}'_t\), and time-window length \(k\in \mathcal {K}_t\); and define the state \(S_t\) as seen in (1). The state space is denoted as \(\mathcal {S}\), i.e., \(S_t\in \mathcal {S}\).

$$\begin{aligned} S_t=\left[ F_{i,d,r,k,t}\right] _{\forall i\in \mathcal {N}^\text {O}_t\cup \mathcal {N}^\text {I}_t,d\in \mathcal {N}^\text {D}_t,r\in \mathcal {R}'_t,k\in \mathcal {K}_t} \end{aligned}$$
(1)

Note that we use a new set \(\mathcal {R}'_t\) for the release days. The release day definition at origin nodes remains the same. The release day at an intermodal terminal, however, is now used to represent the days “left” for a freight to arrive at that node. For example, if a released freight is sent to an intermodal terminal j on a barge whose total service duration is four days, this freight will appear the day after it was sent, as a freight with \(r=3\) at location j. This new set, which is defined as \(\mathcal {R}'_t=\left\{ 0,1,2,\ldots ,\max {\left\{ R^\text {max}_t,\max _{(i,j)\in \mathcal {A}^\text {I}_t}{M_{i,j,t}}\right\} }\right\} \), allows us to model multi-day durations of services without the need of remembering decisions from more than one day ago, i.e., to be more computationally efficient. Note that, in case no total service duration is larger than \(R^\text {max}_t\), then \(\mathcal {R}_t=\mathcal {R}'_t\). Time-window lengths k still model the number of days after the release-day r, within which the freight has to be at its final destination. We will elaborate more on the evolution of the network over time later on in this section.

At each stage, the planner must decide how many released freights to transport and to postpone, for all locations. Remind that, in a synchromodal network, only the first part of the plan to transport a freight to its destination is implemented at each decision moment. Consequently, at every stage, the decision to transport a freight can be either to send it directly to its final destination, or to send it to an intermodal terminal. To model this decision, we introduce the variable \(x_{i,j,d,k,t}\in \mathbb {Z}^{+}\), which represents the number of freights having destination \(d\in \mathcal {N}^\text {D}_t\) and time-window length \(k\in \mathcal {K}_t\) that are transported from location i to location j using service \((i,j)\in \mathcal {A}_t\). Thus, the decision vector \(x_t\) consists of all transported freights in the network, as seen in (2a).

$$\begin{aligned} x_t=\left[ x_{i,j,d,k,t}\right] _{\forall (i,j)\in \mathcal {A}_t,d\in \mathcal {N}^\text {D}_t,k\in \mathcal {K}_t} \\ \text {s.t.}\nonumber \end{aligned}$$
(2a)
$$\begin{aligned} \sum _{j\in \mathcal {N}^\text {I}_t\cup \left\{ d\right\} }{x_{i,j,d,k,t}}\le F_{i,d,0,k,t}, \quad \forall i\in \mathcal {N}^\text {O}_t\cup \mathcal {N}^\text {I}_t,d\in \mathcal {N}^\text {D}_t,k\in \mathcal {K}_t \end{aligned}$$
(2b)
$$\begin{aligned} x_{i,d,d,L^A_{i,d,t},t}\ge F_{i,d,0,L^A_{i,d,t},t}, \quad \forall (i,d)\in \mathcal {A}^\text {D}_t,k\in \mathcal {K}_t \end{aligned}$$
(2c)
$$\begin{aligned} x_{i,j,d,k,t}=0, \quad \forall (i,j)\in \mathcal {A}_t,d\in \mathcal {N}^\text {D}_t,k\in \mathcal {K}_t|k<M_{i,j,t}+M_{j,d,t} \end{aligned}$$
(2d)
$$\begin{aligned} \sum _{d\in \mathcal {N}^\text {D}_t}{\sum _{k\in \mathcal {K}_t}{x_{i,j,d,k,t}}}\le Q_{i,j,t}, \quad \forall (i,j)\in \mathcal {A}^\text {I}_t \end{aligned}$$
(2e)

The decision \(x_t\) depends on the state \(S_t\) and the feasible decision space \(\mathcal {X}_t\), which has four constraints. First, the number of freights transported from one location to all other locations cannot exceed the number of released freights available at the start location, as seen in (2b). Second, released freights whose time-window length is as long as the duration of direct transport (i.e., trucking) must be transported using this service, as seen in (2c). Third, freights whose time-window length is smaller than the duration of the shortest path between an intermodal terminal and their destination cannot be transported via that terminal, as seen in (2d). Fourth, transport between two intermodal terminals cannot exceed the capacity of the long-haul vehicle, as seen in (2e).

After making a decision \(x_{t-1}\), but before entering the state \(S_{t}\), new freights become known to the planner. We represent new freights with origin \(i\in \mathcal {N}^\text {O}_t\), destination \(d\in \mathcal {N}^\text {D}_t\), release day \(r\in \mathcal {R}_t\), and time-window length \(k\in \mathcal {K}_t\), by \(\widetilde{F}_{i,d,r,k,t}\). We denote the vector of all new freights that arrive between stages \(t-1\) and t by \(W_{t}\), as seen in (3). This vector represents the exogenous information (i.e., new random freights) that became known between stages \(t-1\) and t.

$$\begin{aligned} W_{t}=\left[ \widetilde{F}_{i,d,r,k,t}\right] _{\forall i\in \mathcal {N}^\text {O}_t,d\in \mathcal {N}^\text {D}_t,r\in \mathcal {R}_t,k\in \mathcal {K}_t} \end{aligned}$$
(3)

The evolution of the network over time is influenced by decisions, exogenous information, and various time relations. We represent this evolution by using a transition function \(S^M\), as seen in (4a). The general idea of \(S^M\) is to define the freights at \(S_t\) using only the previous-stage decision \(x_{t-1}\) and the exogenous information \(W_t\). Although decisions can span more than one day (i.e., when the duration of a service is longer than a day), we use freight release days (i.e., new set \(\mathcal {R}'_t\)) and time-window lengths to avoid remembering a decision for more than one stage. When freights are not transported, they remain at the same location and their release days and time-window lengths decrease. However, when freights are transported from a given location i to an intermodal terminal j, they are modeled as freights whose release day increases and their time-window length decreases in line with the total duration of transport \(M_{i,j,t}\). To model all these relations, \(S^M\) classifies freight variables \(F_{t,i,d,r,k}\) into seven categories, as shown in (4b)–(4h). To exemplify in detail the workings of these categories, consider (4c). These constraints apply to released freights at an intermodal terminal i with destination d and time-window length k. These freights are the result of three types of freights: (i) released freights in the same terminal, from the previous stage, that had the same destination, that had one additional day in the time-window, and that were not transported to any other node (i.e., \(F_{t-1,i,d,0,k+1}-\sum _{j\in \mathcal {A}_t}{x_{t-1,i,j,d,k+1}}\)); (ii) freights in the same node, from the previous stage, that had the same destination, that had a release-day of one, and that had the same time-window length (i.e., \(F_{t-1,i,d,1,k}\)); and (iii) freights that arrived from other locations to i, that have the same destination, whose total duration of transportation was one period, and whose time-window length was \(k+M_{j,i,t}\) at the moment of the decision \(x_{t-1}\) (i.e., \(\sum _{j\in \mathcal {A}_t|M_{j,i,t}=1}{x_{t-1,j,i,d,k+M_{j,i,t}}}\)). All other constraints work in a similar fashion.

$$\begin{aligned} S_t=S^M\left( S_{t-1},x_{t-1},W_t\right) \nonumber \\ \text {s.t.}\end{aligned}$$
(4a)
$$\begin{aligned} F_{t,i,d,0,k}=F_{t-1,i,d,0,k+1}-\sum _{j\in \mathcal {A}_t}{x_{t-1,i,j,d,k+1}}+F_{t-1,i,d,1,k}+\widetilde{F}_{t,i,d,0,k}, \nonumber \\ \forall i\in \mathcal {N}^\text {O}_t,d\in \mathcal {N}^\text {D}_t,k+1\in {\mathcal {K}_t}\end{aligned}$$
(4b)
$$\begin{aligned} F_{t,i,d,0,k}=F_{t-1,i,d,0,k+1}-\sum _{j\in \mathcal {A}_t}{x_{t-1,i,j,d,k+1}}+F_{t-1,i,d,1,k}\nonumber \\ +\sum _{j\in \mathcal {A}_t|M_{j,i,t}=1}{x_{t-1,j,i,d,k+M_{j,i,t}}}, \nonumber \\ \forall i\in \mathcal {N}^\text {I}_t,d\in \mathcal {N}^\text {D}_t,k+1\in \mathcal {K}_t\end{aligned}$$
(4c)
$$\begin{aligned} F_{t,i,d,0,K^\text {max}_t}=F_{t-1,i,d,1,K^\text {max}_t}+\widetilde{F}_{t,i,d,0,K^\text {max}_t}, \nonumber \\ \forall i\in \mathcal {N}^\text {O}_t,d\in \mathcal {N}^\text {D}_t \end{aligned}$$
(4d)
$$\begin{aligned} F_{t,i,d,r,k}=F_{t-1,i,d,r+1,k}+\widetilde{F}_{t,i,d,r,k}, \nonumber \\ \forall i\in \mathcal {N}^\text {O}_t,d\in \mathcal {N}^\text {D}_t,r+1\in \mathcal {R}_t|r\ge 1,k\in \mathcal {K}_t \end{aligned}$$
(4e)
$$\begin{aligned} F_{t,i,d,r,k}=F_{t-1,i,d,r+1,k}+\sum _{j\in \mathcal {A}_t|M_{j,i,t}=r+1}{x_{t-1,j,i,d,k+M_{j,i,t}}}, \nonumber \\ \forall i\in \mathcal {N}^\text {I}_t,d\in \mathcal {N}^\text {D}_t,r+1\in \mathcal {R}'_t|r\ge 1,k\in \mathcal {K}_t \end{aligned}$$
(4f)
$$\begin{aligned} F_{t,i,d,|\mathcal {R}'_t|,k}=\sum _{j\in \mathcal {A}_t|M_{j,i,t}=|\mathcal {R}'_t|+1}{x_{t-1,j,i,d,k+M_{j,i,t}}}\nonumber \\ \forall i\in \mathcal {N}^\text {I}_t,d\in \mathcal {N}^\text {D}_t,k\in \mathcal {K}_t , \end{aligned}$$
(4g)
$$\begin{aligned} F_{t,i,d,R^\text {max}_t,k}=\widetilde{F}_{t,i,d,R^\text {max}_t,k},\nonumber \\ \forall i\in \mathcal {N}^\text {O}_t,d\in \mathcal {N}^\text {D}_t,k\in \mathcal {K}_t \end{aligned}$$
(4h)

The goal is to minimize the total costs over a multi-period horizon, considering all possible states that can occur each day, and considering the stochastic arrival of freight. To do so, we define a policy \(\pi \) as a function that maps each possible state \(S_t\in \mathcal {S}\) to a decision \(x_{t}^{\pi }\in \mathcal {X}_t\). Consequently, the objective is to determine the policy \(\pi \) from the set of all policies \(\varPi \) that minimizes the expected costs over the planning horizon, given an initial state \(S_0\), as seen in (5):

$$\begin{aligned} \min _{\pi \in \varPi }{\left. \mathbb {E}\left[ \sum _{t\in \mathcal {T}}{C_t\left( x_{t}^{\pi }\right) }= \sum _{t\in \mathcal {T}}{\sum _{(i,j)\in \mathcal {A}_t}{\left( C_{i,j,t}\cdot \sum _{d\in \mathcal {N}^\text {D}_t}{\sum _{k\in \mathcal {K}_t}{x^{\pi }_{i,j,d,k,t}}}\right) }}\right| S_0\right] } \end{aligned}$$
(5)

To solve this stochastic and sequential optimization problem, we transform (5) into the Bellman’s equations (6). In these equations, the expected next-stage cost is computed using the value of the next-stage state \(S_{t+1}\) (obtained using \(S^M\)), the decision \(x_{t}^{\pi }\), a realization of the exogenous information \(\omega \in \varOmega _{t+1}\), and the associated probability \(p^{\varOmega _{t+1}}_\omega \). The solution to all recursive equations of (6), e.g., through backward induction, provide the optimal policy for the MDP.

$$\begin{aligned} V_t\left( S_t\right) =\min _{x_{t}^{\pi }\in \mathcal {X}_t}{\left( C_t\left( x_{t}^{\pi }\right) +\sum _{\omega \in \varOmega _{t+1}}{p^{\varOmega _{t+1}}_\omega \cdot V_{t+1}\left( S^M\left( S_t,x_{t}^{\pi },\omega \right) \right) }\right) },\forall S_t\in \mathcal {S} \end{aligned}$$
(6)

However, solving the Bellman equations (6) for large problems is computationally challenging. The state space \(\mathcal {S}\), decision space \(\mathcal {X}_t\), and the realizations of the exogenous information in \(\varOmega _t\) grow larger with an increasing size of the problem instance. Due to these three “curses of dimensionality” [12], our MDP model is solvable only for tiny problem instances. Notwithstanding, the MDP model serves as a base for our ADP approach.

4 Solution Approach

Our solution approach is based on ADP, which is a heuristic solution method for MDP models that uses various constructs and algorithmic strategies. Its main idea is to replace the expected next-stage costs in (6) by a Value Function Approximation (VFA), and to update this function via a simulation of the exogenous information. This update procedure is done iteratively, as shown in Fig. 2, with the end result being the approximated values of the solution to the Bellman’s equations, and thus a policy \(\pi \). Certainly, the choice of (i) VFA, (ii) the update procedure, and (iii) the number of iterations, has an influence on the performance, i.e., solution quality and computational time. In the following paragraphs we describe the choices we make and indicate their expected performance.

Fig. 2.
figure 2

Overview of the ADP algorithm

In our ADP approach, the expectation of future costs in (6) is replaced by an value function approximation \(\overline{V}_t^n(S_t^{x,n})\), where \(S_t^{x,n}\) is the so-called post-decision state, i.e., the state after a decision has been made but before the new exogenous information becomes known. As seen in (7), this construct avoids specifying all realizations of the exogenous information \(\varOmega _{t}\).

$$\begin{aligned} V_{t}^n\left( S_{t}^n\right) =\min _{x_{t}^{\pi }\in \mathcal {X}_t } \left( C_{t}\left( x_{t}^{\pi }\right) + \overline{V}_t^n(S_t^{x,n}) \right) \end{aligned}$$
(7)

To avoid the large state space, the optimality equation in (7) is solved for one state at each stage, starting from the initial state \(S_0\). The transition from one state to the next uses a sample from \(\varOmega _{t+1}\), obtained through a Monte Carlo simulation, and the transition function \(S^M\) defined in (4a). This process is performed for the entire planning horizon, and repeated for N iterations, hence the superscript n in the approximate value function and post-decision state.

The general outline of an ADP algorithm can be found in Fig. 4.7, page 141, of the book of [12]. We now focus on two designs (i.e., variations) we propose for that algorithm. Our first design uses a commonly proposed ADP setup. We use basis functions for \(\overline{V}_t^n(S_t^{x,n})\) and the non-stationary least squares method for updating this function. A basis function \(\phi _a(S^{x,n}_{t})\) is a quantitative characteristic of a given feature a of a post-decision state \(S^{x,n}_{t}\) that describes, to some extent, the value of that post-decision state. Examples of features in our problem are the number of freights for a given destination and the number of freights at a given intermodal terminal. Given a set of features \(\mathcal {A}\), the approximated next-stage costs in (7) are the result of the product between the basis function \(\phi _a(S^{x,n}_{t})\) and the weight \(\theta ^n_{a,t}\) for each feature \(a\in \mathcal {A}\), as seen in (8).

$$\begin{aligned} \overline{V}^{x,n}_{t}\left( S_{t}^{x,n}\right) =\sum _{a\in \mathcal {A}}\theta ^{n}_{a,t}\phi _{a}\left( S^{x,n}_{t}\right) \end{aligned}$$
(8)

The weight \(\theta ^{n}_{a,t}\) depends on the iteration n because it is updated after each iteration, using observed costs, to improve future cost estimates. We use a Non-stationary Least Squares (NLS) method for updating these weights since it gives more emphasis to the recent observation than to the previous one. This emphasis is necessary at early iterations, where initial conditions might bias the approximation and the result of the ADP approach. The weights \(\theta _{a,t}^{n}\), for all \(a \in \mathcal {A}\), are updated each iteration n using the observed error (i.e., difference between the next-stage estimate from the previous iteration \(\overline{V}_{t-1}^{n-1}\left( S_{t-1}^{x,n}\right) \) and the current estimate \(\widehat{v}_t^n\)), the value of all basis functions \(\phi _{a}\left( S_{t}^{x,n}\right) \), the optimization matrix \(H^n\), and the previous weights \(\theta _{a,t}^{n-1}\), as seen in (9). For a comprehensive explanation on the NLS method, we refer to [12].

$$\begin{aligned} \theta _{a,t}^{n}=\theta _{a,t}^{n-1}-H_{n}\phi _{a}\left( S_{t}^{x,n}\right) \left( \overline{V}_{t-1}^{n-1}\left( S_{t-1}^{x,n}\right) -\widehat{v}_t^n\right) \end{aligned}$$
(9)

The first design considers downstream costs only through a one-step estimate. Since estimates can be off, especially in early iterations, it might be beneficial to look ahead more than one step. To do this, our second design builds on the first one and uses two additional constructs. First, we add a valid inequality to the decision space \(\mathcal {X}_t\) as follows. If a direct service for a freight between its origin and its destination is cheaper than going from its origin to a given intermodal terminal and subsequently to its destination, we prevent this freight from going to that intermodal terminal when its time-window length allows only a direct service after the intermodal terminal. Second, we add another estimate to \(\overline{V}_t^n(S_t^{x,n})\), as seen in (10). In this new approximate value function, \(\overline{C}^{n}_{t}\left( S_{t}^{x,n}\right) \) is an estimate of the downstream cost obtained simulating a fixed rule for the remaining of the horizon, under different demand realizations, and \(\alpha \) is a weight to balance the use of basis functions and simulations for \(\overline{V}_t^n(S_t^{x,n})\).

$$\begin{aligned} \overline{V}^{x,n}_{t}\left( S_{t}^{x,n}\right) =\alpha \sum _{a\in \mathcal {A}}\theta ^{n}_{a,t}\phi _{a}\left( S^{x,n}_{t}\right) +\left( 1-\alpha \right) \overline{C}^{n}_{t}\left( S_{t}^{x,n}\right) \end{aligned}$$
(10)

At last, the output of our two ADP designs are the weights \(\theta _{a,t}^{N}\). The resulting policy \(\pi \) maps state \(S_t\in \mathcal {S}\) to decision \(x_{t}^{\pi }\) as seen in (11).

$$\begin{aligned} x_{t}^{\pi }={{\mathrm{arg\,min}}}\left( C_{t}\left( x_{t}^{\pi }\right) + \sum _{a\in \mathcal {A}\left( S_{t}^{x}\right) }\theta ^{N}_{a,t}\phi _{a}\left( S^{x}_{t}\right) \right) \end{aligned}$$
(11)

5 Numerical Experiments

In this section, we explore the value of our ADP designs through a series of numerical experiments. Using three small instances, we compare the costs achieved by our ADP approach against a benchmark policy and an advance sampling procedure. The benchmark policy mimics a planning approach commonly used in practice. The sampling procedure extends the benchmark policy with a methodology commonly considered in the literature. The section is divided as follows. First, we introduce our experimental setup. Second, we show, analyze, and discuss the results of our experiments.

5.1 Experimental Setup

For the three instances, we use a network containing a single origin, three intermodal terminals, and three destinations over a planning horizon of 15 days. Each day, there are three services between the intermodal terminals, with capacities and durations as shown in Fig. 3. The fixed costs of these services are of \(C^F_{1,2}=C^F_{2,3}=100\) and \(C^F_{1,3}=150\). The variable costs range between 36 and 44, and are equal to the Euclidean distance between the terminals in a plane of 100\(\,\times \,\)50 distance units, as shown to scale in Fig. 3. In addition, every day there is a direct service between the origin and the terminals, between the origin and the destinations, and between the terminals and the destinations; and they all have duration of one day. There are no fixed costs for the direct services, and the variable cost range varies between 241 and 927, and are equal to ten times the Euclidean distance between the two locations they connect. The number of freights that arrives each day varies between \(f=\{0,1,...,4\}\), with probability \(p^\text {F}_f\) as shown in Fig. 3. In the three instances, each freight has destination \(d\in \{4,5,6\}\) with probability \(p^\text {D}_d\) as shown in Fig. 3, and is always released (i.e., \(p^\text {R}_0=1\)). Each freight has a time-window length \(k=\{1,2,\ldots ,5\}\) with probability \(p^\text {K}_k\) according to the instances considered. In instances where freights have short time-windows, there are not many feasible options for transportation and almost none for postponement. In instances with large time-windows, the opposite occurs. To test the value of look-ahead decisions, we create instances with different time-window length distributions, as shown in Fig. 3.

Fig. 3.
figure 3

Network characteristics for the test instances

Using these instances, we test four planning methods: a benchmark heuristic, our two ADP designs (named ADP 1 and ADP 2), and an advance sampling procedure. The set of features \(\mathcal {A}\) consists of all state variables and a constant of 1. The weight \(\alpha \) for ADP 2 is defined as \(\alpha =\max {\left\{ 25/\left( 25+n-1\right) \right. }\), \({\left. 0.05\right\} }\) and the sampling method is the same as the advance sampling procedure introduced in the next paragraph. The number of iterations is set to 100 and the NLS parameters used are those recommended by [11]. Although these settings achieved a fast convergence of the ADP algorithm in our tests, the resulting approximate value functions (i.e., policy) is heuristic and not necessarily optimal.

The benchmark heuristic strikes for a balance between using the intermodal services efficiently (consolidate as many freights as possible) and the postponement of freight. It consists of fours steps: (i) define the shortest and second shortest path for each freight to its final destination, without considering fixed costs for services between terminals, (ii) calculate the savings between the shortest and second shortest path and define these as savings of the first intermodal service used in the shortest path, (iii) sort all freights in non-decreasing time-window length, i.e., closest due-day first, and (iv) for each freight in the sorted list, check whether the savings of the first intermodal service of its shortest path are larger than the fixed cost for this service; if so, use this service for the freight, if not, postpone the transport of the freight. Naturally, all capacities, durations, and time-windows must be checked while doing these steps.

The sampling procedure consists of three steps: (i) enumerate all feasible decisions, (ii) for each feasible decision, estimate future costs by sampling, in a Monte Carlo fashion and using common random numbers across the decisions, realizations of the exogenous information for the remainder of the planning horizon, and simulating the use of the benchmark heuristic for making decisions with these samples, and (iii) choose the decision with the lowest sum of direct and estimated future costs. Although heavily computationally intensive (i.e., not applicable to larger instances), this procedure exploits the benefits of looking-ahead in decision making.

The tests are done using ten test states in each instance. To define these states for each instance, we do a simulation of the benchmark heuristic, beginning with an empty state, for a horizon of 15 days. We save the state at the end of the horizon. We replicate this procedure 10,000 times, and choose the ten states that were observed the most. For each of the test states, we simulate each planning method 100 times, using common random numbers across the methods. Note that these 100 simulation replications are different from the 100 iterations of the ADP algorithm. Thus, we test the ADP approach in two phases: (i) learning phase through 100 iterations and (ii) simulation phase of using the resulting policy in (11) for 100 replications.

5.2 Experimental Results and Discussion

First, we analyze Instance \(I_1\). This is the most flexible test instance since all freights have a time-window length of 5 days when they arrive. The results are shown in Table 1. We show the costs for the benchmark heuristic, and the relative savings, as a percentage, of the other planning methods when compared to the benchmark. In addition, we show the number of freights of each test state and the computational time. The computational time (in seconds) is given as the total simulation time for the 100 replications of the 15-day horizon.

Table 1. Results for Instance \(I_1\)

On average, ADP 1 achieves savings of 12.8 %, ADP 2 of 29.2 %, and the sampling procedure of 41.2 % when compared to the benchmark heuristic. All three methods that explicitly look-ahead in their decisions perform better than the benchmark that does so only implicitly. The sampling method performs the best, at a higher computational expense (more than 10 times the computational time of ADP, and 1000 times the one of the benchmark, on average). For large instances, or even small ones where time is discretized into smaller intervals, this method would not be applicable. ADP 2 performs second best, at a higher computational expense during the learning phase than ADP 1 (965 s instead of 116, on average). However, during the actual decision making for the entire planning horizon, both ADP designs have similar computational time (50 and 79 s, on average, respectively, for \(I_1\)). ADP 1 lowest savings indicate that a one-step look-ahead is not sufficient for a good solution. Furthermore, the difference between the two ADP designs suggests that further research that explicitly considers a few stages in advance, such as rolling-horizon procedures within the ADP framework, can improve performance significantly.

The average results across the test states of \(I_2\) and \(I_3\) are shown in Table 2. Note that each instance has its own set of test states, which differs from the other instances. Furthermore, note that \(I_2\) and \(I_3\) have significantly less flexibility than \(I_1\) due to their time-window length, only 40 % and 0.05 % of arriving freights can use any intermodal connection, respectively.

Table 2. Average results for Instance \(I_2\) and \(I_3\)

The larger savings from all look-ahead methods in \(I_1\) and \(I_2\), compared to \(I_3\), indicate that the more flexibility there is, the better it is to look-ahead when making decisions. In \(I_2\), similar results to \(I_1\) are achieved, but with significantly less cost savings. In \(I_3\), the benchmark heuristic performs better than the ADP approach, and the sampling achieves small savings. In most states of \(I_3\), the only feasible option (time-wise) for freights is to use a direct service via truck. In such a setting, decision making methods that focus on current costs, such as the benchmark heuristic, perform well since there are hardly consolidation opportunities to anticipate for. However, a robust ADP design should be able to learn such a policy, as the sampling method seems to do. In a sensitivity analysis (results not shown), we observed that the number of iterations and the NLS parameters had a small impact on the solution quality, compared to the impact of different approximate value functions (e.g., more basis functions, the sampling method, etc.). In a similar way, the different approximate value functions had a significant difference in computational time during the learning phase, but not during the decision-making phase. Further research on adaptations of the approximating function of future costs within the ADP algorithm, such as aggregates, hierarchical functions and state representatives, is necessary.

6 Conclusions

We developed an MDP model and an ADP algorithm for selecting services and transfers for freights in a synchromodal network. With the MDP model, the optimal balance between transporting and postponing freights, in different locations of the network, over time, and under uncertain demand, can be achieved. With the ADP algorithm, the computational burden of the MDP model is reduced while preserving all of its modeling functionalities.

Through numerical experiments, we explored the value of using look-ahead decisions in our planning problem and reflected on the value and the limitations of our ADP designs. We observed that the more time-window flexibility and number of freights there are, the better the look-ahead methods perform. We also observed that the two methods that look-ahead more than one stage performed better than the standard one-step look-ahead ADP approach. Further research about ADP designs that explicitly consider a few stages in advance (e.g., rolling horizon, sampling, approximate policy iteration) and other, possibly non-linear, value function approximations, are relevant for synchromodal planning.