1 Introduction

The aim of the traveling salesman problem with time windows (TSPTW) is to find a minimum cost route visiting a set of customers, where each customer must be visited only once under the hard constraint that the service starts within a given time window. This implies that the salesman has to wait if he arrives before the start of the time window. The problem is NP-hard because it encompasses the traveling salesman problem, and furthermore, Savelsbergh [30] has shown that the problem of finding a feasible solution for TSPTW is NP-complete.

The TSPTW has applications in the job shop scheduling where the setup time of each job depends on the previous job [5], in the single machine discrete lot-sizing and scheduling problem with sequence dependent setup costs [19], and in the approach cluster-first, route-second for the vehicle routing problem with time windows (VRPTW) [24]. In addition, the feasibility problem of the TSPTW becomes a separation subproblem when a branch-and-cut approach is used to solve the VRPTW [23]. The literature on exact algorithms for the TSPTW is extensive and for a thorough review on this topic, see [6].

When the hard time window constraints may be violated then the time windows become soft constraints, as suggested by Sexton and Choi [32] for the single vehicle pickup and delivery problem, and the deviations of starting the service before or after the time windows imply a penalty cost function that is added to the total vehicle traveling time. In real world situations, a soft time window corresponds to a limited increase of the hard time window which, with the magnitude of the penalty cost parameter, reflects the importance of a customer being served closer to the time window.

The relaxation of the hard time window constraints leads to a greater feasible space and, as a consequence, we can have an optimal cost lower than the optimal cost associated with the hard constrained time windows. Let \( p_{i} \left( {s_{i} } \right) \) denote the penalty cost function, where \( s_{i} \) is the service start time outside the time window \( \left[ {e_{i} ,l_{i} } \right] \) of customer i. Several variants of a linear penalty cost \( p_{i} \left( {s_{i} } \right) = \alpha_{i} \left( {e_{i} - s_{i} } \right),\,\;s_{i} < e_{i} \) and \( p_{i} \left( {s_{i} } \right) = \beta_{i} \left( {s_{i} - l_{i} } \right),\;s_{i} > l_{i} \), where \( \alpha_{i} \) and \( \beta_{i} \) are positive real parameters that have been proposed in the literature of the VRPTW, are mentioned in the sequel.

Koskosidis et al. [24] and Liberatore et al. [26] suggest an unlimited starting time service, whereas Balakrishnan [2] proposes a maximum waiting time \( W_{\text{max} } \) and a maximum penalty cost \( P_{\text{max} } \). Chiang and Russel [9] allow a maximum time window enlargement M, i.e., \( \left[ {e_{i} - M,l_{i} + M} \right] \) and a maximum waiting time \( W_{\text{max} } \). In [33], earlier starting \( s_{i} < e_{i} \) is not allowed, but unlimited lateness \( s_{i} > l_{i} \) is permitted. Qureshi et al. [27] suggest a semi soft penalty cost such that the upper limit of the time window is extended to \( l^{\prime}_{i} > l_{i} \), the penalty cost is linear in the interval \( \left[ {l_{i} ,l^{\prime}_{i} } \right] \), and service start must occur in the interval \( \left[ {e_{i} ,l^{\prime}_{i} } \right] \). The limit \( l^{\prime}_{i} \) is defined as the maximum late arrival penalty equivalent to the time and cost of an additional single vehicle route only serving customer i. Bhusiri et al. [8] include the maximum limit of late arrival time in the definition of \( l^{\prime}_{i} \) and propose a lower limit of the soft window \( e^{\prime}_{i} \), expressed as the maximum early arrival penalty equivalent to the cost of an additional single vehicle route only serving customer i. The penalty cost is linear in the intervals \( \left[ {e^{\prime}_{i} ,e_{i} } \right] \) and \( \left[ {l_{i} ,l^{\prime}_{i} } \right] \), and service start must occur in the interval \( \left[ {e_{i} ,l^{\prime}_{i} } \right] \). The authors claim that waiting time to start service incurs extra costs such as labour operating cost, maintenance cost and parking fee, in addition to the loss of opportunity to generate more profits. For this reason, the objective is to minimize the total waiting time as well as the routing cost that includes the penalty cost. This is accomplished by setting the service start time as the arrival time at each customer.

Ibaraki et al. [22] propose a comprehensive approach to the vehicle routing problem with hard and soft time windows for the case in which the problem is decoupled into two subproblems, as suggested by Sexton and Bodin [31], Sexton and Choi [32] and Dumas et al. [15]. The former consists of determining the routes and the latter determines the optimal service start time of each customer in order to minimize the penalized total cost for violating the hard time windows. The penalty function for each customer can be non-convex provided that it is piecewise linear and lower semicontinuous, i.e., \( p_{i} (t) \le { \lim }_{\varepsilon \to 0} \hbox{min} \left\{ {p_{i} (t + \varepsilon ),p_{i} (t - \varepsilon )} \right\} \).

The term flexible time window has been suggested by Taş et al. [34] and consists of an enlargement of the hard time window of customer i by associated fractions \( f_{i}^{e} \) and \( f_{i}^{l} \) such that the flexible time window \( \left[ {e^{\prime}_{i} ,l^{\prime}_{i} } \right] \) has limits expressed by \( e^{\prime}_{i} = e_{i} - f_{i}^{e} \left( {l_{i} - e_{i} } \right) \) and \( l^{\prime}_{i} = l_{i} + f_{i}^{l} \left( {l_{i} - e_{i} } \right) \). The service at customer i cannot start outside the flexible window, but can start in the intervals \( \left[ {e^{\prime}_{i} ,e_{i} } \right] \) and \( \left[ {l_{i} ,l^{\prime}_{i} } \right] \) with linear penalty cost \( p_{i} \left( {s_{i} } \right) = \delta_{\alpha } \left( {e_{i} - s_{i} } \right),\,\;e^{\prime}_{i} \le s_{i} < e_{i} \) and \( p_{i} \left( {s_{i} } \right) = \delta_{\beta } \left( {s_{i} - l_{i} } \right),\;l_{i} < s_{i} \le l^{\prime}_{i} \), where \( \delta_{\alpha } \) and \( \delta_{\beta } \) are positive real parameters (see Fig. 1). Feasible vehicle routes are constructed by a tabu search algorithm and the optimal service start of each customer is determined by a linear programming (LP) formulation in order to minimize the total penalty cost of each route for a given sequence of customers in that route. Vidal et al. [35] stress that most timing problems, for example, vehicle routing and machine scheduling can be formulated by means of a LP formulation, as suggested by the pioneer articles proposed by Sexton and Bodin [31], Sexton and Choi [32], Dumas et al. [15] for vehicle routing, and Fry and Leong [20] for machine scheduling.

Fig. 1
figure 1

Flexible time windows

In this article, we propose extensions of exact and heuristic dynamic programming algorithms for the traveling salesman problem with flexible time windows (TSPFlexTW). The objective is to determine a sequence of customers and their respective service start times so as to minimize the sum of traveling costs with earliness and lateness costs. The sets of states or labels generated by the proposed dynamic programming algorithms are explored according to label-correcting strategies. Regarding exact algorithms, we extend the bi-directional resource-bounded dynamic programming proposed by Li [25], and with reference to heuristic dynamic programming, we extend the algorithm with precedence constraints devised by Balas and Simonetti [4].

The main contributions of this article are threefold:

  1. 1.

    The proposition of label resources, resource extension functions and dominance criteria to be used in label-correcting algorithms for routing and scheduling problems with flexible time window constraints.

  2. 2.

    The design of exact and heuristic dynamic programming algorithms for the TSPFlexTW, which extend solution methods originally developed for the TSPTW.

  3. 3.

    The extensive computational experimentation on a variety of symmetric and asymmetric instances from the literature and the assessment of the benefits gained by flexible time windows compared to the hard time windows.

The remainder of the article is organized as follows. Section 2 introduces the problem description. Sections 3 and 4 describe the extension of exact and heuristic dynamic programming algorithms for the TSPFlexTW. Computational results are reported in Sect. 5, and conclusions are outlined in Sect. 6.

2 Problem description

The TSPFlexTW is defined on a complete graph \( G = \left( {\mathcal{\mathcal{N}},\mathcal{\mathcal{A}}} \right) \) where \( \mathcal{\mathcal{N}} = \{ 0,1, \ldots ,n,n + 1 \} \) is the set of nodes and \( \mathcal{\mathcal{A}} = \left\{ {(i,j):i \in \mathcal{\mathcal{N}}\backslash \{ n + 1\} ,\;j \in \mathcal{\mathcal{N}}\backslash \{ 0\} ,\;i \ne j} \right\} \) represents their connecting arcs. Let \( \mathcal{\mathcal{C}} = \left\{ {1, \ldots ,n} \right\} \) be the set of customers, while nodes \( 0 \) and \( n + 1 \) denote the depot as route starting and destination points, respectively. For each customer \( i \in \mathcal{\mathcal{C}} \), there is a service time \( W_{i} \), a hard time window \( \left[ {e_{i} ,l_{i} } \right] \) and its fractions \( f_{i}^{e} \) and \( f_{i}^{l} \) for the time window enlargement. The time windows \( \left[ {e_{0} ,l_{0} } \right] \) and \( \left[ {e_{n + 1} ,l_{n + 1} } \right] \) at the depot represent the planning horizon. Following Taş et al. [34], each flexible time window \( \left[ {e^{\prime}_{i} ,l^{\prime}_{i} } \right] \) has limits expressed by \( e^{\prime}_{i} = \hbox{max} \left\{ {e_{i} \text{ } - \text{ }f_{i}^{e} \left( {l_{i} - e_{i} } \right),0} \right\} \) and \( l^{\prime}_{i} = l_{i} + f_{i}^{l} \left( {l_{i} - e_{i} } \right) \). Servicing a customer within \( \left[ {e^{\prime}_{i} ,e_{i} } \right] \) is penalized by \( \delta_{\alpha } \) for one unit of earliness, while servicing a customer within \( \left[ {l_{i} ,l^{\prime}_{i} } \right] \) is penalized by \( \delta_{\beta } \) for one unit of lateness. Waiting up to \( e_{i} \) is allowed at no penalty cost, but customers cannot be served after \( l^{\prime}_{i} \). Moreover, each arc \( (i,j) \in \mathcal{\mathcal{A}} \) has an associated traveling cost \( c_{ij} \) and a traveling time \( t_{ij} \). If \( c_{ij} = c_{ji} \) and \( t_{ij} = t_{ji} \) the TSPFlexTW is symmetric, otherwise is asymmetric.

Consider the following variables:

  • \( w_{i} \): service start time at node \( i \in \mathcal{\mathcal{N}} \);

  • \( \alpha_{i} \): service earliness at node \( i \in \mathcal{\mathcal{N}} \);

  • \( \beta_{i} \): service lateness at node \( i \in \mathcal{\mathcal{N}} \);

    $$ x_{ij} = \left\{ {\begin{array}{ll} 1 & {{\text{if node}}\text{ }j \in \mathcal{\mathcal{N}}\text{ }{\text{is visited}}\text{ }{\text{immediately}}\text{ }{\text{after}}\text{ }{\text{visiting node}}\text{ }i \in \mathcal{\mathcal{N}}\text{;}} \\ 0 & {{\text{otherwise}} .} \\ \end{array} } \right.\text{ } $$

In the following, the TSPFlexTW is formulated as a mixed integer programming (MIP) model, a special case of the vehicle routing problem with flexible time windows (VRPFlexTW) proposed by Taş et al. [34].

$$ \hbox{min} \sum\limits_{{i \in \;\mathcal{\mathcal{N}\backslash }\left\{ {n + 1} \right\}}} {\sum\limits_{{j \in \;\mathcal{\mathcal{N}\backslash }\left\{ 0 \right\},}} {\sum\limits_{{(i,j) \in \mathcal{\mathcal{A}}}} {c_{ij} x_{ij} } } } + \sum\limits_{{i \in \;\mathcal{\mathcal{N}}}} {\left( {\delta_{\alpha } \alpha_{i} + \delta_{\beta } \beta_{i} } \right)} $$
(1)
$$ \sum\limits_{{i \in \;\mathcal{\mathcal{N}\backslash }\left\{ {n + 1} \right\}}} {x_{ij} } = 1\; \, j \in \mathcal{\mathcal{N}}\text{ }\mathcal{\backslash }\left\{ 0 \right\},\;j \ne i $$
(2)
$$ \sum\limits_{{j \in \;\mathcal{\mathcal{N}\backslash }\left\{ 0 \right\}}} {x_{ij} } = 1\;i \in \mathcal{\mathcal{N}\backslash }\left\{ {n + 1} \right\},\;i \ne j $$
(3)
$$ w_{i} + W_{i} + t_{ij} - [M_{ij} (1 - x_{ij} )] \le w_{j} \;i \in \mathcal{\mathcal{N}\backslash }\left\{ {n + 1} \right\},\;j \in \mathcal{\mathcal{N}\backslash }\left\{ 0 \right\},\;i \ne j $$
(4)
$$ e^{\prime}_{i} \le w_{i} \le l^{\prime}_{i} \text{ }\;\;i \in \mathcal{\mathcal{N}} $$
(5)
$$ \alpha_{i} \ge e_{i} - w_{i} \;\;i \in \mathcal{\mathcal{N}} $$
(6)
$$ \beta_{i} \ge w_{i} - \text{ }l_{i} \;\;i \in \mathcal{\mathcal{N}} $$
(7)
$$ \left( {w_{i} ,\alpha_{i} ,\beta_{i} } \right) \ge 0,\;x_{ij} \in \left\{ {0,1} \right\}\text{ }i,j \in \mathcal{\mathcal{N} },\;i \ne j $$
(8)

where \( M_{ij} = \hbox{max} \left\{ {l^{\prime}_{i} - e^{\prime}_{j} + t_{ij} + W_{i} ,0} \right\} \) (see [13]). The objective function (1) minimizes the overall cost, which includes traveling cost and service penalty. Constraints (2) and (3) state that every node must be visited exactly once. Constraints (4) guarantee that the service start time at a customer must be greater or equal to the sum of the service start time with the service time and the traveling time of its immediate predecessor. Constraints (5) ensure that the service start times are within the given flexible time windows, while constraints (6) and (7) connect the node service start times with the service earliness and service lateness, respectively. Constraints (8) indicate the domain of the variables.

3 Exact dynamic programming algorithms for the TSPFlexTW

The proposed dynamic programming algorithms are based on the algorithm conceived by Desrochers [12] for the resource constrained shortest path problem and extended to its elementary version by Feillet et al. [16]. Such algorithms include resources to the Ford-Bellman algorithm (see [1] for an introduction to labeling algorithms).

An essential feature of dynamic programming is to structure an optimization problem in stages corresponding to optimization subproblems that are solved sequentially. Associated with each stage are the states that convey the relevant information of the prior history in order to make future decisions [7, 10].

In constrained shortest path problems, stages are associated with the nodes of the graph and each state corresponds to a feasible path from the depot 0 to a node i with characteristics of a particular problem. The states of a node i are represented by labels (R, C, i), where each component of the vector R denotes the consumption of a different resource and C is the cost along the path. Another fundamental component of dynamic programming is a recursive equation or a resource extension function associated with each arc of the graph and each resource. In addition, dominance rules are applied in order to fathom dominated states.

In this section, we extend the bi-directional resource-bounded dynamic programming devised by Li [25] to two variants of dynamic programming algorithms for the TSPFlexTW.

3.1 Bi-directional resource-bounded dynamic programming for the TSPTW

Li [25] suggests the dynamic programming technique with a label-correcting algorithm for the TSPTW that takes advantage of the bi-directional bounded search framework. Let node \( o = 0 \) denote the depot as the route starting point and node \( d = n + 1 \) refer to the depot as the route destination point. In a bi-directional search, labels are extended both forward from node \( o \) to its successors and backward from node \( d \) to its predecessors. Such a label extension may result in a smaller number of non-dominated labels than in mono-directional dynamic programming [29]. The entire route is obtained by joining paths from the forward and backward extensions. Hereafter, we define the components of the aforementioned algorithm.

3.1.1 Label extension

Let \( L_{i}^{FP} = \left( {V,s,i} \right) \) denote a forward label, where \( i \) is the last reached node of the forward path \( FP = \left( {0, \ldots ,i} \right) \), \( s \) represents the earliest time when service can start at node \( i \), and \( V \) is a vector that indicates the node sequence of \( FP = \left( {0, \ldots ,i} \right). \) Let \( V_{i} \) be the position that node i is visited, then if node i is the kth visited node, \( V_{i} = k \). The accumulated traveling cost of \( L_{i}^{FP} = \left( {V,s,i} \right) \) is represented by \( c\left( {L_{i}^{FP} } \right) \). Let \( V^{IF} \) denote the vector associated with the initial forward (IF) label, which is defined as \( \left( {V^{IF} ,0,0} \right) \) with \( V_{0}^{IF} = 1 \), \( V_{h}^{IF} = 0,\text{ }h \in \mathcal{\mathcal{C}} \cup \{ n + 1\} \) and \( c\left( {V^{IF} ,0,0} \right) = 0 \). The forward extension of a label \( \left( {V,s,i} \right) \) to node \( j \) along arc \( (i,j) \) produces the new label \( \left( {V^{\prime},s^{\prime},j} \right) \) with cost \( c\left( {V^{\prime},s^{\prime},j} \right) \) by means of the following resource extension functions: \( V^{\prime}_{h} = V_{h} ,\text{ }{\text{if}}\text{ }h \ne j \), \( V^{\prime}_{h} = V_{i} + 1,\text{ }{\text{if}}\text{ }h = j \), as node j is the successor of node i; the earliest time when service can start, i.e., \( s^{\prime} = \hbox{max} \left\{ {s + W_{i} + t_{ij} ,\text{ }e_{j} } \right\} \), since there is a waiting time if node j is reached before the start of corresponding time window \( e_{j} \); the accumulated travel cost at node j is increased by \( c_{ij} \), i.e., \( c\left( {V^{\prime},s^{\prime},j} \right) = c\left( {V,s,i} \right) + c_{ij} \).

The backward label extension is a mirror of the forward label extension. A backward path is represented by \( BP = \left( {n + 1, \ldots ,i} \right) \) with respective label \( L_{i}^{BP} = \left( {V,s,i} \right) \) and cost \( c\left( {L_{i}^{BP} } \right) \). Let \( T \) be the maximum feasible arrival time at the destination node \( n + 1 \), i.e., \( T = \max_{{h \in \mathcal{\mathcal{C}}}} \left\{ {l_{h} + W_{h} + t_{{h,\text{ }n + 1}} } \right\} \) [29]. The initial label of the backward extension is \( \left( {V^{IB} ,T,n + 1} \right) \) with \( V_{n + 1}^{IB} = 1 \), \( V_{h}^{IB} = 0,\text{ }h \in \mathcal{\mathcal{C}} \cup \{ 0\} \) and \( c\left( {V^{IB} ,T,n + 1} \right) = 0 \). In this case, \( V^{IB} \) denotes the vector associated with the initial backward (IB) label. Consider that label \( \left( {V,s,i} \right) \) is backward extended to \( \left( {V^{\prime},s^{\prime},j} \right) \), where the computation of \( V^{\prime} \) is identical to the forward extension as shown above, while \( s^{\prime} = \hbox{min} \left\{ {s - W_{j} - t_{ji} ,\text{ }l_{j} } \right\} \) since \( s^{\prime} \) is the latest time when service can start at node \( j \), and \( c\left( {V^{\prime},s^{\prime},j} \right) = c\left( {V,s,i} \right) + c_{ji} \).

Righini and Salani [29] suggest that the forward and the backward extensions are allowed only if \( s^{\prime} \le T/2 \) and \( s^{\prime} \ge T/2 \), respectively. A half-way point that minimizes the half-way function\( \left| {s^{\prime}_{FP} - s^{\prime}_{BP} } \right| \) among neighbours of a pair of nodes \( i \) and \( j \) is then chosen to join their associated labels. In this way, each path from node \( o = 0 \) to node \( d = n + 1 \) is generated only once.

Li [25] addresses the case where a forward label at node \( i \) can be directly extended to reach a node \( j \) or otherwise that a backward label at node \( j \) can be extended to reach a node \( i \). The author argues that the conditions \( s^{\prime} \le T/2 \) and \( s^{\prime} \ge T/2 \) do not guarantee the previous extensions, and thus a feasible and possibly optimal path from node \( o = 0 \) to node \( d = n + 1 \) is lost.

Let \( \lambda = \max_{{h \in \text{ }\mathcal{\mathcal{N}}\backslash \left\{ {n + 1} \right\},\ell \in \text{ }\mathcal{\mathcal{N}}\backslash \left\{ 0 \right\},h \ne \ell }} \left\{ {W_{h} + t_{h\ell } } \right\} \) denote the maximum arc traversal time in the graph \( G \). Figure 2 illustrates that a feasible solution is not lost if the forward extension from \( \left( {V,s,i} \right) \) to \( \left( {V^{\prime},s^{\prime},j} \right) \) is allowed only if the earliest time \( s^{\prime} \le T/2 \), and the backward extension from \( \left( {V,s,j} \right) \) to \( \left( {V^{\prime},s^{\prime},i} \right) \), only if the latest time \( s^{\prime} \ge \left( {T/2} \right) - \lambda \) [25]. Note that if condition \( s^{\prime} \ge T/2 \) were applied to the latest time instead of \( s^{\prime} \ge \left( {T/2} \right) - \lambda \), this would disallow the backward extension depicted in Fig. 2b, therefore, the feasible \( o - d \) path would be lost.

Fig. 2
figure 2

Time resource bounding scheme. a Infeasible forward extension, b feasible backward extension

The forward label extension is feasible if (1) node \( j \) has not previously been visited, i.e., \( V_{j} = 0 \), (2) \( s^{\prime} \) is no later than the upper limit of time window \( \left[ {e_{j} ,l_{j} } \right] \), i.e., \( s^{\prime} \le l_{j} \), (3) a look-ahead step, in which the new label \( \left( {V^{\prime},s^{\prime},j} \right) \) can be further propagated to visit all non-visited nodes while satisfying their time windows, and (4) \( s^{\prime} \le T/2 \). The same feasibility conditions (1) and (3) on the forward extension apply to the backward procedure, whereas conditions (2) and (4) are replaced by \( s^{\prime} \ge e_{j} \) and \( s^{\prime} \ge \left( {T/2} \right) - \lambda \), respectively.

3.1.2 Label dominance

Dominance tests are performed during forward and backward label extensions. Define the set \( U = \left\{ {h|V_{h} \ne 0,h \in \mathcal{\mathcal{N}}} \right\} \) as the nodes visited by a label \( \left( {V,s,i} \right) \). Regarding forward dominance, let \( \left( {V^{1} ,s^{1} ,i} \right) \) and \( \left( {V^{2} ,s^{2} ,i} \right) \) be two labels with the same reached node \( i \) and identical sets \( U^{1} = U^{2} \). Label \( \left( {V^{1} ,s^{1} ,i} \right) \) dominates \( \left( {V^{2} ,s^{2} ,i} \right) \) if

$$ c\left( {V^{1} ,s^{1} ,i} \right) \le c\left( {V^{2} ,s^{2} ,i} \right) $$
(9)
$$ s^{1} \le s^{2} $$
(10)

and at least one of these inequalities is strict. With respect to backward dominance, the dominance rule (10) is replaced by

$$ s^{1} \ge s^{2} $$
(11)

If the triangular inequality holds for the arc traveling costs \( c_{ij} \) and times \( t_{ij} \), the condition \( U^{1} = U^{2} \) can be replaced by \( U^{2} \subseteq U^{1} \) for both forward and backward extensions [25].

3.1.3 Label joining

Forward and backward labels are joined to yield TSPTW feasible routes. Let \( L_{i}^{FP} = \left( {V^{F} ,s^{F} ,i} \right) \) be a forward label and \( L_{i}^{BP} = \left( {V^{B} ,s^{B} ,i} \right) \) be a backward label with the same reached node \( i \). Their concatenation \( L_{i}^{FP} \oplus L_{i}^{BP} \) consists of a feasible route if all nodes are visited only once, and if the starting time at the forward label is less than or equal to the starting time of the backward label, i.e., \( s^{F} \le s^{B} \). The cost of the resulting route is \( c\left( {L_{i}^{FP} } \right) + c\left( {L_{i}^{BP} } \right) \).

Figure 3 depicts a pseudo-code of the algorithm. Let \( ForL \), \( BackL \) and \( ActL \) be linked lists that store the forward labels, the backward labels and the active label pool, respectively. The algorithm is started by placing the first forward label and the first backward label into the active label pool \( ActL \) (lines 1–3). Nodes as well as forward and backward labels are then repeatedly treated until \( ActL \) becomes empty (lines 4–19). Herein, \( Extend^{F} \left( {\left( {V,s,i} \right),j} \right) \) (line 8) and \( Dominance^{F} \left( {ForL,\left( {V^{\prime},s^{\prime},j} \right)} \right) \) (line 9) represent the label extension and label dominance procedures in the forward extension. Analogously, \( Extend^{B} \left( {\left( {V,s,i} \right),j} \right) \) (line 13) and \( Dominance^{B} \left( {BackL,\left( {V^{\prime},s^{\prime},j} \right)} \right) \) (line 14) are applied to the backward extension. Finally, for each node \( i \in \mathcal{\mathcal{N}} \) the label joining is performed by the function \( Join\left( i \right) \) (lines 20–22) and the minimum cost route is obtained (line 23).

Fig. 3
figure 3

Bi-directional resource-bounded dynamic programming algorithm

3.2 Bi-directional resource-bounded dynamic programming algorithms for the TSPFlexTW

We suggest two alternative forms of label correcting that includes label representation and the respective procedures for label extension and dominance.

3.2.1 Label-correcting LP

Let the label \( L_{i}^{P} = \left( {V,s,i} \right) \) denote a path both for a forward label and a backward label for the TSPFlexTW. It differs from the TSPTW with respect to the flexible time windows \( \left[ {e^{\prime}_{i} ,l^{\prime}_{i} } \right],\text{ }i \in \mathcal{\mathcal{N}} \) that replace the hard time windows \( \left[ {e_{i} ,l_{i} } \right],\text{ }i \in \mathcal{\mathcal{N}} \) in all forward and backward calculations.

In addition to the accumulated traveling cost of the forward or backward label \( c\left( {L_{i}^{P} } \right) \), there is a penalty cost corresponding to the minimum earliness and lateness cost \( pen\left( {L_{i}^{P} } \right) \) associated with starting service within the flexible time windows but earlier or later than the limits of the hard time windows. Such a minimum cost is obtained by means of the LP model suggested by Taş et al. [34]:

$$ pen\left( {L_{i}^{P} } \right) = \hbox{min} \text{ }\sum\limits_{{j \in \mathcal{\bar{\mathcal{N}}}}} {\left( {\delta_{\alpha } \alpha_{j} + \delta_{\beta } \beta_{j} } \right)} $$
(12)
$$ w_{j} + W_{j} + t_{jh} \le w_{h} \text{ }j,h \in \mathcal{\bar{\mathcal{N}}},\text{ }\left( {j,h} \right) \in \mathcal{\bar{\mathcal{A}}} $$
(13)
$$ e^{\prime}_{j} \le w_{j} \le l^{\prime}_{j} \text{ }j \in \mathcal{\bar{\mathcal{N}}} $$
(14)
$$ \alpha_{j} \ge e_{j} - w_{j} \text{ }j \in \mathcal{\bar{\mathcal{N}}} $$
(15)
$$ \beta_{j} \ge w_{j} - \text{ }l_{j} \text{ }j \in \mathcal{\bar{\mathcal{N}}} $$
(16)
$$ \left( {w_{j} ,\alpha_{j} ,\beta_{j} } \right) \ge 0\text{ }j \in \mathcal{\bar{\mathcal{N}}} $$
(17)

where \( \mathcal{\bar{\mathcal{N}}} \subset \mathcal{\mathcal{N}} \) is the set of nodes visited in the path \( P \) and \( \mathcal{\bar{\mathcal{A}}} \subset \mathcal{\mathcal{A}} \) is the set of traversed arcs of this path. The objective function (12) expresses the minimization of penalty costs along the path \( P \). Constraints (13) describe the compatibility between the service start time of two subsequent nodes in path \( P \). Constraints (14) impose that the service start time at each node in path \( P \) is limited by a flexible time window. Constraints (15) connect the earliness with the service start time, while constraints (16) link the lateness and the service start time. Non-negativity constraints on variables are represented by (17). Before solving this linear program for a forward or a backward label \( L_{i}^{P} = \left( {V,s,i} \right) \), the service start variable \( w_{i} \) regarding the last reached node \( i \) is fixed at the current value of its time resource, i.e., \( w_{i} = s \).

Regarding forward dominance, let \( \left( {V^{1} ,s^{1} ,i} \right) \) and \( \left( {V^{2} ,s^{2} ,i} \right) \) be two labels with the same reached node \( i \). Then the dominance criterion (9) is replaced by

$$ c\left( {V^{1} ,s^{1} ,i} \right) + pen\left( {V^{1} ,s^{1} ,i} \right) \le c\left( {V^{2} ,s^{2} ,i} \right) + pen\left( {V^{2} ,s^{2} ,i} \right) $$
(18)

with the dominance criterion (10). The backward dominance criteria are given by (11) and (18).

3.2.2 Label-correcting EL

Additional resources of accumulated earliness (E) \( early\left( {L_{i}^{P} } \right) \) and accumulated lateness (L) \( late\left( {L_{i}^{P} } \right) \) are added to both forward and backward label \( L_{i}^{P} \). They were inspired by the accumulated waiting time \( wa\left( {L_{i}^{P} } \right) \) and the accumulated delay time \( de\left( {L_{i}^{P} } \right) \) proposed in [8] for the vehicle routing problem with soft time windows constraints. The resulting label is denoted \( L_{i}^{P} = \left( {V,s,i,\;early,\;late} \right) \) with accumulated traveling cost \( c\left( {L_{i}^{P} } \right) \) and penalty cost \( pen\left( {L_{i}^{P} } \right) \). The flexible time windows \( \left[ {e^{\prime}_{i} ,l^{\prime}_{i} } \right],\text{ }i \in \mathcal{\mathcal{N}} \) also replace the hard time windows \( \left[ {e_{i} ,l_{i} } \right],\text{ }i \in \mathcal{\mathcal{N}} \) in all label extensions.

The label extension for three first components of \( \left( {V,s,i,\;early,\;late} \right) \) follows that of TSPTW presented in Sect. 3.1.1. Let \( s\left( {L_{j}^{P} } \right) \) denote the service start time at node \( j \in \mathcal{\mathcal{N}} \) and assume the extension of \( L_{i}^{P} \) along arc \( (i,j) \in \mathcal{\mathcal{A}} \) towards node \( j \in \mathcal{\mathcal{N}} \). The accumulated earliness \( early\left( {L_{i}^{P} } \right) \) and lateness \( late\left( {L_{i}^{P} } \right) \) are extended along the forward and backward paths in the following way:

$$ early\left( {L_{j}^{P} } \right) = \left\{ {\begin{array}{ll} {early\left( {L_{i}^{P} } \right) + \left( {e_{j} - s\left( {L_{j}^{P} } \right)} \right),} & {{\text{if}}\text{ }s\left( {L_{j}^{P} } \right) < e_{j} } \\ {early\left( {L_{i}^{P} } \right),} & {{\text{if}}\text{ }s\left( {L_{j}^{P} } \right) \ge e_{j} } \\ \end{array} } \right. $$
(19)
$$ late\left( {L_{j}^{P} } \right) = \left\{ {\begin{array}{ll} {late\left( {L_{i}^{P} } \right) + \left( {s\left( {L_{j}^{P} } \right) - l_{j} } \right),} & {{\text{if}}\text{ }s\left( {L_{j}^{P} } \right) > l_{j} } \\ {late\left( {L_{i}^{P} } \right),} & {{\text{if}}\text{ }s\left( {L_{j}^{P} } \right) \le l_{j} } \\ \end{array} } \right. $$
(20)

The forward label dominance is determined by inequalities (9), (10) and the additional resource inequalities

$$ early\left( {L_{i}^{1} } \right) \le early\left( {L_{i}^{2} } \right) $$
(21)
$$ late\left( {L_{i}^{1} } \right) \le late\left( {L_{i}^{2} } \right) $$
(22)

The backward dominance criteria are given by (9), (11), (21) and (22). In contrast to the label-correcting LP, the label-correcting EL temporarily disregards the value of penalty costs \( pen\left( {L_{i}^{P} } \right) \) until the label joining procedure.

The two variants of the exact dynamic programming algorithm (EDPA) are denoted according to the label-correcting procedures LP and EL, i.e., EDPALP and EDPAEL.

3.2.3 Common label joining for label-correcting procedures

The last step of the label-correcting procedures LP and EL shares a common label joining procedure such that the overall costs of the concatenated paths \( L_{i}^{FP} \oplus L_{i}^{BP} \) are given by \( c\left( {L_{i}^{FP} } \right) + c\left( {L_{i}^{BP} } \right) + pen\left( {L_{i}^{FP} \oplus L_{i}^{BP} } \right) \), where \( pen\left( {L_{i}^{FP} \oplus L_{i}^{BP} } \right) \) is determined by the solution of LP model (12)–(17). In order to avoid the time-consuming computation of such a linear program for every pair of forward and backward paths to be concatenated we proceed in the following way. Store the current minimum overall cost found so far and call it best. Then the computation of \( pen\left( {L_{i}^{FP} \oplus L_{i}^{BP} } \right) \) for the next pair of paths is performed only if \( c\left( {L_{i}^{FP} } \right) + c\left( {L_{i}^{BP} } \right) < best \) holds for such a concatenated route \( L_{i}^{FP} \oplus L_{i}^{BP} \), otherwise it cannot improve \( best \) since \( pen\left( {L_{i}^{FP} \oplus L_{i}^{BP} } \right) \ge 0 \).

4 Heuristic dynamic programming algorithms for the TSPFlexTW

Desaulniers et al. [11] propose a heuristic dynamic programming for the subproblem encountered when solving the VRPTW by column generation, which corresponds to an elementary shortest path problem with resource constraints. Two different heuristic strategies are suggested, the one that eliminates non-promising arcs in the set \( \mathcal{\mathcal{A}} \), and the other a modified dominance rule which is applied to eliminate a large number of labels.

In a first attempt to obtain effective heuristics for the TSPFlexTW, we implemented such heuristic strategies in the dynamic programming based on the label-correcting LP and the label-correcting EL without success. The arc elimination failed to significantly speed up the algorithms. On the other hand, the modified dominance rule often was too aggressive and fathomed all labels that could yield TSPFlexTW feasible solutions.

We then adopted an alternative approach by considering the dynamic programming for the traveling salesman problem with precedence constraints (TSPPC) presented by Balas and Simonetti [4]. If such constraints are absent, then the dynamic programming can be used as a heuristic that finds in linear time a local optimum over an exponential size neighbourhood. In the following sections, we first describe the approach for the TSPTW of Balas and Simonetti [4], and then extend it by including the label-correcting LP and the label-correcting EL for the TSPFlexTW.

4.1 General heuristic dynamic programming for the TSPTW

The TSPTW can be addressed with a dynamic programming algorithm originally proposed for the TSPPC. Let \( \sigma \) denote an initial ordering of the nodes \( i \in \left\{ {0, \ldots ,n} \right\} \) with depot \( 0 \) in the first position, and assume that \( k \) is a positive integer. The TSPPC aims to find the minimum cost permutation \( \pi \) of \( \sigma \) satisfying \( \pi \left( 0 \right) = 1 \) and \( \pi \left( i \right) < \pi \left( j \right) \) for all pairs of nodes \( \left( {i,j} \right) \) such that \( \sigma \left( i \right) + k \le \sigma \left( j \right) \), where notations \( \sigma \left( \text{ } \right) \) and \( \pi \left( \text{ } \right) \) are used to describe the positions of nodes in \( \sigma \) and \( \pi \), respectively. This means that aside from depot \( 0 \), which is fixed in the first position, any customer \( i \) that precedes a customer \( j \) by \( k \) or more positions in the initial ordering \( \sigma \) must also precedes customer \( j \) in any feasible permutation \( \pi \). Figure 4 illustrates examples of feasible and infeasible permutations for \( \sigma = \left( {0,5,6,7,8,9,2,3,1,4} \right) \) and \( k = 4 \). A dynamic programming formulation has been proposed in [3] for the TSPPC, which finds the optimal solution in linear time with \( n + 1 \) (i.e., depot \( 0 \) plus \( n \) customers), but exponential time with the parameter \( k \).

Fig. 4
figure 4

Examples of TSPPC feasible and infeasible permutations

The above-mentioned property still holds when \( k \) is replaced by a parameter that depends on the node \( i \), \( k\left( i \right),i \in \left\{ {0, \ldots ,n} \right\} \). This fact enabled Balas and Simonetti [4] to translate the customer time windows into precedence requirements and suggest an efficient implementation of the Balas [3] dynamic programming for the TSPTW. Since the complexity of this algorithm is exponential with \( k\left( i \right),i \in \left\{ {0, \ldots ,n} \right\} \), then computer space and/or time may restrict the values of these parameters in order to obtain a viable heuristic dynamic programming in linear time with \( n + 1 \).

The heuristic algorithm comprises three main stages. In the first stage, the TSPTW is reformulated as a TSPPC. In the second stage, an auxiliary graph \( G^{*} \) that bounds the complexity of the algorithm is constructed. In the third stage, the problem is solved by determining a shortest path in \( G^{*} \). These stages are summarized as follows.

4.1.1 TSPTW–TSPPC reformulation

The heuristic dynamic programming starts by generating an initial ordering \( \sigma \), in which \( \sigma \left( 0 \right) = 1 \) and the customers are sorted in non-decreasing order of their time window midpoints \( {{\left( {e_{i} + l_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {e_{i} + l_{i} } \right)} 2}} \right. \kern-0pt} 2},\text{ }i \in \mathcal{\mathcal{C}} \).

The ordering \( \sigma \) now can be used to derive precedence constraints. Node \( i \) placed in position \( \sigma \left( i \right) \) has to precede node \( j \) whose position is \( \sigma \left( j \right) \) in any feasible route such that \( e_{j} + W_{j} + t_{ji} > l_{i} \). Hence, if \( j_{0} \) is the smallest index such that \( e_{j} + W_{j} + t_{ji} > l_{i} \) for all \( j \) with \( \sigma \left( j \right) \ge j_{0} \), then \( k\left( i \right) \) is defined as \( k\left( i \right) = \hbox{min} \left\{ {j_{0} - \sigma \left( i \right)\text{, }K} \right\} \), where \( K \) is an upper bound on \( k\left( i \right) \) used to avoid the assignment of impractical values to this parameter. Such a procedure is applied to every node in order to generate a TSPPC from the underlying TSPTW.

4.1.2 The auxiliary graph G*

The core of the heuristic algorithm here described consists of the construction of the auxiliary graph \( G^{*} = \left( {\mathcal{\mathcal{N}}^{*} ,\mathcal{\mathcal{A}}^{*} } \right) \) associated with the TSPPC earlier obtained. This graph contains a set \( \varGamma^{*} \) of \( n + 2 \) node layers, one layer for each position in the route, with the depot appearing at the beginning and at the end of the route, both as starting node \( o = 0 \), the only node in layer \( \varGamma_{1}^{*} \), and the destination node \( d = n + 1 \), the only node in layer \( \varGamma_{{n + 2}}^{*} \). There is a one-to-one correspondence between TSPPC feasible routes and \( o - d \) paths in \( G^{*} \).

Let \( r \) be a counter assigning unique numbers to the nodes of a layer \( \varGamma_{i}^{*} \text{, }i \ne 1,i \ne n + 2 \). Each node \( \left( {i,j,r} \right) \in \varGamma_{i}^{*} \) represents a family of permutations \( \pi \) with depot \( 0 \) placed in position 1 and customer \( j \) placed in position \( i \) (i.e., \( \pi \left( 0 \right) = 1 \) and \( \pi \left( j \right) = i \)). Customer \( j \) must be one of those ranked between positions \( \hbox{max} \{ 2,i - K + 1\} \) and \( \hbox{min} \{ n + 1,\text{ }i + K - 1\} \) in the initial ordering \( \sigma \). Moreover, there is a pair of sets

$$ S_{{\left( {i,j,r} \right)}}^{ - } = \left\{ {h:h \in \mathcal{\mathcal{C}},\sigma \left( h \right) \ge i,\text{ }\pi \left( h \right) < i} \right\} $$

and

$$ S_{{\left( {i,j,r} \right)}}^{ + } = \left\{ {h:h \in \mathcal{\mathcal{C}},\sigma \left( h \right) < i,\text{ }\pi \left( h \right) \ge i} \right\} $$

associated with each node \( \left( {i,j,r} \right) \in \varGamma_{i}^{*} \). The former consists of customers ranked \( i \) or higher in \( \sigma \) but visited in a position lower than \( i \) in the permutations \( \pi \) considered by this node, and the latter comprises customers ranked below \( i \) in \( \sigma \) but visited in a position greater than or equal to \( i \) in such permutations. The pair \( \left( {S_{{\left( {i,j,r} \right)}}^{ - } ,S_{{\left( {i,j,r} \right)}}^{ + } } \right) \) enables the identification of all customers visited before \( j \) in a given node \( \left( {i,j,r} \right) \) and provides an efficient routine to ascertain the compatibility between nodes of consecutive layers. The arcs of \( G^{*} \) only join nodes of consecutive layers with verified compatibility, i.e., nodes that can be part of the same feasible solution. Figure 5 depicts the auxiliary graph \( G^{*} \) corresponding to a TSPPC in which \( \sigma \left( i \right) = \left( {0,1,2,3,4,5,6,7} \right) \) and \( k\left( i \right) = 3,\;i \in \left\{ {0, \ldots ,7} \right\} \). To elucidate the structure of the layers, nodes of two feasible \( o - d \) paths are highlighted as well as their respective components.

Fig. 5
figure 5

Example of graph G*

Because each layer \( \varGamma_{i}^{*} \text{, }i \ne 1,i \ne n + 2 \) shares a common structure of its node set and of the arc set \( \mathcal{\mathcal{A}}^{*} \cap \left( {\varGamma_{i}^{*} ,\varGamma_{i + 1}^{*} } \right) \), a copy is generated in advance and then retrieved for all \( 1 < \text{ }i < n + 2 \) during the construction of \( G^{*} \). The time complexity of finding the shortest path \( o - d \) in this auxiliary graph is strongly related to the number of arcs \( \left| {\mathcal{\mathcal{A}}^{*} } \right| \). Since \( \left| {\varGamma_{i}^{*} } \right| \le \left( {K + 1} \right)2^{K - 2} ,\;1 < \text{ }i < n + 2, \) and no node of \( G^{*} \) has an indegree greater than \( K \), the bound on the number of arcs of the auxiliary graph is given by \( \left| {\mathcal{\mathcal{A}}^{*} } \right| \le K\left( {K + 1} \right)2^{K - 2} \left( {n + 1} \right) \). Despite the inherent intricacy of building \( G^{*} \), detailed guidelines are presented in [3] and [4].

4.1.3 Shortest path computation

Once \( G^{*} \) has been built, the TSPTW solution is obtained by solving a shortest path problem with time windows (SPPTW) defined on this auxiliary graph. The referred SPPTW can be computed by a simplified label-correcting method whose components are detailed next.

The algorithm for the SPPTW consists of a mono-directional dynamic programming in which a label is denoted by \( L_{i,r}^{P} = \left( {s,i,r} \right) \) with accumulated traveling cost \( c\left( {s,i,r} \right) \). Indexes \( i \) and \( r \) assign such a label to the rth node of layer \( \varGamma_{i}^{*} \), i.e., \( \left( {i,j,r} \right) \in \varGamma_{i}^{*} \). On the other hand, resource \( s \) stands for the earliest time when service can start at customer \( j \) associated with node \( \left( {i,j,r} \right) \in \varGamma_{i}^{*} \).

The initial label \( \left( {0,1,1} \right) \) has cost \( c\left( {0,1,1} \right) = 0 \) and is associated with the node \( \left( {1,0,1} \right) \in \varGamma_{1}^{*} \). The forward extension of a label \( \left( {s,i,r} \right) \) along arc \( \mathcal{\mathcal{A}}^{*} \cap \left( {\left( {i,j,r} \right) \in \varGamma_{i}^{*} ,\left( {i + 1,h,\ell } \right) \in \varGamma_{i + 1}^{*} } \right) \) towards node \( \left( {i + 1,h,\ell } \right) \) produces the new label \( \left( {s^{\prime},i + 1,\ell } \right) \) with cost \( c\left( {s^{\prime},i + 1,\ell } \right) \) by means of the extensions

$$ s^{\prime} = \hbox{max} \left\{ {s + W_{j} + t_{jh} ,\text{ }e_{h} } \right\} $$
(23)
$$ c\left( {s^{\prime},i + 1,\ell } \right) = c\left( {s,i,r} \right) + c_{jh} $$
(24)

which are feasible only if

$$ s^{\prime} \le l_{h} $$
(25)

Regarding the dominance procedure, a label \( \left( {s^{1} ,i,r} \right) \) dominates another label \( \left( {s^{2} ,i,r} \right) \) if

$$ s^{1} \le s^{2} $$
(26)
$$ c\left( {s^{1} ,i,r} \right) \le c\left( {s^{2} ,i,r} \right) $$
(27)

and at least one of these inequalities is strict. Additionally, a maximum number of \( q \) non-dominated labels is imposed on any node of \( G^{*} \) in order to avoid an excessive computation time. For each node \( \left( {i,j,r} \right) \in G^{*} \), only the best \( q \) labels \( \left( {L_{i,r}^{1} , \ldots ,L_{i,r}^{q} } \right) \) with respect to the accumulated traveling cost are stored.

Figure 6 shows a pseudocode of the heuristic dynamic programming. To initialize the algorithm, we set the values of the parameters \( K \) and \( q \) (line 1). We then reformulate the TSPTW as a TSPPC (lines 2 and 3) and construct the auxiliary graph \( G^{*} \) (line 4). Afterwards, a simplified labeling procedure solves the TSPTW by computing a SPPTW on \( G^{*} \) (lines 5–18). This procedure is composed of a label extension procedure (lines 8–15) and a label dominance checking (line 12). Herein, \( NodeL \) denotes the set of labels associated with a node \( \left( {i,j,r} \right) \in G^{*} \), \( Extend\left( {\left( {s,i,r} \right),\left( {i + 1,h,\ell } \right)} \right) \) and \( Dominance\left( {NodL\left( {i + 1,h,\ell } \right),\left( {s^{\prime},i + 1,\ell } \right),q} \right) \) represent the label extension and dominance procedures, respectively. At the end of the algorithm, the TSPTW route corresponding to the minimum cost label stored in \( NodL\left( {n + 2,\text{ }n + 1,\text{ }1} \right) \) is outputted.

Fig. 6
figure 6

Heuristic dynamic programming algorithm

4.2 Adaptations to the TSPFlexTW context

This section presents two variants of a dynamic programming algorithm for the TSPFlexTW, which are adapted from the Balas and Simonetti [4] method. The first one, called heuristic LP, employs the label-correcting LP technique and the second one, called heuristic EL, makes use of the label-correcting EL technique.

Both approaches start by reformulating the TSPFlexTW as a TSPPC. For this purpose, the hard time windows are replaced by the flexible time windows in the required calculations. The heuristic variants then proceed to construct the auxiliary graph \( G^{*} \). Once these steps have been carried out, the label-correcting LP and EL are applied.

As far as the label-correcting LP is concerned, the additional cost-related resource \( pen\left( {s,i,r} \right) \) is assigned to the label \( \left( {s,i,r} \right) \). Its forward propagation employs the same resource extension functions (23) and (24) as well as the same feasibility condition (25) except that the flexible time windows replace the hard ones in the calculations. Regarding label dominance, if the inequality (26) is satisfied for a pair of labels \( \left( {s^{1} ,i,r} \right) \) and \( \left( {s^{2} ,i,r} \right) \), their penalty costs are obtained by means of the linear program (12)–(17). Thus, label \( \left( {s^{2} ,i,r} \right) \) is fathomed if the inequality

$$ c\left( {s^{1} ,i,r} \right) + pen\left( {s^{1} ,i,r} \right) \le c\left( {s^{2} ,i,r} \right) + pen\left( {s^{2} ,i,r} \right) $$
(28)

also holds and at least one of these inequalities is strict.

On the other hand, the label-correcting EL is based on the extended label definition \( \left( {s,i,r,early,late} \right) \) with accumulated traveling cost \( c\left( {s,i,r,early,late} \right) \) and penalty cost \( pen\left( {s,i,r,early,late} \right) \). In the course of label extension, the penalty costs are temporarily disregarded and the resources of accumulated earliness and lateness are recursively updated along the paths by means of the extension functions (19) and (20). The label dominance checking is determined by rules (21), (22), (26) and (27).

In both heuristic variants, only the best \( q \) labels \( \left( {L_{i,r}^{1} , \ldots ,L_{i,r}^{q} } \right) \) with respect to the accumulated traveling cost are stored for a given node \( \left( {i,j,r} \right) \in G^{*} \). Therefore, at the end of the extension procedure a set of at most \( q \) non-dominated labels \( L_{{n + 2,\text{ }1}}^{P} ,\text{ }P = 1, \ldots ,q \) associated with the node \( \left( {n + 2,\text{ }n + 1,\text{ }1} \right) \in G^{*} \) is obtained. Each non-dominated label stands for a TSPFlexTW feasible route and its overall cost is given by \( c\left( {L_{{n + 2,\text{ }1}}^{P} } \right) + pen\left( {L_{{n + 2,\text{ }1}}^{P} } \right) \), where \( pen\left( {L_{{n + 2,\text{ }1}}^{P} } \right) \) is determined by the solution of LP model (12)–(17). In both heuristic variants, the feasible route with the minimum overall cost \( \min_{P = 1, \ldots ,q} \left\{ {c\left( {L_{{n + 2,\text{ }1}}^{P} } \right) + pen\left( {L_{{n + 2,\text{ }1}}^{P} } \right)} \right\} \) corresponds to the best solution found.

The two variants of the heuristic dynamic programming algorithm (HDPA) are denoted according to the label-correcting procedures LP and EL, i.e., HDPALP and HDPAEL.

5 Computational experiments

This section describes the computational results of the proposed algorithms, which were coded in C# and run on an Intel® Core™ i7-4790 CPU at 3,6 GHz with 16 GB RAM. For all labeling methods, the Gurobi v.6.05 software was used to solve the LP model (12)–(17).

The algorithms were tested on two sets of instances. The first one consists of 135 symmetric Euclidean instances described in [14], with number of customers between 20 and 200 and time window widths ranging from 20 to 100 time units. They are grouped into 27 classes of five instances with an equal number of customers and time window widths. The name of each instance indicates the number of customers \( \left| \mathcal{\mathcal{C}} \right| \), the width of the time windows and the instance number in the class, e.g., n20w20.001 is the first instance of the class of five instances with 20 customers and time windows of 20 units. Since no travel time and travel cost matrices are available, these values are calculated by truncated integer Euclidean distances and then adjusted to maintain the validity of the triangular inequality by setting \( t_{ij} = t_{ih} + t_{hj} ,\text{ }{\text{if}}\text{ }t_{ih} + t_{hj} < t_{ij} \) and \( c_{ij} = c_{ih} + c_{hj} ,\text{ }{\text{if}}\text{ }c_{ih} + c_{hj} < c_{ij} \) for each triple \( \left( {i,j,h} \right) \in \mathcal{\mathcal{N}} \) [6].

The second set is adapted from the real-world asymmetric traveling salesman (ATSP) instances proposed in [17, 18], which are part of the TSPLIB [28]. These instances consist of 17 ATSP problems coming from pharmaceutical product delivery in Bologna. In the same manner as Dumas et al. [14], we generated time windows for the ATSP customers. For each problem, we first constructed a second nearest neighbour ATSP tour to determine the service start time at each customer. Then, four instances were obtained by generating time windows around such service start times. Each side of a time window consists of a random number drawn from a uniform distribution in the interval \( \left[ {0,width/2} \right] \), where \( width = 50,\text{ }100,\text{ }200\;{\text{or}}\;400 \). The service time is zero for all nodes. This procedure generated 68 instances grouped in 17 classes, with 33–170 customers per instance, and time windows between 50 and 400 units. The instance names specify the number of customers \( \left| \mathcal{\mathcal{C}} \right| \) and the width of its time windows, e.g., ftv33w50 contains 33 customers and presents time windows of 50 units.

The rest of this section is organized as follows. Section 5.1 assesses the performance of the variants EDPALP and EDPAEL of the exact dynamic programming algorithm, while the two variants HDPALP and HDPAEL of the heuristic dynamic programming algorithm are analyzed in Sect. 5.2. Finally, Sect. 5.3 compares the best solutions found for the TSPFlexTW symmetric instances with the corresponding TSPTW exact results to highlight the benefits gained by flexible time windows.

5.1 Results of the exact variants EDPALP and EDPAEL

The symmetric and asymmetric instances were adapted to the TSPFlexTW context by setting costs \( \left( {\delta_{\alpha } ,\delta_{\beta } } \right) \) to \( \left( {0.50,1.00} \right) \) and fractions \( \left( {f_{i}^{e} ,f_{i}^{l} } \right) \) to \( \left( {0.10,0.10} \right) \) for all \( i \in \mathcal{\mathcal{N}} \). Such parameter settings correspond to the medium flexibility considered by Taş et al. [34] in their main experiments. Then, EDPALP and EDPAEL were applied with a computing time limit of 3600 s.

For each class of symmetric instances, Table 1 presents the results of exact methods by using the following notation: Solved (number of instances for which the optimal solution was found within the time limit), CPU (mean time in seconds), \( \left| {ForL} \right| \) (mean number of non-dominated forward labels), \( \left| {BackL} \right| \) (mean number of non-dominated backward labels) and ‘−’ (no optimal solution was found within the time limit). EDPALP solved 92 (68%) instances with up to 200 customers, while EDPAEL solved 65 (48%) instances with up to 100 customers. Regarding the 65 instances for which both methods found the optimal solution, the mean computing time of EDPALP is significantly lower than that of EDPAEL. Furthermore, EDPALP consistently generated less non-dominated labels than EDPAEL. These results suggest that the label-correcting LP procedure coupled with EDPALP yields a better dominance procedure, which fathoms more dominated labels and speed up the exact bi-directional labeling algorithm.

Table 1 EDPALP and EDPAEL results for symmetric instances

We remark that the exact dynamic programming algorithms are sensitive to large numbers of customers as well as to wide time windows. For example, both EDPALP and EDPAEL solved most of the instances with time windows of 80 units and involving up to 40 customers. However, none of the algorithms coped with time windows with more than 40 units for instances with 100 customers.

Analogous information is shown in Table 2 for each class of asymmetric instances, which provides further evidence of the stronger dominance rules of EDPALP when compared to EDPAEL. EDPALP solved 59 (86%) instances whereas EDPAEL solved 54 (79%). In terms of the 54 instances solved by both variants, EDPALP presented lower CPU times and total of forward and backward labels. Taken together, the results obtained for symmetric and asymmetric instances underline the advantage of EDPALP with reference to exact dynamic programming. Detailed results, for each instance, are provided in Section A of the Supplementary Electronic Material.

Table 2 EDPALP and EDPAEL results for asymmetric instances

Mono-directional counterparts of the bi-directional algorithms EDPALP and EDPAEL were considered in a second experiment in order to investigate the benefits stemming from the bi-directional bounded search framework. In such counterparts, labels are mono-directionally extended by just considering the forward propagation direction. They were applied to solve the same 203 symmetric and asymmetric instances of the TSPFlexTW within an identical processing time limit. The obtained results were compared to those of the bi-directional algorithms as shown in Table 3, where columns ‘#Labels’ report the total number of non-dominated labels in memory at the end of execution of each algorithm. The bi-directional version of EDPALP was able to solve 8 more instances, reduced the mean computing time by around 33% and the mean number of labels by around 20% when compared to the mono-directional one. With reference to EDPAEL, the bi-directional version solved 8 more instances, reduced the mean computing time by around 64% and the mean number of labels by around 42% when compared to the mono-directional one. These results are consistent with those of Righini and Salani [29] and subsequent studies, which have shown that bi-directional labeling algorithms often outperform their mono-directional counterparts.

Table 3 Comparison between mono-directional and bi-directional versions of EDPALP and EDPAEL

Since the bi-directional version of EDPALP achieved the best overall performance among the exact variants, it was tested against an alternate optimal approach, i.e., a state-of-art MIP solver. For this purpose, we used Gurobi v.6.05 branch-and-cut to directly solve model (1)–(8) with default parameters except limiting the total processing time to 3600 s. Table 4 displays such a comparison for symmetric and asymmetric instance sets. The interested reader is referred to Section B of the Supplementary Electronic Material for the detailed results obtained by the MIP solver.

Table 4 Comparison between EDPALP and Gurobi v.6.05

Gurobi v.6.05 obtained optimal solutions for 92 (68%) symmetric instances with up to 150 customers. Note that EDPALP solved the same number of instances, and successfully coped with up to 200 customers. Regarding the 81 symmetric instances that could be solved by both methods, Gurobi v.6.05 required a mean CPU time around 30% longer than EDPALP. Turning now to asymmetric instances, Gurobi v.6.05 solved 67 (98%) out of the 68 instances to optimality and consumed lower computing times for those 59 that also could be solved by EDPALP. Overall, these results suggest an advantage of EDPALP for the symmetric set and a disadvantage for the asymmetric one. Such a disagreement may be explained by the absence of the triangular inequality property for the asymmetric instances. As stated by Li [25], the exact dynamic programming algorithm is significantly faster for situations in which the triangular inequality holds since the strengthened dominance condition mentioned in Sect. 3.1.2 can be applied.

5.2 Results of the heuristic variants HDPALP and HDPAEL

Based on the prior experimentation carried out in [4], we tested the sets of values \( \left\{ {8,10,12,14} \right\} \) and \( \left\{ {10,15,20} \right\} \) for the parameters \( K \) and \( q \), respectively. Within a processing time limit of 3600 s, the best values \( \left( {K,q} \right) \) found were \( \left( {8,20} \right) \) for HDPALP and \( \left( {12,20} \right) \) for HDPAEL. Note that, if the largest required \( k\left( i \right) \), i.e., \( \max_{{i \in \left\{ {0, \ldots ,n} \right\}}} \left\{ {j_{0} - \sigma \left( i \right)} \right\} \), does not exceed its imposed upper bound \( K \) and if no more than \( q \) non-dominated labels are generated for each node \( \left( {i,j,r} \right) \in G^{*} \), the heuristic dynamic programming algorithms become exact. However, the investigated sets entail instances with values of \( \max_{{i \in \left\{ {0, \ldots ,n} \right\}}} \left\{ {j_{0} - \sigma \left( i \right)} \right\} \) beyond 20, which are too large for practical use [5].

After the determination of these parameters, both heuristic variants were applied to the TSPFlexTW instances by keeping the same flexible time window and cost settings employed in the previous section. Table 5 summarizes the results obtained for symmetric and asymmetric instances. For each set, we report the number of instances for which a feasible solution was found (Solved), the mean of best objective costs found by the heuristic method in question (Obj.), its gap relative to the available optimal solutions (Opt.), i.e., \( 100\left( {{\text{Obj}} - {\text{Opt}}} \right)/{\text{Opt}} \), and CPU (mean time in seconds). Detailed results, for each instance, can be found in Section C of the Supplementary Electronic Material.

Table 5 HDPALP and HDPAEL results

This table is revealing in several ways. As far as symmetric problems are concerned, HDPALP solved 120 (88%) instances, presented a mean gap of 0.73% relative to the available optimal solutions and required a mean computational time of approximately 223 s. On the other hand, HDPAEL solved 130 (96%) instances, achieved a mean gap of 0.23% and expended a mean computational time of approximately 93 s. Moreover, for the 119 instances solved by both methods, the reported means show that HDPAEL provided better quality solutions and consumed lower CPU times. With reference to asymmetric problems, both algorithms solved all instances with gaps smaller than 1.00% in regard to the available optimal solutions. HDPAEL was a little faster, while HDPALP obtained a smaller mean gap.

From these results, the effectiveness of the heuristic dynamic programming variants can be analyzed. Within the given time limit, HDPALP solved 188 (92%) instances and HDPAEL solved 198 (97%) out of the 203 instances, whereas their exact bi-directional counterparts EDPALP and EDPAEL solved 151 (74%) and 119 (58%) instances, respectively. Such findings are not surprising since the number of labels may grow exponentially with the number of arcs in the forward and backward paths regarding exact dynamic programming. Conversely, the complexity of the heuristic variants is bounded by \( \left| {\mathcal{\mathcal{A}}^{*} } \right| \le K\left( {K + 1} \right)2^{K - 2} \left( {n + 1} \right) \) [3], with parameters \( K \) and \( q \) that control the combinatorial explosion of labels propagated along arcs in \( \mathcal{\mathcal{A}}^{*} \). Therefore, with an appropriated parameter setting \( \left( {K,q} \right) \), the heuristic algorithms can cope with instances which are intractable for their exact counterparts.

To get additional support for the evaluation of the heuristics, we applied non-parametric Wilcoxon signed rank test [21]. This test compares the expected values \( E_{HDPALP} \left[ {OC} \right] \) and \( E_{HDPAEL} \left[ {OC} \right] \), where \( OC \) is the random variable that represents the objective cost obtained by the variants HDPALP and HDPAEL for a set of instances. Ranks are assigned to each difference and we test the null hypothesis \( E_{HDPALP} \left[ {OC} \right] = E_{HDPAEL} \left[ {OC} \right] \) with the alternative hypothesis \( E_{HDPALP} \left[ {OC} \right] < E_{HDPAEL} \left[ {OC} \right] \) or \( E_{HDPALP} \left[ {OC} \right] > E_{HDPAEL} \left[ {OC} \right] \) at a significance level \( \alpha = 0.05 \). For the asymmetric problems, the null hypothesis \( E_{HDPALP} \left[ {OC} \right] = E_{HDPAEL} \left[ {OC} \right] \) was accepted. However, the Wilcoxon test showed that HDPAEL performance was superior to that of HDPALP for symmetric instances. Since the numbers of non-dominated labels stored by both heuristic variants become similar under the restrictions imposed by parameter \( q \), this advantage might be related to the HDPAEL faster dominance procedure, which does not require time-consuming LP calculations.

5.3 TSPFlexTW versus TSPTW

We also analyzed the traveling cost savings resulting from flexible time windows. Optimal solution costs reported in [14] for the TSPTW symmetric instances were compared to the corresponding solution costs obtained for the TSPFlexTW. Three tests were carried out by considering different flexible time window scenarios: (1) we kept the medium flexibility settings \( \left( {f_{i}^{e} ,f_{i}^{l} } \right) = \left( {0.10,0.10} \right) \) for all \( i \in \mathcal{\mathcal{N}} \) and \( \left( {\delta_{\alpha } ,\delta_{\beta } } \right) = \left( {0.50,1.00} \right) \); (2) we increased the values of flexibility fractions to \( \left( {f_{i}^{e} ,f_{i}^{l} } \right) = \left( {0.25,0.25} \right) \) for all \( i \in \mathcal{\mathcal{N}} \) but kept the values of cost coefficients \( \left( {\delta_{\alpha } ,\delta_{\beta } } \right) = \left( {0.50,1.00} \right) \); and (3) we kept the values of flexibility fractions \( \left( {f_{i}^{e} ,f_{i}^{l} } \right) = \left( {0.10,0.10} \right) \) for all \( i \in \mathcal{\mathcal{N}} \) but applied higher deviation costs \( \left( {\delta_{\alpha } ,\delta_{\beta } } \right) = \left( {2.00,4.00} \right) \). The higher values of parameters considered in scenarios (2) and (3) were also extracted from Taş et al. [34]. For all scenarios, we applied EDPALP to solve the TSPFlexTW instances with a computing time limit of 3600 s and, whenever it fails, HDPAEL is employed with \( K = 12 \) and \( q = 20 \).

Table 6 summarizes the above-mentioned tests which are fully described in Section D of the Supplementary Electronic Material. Column 1 shows the names of instance classes. The next five columns refer to results obtained when solving TSPFlexTW scenario (1). Column 2 reports the number of instances for which a feasible solution was found (Solved). Columns 3–6 present the mean of TSPFlexTW best objective costs obtained by EDPALP or HDPAEL which includes traveling and penalty costs (Obj.), its associated mean traveling cost (Trav.), the mean percentage difference \( \Delta_{1} \% \) between TSPFlexTW objective costs and TSPTW optimal costs (Opt.), i.e.,\( 100\left( {{\text{Opt}} - {\text{Obj}}} \right)/{\text{Obj}} \), and the mean percentage difference \( \Delta_{2} \% \) between TSPFlexTW traveling costs and TSPTW optimal costs, i.e., \( 100\left( {{\text{Opt}} - {\text{Trav}}} \right)/{\text{Trav}} \). Analogous information is shown in Columns 7–11 and in Columns 12–16 for scenarios (2) and (3), respectively. The bottom of this table provides the summary of the percentage differences obtained for the three scenarios investigated. We remark that the TSPFlexTW is a relaxation of the TSPTW, thus its optimal solution at most would lead to the same objective cost as the TSPTW optimal cost. As a consequence, the percentage differences defined above are non-negative whenever the TSPFlexTW is solved to optimality, i.e., \( \Delta_{1} \% ,\Delta_{2} \% \ge 0 \). Nevertheless, since EDPALP fails in some instances, which are heuristically solved by HDPAEL, negative percentage differences are possible, i.e., \( \Delta_{1} \% ,\Delta_{2} \% < 0 \).

Table 6 Comparison of the optimal TSPTW solutions with the best solutions found for the TSPFlexTW

From the examination of Table 6, it is possible to attest the advantages of flexible time windows. With reference to scenario (1), percentage differences ranging from − 8.89 to 12.00% and from − 6.59 to 13.42% were obtained for \( \Delta_{1} \% \) and \( \Delta_{2} \% \), respectively. The flexible time windows leaded to lower overall costs for 73 (54%) instances and lower traveling costs for 82 (61%) instances than the hard time windows. Only few heuristic solutions found by HDPAEL presented higher overall costs (15%) or higher travelling costs (6%) than the corresponding TSPTW. The TSPFlexTW algorithm variants failed in 4 (3%) instances. As far as scenario (2) is concerned, the higher flexibility fractions leaded to even better comparative figures resulting in overall cost savings for 77 (57%) instances and traveling cost savings for 104 (77%) instances when compared to TSPTW. The values of \( \Delta_{1} \% \) range between − 9.12 and 17.22%, whereas the values of \( \Delta_{2} \% \) range between − 2.00 and 23.89%. As a counterpart of such an increased flexibility, the TSPFlexTW algorithms failed in 8 (6%) instances because taking wider flexible time windows into account implies a higher computational effort. Finally, as expected, scenario (3) shows that higher penalty coefficients can possibly eliminate the benefits of flexible time windows in terms of overall cost reduction since higher penalties incur. However, traveling cost savings were preserved for a large number of instances (44%). The values of \( \Delta_{1} \% \) vary from − 17.53 to 9.38%, while the values of \( \Delta_{2} \% \) are between − 6.59 and 13.08%.

We note that the greatest cost savings were obtained for the instances with \( 20 \le \left| \mathcal{\mathcal{C}} \right| \le 60 \), since most of them could be solved to optimality by EDPALP. Furthermore, wide time windows increase the TSPFlexTW instance difficult for a fixed \( \left| \mathcal{\mathcal{C}} \right| \) and may lead to reduced and/or negative cost savings. For example, the solutions obtained on the class n80w40 presented mean percentage differences \( \Delta_{1} \% = 0.93/\Delta_{2} \% = 1.60 \) for scenario (1), \( \Delta_{1} \% = 1.41/\Delta_{2} \% = 4.67 \) for scenario (2), and \( \Delta_{1} \% = 0.12/\Delta_{2} \% = 0.18 \) for scenario (3). On the other hand, the mean outcomes found on the class n80w80, whose time windows are 40 units larger, were \( \Delta_{1} \% = - {\kern 1pt} 2.07/\Delta_{2} \% = - {\kern 1pt} 0.30 \), \( \Delta_{1} \% = - {\kern 1pt} 3.94/\Delta_{2} \% = 2.98 \) and \( \Delta_{1} \% = - {\kern 1pt} 6.87/\Delta_{2} \% = - {\kern 1pt} 0.30 \) for scenarios (1), (2) and (3), respectively.

6 Conclusions

We have considered flexible time windows in the traveling salesman problem (TSPFlexTW) which are an expansion of the hard time windows (TSPTW). The service of a customer can be started before or after the hard time window at a penalty cost added to the objective function. This leads to a problem that requires the determination of a sequence of customers and their respective service start times in order to minimize the sum of traveling costs with earliness and lateness costs. Flexible time windows result in a greater feasible space, which may lead to cost savings.

We have proposed two alternative extensions of exact and heuristic dynamic programming algorithms originally proposed for the TSPTW to solve the TSPFlexTW. Such algorithms make use of new label-correcting techniques, called label-correcting LP and label-correcting EL, and they were tested on a variety of symmetric and asymmetric instances from the literature.

The obtained results indicate that label-correcting LP leads to more effective exact methods when compared to label-correcting EL. Moreover, bi-directional versions of these approaches were superior to their mono-directional counterparts. The most efficient exact method outperformed a state-of-art MIP solver for instances where the triangular inequality holds and stronger dominance rules can be applied. We also observed that exact algorithms are sensitive to large numbers of customers as well as to wide time windows.

Regarding heuristic methods, label-correcting EL outperformed label-correcting LP for symmetric instances, while both presented a similar performance for asymmetric instances. The most efficient heuristic quickly solved 198 out of the 203 instances with up to 200 customers and time windows of up to 400 units by yielding a mean gap smaller than 0.20% in regard to the available optimal solutions.

The benefits gained by TSPFlexTW compared to the TSPTW have been also assessed. The flexible time windows leaded to overall cost savings of up to 17.22% and traveling cost savings of up to 23.89% depending on the flexibility settings employed. Wider flexible time windows provided greater cost savings, whereas higher penalty coefficients reduced such gains. Only few heuristic solutions found for harder instances presented higher overall costs or higher travelling costs than the corresponding TSPTW.

The algorithmic techniques suggested here may be applied to other types of routing and scheduling problems with flexible time window constraints. Furthermore, the proposed dynamic programming algorithms can be embedded in optimization methods under the cluster-first, route-second approach for the VRPFlexTW.