1 Introduction

The vehicle routing problem with time windows (VRPTW) is an extension of the classic vehicle routing problem (VRP) and is defined as follows: given a set of depots, a homogeneous fleet of vehicles and a set of known demand locations, find a set of closed routes, originating and ending at the depots, that service all demands and minimize the travel cost; in addition, the service at each demand must start within an associated time window. All problem parameters, such as demand locations and time windows, are assumed to be known with certainty. Time windows constraints are indeed common in many applications, including bank deliveries, postal deliveries, grocery distribution, dial-a-ride service, bus routing, and repairmen scheduling. The VRPTW has generated significant research interest over the years (see, for example, [15]), resulting in major contributions in the area of combinatorial optimization. However, paralleling the observations in [6] about the VRP, the VRPTW, as a model for routing problems, is static and deterministic, whereas many routing problems in practice are inherently dynamic and stochastic. In fact, requests for service often arrive sequentially in time, and these arrival epochs may be stochastic; moreover, locations of future demands may be unknown or known only probabilistically.

Optimization problems with these characteristics arise frequently in sensor network settings. As an example, imagine a sensor network composed of a large number of nodes is deployed over a vast field, for example to study the behaviors of elusive animals, or to detect suspicious activity in a protected region as, for example, home burglaries, or insurgents placing improvised explosive devices (IEDs). Typically, network nodes contain inexpensive sensors, such as motion detectors, which are susceptible to false alarms. In addition to the sensor network, suppose a team of unmanned aerial vehicles (UAVs) is also available, which are equipped with more sophisticated on-board sensors. Each time a sensor detects an event, a UAV is sent to the location to investigate the cause of the alarm, i.e., to verify the presence of the animal or the intruders. Then, the UAV mission control is a continuous process of collecting alarms, planning routes and sending UAVs; moreover, timeliness is paramount: should the UAV take too long to reach the location of the event, its cause may have already left the premises, and be hard to track. Another possible scenario is a sensor network in which sensors are triggered by external events, and then remain active to upload data to a UAV for a certain known amount of time. After this time expires, sensors return to a power-saving “sleep” mode.

1.1 The dynamic and stochastic VRPTW

In this paper, we introduce and study the following dynamic and stochastic version of the VRPTW. We assume that demands for service arrive according to a Poisson process with rate λ to a bounded Euclidean service region Q with area |Q|. Upon arrival, demands assume an independent and uniformly distributed location in Q. Each demand has an associated deterministic time window [r i , d i ], where d i  ≥ r i , r i , d i  ∈ ℝ; the release time r i is the demand’s arrival time, while the deadline is r i  + T, where T ∈ ℝ +  is a constant independent of the demand. Notice the assumption of equal width for the time windows (although release times and deadlines clearly differ from demand to demand). Demands are serviced by a fleet of vehicles that travel at constant velocity v. We assume that the on-site service times are equal to zero (i.e., we assume that the on-site service times are negligible compared to the travel times). Given ε ∈ (0, 1], the objective is to find routing policies that, with the minimum possible number of agents, ensure that each demand has a probability of being visited before expiring greater than 1 − ε.

This problem can be viewed as an extension of our complementary previous works [7] and [8].

1.2 Related work

The static and deterministic VRPTW has been the subject of intensive research efforts for both heuristic and exact optimization approaches (see, for example, [15]). Indeed, the VRPTW is NP-hard; even finding a feasible solution to the VRPTW when the number of vehicles is fixed is itself an NP-complete problem [9]. Chapter 7 in [3] provides a comprehensive survey on exact (exponential-time) solution techniques. Because of the difficulty of the VRPTW and its wide applicability to real-life situations, many heuristic solution techniques capable of producing high-quality solutions in limited time have been proposed; a recent thorough survey on heuristics for the VRPTW can be found in [4, 5].

To the best of our knowledge, this is the first time that a dynamic and stochastic version of the VRPTW is introduced. The dynamic and stochastic VRPTW is closely related to the dynamic traveling repairman problem (DTRP) [6, 7, 1012], in which m identical vehicles must service demands whose time of arrival, location and on-site service time are stochastic, and the objective is to find service policies that minimize the expected waiting time of the demands. In [8] the authors studied a similar vehicle routing problem where demands expire. They presented approximation algorithms for the case where vehicles’ motion is restricted to a planar curve.

The dynamic and stochastic VRPTW is also related to coverage problems of sensor networks. Considerable research effort has been invested in studying coverage properties of static sensor networks [1317]. More recently, there has been growing interest in understanding how the coverage properties of a sensor network may be improved by introducing mobility to the sensor devices. The problem of relocating sensors to improve coverage has been studied in [18]. In this formulation, the sensors can individually estimate the positions of the targets. However, the quality of coverage decreases with increasing distance. The average area covered by mobile sensors over a period of time has been characterized in [19]. It is shown that for a mobile sensor network with spatial density λ, with each sensor moving according to a mobility model similar to a random walk with expected velocity E[V s ], the expected area covered in time interval (0,t) is given by \(1 - \exp{(- \lambda(\pi r^{2} + 2rE[V_{s}]t))}\). Finally, another related problem is the orienteering problem (see, for example, [20]): the input is an edge-weighted graph G = (V , E) (directed or undirected), two nodes s, t ∈ V and a non-negative budget B. The goal is to find an s − t walk of total length at most B so as to maximize the number of distinct nodes visited by the walk.

1.3 Statement of contributions and paper organization

Since we use a variety of results from several areas, we present a brief overview in Section 2. In Section 3 we formally define the dynamic and stochastic VRPTW. In Section 4 we study the special case when ε → 0+ and λ is large; setting ε → 0+ implies that almost all demands are required to be serviced before expiring. First, we compute a lower bound on the minimum number of vehicles needed to ensure that almost all demands are serviced before expiring; second, we propose and analyze a service policy whose performance provably approximates that of optimal policies. Then, in Section 5, we study a provably correct service policy for the general case with ε and λ arbitrary. In Section 6 we extend our analysis to the case in which the time service requests remain active is itself a random variable, describing customer impatience; the customers’ impatience is only known to the mobile agents via prior statistics. In Section 7 we present simulation results. In Section 8 we present a distributed strategy for assigning demands to the vehicles, and to route them in an efficient way; finally, in Section 9, we draw some conclusions and discuss some directions for future work.

2 Preliminaries

In this section, we briefly describe some known concepts from probability, geometry and combinatorial optimization, on which we will rely extensively later in the paper.

2.1 Notation

Let ℕ0 be the set of nonnegative integers and ℝ +  be the set of positive real numbers. Let ||·|| denote the Euclidean norm. Let Q be a compact, convex subset of ℝd. We denote the Lebesgue measure of Q as |Q|. We define J m  ≐ {1,2, ⋯ ,m }. Let \(G = (g_1,\cdots, g_m ) \in Q^m \subset ({\mathbb{R}}^d)^m\) denote the location of m points in Q. A partition (or tessellation) of Q is a collection of m closed subsets \(\mathcal Q = \{Q_1,\cdots,Q_m \}\) with disjoint interiors whose union is Q. The partition of Q is convex, if each Q j , j ∈ J m , is convex.

2.2 Convergence of random variables

We refer the reader to [21, 22] for comprehensive treatment of convergence of random variables. A sequence of random variables X r , r ∈ ℕ0, converges almost surely to X (\(\lim_{r \to \infty} X_r \overset{a.s.}{=} X\)) if lim r → ∞  X r (ω) = X(ω) for all sample functions ω ∈ Ω where ℙ[Ω] = 1. (In other words, \(\mbox{$\mathbb{P}\left[{\lim_{r \to \infty} X_r = X}\right]$}=1\).) The sequence of random variables X r converges almost surely to X if and only if, for each ε > 0,

$$ \lim_{r\to \infty} \mathbb{P}\left[{\sup_{s \geq r}\left\{|X_s - X| \right \}>\varepsilon}\right]=0. $$
(1)

2.3 Inequalities for random variables

If X and Y are defined on the same probability space, then X is almost surely larger that Y, written \(X \overset{a.s.}{\geq}Y\), if and only if X(ω) ≥ Y(ω) for all ω ∈ Ω where ℙ[Ω] = 1.

Moreover, we define the following notions for asymptotic almost sure inequalities. We say that a sequence of random variables X r , r ∈ ℕ0, is asymptotically almost surely upper bounded by the random variable X (which we write as \(\lim_{r\to\infty} X_{r} \overset{a.s.} \le X\)) if, for any ε > 0,

$$ \lim_{r\to\infty} \mathbb{P}\left[{ \sup_{s\ge r} \left\{ X_{s}-X \right\} > \varepsilon}\right] =0. $$
(2)

Similarly, we say that X r is asymptotically almost surely lower bounded by X (that is, \(\lim_{r \to \infty} X_{r} \overset{a.s.}{\ge} X\)) if, for any ε > 0,

$$ \lim_{r\to\infty} \mathbb{P}\left[{\sup_{s\ge r} \left\{X- X_{s} \right\} > \varepsilon}\right]=0. $$
(3)

Notice that neither Eq. 2 nor 3 implies any form of convergence for the sequence X r . On the other hand, if both hold at the same time, it is clear that Eq. 1 is recovered.

2.4 Asymptotic and worst-case properties of the traveling salesperson problem in the Euclidean plane

The Euclidean traveling salesperson problem (TSP) is formulated as follows: given a set D of n points in ℝd, find the minimum-length tour (i.e., closed path that visits all points) of D. Let \(\ensuremath{\operatorname{TSP}}(D)\) denote the minimum length of a tour through all the points in D; by convention, \(\ensuremath{\operatorname{TSP}}(\emptyset)=0\). Suppose the set D is composed of n points whose locations are independently chosen at random from a distribution f, supported on a compact set Q. In [23] it is shown that there exists a constant β TSP,d such that, almost surely,

$$ \lim_{n\rightarrow+\infty} \frac{\ensuremath{\operatorname{TSP}}(D)}{n^{1-1/d}} \overset{a.s.}{=} \beta_{\mathrm{TSP},d} \int_Q\bar f(q)^{1-1/d}\;dq,\; $$
(4)

where \(\bar f\) is the density of the absolutely continuous part of the point distribution f. In other words, the optimal cost of stochastic TSP tours approaches a deterministic limit. In the planar case, the cost of stochastic TSP tours grows as the square root of the number of points in D. Moreover, the current best estimate of the constant in the case d = 2 is β TSP,2 ≃ 0.7120 (see, e.g., [24]). For short, we let β ≐ β TSP,2.

Notice that the bound (4) holds for all compact sets: the shape of the set only affects the convergence rate to the limit. According to [25], if Q is a “fairly compact and fairly convex” set in the plane, then Eq. 4 provides an adequate estimate of the optimal TSP tour length for values of n as low as 15.

Remarkably, the asymptotic cost of the stochastic TSP for uniform point distributions is an upper bound on the asymptotic cost for general point distributions, i.e.,

$$ \lim_{n\rightarrow+\infty} \frac{\ensuremath{\operatorname{TSP}}(D)}{n^{1-1/d}} \le \beta_{\mathrm{TSP},d} {|Q|}^{1/d}, $$

where |Q| is the area of Q. This follows directly from an application of Jensen’s inequality for concave functions to the right hand side of Eq. 4

$$ \int_Q \bar f(q)^{1-\frac{1}{d}} \; dq \le |Q|^{1/d}\left(\int_Q \bar f(q)\;dq\right)^{1-\frac{1}{d}} \le |Q|^{1/d}. $$

Finally, if the support of the point distribution f is a compact set \(Q\subseteq {\mathbb{R}}^{d}\), the following (deterministic) bound holds on the length of the TSP tour [26]:

$$ \mathrm{TSP}(D)\leq \beta_{Q}{n}^{1-1/d}|Q|^{1/d}. $$
(5)

The price of determinism is that the constant β Q depends on the set Q, and is generally much larger than β TSP,d . For example, if Q is a unit square, β Q  = 2.

2.5 Voronoi diagrams

We refer the reader to [27] for a comprehensive treatment of Voronoi diagrams. The Voronoi Diagram \(\mathcal V(G) = (V_1(G),\cdots,V_m(G))\) of Q generated by points G  = (g 1, ⋯ ,g m ) is defined by

$$V_i(G) = \left\{x \in Q \; | \; \|x -g_i \| \leq \|x - g_j \|, \, \forall j \neq i, \, j \in J_m \right\}$$
(6)

We refer to G as the set of generators of \(\mathcal V(G)\), and to V i (G) as the Voronoi cell or region of dominance of the i-th generator. Each Voronoi cell is a convex set. Indeed, a Voronoi Diagram is a convex partition of Q. For simplicity, we will refer to V i (G) as V i . When the two Voronoi cells V i and V j are adjacent (i.e., they share an edge), g i is called a Voronoi neighbor of g j (and vice-versa).

Finally, we define an equitable Voronoi diagram as a Voronoi diagram where all Voronoi cells have the same Lebesgue measure, with respect to the distribution f.

2.6 The VRPTW

The classic VRPTW is defined on a complete graph G = (V, A), where V = {v 0, v 1, ..., v n } is the vertex set and A = {(v i ,v j ): v i ,v j  ∈ V, i ≠ j } is the arc set. The first p vertices in V represent depot locations at which one or more of the m ≥ p available vehicles are based, while the remaining vertices of V represent demand locations to be serviced. With each vertex v i  ∈ V is associated a nonnegative on-site service time s i , and a time window [r i , d i ], where d i  ≥ r i , r i , d i  ∈ ℕ0. We refer to r i as a release time, to d i as a deadline, and to d i  − r i as the width of the time window. Each arc (v i , v j ) has an associated nonnegative cost c ij (usually the travel time between v i and v j ). The VRPTW consists of designing m vehicle routes on G such that: (1) every route starts and ends at the same depot; (2) every demand belongs to exactly one route; (3) the service at demand i begins in the interval [r i , d i ], and each vehicle j leaves its depot and returns to its depot in the interval [r j ,d j ], (4) the total travel cost of all vehicles is minimized. There are variants of this problem where other constraints are added, for example the total load and duration of route j are required not to exceed certain thresholds. Moreover, the fleet size m can be a variable and a usual additional objective is to minimize m.

All problem parameters are assumed to be known with certainty. Note that finding a feasible solution to the VRPTW is itself an NP-complete problem [9].

3 Problem definition

We focus on the case where the environment Q is a compact, convex subset of ℝ2.Footnote 1 Without loss of generality we will assume that the measure of Q (denoted as |Q|) is 1.

Demands are serviced by a team of m holonomic vehicles, modeled as point masses. The vehicles are free to move, with bounded velocity v, within the environment Q; without loss of generality, we will assume that the velocity magnitude is unitary. The vehicles are identical, and have unlimited fuel and demand servicing capacity. For simplicity, vehicles are not required to stop or to loiter in proximity of demands (i.e., we assume zero on-site service time).

Service operations start at time 0; the initial number of demands is a positive integer random variable. Demands arrive at Q according to a Poisson process with intensity λ, and their locations are independent and uniformly distributed over Q. In other words, the number of demands generated over time within a region \(S \subseteq Q\) can be described as a homogeneous Poisson process with rate

$$ \lambda_S = \lambda \cdot |S|. $$

When the number of vehicles is a function of λ, we denote such number as m(λ). It is assumed that the initial number of demands is independent of λ. The term heavy load is used to denote the condition λ→ ∞. We will label demands in increasing order with respect to time of arrival, by using the index i ∈ ℕ0. Each demand has an associated deterministic time window: the release time r i is the demand’s arrival time, while the deadline is r i  + T, where T ∈ ℝ +  is a constant independent of the demand. Notice the simplifying assumption of equal width for the time windows (although release times and deadlines clearly differ from demand to demand).

If one of the vehicles visits the location of the i-th demand at a time t i such that r i  ≤ t i  < d i , then the i-th demand is considered serviced; otherwise the i-th demand is considered expired. Let W i be a random variable expressing the sojourn time in the system for the i-th demand. If the i-th demand is serviced, then W i  = t i  − r i , otherwise W i  = T. Notice that, by definition, all random variables W i are surely bounded below and above; in particular, we have 0 ≤ W i  ≤ T.

Let D(t) be the set of locations of demands generated up to time t. Information on outstanding demands (i.e., arrived demands that have neither been serviced nor expired) at time t is summarized as a finite set of demand positions \(D_\mathrm{o}(t) \subseteq D(t)\). Demands are inserted in both D and D o as soon as they are generated; they are removed from D o either upon servicing—as a vehicle visits the demand’s location— or upon expiration. We assume that information contained in D(t) and D o(t) is available to all vehicles.

Given ε ∈ (0, 1], the objective is to find routing policies that ensure

$$ \lim_{i \to \infty} \mathbb{P} \left[W_i<T \right] \geq 1 - \varepsilon, $$
(7)

with the minimum possible number of agents. (Notice that W i  = T if and only if the i-th demand expired.) We will refer to ε as the “reliability” of the system (notice that ε small implies high reliability).

Define \(W_i^{\lambda}\) as the sojourn time of the i-th demand when the arrival rate is λ. Given ε ∈ (0, 1], the objective, in heavy load, is to find routing policies that ensure

$$ \lim_{i \to \infty} \mathbb{P} \left[ \lim_{\lambda \to \infty} W_i^{\lambda}<T \right] \geq 1 - \varepsilon. $$
(8)

The heavy load assumption will allow us to find some interesting asymptotic results.

4 The dynamic and stochastic VRPTW in heavy load

In this section we study the special case when ε → 0+ and λ→ ∞. Setting ε → 0+ implies that, in steady state, almost all demands are required to be serviced before expiring. Our strategy is the following: First, we establish a heavy-load lower bound on the minimum number of agents required to ensure Eq. 8 when ε → 0+. Second, we analyze a heavy-load policy with a provable performance guarantee.

4.1 Heavy-load lower bound

We have the following

Theorem 4.1

The minimum number m*(λ) of vehicles required to ensure Eq. 8 when ε → 0+ satisfies:

$$ \lim_{\lambda \to \infty} \frac{m^{*}(\lambda)}{\sqrt{\lambda}} \geq \frac{\gamma}{\sqrt{T}}, $$
(9)

where \(\gamma \geq 2/(3\sqrt{2\pi}) \approx 0.266\) .

Proof

Since ε → 0+, we require that, in the limit i → ∞, all demands (except possibly a negligible set) receive service before expiration. Thus, in the limit i → ∞, our problem resembles the dynamic vehicle routing problem [6]. In [6] it is shown that, for any policy, the limiting expected sojourn time satisfies (assuming zero on-site service time):

$$ \lim_{i \to \infty} \frac{\mathbb{E}\left[{W^{\lambda}_{i,\text{DTRP}}}\right]m^2(\lambda)}{\lambda} \geq \gamma^2, $$

where \(W^{\lambda}_{i,\text{DTRP}}\) is the sojourn time of the i-th demand in the DTRP when the arrival rate is λ, and \(\gamma \geq 2/(3\sqrt{2\pi}) \approx 0.266\). Therefore, in heavy load

$$ \lim_{\lambda \to \infty } \lim_{i \to \infty} \frac{\mathbb{E}\left[{W^{\lambda}_{i,\text{DTRP}}}\right]m^2(\lambda)}{\lambda} \geq \gamma^2. $$

Since all demands, except possibly a negligible set, receive service within a time T, we have eventually \(\mathbb{E}{\left[ {W^{\lambda }_{{i,{\text{DTRP}}}} } \right]} \leqslant T\). It follows that

$$ \lim_{\lambda \to \infty } \lim_{i \to \infty} \frac{T m^2(\lambda)}{\lambda} \geq\gamma^2. $$
(10)

Therefore, to satisfy Eq. 10, we need

$$ \lim_{\lambda \to \infty} \frac{m(\lambda)}{\sqrt{\lambda}} \geq \frac{\gamma}{\sqrt{T}}.\qquad\qquad\qquad\qquad \square $$

Thus, the required number of vehicles in heavy load and when ε → 0+ can not be less then \(m^*(\lambda) \doteq \Bigl \lceil \gamma \sqrt{\lambda / T} \Bigr \rceil\).

4.2 A provably good heavy load policy

In this section, we propose a policy (which we call TSP Policy) that satisfies Eq. 8 and requires a number of vehicles that is within a constant factor from the optimum. A pseudo-code description of the TSP Policy is provided in Algorithm 1. The requirement that the location of each demand is visited even if the demand expired is introduced to simplify the analysis.

We assume that each vehicle spends a fixed time c > 0, c ≪ T, to compute a new TSP tour; the computation time c does not depend on the number of outstanding demands.Footnote 2

Algorithm 1 TSP Policy

Require: m, the number of vehicles.

Ensure: Routing policy to service demands.

   At start-up, the environment Q is partitioned into m

   service regions Q j , j ∈ J m , of equal area 1/m (recall

   that |Q| = 1), and each agent is assigned to a distinct

   service region. Then, each agent executes in its own

   service region:

   while TRUE do

     if there are no unvisited demands then

        Move at unit velocity toward the median of the

        service region.

     else

       Compute the TSP tour through all demands.

        Service demands by following the TSP tour

        (start from the closest demand and randomly

        select one of the two possible orientations of the

        tour).

        Do not skip demands that expired.

      end if

   end while

The behavior of the TSP policy is summarized in the following theorem.

Theorem 4.2

Assume that the TSP policy is used with a number of vehicles m(λ) (i.e., m depends on λ) that satisfies

$$ \beta \sqrt{\frac{2}{T}} < \lim_{\lambda \to \infty} \frac{m(\lambda)}{\sqrt{\lambda}} < + \infty. $$
(11)

Then

$$ \lim_{i \to \infty} \mathbb{P} \left[ \lim_{\lambda \to \infty} W_i^{\lambda}<T \right]=1. $$

In other words, the TSP policy, with a number of vehicles that satisfies Eq. 11 , solves, in heavy load and when ε → 0+ , the dynamic and stochastic VRPTW. Moreover, if we define, with δ > 0 an arbitrarily small constant,

$$ m_{\text{TSP}}(\lambda) \doteq \left\lceil {\sqrt \lambda }{\left( {\beta {\sqrt {\frac{2} {T}} } + \delta } \right)} \right\rceil $$
(12)

(\(m_{\text{TSP}}(\lambda)\) clearly satisfies Eq. 11) we obtain

$$\label{eq:fact} \lim_{\lambda \to \infty} \frac{m_{\text{TSP}}(\lambda)}{m^*(\lambda)} \leq 3.78. $$
(13)

Equation 13 shows that the number of vehicles required by the TSP policy is within a constant factor from the optimal value.

Before proving Theorem (4.2), we need some notation and two lemmas.

To simplify the analysis, we write λ = kl where k ∈ ℕ0 and l > 0 is an arbitrary constant: thus, heavy load is obtained for k → ∞. With a slight abuse of notation, let \(W_{i,j}^{\lambda}\) be the sojourn time of the i-th demand arriving in service region j. By definition of the policy, what happens in a service region j ∈ J m is independent of what happens in any other service region h ≠ j. Thus, under the TSP policy, ensuring Eq. 8 with ε → 0+ calls for ensuring that

$$ \lim_{i \to \infty} \mathbb{P} \left[ \lim_{\lambda \to \infty} W_{i,j}^{\lambda}<T \right]=1, \, \forall j \in J_m. $$

Our analysis will thus concentrate on the (independent) sequences of random variables \(W_{i,j}^{\lambda}\). In the following, to avoid cumbersome notation, we will drop the index j, with the understanding that all the following analysis refers to a service region j (and not to the whole environment Q).

The arrival rate in a service region of area 1/m(λ), is \(\bar{\lambda} = \lambda/m(\lambda)\). We refer to the time instant t r , r ∈ ℕ0, at which the vehicle computes a new TSP tour (recall that each service region is serviced by exactly one vehicle) as the epoch r of the policy; we refer to the time interval between epoch r and epoch r + 1 as the r-th iteration. Finally, we define

  • \(n_r^k\), r ≥ 1: number of demands arrived in between epochs r − 1 and r, when the overall arrival process has intensity λ = kl;

  • \(C_r^k\), r ≥ 0: time interval between epochs r and r + 1, when the overall arrival process has intensity λ = kl. This time interval is the sum of (1) the computation time (equal, deterministically, to c time units), and of (2) the time required to service all demands arrived in between epochs r − 1 and r following a \(\ensuremath{\operatorname{TSP}}\) tour;

  • \(W_i^k\), i ≥ 0: sojourn time of the i-th demand, when the overall arrival process has intensity λ = kl;

  • m(k): number of vehicles used when λ = kl.

Notice that \(C_r^k - c\) is also the length of the TSP tour through the demands arrived in between epochs r − 1 and r; indeed, velocity is unitary, and in heavy load we can safely neglect the travel component between the agent’s current position and the closest demand in the TSP tour [see also part (1) in Lemma 4.3 below].

In the next Lemma, we provide some almost sure convergence results concerning random variables \(C_r^k\) and \(n_r^k\).

Lemma 4.3

Assume

$$ \beta \sqrt{\frac{2}{T}} < \lim_{k \to \infty} \frac{m(k)}{\sqrt{kl}} < + \infty. $$

Then, at each epoch r ≥ 1:

  1. (1)

    \(\displaystyle\lim_{k \to \infty} n_r^k \overset{a.s}{=} \infty;\)

  2. (2)

    \(\displaystyle\lim_{k \to \infty} \frac{C_r^k - c}{\sqrt{n_r^k}}\sqrt{m(k)} \overset{a.s}{=} \beta;\)

  3. (3)

    \(\displaystyle\lim_{k \to \infty} \frac{n_{r}^k}{C_{r-1}^k }\frac{1}{kl/m(k)} \overset{a.s}{=} 1.\)

Indeed, limit (1) is intuitive; limit (2) is an application of the TSP Theorem (Eq. 4), and limit (3) is a consequence of the strong law for renewal processes. A detailed proof is provided in the Appendix.

The next Lemma characterizes the length of an iteration.

Lemma 4.4

Assume

$$ \beta \sqrt{\frac{2}{T}} < \lim_{k \to \infty} \frac{m(k)}{\sqrt{kl}} < + \infty. $$

Then

$$ \limsup_{r \to \infty} \limsup_{k \to \infty} C_r^k \overset{a.s.}{<} T/2. $$

Proof

Recall that \(C_{r+1}^k - c\) is the length of the TSP tour through the demands arrived in between epochs r and r + 1. Then, using Lemma 4.3, we can write

$$\begin{array}{rcl} \limsup_{k \to \infty} C_{r+1}^k &=& c + \limsup_{k \to \infty} \left(C_{r+1}^k - c\right)\nonumber\\ &=&c +\limsup_{k \to \infty} \frac{C_{r+1}^k - c}{\sqrt{n_{r+1}^k}} \frac{\sqrt{m(k)}}{\sqrt{m(k)}}\sqrt{n_{r+1}^k}\nonumber\\ &\overset{a.s}{=}& c + \limsup_{k \to \infty} \frac{\beta}{ \sqrt{m(k)} }\sqrt{n_{r+1}^k} \nonumber\\ &=&c + \limsup_{k \to \infty} \frac{\beta}{ \sqrt{m(k)} }\sqrt{\frac{n_{r+1}^k}{C_r^k} \frac{kl/m(k)}{kl/m(k)}C_r^k}\nonumber\\ &\overset{a.s.}{=} &c + \limsup_{k \to \infty} \beta \frac{\sqrt{kl}}{m(k)}\sqrt{C_r^k}\nonumber\\ &<&c + \sqrt{\frac{T}{2}} \limsup_{k \to \infty} \sqrt{C_r^k}. \end{array}$$
(14)

Define the random variable \(X_r \doteq \limsup_{k \to \infty} C_r^k\); Eq. 14 allows to determine an almost sure upper bound on \(\limsup_{r \to \infty} X_r\). It is straightforward to verify

$$ \limsup_{r \to \infty} X_r \overset{a.s.}{<} \frac{1}{4}\left(\sqrt{\frac{T}{2}} + \sqrt{\frac{T}{2} + 4c} \right)^2; $$
(15)

since, by assumption, c ≪ T, we get the claim. □

Proof

(Theorem 4.2) Notice that, for demands arriving in between epochs r and r + 1, the sojourn time \(W_i^k\) is bounded above by \((C_r^k + C_{r+1}^k)\); therefore, applying Lemma 4.4, we can write

$$ \limsup_{i \to \infty} \limsup_{k \to \infty} W_i^k \leq \limsup_{r \to \infty} \limsup_{k \to \infty}\left (C_r^k + C_{r+1}^k \right) \overset{a.s.}{<} T. $$
(16)

For convenience, define \(\bar{T}\) as

$$ \bar{T} \doteq \limsup_{i \to \infty} \limsup_{k \to \infty} W_i^k. $$

From the definition, \(\bar{T}\) is a random variable (it might not be a constant) almost surely smaller than T. Recalling the definition of almost sure convergence, we have for any δ > 0

$$ \lim_{i \to \infty} \mathbb{P} \left[ \sup_{p \geq i} \left\{\left| \sup_{q\geq p}\left( \limsup_{k \to \infty} W_q^k \right) - \bar{T} \right| \right\} > \delta \right] = 0. $$

Thus, we have a fortiori

$$ \lim_{i \to \infty} \mathbb{P} \left[ \left( \limsup_{k \to \infty} W_i^k - \bar{T} \right) > \delta\right] = 0.$$

The claim follows from the fact that δ is arbitrary and \(\bar{T}\overset{a.s.}{<}T\). □

5 A service policy for the non-asymptotic case

The previous results hold only in the limit λ → ∞ and when ε → 0+. In this section, we study the TSP policy for arbitrary values of ε ∈ (0, 1] and in general load conditions. In particular, we study the number of vehicles that are sufficient, for a given ε, to ensure Eq. 7. Here, we assume that vehicles are allowed to skip the expired demands. We have the following

Theorem 5.1

For any given ε ∈ (0, 1] and any λ , a number of vehicles sufficient for the TSP policy to ensure Eq. 7 is

$$ m = \left\lceil \beta_Q \sqrt{\frac{2\lambda}{\varepsilon T}} \right \rceil, $$

where β Q is a constant that depends on the shape of the service regions.

Proof

As in Section 4.2, we analyze the sequence of sojourn times within a given cell j ∈ J m , and we drop the index j, with the understanding that all the following analysis refers to a service region j (and not to the whole environment Q).

To avoid any confusion, we restate some of the notation. The arrival rate in a service region, whose area is 1/m, is \(\bar{\lambda} = \lambda/m\). We refer to the time instant t r , r ∈ ℕ0, in which the vehicle computes a new TSP tour as the epoch r of the policy; we refer to the time interval between epoch r and epoch r + 1 as the r-th iteration. We define n r as the number of demands arrived in between epochs r − 1 and r. Finally, we define C r as the time interval between epochs r and r + 1. For simplicity we assume the computation time to be zero; extension to the case c > 0 is straightforward but cumbersome.

First, we study the sequence of expected values \(\mathbb{E}\left[{C_r}\right]\). By the deterministic inequality for the TSP tour through n points, we have (recall that the area of each service region is 1/m)

$$ C_{r+1} \leq \beta_Q \sqrt{n_{r+1}/m}, $$

where the contribution of the initial travel to the closest demand in the TSP tour is included in β Q . By applying Jensen’s inequality for concave functions, in the form \(\mathbb{E}\left[{\sqrt{X}}\right] \leq \sqrt{\mathbb{E}\left[{X}\right]}\), we get

$$ \mathbb{E}\left[{C_{r+1}}\right] \leq \beta_Q \sqrt{\mathbb{E}\left[{n_{r+1}}\right]/m}. $$

During iteration r of the policy, demands arrive according to a Poisson process. Call \(n_r^{\text{new}}\) the number of demands arrived during iteration r. Thus, we have \(\mathbb{E}\left[{n_{r+1}}\right] = \mathbb{E}\left[{n_r^{\text{new}}}\right] = \bar{\lambda} \mathbb{E}\left[{C_r}\right]\); therefore

$$ \mathbb{E}\left[{C_{r+1}}\right] \leq \beta_Q \sqrt{\bar{\lambda}\mathbb{E}\left[{C_r}\right]/m}. $$

The above inequality describes a recurrence relation that allows to bound the value to which \(\mathbb{E}\left[{C_{r}}\right]\) converges as r → ∞. It is straightforward to verify that

$$ \limsup_{r \to \infty} \mathbb{E}\left[{C_r}\right] \leq \beta_Q^2 \frac{\lambda}{m^2}. $$

Since, for demands arriving in between epochs r and r + 1, the sojourn time W i is bounded above by (C r  + C r + 1), we can write

$$ \limsup_{i \to \infty} \mathbb{E}\left[{W_i}\right] \leq 2\beta_Q^2 \frac{\lambda}{m^2}. $$

Furthermore, since W i is a non-negative random variable with finite mean, we can apply Markov inequality to obtain

$$ \mathbb{P}\left[{W_i = T}\right] = \mathbb{P}\left[{W_i \geq T}\right] \leq \frac{\mathbb{E}\left[{W_i}\right]}{T}; $$

therefore we have

$$ \limsup_{i \to \infty} \mathbb{P}\left[{W_i = T}\right] \leq \limsup_{i \to \infty} \frac{\mathbb{E}\left[{W_i}\right]}{T} \leq 2\beta_Q^2 \frac{\lambda}{m^2 T}. $$

If we choose m such that \(2\beta_Q^2 \lambda/(m^2 T) < \varepsilon\), we can guarantee that \(\limsup_{i \to \infty} \mathbb{P}\left[{W_i = T}\right] < \varepsilon \), and we obtain the claim. □

Remark 5.2

Some remarks are in order.

  1. 1)

    The constant β Q depends on the shape of the service regions; for example, if service regions are approximately square, then β Q  ≈ 2 (recall that we are including in β Q the contribution of the initial travel to the closest demand in the TSP tour).

  2. 2)

    Markov inequality is often a loose upper bound for the cumulative distribution function of a random variable; thus, we believe that the result of Theorem 5.1 is very conservative.

  3. 3)

    Notice, moreover, that the TSP policy does not rely on the knowledge of demands’ release times; in other words, it does not give priority to demands that are about to expire. It is clear that, to lower the number of needed vehicles, the knowledge of the releases times should be exploited.

6 The dynamic and stochastic vehicle routing problem with customer impatience

In this section, we extend our analysis to the case in which the time service requests remain active is itself a random variable, describing customer impatience. In this case, it is desired to minimize the fraction of service requests missed because of impatience. A preliminary version of these results appeared in [28].

6.1 Problem definition

The problem definition is similar to that of Section 3. The new fact is that, now, each demand i has an associated random impatience time L i ; should the i-th demand not be visited within time L i from its arrival, it will expire. The impatience times L i are independent and identically distributed according to a density f L : ℝ +  →ℝ. We assume that demands’ impatience is known to the vehicles only via prior statistics. This is in sharp contrast to the standard vehicle routing problem with time windows, where the time windows are known by the vehicles.

Similarly as before, we define W i as the sojourn time in the system for the i-th demand. The random variable W i is the elapsed time between the arrival of demand i and the time when either one of the vehicles visits its location or such demand departs from the system due to impatience - whichever is smaller. A demand is considered serviced if W i  < L i .

Given φ ∈ (0, 1], the objective is to ensure that no more than a fraction φ (where φ ∈ (0, 1] is a control parameter) out of all the arrived demands departs impatiently before service. Notice that we rule out the case φ = 0. We want to answer the questions: what is a sufficient number of mobile agents needed to ensure that each service request is fulfilled before expiring, with probability at least 1 − φ? What strategy should they use to ensure this objective is attained?

We will restrict our analysis to the heavy load case.

6.2 From customer impatience to deterministic time windows

Our approach is to transform the problem with customer impatience into a problem with deterministic time windows, and then to apply the results of the previous sections. Recalling that the impatience times are identically distributed, define the critical time \(T_{\text{crit}}\) as

$$ T_{\text{crit}} \doteq \max \left\{ T \in \mathbb{R}_+ : \int_T^\infty f_{L}(t)dt = \mathbb{P}\left[{L>T}\right] \geqslant 1 - \varphi \right\}. $$

Clearly, if a routing policy is able to ensure that each demand location (regardless of its impatience) is visited within time \(T_{\text{crit}} \) from its arrival, then this policy ensures that no more than a fraction φ out of all the arrived demands departs impatiently before service. Through the concept of critical time we can, therefore, address the problem of servicing demands with impatience as the problem of visiting all demands’ locations, regardless of their impatience (i.e., even if they depart impatiently), within a constant time. In other words, through the concept of critical time, we cast the problem of servicing demands with impatience into the previous dynamic and stochastic VRPTW, in which the time windows have length \(T_{\text{crit}}\) and we require a system reliability ε → 0+.

This approach on the one hand introduces some degree of conservatism, but on the other hand it simplifies considerably the mathematical analysis.

6.3 A provably correct heavy load policy

From the previous discussion, we argue that, in heavy load, the TSP policy (see Algorithm 1) solves the dynamic and stochastic vehicle routing problem with customer impatience for any φ ∈ (0, 1]. An upper bound on the number of vehicles needed by the TSP policy is

$$ m = \left\lceil \beta\sqrt{\frac{2\lambda}{T_{\text{crit}}}} \right\rceil. $$
(17)

7 Simulation results for the TSP policy

In this section we verify by simulations the correctness of the previous results. All simulations are performed using linkern Footnote 3 as a solver to generate approximations to the optimal TSP tour. This powerful solver yields approximations in the order of 10% of the optimal tour cost very quickly for many instances. For example, in our numerical experiments on a 2.4 GHz Pentium machine, approximations of random TSPs with 1,000 points typically required about two seconds of CPU time.

7.1 TSP policy in heavy load

The analysis of the TSP policy in heavy load culminates in Eq. 12, which dictates how many vehicles are needed to ensure that almost no demand is missed.

We consider the scenario λ = 1000 and T = 5. Then, Eq. 12 prescribes \(m_{\text{TSP}}=15\) vehicles. We run, starting from random initial conditions, 100 simulations; each run consists of 200 iterations of the TSP policy. For each run, we record the fraction ρ of demands that expire before service. Table 1 summarizes simulation results; we report the worst case value of ρ, the mean value of ρ and its standard deviation (mean value and standard deviation are over 100 realizations).

Table 1 Fraction ρ of expired demands

From Table 1, the worst case fraction of missed demands is ρ = 4.1 ·10 − 3, that is 4 demands every 1000 demands are missed in the worst case; recalling that our analysis holds in the limit λ→ ∞, and that we are using an approximate TSP solver, we conclude that simulation results match our previous analysis.

The central idea in the analysis of the TSP policy in heavy load is that each iteration should last a time interval less than T/2. Figure 1 shows the worst case iteration length versus the number of vehicles employed. We consider the case T = 5 and λ = 400. Equation 12 prescribes 10 vehicles, and m = 10 is exactly the minimum number of vehicles to obtain a worst case iteration length smaller than T/2. This shows that our analysis of the TSP policy is rather tight.

Figure 1
figure 1

Iteration length as a function of m; λ = 400, T = 5, \(m_{\text{TSP}} = 10\)

7.2 TSP policy in normal load

We now test the TSP policy in moderate load, with the number of vehicles prescribed by Theorem 5.1. We compute the maximum fraction ρ of missed demands as a function of ε and λ. The maximum is over 100 sample paths, with 200 iterations each. We consider T = 5, ε ∈ { 0.1,0.4,0.6,0.9} and λ = {1,5,10,20,30,40,50,60,70,80 }. The number of vehicles is dictated by Theorem 5.1.

For all pairs (ε, λ), we obtained max ρ = 0. This result is somewhat expected, since, as discussed before, Theorem 5.1 provides a very loose upper bound on the number of vehicles needed to guarantee ε-reliability.

It is interesting to verify if the results obtained for the heavy load case, in particular Eq. 12, remain valid for moderate values of λ. Figure 2 shows the worst case fraction of missed demands for various values of λ (the worst case is with respect to 100 sample paths). The number of vehicles is prescribed by Eq. 12. It can be seen that for λ > 350 ρ ≈ 0, and that for λ ≤ 350 ρ is never larger than 0.1.

Figure 2
figure 2

Fraction ρ of missed demands as a function of λ. The number of vehicles is prescribed by Eq. 12

7.3 TSP policy with customer impatience

First, we consider the scenario where demands have an impatience uniformly distributed in [0, 90], i.e.

$$ f_L(t) = \left\{ \begin{array}{ll} 1/90& \mbox { if } t \in [0, 90];\\ 0 & \mbox{ otherwise.}\end{array}\right. $$

We set φ = 0.05; therefore T crit = 4.5 seconds. We consider λ = {10,20,40,50,75,100}. For each value of λ, we run, starting from random initial conditions, 100 simulations. For each λ, we record the fraction ρ of demands that are missed in the worst case run. For every value of λ, the number of vehicles is dictated by Eq. 17.

Figure 3 shows the results. The TSP policy always satisfies the requirement that no more than φ = 5% of demands depart impatiently. The requirement is satisfied with a considerable safety margin, and this is a consequence of the conservatism that characterizes our approach. Recall that Eq. 17 was derived under the heavy load assumption. These simulations show that Eq. 17 seems to be applicable for every value of λ.

Figure 3
figure 3

Fraction of demands that depart impatiently as a function of λ; the number of agents is respectively m = {1,1,2,2,3,4,4,5,5,6,7,9,10}. The impatience time follows a uniform distribution

Then, we consider an impatience that follows an exponential distribution with the same mean of the previous uniform distribution, i.e.:

$$ f_L(t) = \left\{ \begin{array}{ll} \delta e^{-\delta t} & \mbox { if } t\ge 0;\\ 0 & \mbox{ otherwise.}\end{array}\right. $$

where δ = 1/45 seconds. We repeat the previous simulations with this new customer impatience model. The TSP policy, also in this case, always satisfies the requirement, with a considerable safety margin (see Fig. 4).

Figure 4
figure 4

Fraction of demands that depart impatiently as a function of λ; the number of agents is respectively m = {1,1,2,3,3,5,5,6,7,9,10,12,14}. The impatience time follows an exponential distribution

8 Decentralized equitable partitioning

The TSP policy is centralized, since it requires a centralized assignment of the service regions. As introduced in [7, 29], assuming that Q is convex, m agents can achieve a configuration with equitable (i.e. with the same area) service regions in a decentralized way. The first step is to associate to each vehicle i a virtual generator g i . We define the region of dominance for vehicle i as the Voronoi cell V i  = V i (G), where G ≐ (g 1, ⋯ ,g m ). We refer to the partition into regions of dominance induced by the set of virtual generators G as V(G). A virtual generator g i is simply an artificial variable locally controlled by the i-th agent; in particular, g i is a virtual point (see Fig. 5).

Figure 5
figure 5

Agents, virtual generators and regions of dominance

We shall assume that each vehicle has sufficient information available to determine: (1) its Voronoi cell, and (2) the locations of all outstanding events in its Voronoi cell. A control policy that relies on information (1) and (2) is Voronoi-distributed in the sense that the behavior of each vehicle depends only on the location of the other agents with contiguous Voronoi cells (the number of Voronoi neighbors of each generator is on average less than or equal to 6 [27]). Accordingly, Voronoi-distributed policies are spatially distributed and scalable in the number of agents. A spatially distributed algorithm for the local computation and maintenance of Voronoi cells is provided in [30].

The key idea, then, is to enable virtual generators to follow a (Voronoi-distributed) gradient descent law such that an equitable partition is reached (see [7] for the details). Figure 6 shows a typical partition that arises with such algorithm.

Figure 6
figure 6

Each vehicle services outstanding demands inside its own region of dominance by following a TSP tour

9 Conclusion

In this paper, we introduced and studied a dynamic and stochastic version of the well-known vehicle routing problem with time windows. In our model, each service request is generated by a spatio-temporal stochastic process; once a service request has been generated, it remains active for a certain deterministic amount of time, and then expires. Given ε ∈ (0, 1], the objective is to find routing policies that, with the minimum possible number of agents, ensure that each demand has a probability of being visited before expiring greater than 1 − ε.

We first presented a lower-bound on the minimum number of vehicles for the case where ε → 0+ and the arrival rate of events is large (heavy load). Next, we presented a heavy-load routing strategy for servicing almost all demands before they expire and showed that the number of vehicles is within a small constant from the optimal value. We then studied the case where the arrival rate is arbitrary and an arbitrary fraction of the events must be serviced; also for this case we provided a provably correct algorithm. Finally, we extended these results to the case where the expiration time of an event is itself a random variable. In addition, we showed how the routing strategies presented in the paper can be executed in a distributed fashion. Our theoretical results are confirmed by simulations.

Several important research directions will be the objective of future work. First of all, the results in this paper assume that the time windows are either deterministic and uniform for all demands, or stochastic and unknown to the agents; it is of interest to extend our work to a more general setup. Second, we will investigate the extension of the work presented in this paper to the case in which the demands are generated according to non-uniform spatial distributions. Finally, some of the bounds in this paper are conservative, thus we plan to search for tighter bounds.