1 Introduction

A central issue of managing a public transport system is how to handle unpredicted delays. In this paper we consider the question whether (and when) a train should wait for delayed passengers where we take the position of the train operator. The basic setting is that the number of passengers wishing to enter and leave the train at the stations where the train stops is known in advance. However, how many of the passengers are delayed at a station becomes known only when that station is reached. We focus on the case of a single train line. This case occurs, for instance, in a subway train line in Athens connecting the central station Syntagma to the Airport (http://www.ametro.gr/). Passengers arrive at the various stations by other means of public transportation (e.g. busses) and there is no obvious way how the train operator can take influence on the feeder lines.

The objective which we consider is to minimize the total delay of the passengers caused by (a) delayed passengers missing the train which has not waited for them and (b) on-time passengers being delayed by reaching their destinations late due to the train waiting for delayed passengers. We stress that we assume that once the train has waited, it can not catch up. This assumption is realistic if there is only a comparatively small time between stops, for instance, in urban subways.

An instance of the Online Delay Management Problem (ODMP) on a single train line introduced by Gatto et al. (2007a) is given by a number of stations 1,…,n that have to be served in ascending order by a single train. Furthermore, passenger paths P ij carrying a fixed number p ij ≥0 passengers from station i to j are given for 1≤i<jn, where passengers can be either on time or have a (source) delay δ>0. If passengers are delayed, the question is whether the train should wait for them or not. We assume that the train cannot speed up in order to gain time. Hence, waiting once yields a delay of δ at all upcoming stations. If a train does not wait for delayed passengers, these passengers are dropped and they have to wait for the next train scheduled in Tδ time units.

Gatto et al. (2007a) used the concept of online optimization and competitive analysis to compare strategies for the ODMP. An online algorithm knows already at the first station for every station i the numbers p ij (j>i) of passengers starting their trip at station i. However, the numbers of delayed and on-time passengers is not known in advance. When reaching station i the number o ij of on-time passengers starting at station i is revealed, and thereby also the number of delayed passengers d ij , since d ij =p ij o ij . If the train waits at station k the total delay incurred is given by

$$ D(k)=(T-\delta)\sum_{\substack{i,j: 1\leq i<k \\ \phantom{i,j:} i< j\leq n}}d_{ij}+\delta\sum _{\substack{i,j: 1\leq i\leq n \\ \phantom {i,j:} i<j\leq n}}d_{ij} + \delta\sum _{\substack{i,j: 1\leq i< j \\ \phantom{i,j:} k<j\leq n}}o_{ij}. $$

Here, the first term accounts for the additional delay caused by dropping delayed passengers before waiting in k, the second term represents the (fixed) δ-delay of all previously delayed passengers and the last term is the delay caused for all initially on-time passengers who leave the train after k. If k=n, this corresponds to the train not waiting at all since no passenger path starts in station n. The aim of the ODMP is to find a station k to wait at that minimizes D(k).

A deterministic online algorithm alg is called c-competitive, if for every instance σ, i.e. every setting of delayed passengers on the given passenger paths P ij , its cost alg(σ) satisfies:

where opt(σ) denotes the optimum offline cost for instance σ. A detailed introduction to online optimization and competitive analysis can be found in the book by Borodin and El-Yaniv (2005). In this paper we consider only deterministic online algorithms.

Competitive analysis has often been criticized for being overly pessimistic. In our situation it seems unrealistic that an algorithm (a train line operator) knows absolutely nothing about the number of delayed passengers at a station before it is reached. Additional information such as lookahead (i.e. information about the next ≥2 stations) or knowledge about the distribution of the delays should help.

The main contribution of the paper is to explore various ways to make this intuition quantifiable. Although we use different approaches which may seem only weakly connected at first glance, all of these approaches attempt to answer the same question: How much can “additional information” (such as lookahead, information about the distribution of delays etc.) help and which kind of information is most helpful? Part of our paper focusses on theoretical analysis of the problem. This is complemented by a small illustrative case-study for a subway in Athens, Greece. While some of the theoretical results are obtained only for special “delay patterns”, they suggest decision strategies which we use for the case-study.

The paper is organized as follows: Our first step to go beyond standard competitive analysis is to introduce a lookahead concept for the ODMP in Sect. 2. While lookahead turns out to be essentially useless from the competitive analysis point of view, we use the comparative ratio introduced by Koutsoupias and Papadimitriou (2000) and theoretically validate the intuition that lookahead is helpful for the online player. The case-study corroborates this theoretical finding experimentally.

If one has knowledge about the delay distribution, one may ask how this can be incorporated into online algorithms. In Sect. 3 we analyze online algorithms from an average case point of view. We show how the decision made by an online algorithm changes if it does not hedge against the worst-case but against an “average worst-case” according to some distribution.

Stochastic optimization is a classical approach to deal with uncertain information (Kall and Wallace 1994; Prékopa 1995). In Sect. 4 we present a stochastic programming framework which allows to incorporate knowledge about delay distributions as well as the online feature of delay management. We show how to obtain an optimal policy.

In Sect. 5 we evaluate various strategies experimentally. Both, on random data and on the data given in the real-world case-study in Athens, the algorithms using additional information outperform the known competitive online algorithm.

1.1 Previous work

There exist various models and solution approaches for offline delay management, see e.g. Schöbel (2001, 2006), Heilporn et al. (2008), where mixed-integer programming formulations for different objective functions are presented and also combined in a bi-criteria programming framework. An investigation of the computational complexity of delay management problems can be found in the papers by Gatto et al. (2004, 2005).

Gatto et al. (2007a) consider the problem of managing delays for the first time as an online optimization problem. They introduce the problem stated above, present a family of 2-competitive online algorithms, and prove a lower bound on the competitive ratio. Krumke et al. (2011) consider extensions to the problem. They define a 3-competitive algorithm for the case of two different source delays, and deal with a different objective function. They show that when maximizing the profit no deterministic algorithm can have bounded competitiveness, and present a randomized 2-competitive algorithm. A restricted version of the ODMP where all passenger paths end in the last station was investigated by Gatto et al. (2007b). In this setting, the online player knows that passengers on at most k stations which are not known in advance are delayed. The authors connect this variant to a more general version of the Ski Rental Problem, the so-called Generalized k-Day Ski Rental Problem, and derive lower bounds on the competitive ratio.

Kliewer and Suhl (2011) and Bauer (2010) present various online strategies (among others priority based and ILP-based strategies) and provide simulation frameworks. The complexity of finding an optimal online policy was addressed by Berger et al. (2007) who proved this task to be PSPACE-hard. More aspects of delay management can be found in Dollevoet et al. (2011, 2012), Schachtebeck and Schöbel (2010), Suhl et al. (2001).

Standard competitive analysis has often been criticized for being overly pessimistic since it assumes a fairly weak online player. Several concepts have been introduced in the literature as a remedy. Koutsoupias and Papadimitriou suggested to use comparative analysis where basically two classes of online algorithms are compared relatively to each other (Koutsoupias and Papadimitriou 2000). We use this approach in Sect. 2. Another alternative is to use average-case analysis coupled with competitive analysis which was done by Fujiwara and Iwama (2002) for the Ski-rental problem. More approaches can be found in Borodin et al. (1991, 1995, 1996), Irani et al. (1992). We refer to Fiat and Woeginger (1998, Chap. 17) for a comprehensive survey.

2 Lookahead and comparative analysis

2.1 Lookahead for delay management

In the ODMP the number of delayed passengers at a station is not known in advance. We consider a lookahead where the information about delays at the next station(s) is known. We first show that from a competitive analysis point of view, lookahead is essentially useless for the online player. This is similar to the classical paging problem (Fiat and Woeginger 1998; Borodin and El-Yaniv 2005).

Definition 2.1

An algorithm for the ODMP has a weak lookahead of k stations if it knows at each station the number of delayed and on-time passengers for the following k stations. We denote this class of algorithms by \(\mathcal{L}_{w}(k)\).

Intuitively, lookahead gives more information to an online algorithm. However, this does not pay off in competitive analysis as in a worst case there can be sufficiently many dummy nodes having no outgoing passenger paths.

Observation 2.2

Let be a c-competitive algorithm. Then there exists an algorithm alg without lookahead that is also c-competitive.

The existence of dummy nodes is an obvious reason for the bad performance of weak lookahead in competitive analysis. Hence, we consider a modified version of the lookahead that ensures that we always obtain additional information.

Definition 2.3

An algorithm for the ODMP has a strong lookahead of k stations if it knows the number of delayed and on-time passengers at each station for the following k stations that have passengers departing. We denote this class of algorithms by \(\mathcal{L}_{s}(k)\).

Observation 2.4

The classes of online algorithms with different lookaheads are related to each other. By definition it holds for all k≥1 that

$$\mathcal{L}(0) \subset\mathcal{L}_w(1) \subset\mathcal{L}_w(k) \subset\mathcal{L}_s(k) \subset\mathcal{L}_s(k+1) \subset\mathcal {L}(\infty), $$

where \(\mathcal{L}(0)\) and \(\mathcal{L}(\infty)\) denote the class of online algorithms without lookahead, and the class of offline algorithms, respectively.

The question arising is whether it is possible to obtain better competitiveness results for algorithms with strong lookahead. This is not the case as the following result shows.

Theorem 2.5

Let be a c-competitive algorithm. Then there exists for every ϵ>0 an algorithm alg without lookahead that is (c+ϵ)-competitive.

Proof

Let I be an instance of the ODMP with n stations. We assume that I has at least one delayed passenger. Otherwise, the problem would be trivial as the online algorithm that never waits is 1-competitive.

We give a construction of an instance I′ by modifying I as shown in Fig. 1: Multiply the number of passengers on all paths by some \(\alpha\in\mathbb{N}\). Furthermore, add k nodes \(v_{1}^{l},\ldots,v_{k}^{l}\) in between stations l and l+1 for all l∈{1,…,n−1}. Finally, add passenger paths \(P_{l,v_{1}^{l}}\) for all l∈{1,2,…,n−1}, \(P_{v_{i}^{l},v_{i+1}^{l}}\) for all i∈{1,…,k−1} and l∈{1,…,n−1}, and \(P_{v_{k}^{l},l+1}\) for all l∈{1,…,n−1}. All of them carry one passenger and are declared to be on time.

Fig. 1
figure 1

Illustration of the construction of instance I′ for n=4 and k=3 in Theorem 2.5

We define alg to work on instance I as follows:

  • For ϵ>0 construct I′ as explained above for some α that satisfies

    $$ \epsilon\alpha\delta- (\epsilon+ c) (n-1) (k+1) \delta\geq0. $$
    (1)
  • Apply alg′ to instance I′. If alg′ waits at station 1, wait at station 1. If alg′ waits at any station in \(\{v_{1}^{l},\ldots,v_{k}^{l},l\}\), wait at station l.

Observe that alg is defined such that when it has to decide whether to wait or not, it does not use knowledge about delayed passengers at the succeeding stations in the original network I. Hence, we ensure that alg does not possess a lookahead.

By construction we have

(2)

and

(3)

By the assumption that I has at least one delayed passenger, there are at least α source delayed passengers when serving I′, i.e.,

(4)

Since ALG′ is c-competitive, we have

(5)

All in all we then get

 □

2.2 Comparative analysis for lookahead

As we have seen, competitive analysis is too restrictive to account for the additional feature of lookahead. For a more subtle analysis of algorithms with different kinds of lookahead we make use of the concept of comparative analysis introduced by Koutsoupias and Papadimitriou (2000).

Definition 2.6

For a given problem with set of instances Σ consider two classes of algorithms \(\mathcal{A}\) and \(\mathcal{B}\). We call

$$ \mathcal{R(A,B)}= \max_{B \in\mathcal{B}} \min_{A \in\mathcal{A}} \max _{\sigma\in\varSigma} \frac{A(\sigma)}{B(\sigma)} $$
(6)

the comparative ratio Footnote 1 of \(\mathcal{A}\) with respect to \(\mathcal{B}\). Here, A(σ) and B(σ) denote the costs of algorithm A and B on instance σ.

If we consider the case that \(\mathcal{A}\) is the set of all online algorithms (without lookahead) and \(\mathcal{B}\) is the set of all offline algorithms, (6) turns out to be the competitive ratio. Similar to the competitive ratio, the comparative ratio can be interpreted in a game-theoretic way: \(\mathcal{B}\) tries to demonstrate its strength by choosing an algorithm B. \(\mathcal{A}\) has to respond to this by choosing A. Finally, \(\mathcal{B}\) selects an instance σ such that the ratio is maximized.

When comparing the class of algorithms without and with lookahead, we have the following result.

Theorem 2.7

Let L be a lower bound on the competitive ratio for the ODMP with n stations. Then the comparative ratio of \(\mathcal{L}(0)\) with respect to \(\mathcal{L}_{w}(n-2)\) is at least L.

Proof

If we consider an instance of the ODMP with n stations and an algorithm possesses a lookahead of k=n−2, it can choose the same sequence as the optimal offline algorithm and serve it optimally as it possesses the same amount of information. □

Theorem 2.7 shows that lower bounds on the competitive ratio that have been derived for three, four, and arbitrary many stations, transfer to the comparative ratio in this case. Table 1 states these lower bounds using previous results cited in the right column. The l in the last row is independent of the size of the instance. It simply refers to the lookahead advantage that increases, i.e., \(\mathcal{L}_{w}(l)\) can construct an instance with l+1 stations such that the comparative ratio on this instance goes to 2 as l→∞.

Table 1 Lower bounds on the comparative ratio

Note that by Observation 2.4, Theorem 2.7 also holds for the strong lookahead.

For the comparative ratio of lookahead with respect to the lookahead whose range covers one more station we have the following result.

Theorem 2.8

For all k≥1 the comparative ratio of \(\mathcal{L}_{w}(k)\) with respect to \(\mathcal{L}_{w}(k+1)\) is at least \(\varPhi= \frac{1 + \sqrt{5}}{2}\).

Proof

Consider the network as shown in Fig. 2 consisting of stations 1,2,…,k+3 and passenger paths P i connecting i and k+3 for all i∈{1,…,k+2}.

Fig. 2
figure 2

Illustration of the network used in the proof of Theorem 2.8

Initially, the train is at the first station. Then P 1 is declared to be delayed and P 2,…,P k+1 to be on time. Besides T and δ (and the fixed numbers of passengers) this is the common information to both classes of algorithms. The stronger class of algorithms \(\mathcal{L}_{w}(k+1)\) also possesses the information about the delay of the passengers on P k+2 and, hence, can optimally choose where to wait.

In order to obtain a lower bound on the ratio between the best algorithm from each class, we do not need to consider all situations but it suffices to construct a setup along the following lines: If an algorithm from \(\mathcal{L}_{w}(k)\) waits at station 1, P k+2 is declared to be on time and an algorithm from \(\mathcal{L}_{w}(k+1)\) would have chosen not to wait at all. Else, P k+2 is declared to be delayed and the optimal strategy would have been to wait at station 1.

This situation can be summarized in a nonlinear program:

$$\begin{array}{l@{\quad}ll} \max& c & \\ \text{s.t.} & \delta\Biggl(p_1 + \displaystyle\sum\limits_{i=2}^{k+1}p_{i}+ p_{k+2}\Biggr) &\geq c T p_1 \\ & T p_1 + \delta\Biggl(\displaystyle\sum\limits_{i=2}^{k+1}p_{i} + p_{k+2} \Biggr) &\geq c \delta\Biggl(p_1 + \displaystyle\sum\limits_{i=2}^{k+1}p_{i} + p_{k+2} \Biggr) \\ & T (p_1+ p_{k+2}) &\geq c \delta\Biggl(p_1 + \displaystyle\sum\limits _{i=2}^{k+1}p_{i}+ p_{k+2} \Biggr)\\ & c & \geq1 \\ & T, \delta, p_1,\ldots,p_{k+2} & \geq0. \end{array} $$

The aim is to find an instance (that is specified by the variables T, δ, p 1,…,p k+2 such that the minimum ratio c of any algorithm from \(\mathcal{L}_{w}(k)\) and an optimum solution which is provided by the best algorithm in \(\mathcal{L}_{w}(k+1)\) is as large as possible.

The first constraint in the above program is for the case that the algorithm from \(\mathcal{L}_{w}(k)\) waits at station 1. The l.h.s. states the algorithm’s delay in this case, whereas the r.h.s. corresponds to the optimal delay for this case. The second constraint corresponds similarly to the delay in the case of waiting at stations 2,3,,…,k+1, or k+2. The last constraint is for not waiting at all.

Analogously to the analysis of Gatto et al. (2007a), one can show that the program is feasible for c arbitrarily close to Φ. Hence, we have shown that there exists an instance such that for any possible \(\mathcal{L}_{w}(k)\)-algorithm the ratio is at least Φ. □

Observe that the above result even holds when comparing arbitrary lookahead variants since the considered instance does not have any dummy nodes and therefore strong and weak lookahead do not differ.

Last, we compare the weak lookahead with arbitrary range and the strong lookahead having a range of only one station. The following result shows that the type of lookahead is more important than the range.

Theorem 2.9

For all k≥1 the comparative ratio of \(\mathcal{L}_{w}(k)\) with respect to \(\mathcal{L}_{s}(1)\) is at least \(\varPhi= \frac{1 + \sqrt{5}}{2}\).

Proof

Consider the following network consisting of n=k+3 stations and passenger paths P 1 and P 2 connecting 1 and n, and P 3 connecting n−1 and n as shown in Fig. 3. Observe that stations 2,…,n−2 are dummy nodes.

Fig. 3
figure 3

Illustration of the network used in the proof of Theorem 2.9

\(\mathcal{L}_{s}(1)\) possesses all information at the first station and can hence choose optimally where to wait.

P 1 is declared to be delayed and P 2 to be on time. The algorithm with weak lookahead has to decide whether to wait or not. If it decides to wait at station 1, P 3 will be declared on time and the optimal strategy would have been not to wait at all. Else, P 3 will be declared delayed and waiting at 1 would have been optimal.

This situation can once again be written as a nonlinear program:

$$\begin{array}{l@{\quad}ll} \max& c & \\ \text{s.t.} & \delta(p_1 + p_2 + p_3) &\geq c T p_1 \\ & T p_1 + \delta(p_2 + p_3) &\geq c \delta(p_1 + p_2 + p_3) \\ & T (p_1+p_3) &\geq c \delta(p_1 + p_2 + p_3)\\ & c & \geq1\\ & T, \delta, p_1, p_2, p_3 & \geq0. \end{array} $$

Again, by a similar argument as above and the previous work by Gatto et al. (2007a) this program yields the desired bound. □

Gatto et al. (2007a) present a family of 2-competitive algorithms for the ODMP. Observation 2.4 and Theorem 2.5 therefore immediately yield an upper bound for the comparative ratios.

Corollary 2.10

The comparative ratios considered in Theorems 2.7, 2.8, and 2.9 are at most 2.

3 Average-case (competitive) analysis

If there is some knowledge about the input distribution, a classical approach is to use average-case analysis and estimate the expected cost of an algorithm. A related measure to overcome the pessimism of competitive analysis was introduced by Fujiwara and Iwama (2002) connecting the concepts of average-case and competitive analysis, the average-case competitive analysis. The average-competitive ratio of an online algorithm alg is defined as

From a game-theoretic perspective the adversary can still make use of the optimal offline algorithm but has to follow a distribution that is known to the online player when generating requests.

3.1 Analysis for a special case of the ODMP

We analyze a special case of the ODMP where we assume that there exists a breakpoint t (the exact position of the breakpoint should not be known to the online algorithm in advance) indicating that all passengers are delayed up to, and on time after this point. The motivation for studying this setting is derived from competitive analysis, where this transition from delaying passengers to declaring them to be on time is exactly what the adversary does as soon as an online algorithm waits.

We consider the case that the position of the breakpoint is drawn uniformly from the interval [0,n], i.e., its p.d.f. is \(f(t)= \frac{1}{n} 1_{[0,n]}(t)\). Furthermore, we assume that at each station one passenger embarks the train and travels to the terminal station, i.e., all paths are of the form P in with p in =1. Note that we assume in this section for the sake of better readability that the first station is 0.

These assumptions help to keep the analysis comparatively clean, and to be able to draw qualitative conclusions.

3.1.1 Average-case analysis

The delay for the algorithm waiting at station k if the breakpoint is at t is given by

The average costs of waiting at station k is then

Note that m is a quadratic function in k, and we can compute the first and second order derivatives:

$$\begin{aligned} \frac{d}{dk}\bigl(m(k)\bigr) &= (T-2 \delta) \biggl(1-\frac{k}{n} \biggr) + \frac{T}{2n}, \\ \frac{d^2}{(dk)^2}\bigl(m(k)\bigr) &= - \frac{T-2 \delta}{n}. \end{aligned}$$

If T≥2δ, it holds that \(\frac{d}{dk}(m(k)) > 0\) for all k∈[0,n], and therefore m is strictly increasing on [0,n]. Hence, m attains the minimum on [0,n] for k=0.

If T<2δ, we have that \(\frac{d^{2}}{(dk)^{2}}(m(k)) > 0\) for all k∈[0,n]. In particular, the stationary point \(k^{*} = \frac {T}{2(T-2\delta)}+n\) is the global minimum of m on the real line. Furthermore, an easy computation shows that k ∈[0,n], and hence k is the minimum on the interval [0,n]. Figure 4 shows an illustration of the behavior.

Fig. 4
figure 4

The average costs m for n=10 stations for different choices of T and δ

The results can be interpreted in the following way:

If T≥2δ, the analysis shows that waiting at the first station leads to the minimum expected delay. This seems reasonable as dropping delayed passengers incurs a delay of T per delayed passenger. Having a comparatively large T therefore indicates that one should avoid this.

If T<2δ, the analysis suggests to wait at station k (which we possibly need to round up or down if k is not an integer). For δ being close to T, k is close to n. Overall, the analysis suggests to rather wait at one of the later stations. Similarly to the case before, this makes sense since a comparatively large δ corresponds to a large delay if one delays on-time passengers.

3.1.2 Average-case competitive analysis

We continue to analyze the special case by means of average-case competitive analysis. To this end, we need to consider the optimal offline algorithm that knows the breakpoint t in advance and can decide whether to wait at the first station or not to wait at all, yielding a total delay of

The average-competitive ratio of alg k is therefore given by

To be able to actually state the integrals properly, we have to distinguish two cases, and consider the following piecewise definition of c(k):

$$c(k)= \begin{cases} c_1(k) & \text{if } k < \lceil t \rceil\\ c_2(k) & \text{if } k \geq\lceil t \rceil, \end{cases} $$

where

$$\begin{aligned} c_1(k) =& \int_0^k{ \frac{T \lceil t \rceil}{T \lceil t \rceil}f(t)dt} + \int_k^{\lfloor\frac{\delta n}{T} \rfloor}{ \frac{Tk+\delta(n-k)}{T \lceil t \rceil}f(t)dt} \\ &{}+ \int_{\lfloor\frac{\delta n}{T} \rfloor}^n{\frac {Tk+\delta (n-k)}{\delta n}f(t)dt}, \\ c_2(k) =& \int_0^{\lfloor\frac{\delta n}{T} \rfloor}{ \frac{T \lceil t \rceil}{T \lceil t \rceil}f(t)dt} + \int_{\lfloor\frac{\delta n}{T} \rfloor}^k{ \frac{T \lceil t \rceil}{\delta n}f(t)dt} + \int_k^n{ \frac {Tk+\delta(n-k)}{\delta n}f(t)dt}. \end{aligned}$$

Our aim is to find k such that c(k) is minimized. However, the analysis of c is not possible in such a straightforward way as it was possible for m before. The complicated structure of the functions, due to e.g., the second integral of c 1, forbids solving the minimization problem analytically or obtaining general statements on the monotonic behaviour. We resort to a numerical analysis of c to locate minima by means of constrained nonlinear programming. Thereby, we can solve the minimization problem numerically for any given set of input parameters T, δ, and n, and obtain qualitative results. Nevertheless, we observe that the location of the minimum of c depends as expected on the relation between T and δ:

If T<2δ, average-case competitive analysis suggests to wait at the last station n. This is similar to the average-case analysis before that waited at k (which was close to n). Again, we can argue that waiting at one of the earlier stations would incur a comparatively large delay of δ to each of the on-time passengers.

If T≥2δ, the station chosen by the average-case competitive analysis is one of the earlier stations, which is similar to the average-case analysis. The intuition behind this is that dropping delayed passengers incurs for each of them a delay of T, which is large compared to δ. However, the minimum is not attained for 0. This can be seen as a tendency of the measure to incorporate worst case behaviour. If it waits at the first station, the offline algorithm will have significantly less delay when the breakpoint is located slightly after this station. In this case the ratio of algorithm and optimum would therefore be large. Figure 5 illustrates the situation.

Fig. 5
figure 5

The average-competitive ratio c for n=10 stations and different choices of T and δ

This shows how average-case competitive analysis tries to balance between these two opposing strategies.

This two-face nature of average-case competitive analysis that uses knowledge about distributions as well as hedging against worst cases can also be seen in Fig. 6.

Fig. 6
figure 6

The optimal waiting strategy k opt for average-case analysis (dots) and average-case competitive analysis (squares) as a function of δ for n=50 stations and T=25

3.1.3 Comparison

We now present a comparison of the three strategies for our special passenger path and delay structure: the strategy from Sect. 3.1.1 suggested by average-case analysis, the one from Sect. 3.1.2 obtained from average-case competitive analysis, and the 2-competitive balancing strategy of Gatto et al. (2007a).

Figure 7 shows the analytical average delay for each of the strategies assuming that the breakpoint is uniformly distributed. Here, only the optimal average-case competitive strategy had to be computed approximately in each step by means of a numerical analysis. By definition, average-case analysis yields the best results and it is not surprising that the 2-competitive balancing strategy is the worst. The values of the average-case competitive strategy lie in between the two others.

Fig. 7
figure 7

The average delay for waiting strategies suggested by average-case analysis (dots), average-case competitive analysis (squares), and the balancing algorithm (crosses) as a function of δ for n=50 stations and T=25

To analyze how the strategies perform in a worst case, we compared for different choices of T, δ, and n the maximum ratio between each of the strategies and the optimal offline algorithm. The results are shown in Fig. 8, and represent the theoretically derived worst-cases. By construction, the 2-competitive balancing strategy is never worse than twice the optimum, but often even better. The performance of the average-case strategy is poor in many cases. The worst-case ratios of the average-case competitive strategy lie again in between.

Fig. 8
figure 8

The maximum ratio of strategies suggested by average-case analysis (dots) and average-case competitive analysis (squares) to the optimal offline algorithm as a function of δ for n=50 stations and T=25 for waiting strategies. The dashed line indicates the competitive ratio of 2 that is obtained by the balancing algorithm (Gatto et al. 2007a)

4 A stochastic programming framework

Stochastic Programming is a widely used framework when analyzing optimization problems that contain uncertainties. Its aim is to find a feasible policy that is optimal w.r.t. the current stage costs plus the (discounted) expected future costs. Although there are in general a lot of possible scenarios, in our problem we do not have to solve mathematical programs for each of the subproblems here, as one has to do in most cases in literature as well as applications. We make use of the nice structure of our problem and present a tailor-made solution for obtaining an optimal policy.

4.1 Setting of the stochastic program

A stage in our stochastic program corresponds to a station in the train network. The random variable ξ k in stage k corresponds to the (nk+1)-dimensional vector of delayed passengers departing from station k, i.e., ξ k =(ξ k,k+1,…,ξ kn ). Here, ξ kj is the random variable of delayed passengers that want to board the train at station k and travel to station j. We define the number of delayed passengers d kl (ξ kl ) travelling from k to l to be a discrete random variable with values in {0,1,…,p kl }. As stated before, p kl is assumed to be known. Hence, the number of on-time passengers is given by o kl (ξ kl )=p kl d kl (ξ kl ).

We introduce for each station k∈{1,…,n−1} a decision variable \(x_{k} \in\mathcal{X}_{k}\) which represents the waiting decision. The values allowed to be taken only depend on the previous stage decision and we define \(\mathcal{X}_{1}=\{\lambda,0,1\}\) and for k≥2

$$\mathcal{X}_k(x_{k-1},\xi_k)= \begin{cases} \{0,1\} & \text{if }x_{k-1}=0\\ \{\lambda\} & \text{if }x_{k-1} \in\{1,\lambda\}. \end{cases} $$

If the decision variable takes the value 0, this means that we do not wait at the corresponding station, 1 represents waiting, and λ∉{0,1} is for the case that we already waited at some station before and therefore no decision has to be taken. Depending on the waiting decision and the outcome of the random variable, costs in stage k occur:

$$ f_k(x_k,\xi_k)= \begin{cases} T \sum_{j: k<j\leq n}d_{kj}(\xi_{kj}) & \text{if }x_k=0\\ \delta (\sum_{i,j: k\leq i<j\leq n}p_{ij}+\sum_{i,j: 1\leq i<k<j\leq n}o_{ij}(\xi_{ij}) ) & \text{if }x_k=1\\ 0 & \text{if }x_k=\lambda. \end{cases} $$

If we decide not to wait, all passengers at the current station are dropped and they have to wait for the next train scheduled in T time units. If we wait, all on-time passengers currently on the train and all future on-time passengers are delayed by δ. In case that we waited at some station before, no costs occur.

4.2 Obtaining an optimal policy

Our aim is then to find an optimal waiting policy, i.e., solve the (n−1)-stage stochastic program

$$\begin{aligned} &\min_{x_1 \in\mathcal{X}_1(\xi_1)} f_1(x_1)+ \mathbb{E}_{\xi_2} \Bigl[ \min_{x_2 \in\mathcal{X}_2(x_1, \xi_2)} f_2(x_2, \xi_2) +\mathbb {E}_{\xi_3} \Bigl[ \cdots + \mathbb{E}_{\xi_{n-1}} \\ &\quad\times\Bigl[ \min _{x_{n-1} \in\mathcal{X}_{n-1}(x_{n-2},\xi_{n-1})} f_{n-1}(x_{n-1},\xi _{n-1}) \Bigr] \cdots \Bigr] \Bigr]. \end{aligned}$$

First, we consider the last-stage problem

$$Q_{n-1}(x_{n-2},\xi_{n-1})=\min_{x_{n-1} \in\mathcal {X}_{n-1}(x_{n-2},\xi_{n-1})} f_{n-1}(x_{n-1},\xi_{n-1}) $$

which has optimal solution

$$x^*_{n-1}(x_{n-2},\xi_{n-1})= \begin{cases} 0 & \text{if }x_{n-2}=0 \text{ and } f_{n-1}(0,\xi_{n-1}) < f_{n-1}(1,\xi_{n-1}) \\ 1 & \text{if }x_{n-2}=0 \text{ and } f_{n-1}(0,\xi_{n-1}) > f_{n-1}(1,\xi_{n-1}) \\ \lambda& \text{if }x_{n-2} \in\{1,\lambda\}. \end{cases} $$

This solution can be interpreted as the optimal strategy when reaching station n−1, and is then passed on to the second-to-last stage problem to obtain an optimal waiting policy for the last three stations. This process can then be iterated as shown in Fig. 9.

Fig. 9
figure 9

Illustration of iterative solution process for the stochastic program

In stage k we already have computed \(x^{*}_{k+1},\ldots,x^{*}_{n-1}\) and need to solve the problem

$$Q_{k}(x_{k-1},\xi_{k})=\min_{x_{k} \in\mathcal{X}_{k}(x_{k-1},\xi _k) } f_{k}(x_{k},\xi_{k}) + \mathbb{E}_{\xi_{k+1}} \bigl[Q_{k+1}(x_{k},\xi _{k+1}) \bigr] $$

of minimizing the costs in stage k plus the expected costs to go in the following states.

If x k ∈{1,λ}, no costs occur at the following stations and the expectation in the minimization problem above is zero. The more interesting case is x k =0, where we have

$$ \mathbb{E}_{\xi_{k+1}} \bigl[Q_{k+1}(0,\xi_{k+1}) \bigr]= \mathbb {E}_{\xi _{k+1}} \bigl[f_{k+1}\bigl(x^*_{k+1}, \xi_{k+1}\bigr)+ \mathbb{E}_{\xi _{k+2}} \bigl[Q_{k+2} \bigl(x^*_{k+1},\xi_{k+2}\bigr) \bigr] \bigr]. $$
(7)

If after the realization of ξ k+1, \(x^{*}_{k+1}(\xi_{k+1})=1\) is chosen, we have expected costs of

$$ \underbrace{\delta \biggl(\sum_{i,j: k+1\leq i<j\leq n}p_{ij} + \sum_{i,j: 1\leq i<k+1<j\leq n}o_{ij}(\xi_{ij}) \biggr)}_{=f_{k+1}(1,\xi _{k+1})} + \underbrace{\mathbb{E}_{\xi_{k+2}} \bigl[ Q_{k+2}(1,\xi _{k+2}) \bigr]}_{=0}. $$
(8)

If \(x^{*}_{k+1}(\xi_{k+1})=0\) is chosen, the expected costs are

$$ \underbrace{T \sum_{j: k+1<j\leq n}d_{k+1,j}(\xi _{k+1,j})}_{=f_{k+1}(0,\xi_{k+1})} + \underbrace{\mathbb{E}_{\xi _{k+2}} \bigl[ Q_{k+2}(0,\xi_{k+2}) \bigr]}_{=:\mathcal{A}_{k+2}}, $$
(9)

where \(\mathcal{A}_{k+2}\) is known from the iteration before. Hence, the expression in (7) becomes:

$$\begin{aligned} &\pi_1 \delta \biggl(\sum_{i,j: k+1\leq i<j\leq n}p_{ij} + \sum_{i,j: 1\leq i<k+1<j\leq n}o_{ij}(\xi_{ij}) \biggr) \\ &\quad + \pi_2 \biggl( T \mathbb{E}_{\xi_{k+1}} \biggl[\sum _{j: k+1<j\leq n}d_{k+1,j}(\xi_{k+1,j}) | (8) > (9) \biggr] + \mathcal {A}_{k+2} \biggr), \end{aligned}$$
(10)

where π 1=Pr((8)≤(9)) and π 2=Pr((8)>(9)).

Observe that we do not need a conditional expectation for the first summand in (10) as the numbers of the corresponding on-time passengers are already known at this stage. Combining what we just obtained, the strategy

$$ x^*_{k}(x_{k-1},\xi_{k})= \begin{cases} 0 & \text{if }x_{k-1}=0 \text{ and } f_k(0,\xi_k) + (10) < f_k(1,\xi_k) \\ 1 & \text{if }x_{k-1}=0 \text{ and } f_k(0,\xi_k) + (10) > f_k(1,\xi_k) \\ \lambda& \text{if }x_{k-1} \in\{1,\lambda\} \end{cases} $$

is optimal for the problem in stage k. This yields, together with the previously obtained strategies, a policy \((x^{*}_{k},x^{*}_{k+1},\ldots ,x^{*}_{n-1})\) that minimizes the expected costs after reaching station k.

This procedure is continued until we reach the first station, i.e., k=1, and we obtain an optimal waiting policy \((x^{*}_{1},\ldots ,x^{*}_{n-1})\), saying at each station whether it is better to wait or to continue on time.

For the computation of the conditional expectation in (10) we can use the convolution to obtain the distribution of a sum of random variables for the case of independent random variables. Otherwise, if no closed form is at hand, one has to perform sampling and use the empirical distribution function.

4.3 Numerical evaluation

In the following we show how the stochastic programming strategy performs on randomly generated data. For small networks of three stations, we compare stochastic programming, average-case analysis, and the balancing algorithm (Gatto et al. 2007a). We generated 10,000 instances where the parameters for each instance were chosen with a uniform distribution: p ij from {0,…,20}, d ij from {0,…,p ij }, and T from {1,…,5}.

We obtained the boxplots shown in Fig. 10 for the average delays. Although this yields a first insight into the performance of the strategies, we cannot draw any statistical conclusions from it. Therefore, we analyzed the number of cases where the strategies differed. In fact, the stochastic programming strategy and the balancing strategy suggested the same strategy for 69.21 % of the cases. Average-case analysis and stochastic programming coincided even on 93.36 % of the instances.

Fig. 10
figure 10

Boxplots for the average delay of the different strategies

Hence, we focused only on those instances where the delays of the strategies differ. We computed the difference of stochastic programming and each of the respective strategies. Negative values therefore correspond to instances where stochastic programming is superior.

Figure 11 shows that stochastic programming outperforms the 2-competitive balancing strategy by far. Figure 12 shows the difference between stochastic programming and average-case analysis. Although the difference is not as significant as in the previous case, the observations indicate that stochastic programming is slightly superior.

Fig. 11
figure 11

Difference of the delays of stochastic programming and balancing algorithm (the instances where the delays coincide have been left out)

Fig. 12
figure 12

Difference of the delays of stochastic programming and average-case analysis (the instances where the delays coincide have been left out)

5 Experimental case study

In this section, we compare the concepts introduced before on a real world data set taken from the Athens Metro as shown in Fig. 13. We consider the single train line connecting the station Syntagma in the center of Athens with the airport and a total of 13 stations.

Fig. 13
figure 13

Illustration of the Athens Metro http://lintim.math.uni-goettingen.de/. The station Syntagma (SYN) is located in the center of Athens, the airport (AER) in the south-east

The number of passengers for all origin-destination pairs was taken from the LinTim software package http://lintim.math.uni-goettingen.de/. In the following, we apply and compare different concepts for the ODMP introduced before:

  • the 2-competitive balancing algorithm from Gatto et al. (2007a): This algorithm fixes a parameter t∈[Tδ,T] and decides to wait at the first station k, where

    $$ t\sum_{\substack{i,j: 1\leq i<k \\ \phantom{i,j:} i< j\leq n}}d_{ij}\geq \delta \biggl(\sum_{\substack{i,j: 1\leq i< k<j\leq n}}o_{ij} +\sum _{\substack{i,j: k\leq i< j\leq n}}p_{ij} \biggr). $$
    (11)

    (If there is no such k, then the algorithm does not wait at all.) Equation (11) states that the total delay caused to delayed passengers missing the train before k is balanced with the potential delay for all future and on-board passengers which are or might be on time.

  • two variants of algorithms using a lookahead of l stations as introduced in Sect. 2:

    The first variant, the dynamic lookahead version, uses its knowledge about the delays at the next l stations and takes these actual numbers into consideration in a balancing argument. It waits at station knl−1, if

    $$ t\sum_{\substack{i,j: 1\leq i<k \\ \phantom{i,j:} i< j\leq n}}d_{ij}\geq \delta \biggl(\sum_{\substack{i,j: 1\leq i< k+l<j\leq n}}o_{ij} +\sum _{\substack{i,j: k+l\leq i< j\leq n}}p_{ij} \biggr). $$
    (12)

    Once it reaches station nl−1 and has not waited so far, it possesses all information about future delays and chooses optimally for the remaining stations. Hence, it can be seen as a generalization of the balancing algorithm.

    The static lookahead algorithm is a naïve version of the dynamic algorithm. It actually ignores its additional information until it has reached station nl−1. Until this station, it uses the algorithm from Gatto et al. (2007a). At station nl−1, it behaves like the dynamic lookahead algorithm. Note that the naïve static lookahead algorithm is the natural generalization to a lookahead algorithm as it is by construction never worse than the balancing algorithm, and optimal on networks with up to l+2 stations.

    The train line in our case study contained only a total of 13 stations, so we limited the lookahead in both the dynamic and the static version to l=1.

  • the algorithm from Sect. 3.1.1 which chooses a station k according to the results from the average-case analysis. Note that the analysis in Sect. 3.1.1 was carried out only for a special pattern of the delays and from a theoretical point of view. However, from a practical point of view, k can still be used on arbitrary data.

  • the optimum offline algorithm opt which, given all input data, chooses a station k to wait at such that D(k) is minimized.

We did not evaluate the stochastic programming approach of Sect. 4 for the case study. The computation of the conditional expectations is complex and too time consuming for a larger number of stations. The average-case competitive algorithm from Sect. 3.1.2 behaved slightly worse than the algorithm using the average-case solution (see also Sect. 3.1.3) and thus is also omitted in the following comparison.

Due to lack of the actual delay distributions, in our case study we assume that the number of delayed passengers d ij travelling from station i to j is a uniformly distributed random variable with values in {0,1,…,p ij } and independent of the other variables. For this setting we generated 100,000 instances for every different choices of the parameters δ and T. Figure 14 shows boxplots for the average delays of each of the strategies for parameters δ=1 and T=2. Figure 15 shows the associated worst-case ratios, i.e., the maximum that was observed over all instances for the ratio of each of the strategies and the optimal offline algorithm.

Fig. 14
figure 14

Boxplots for the average delays for δ=1, T=2, and 100,000 runs

Fig. 15
figure 15

Maximum ratio that was observed for each of the strategies compared to the optimal solution

We observe that the static lookahead algorithm that uses its additional information only when reaching station n−2 does not help to improve over the balancing algorithm from Gatto et al. (2007a). This can be explained by the special structure of the passenger paths. There are many passengers boarding the train at one of the first stations, and therefore the main delay is caused by passengers at these stations. The algorithm waits in most cases “too early”, since the break-even point of balancing is reached before it can make use of the lookahead. In contrast, dynamic lookahead algorithm takes its additional lookahead knowledge into consideration at every station. Although this does not always guarantee to outperform the balancing strategy (as can be seen in Fig. 14, there are actually instances where the dynamic lookahead performed worse than all other strategies), it is more suitable for taking the structure of the passenger paths into account and obtains better results overall.

The best strategy (except for the optimum offline strategy, of course) in the real world example is to follow the decisions motivated by the average-case analysis. In our setting, it chooses to wait at the first station which coincides with the optimal offline strategy in almost all cases.

The worst-case ratios of the different strategies have the same order of magnitude. The average-case analysis algorithm provides a decent worst-case behaviour on the real data. Recall that we saw in Sect. 3 that it might be significantly worse than the other strategies. This behavior cannot be observed in the case study. The ratio of the balancing algorithm is fairly close to its theoretical competitive ratio of 2.

We note that we could observe qualitatively similar results for other choices of the parameters.

6 Conclusion

We have studied extensions and alternatives to competitive analysis for the online delay management problem on a single train line (ODMP). The results indicate that competitive analysis is, in fact, overly pessimistic. The balancing algorithm from Gatto et al. (2007a) is tailored for the worst-case and exhibits the same performance on real data. Additional information such as lookahead or knowledge about the distribution of the input can help an algorithm both theoretically and in practice. In order to obtain a theoretical quantification of the improvement, one has to deviate from standard competitive analysis: we used comparative analysis and two versions of average-case analysis in order to prove the actual benefit. This improvement also shows in experimental results both on artificially generated instances and on those on the basis of real-world data.

Our work raises a couple of challenging questions: Can analogous results be derived for more complex delay management problems? Can the theoretical results (in particular average-case analysis) be extended to more general distributions of the input? Finally, although we presented a stochastic programming framework for computing an optimum policy, at the current stage, the effort to implement this policy for a larger number of stations is excessive. Thus, it would be interesting to see if one could develop efficient means to compute those policies, maybe using the special structure of a given real-world application.