Keywords

1 Introduction

An approximate distance oracle is a data structure that is required to produce distance estimations in constant query time. Thorup and Zwick [19] showed that given an undirected weighted graph \(G=(V,E)\) with m edges and n vertices and an integer \(k\ge 1\), there is a data structure of size \(O(kn^{1+1/k})\) that for every pair of vertices \(u,v\in V\) returns in O(k) time an estimation \(\hat{d}(u,v)\) which is a \((2k-1)\) multiplicative approximation (stretch) of d(uv), that is, \(d(u,v)\le \hat{d}(u,v) \le (2k-1)d(u,v)\), where d(uv) is the length of the shortest path between u and v in G.

Thorup and Zwick [19] presented also a lower bound based on the girth conjecture of ErdősFootnote 1. More specifically, they proved that, for every \(k\ge 1\), if there is a graph of \(\varOmega (n^{1+1/k})\) edges whose girth is \(2k+2\) then any distance oracle with stretch \(t\le 2k\), requires \(\varOmega (n^{1+1/k})\) bits on some input. A careful examination of their proof reveals that it relies on the stretch of the estimation for vertex pairs \(u,v\in V\) for which \((u,v)\in E\), that is, \(d(u,v)=1\). Therefore, it still might be possible to obtain a data structure with constant query time and a stretch better than \(2k-1\) using \(O(kn^{1+1/k})\) space, for vertex pairs \(u,v\in V\) that satisfy \(d(u,v)\ge 2\), or for graphs with \(m=o(n^{1+1/k})\), that is, sparse graphsFootnote 2.

We present a new distance oracle for unweighted undirected graphs, that uses \(O(knm^{1/k}\log n)\) space and provides in O(k) query time an estimation \(d^*(u,v)\) that satisfies \(d(u,v) \le d^*(u,v) \le (2k-1)d(u,v)-4\), for every \(k>2\), and \(d(u,v) \le d^*(u,v) \le 3d(u,v)-2\), for \(k=2\). This implies that for sparse graphs with \(m=\tilde{O}(n)\)Footnote 3 our new distance oracle uses the same space as Thorup and Zwick’s distance oracle (up to poly-logarithmic factors) and produces in O(k) time an estimation of strictly better stretch than the stretch of Thorup and Zwick’s distance oracle. Sparse graphs with \(m=\tilde{O}(n)\) edges are very interesting both from the practical perspective and the theoretical perspective.

From the practical perspective, it is important to note that many real world graphs are sparse and \(m=\tilde{O}(n)\). This is usually the case in social networks and in many other types on networksFootnote 4.

From the theoretical perspective, Pǎtraşcu, Roditty and Thorup [11] proved a conditional lower bound for the case of sparse graphs with \(m=\tilde{O}(n)\), based on a set intersection hardness conjecture. They showed that for any \(\ell > 1\), a distance oracle that for every pair of vertices at distance \(\ell + 1\), provides in constant query time an estimation strictly smaller than \(3(\ell +1) - 2\) requires space. Notice that for \(k=2\) our distance oracle has an estimation that is at most \(3d(u,v)-2\), for every \(u,v \in V\) and uses \(\tilde{O}(n^{1.5})\) space for sparse graphs with \(m=\tilde{O}(n)\). It follows from [11] that bounding the estimation by a value strictly smaller than \(3d(u,v)-2\) requires space, where \(\varepsilon >0\).

Pǎtraşcu et al. [11] showed also that there are infinitely many distance oracles for sparse graphs with fractional stretch factors. Their distance oracles converge exactly to the integral stretch factors and the corresponding space bound of Thorup-Zwick distance oracles. Our new construction implies that for space \(\tilde{O}(km^{1+{1/k}})\) a stretch that is strictly better than the corresponding integral stretch of \(2k-1\) is possible.

The implications of our new distance oracles are not restricted only for sparse graphs with \(m=\tilde{O}(n)\). Consider graphs with \(m\in [n,o(n^{1+1/k})]\) edges. A natural question is whether a distance oracle for such graphs requires \(\varOmega (n^{1+1/k})\) for stretch \(2k-1\). The girth based approach, as in the lower bound of Thorup and Zwick [19], is not possible here since we can store the entire graph. This implies that for vertex pairs \(u,v\in V\) with \(d(u,v)=1\), we can store the exact distance. Our new distance oracle rules out also the option to use pairs of vertices \(u,v\in V\) for which \(d(u,v)=2\), as a possible source of hardness for a possible lower bound. If we construct our new distance oracle with parameter \(k+1\) then the space required is in the range \([n,o(n^{1+1/k})]\) and for every pair of vertices \(u,v\in V\), for which \(d(u,v)=2\), the estimation is at most \((2(k+1)-1)2-4=(2k-1)2\), and therefore, when \(d(u,v)=2\) the stretch is at most \(2k-1\) .

The distance oracles of Thorup and Zwick, beside being an important data structure on their own, are also extremely useful as a tool in many applications. They were a crucial building block in several important dynamic graph algorithms along the last decade (e.g., [2, 7, 8, 16]). They also play a pivotal role in designing distance labeling and compact routing schemes as was already shown by Thorup and Zwick [18] and in subsequent works (e.g., [1, 3, 13, 14]). Distance oracles were also implemented and tested (e.g., [6, 12]) and found useful on real world graphs. Therefore, any further understanding that we gain on the basic properties of distance oracles is of great interest.

We obtain our new distance oracle by a careful combination of a variant of Thorup and Zwick distance oracles with a new idea that interplays between a hitting set of vertices and a hitting set of edges to overcome a certain hard case that is relatively common in analysis of algorithms of shortest paths. Therefore, our new approach is of independent interest, as it might be found useful in other closely related problems.

Motivated by our theoretical finding, another contribution that we make in this paper is a refined analysis of the stretch of Thorup and Zwick distance oracles. At the base of the distance oracles there is an hierarchy of vertex sets \(A_0,A_1,\ldots , A_k\), where \(A_0=V\), \(A_k=\emptyset \) and \(A_i\) is formed by picking each vertex of \(A_{i-1}\), independently, with some probability p. For every \(u\in V\) the distance \(d(u,A_i)\) between u and \(A_i\) is computed and saved. We introduce a simple parameter, called the average distance, which is roughly definedFootnote 5 for every \(i\in [1,k-1]\) as the distance between u and \(A_i\) divided by i, that is \(d(u,A_i)/i\). Our refined analysis characterizes several cases in which the stretch is strictly better than \(2k-1\) using only the average distance, which can be easily computed using the current information saved with the distance oracle. Roughly speaking, if there exist \(i,j\in [1,k-1]\) such that \(i\ne j\) and \(d(u,A_i)/i\ne d(u,A_j)/j\), then the stretch is strictly better than \(2k-1\) for every distance query that includes the vertex u.

Based on similar ideas we also show that if \(D(u)=\{\varDelta _1,\ldots , \varDelta _\ell \}\) is the set of all possible distances of \(u\in V\) with other vertices in the graph then there is at most one value \(\varDelta \in D(u)\) for which the stretch of the distance estimation is exactly \(2k-1\), that is, only for vertices v that satisfy \(d(u,v)=\varDelta \) it might be that \(\hat{d}(u,v)=(2k-1)d(u,v)\).

We complement the refined stretch analysis by conducting a small experiment on real world graphs. In the experiment we check how frequent are the cases that allow for a better stretch in these real world graphs. Interestingly, these cases are quite frequent and thus in many cases the actual stretch is much better than the worst case stretch bound.

1.1 Related Work

Since their introduction by Thorup and Zwick [19] distance oracles were studied by many researchers. Chechik [4, 5], presented a \((2k-1)\)-stretch distance oracle with O(1) query time and \(O(n^{1+1/k})\) space. (See also [9, 20].)

Pǎtraşcu and Roditty [10] showed a distance oracle for weighted undirected graphs with stretch 2 and size \(O(n^{4/3}m^{1/3})\). For \(m=o(n^2)\), this distance oracle has \(o(n^2)\) size and stretch 2. Pǎtraşcu, Roditty and Thorup [11] showed for every integer \(k\ge 0\) and \(\ell >0\) distance oracles, that use \(\tilde{O}(m^{1+1/(k\pm 1/\ell )})\) space and answer distance query in \(O(k + \ell )\) time with stretch \(2k+1\pm 2/\ell \). Sommer, Verbin, and Yu [17] provided a lower bound in the cell probe model. They showed that there are sparse graphs for which constant stretch and query time requires \(m^{1+\varOmega (1)}\) spaceFootnote 6.

Due to lack of space, we refer the reader to the full version of this paper [15] for the rest of the related work section.

1.2 Paper Organization

In the next section we present some necessary preliminaries, the distance oracles of Thorup-Zwick and a standard variant of it, that is required in order to obtain our new distance oracle. In Sect. 3 we present our new distance oracles. In Sect. 4 we present our refined stretch analysis for Thorup-Zwick distance oracles. In Sect. 5 we present some concluding remarks and open problems. Due to lack of space, we omit here some of the proofs of Sect. 2 and the technical part of Sect. 4. We refer the reader to [15] for the full version of this paper. Also, in [15] we present the experiment that we have conducted on real world graphs. In the experiment we examine how frequent are the cases that are characterized in our refined stretch analysis from Sect. 4.

2 Preliminaries and Previous Work

Let \(G=(V,E)\) be an n-vertices m-edges undirected unweighted graph. For every \(u,v\in V\), let d(uv) be the length of the shortest path between u and v. Let N(u) be the vertices that are neighbours of u and let \(deg(u)=|N(u)|\) be the degree of u.

For every set \(A\subseteq V\), let \(p_A(u)\) be the closest vertex to u from A, that is \(p_A(u) := {\text {arg min}}_{v \in A} (d(u,v))\), where ties are broken in favor of the vertex with a smaller identifier, and let \(d(u,A)=d(u,p_A(u))\). Notice that it follows from this definition that if v is on a shortest path between u and \(p_A(u)\), then \(p_A(u)=p_A(v)\). For a set \(E'\subseteq E\) let \(V(E')=\{u \mid (u,v) \in E' \}\). Let N(usA) be the s closest vertices to u from the set A.

Let \(B(u,r)=\{ v\in V\mid d(u,v) < r \}\) and let \(B(u,r,X)= \{ v\in X\mid d(u,v) <r \}\), where \(X\subseteq V\). Let \(L(u,r)=\{v \in V \mid d(u,v)=r\}\).

The following Lemma is a standard tool in the area of approximate shortest paths and we provide it here for completeness.

Lemma 1

(e.g. Lemma 3.6 in [19]). Let U be a set of size u. Let \(Q_1,\ldots ,Q_n\subseteq U\). If \(|Q_i| \ge s\), for every \(1\le i \le n\) then a hitting set A of size \(\tilde{O}(u/s)\) such that \(Q_i\cap A \ne \emptyset \) can be found with a deterministic algorithm in \(O(u+\sum _{i=1}^{n}|Q_i|)\) time.

2.1 The Distance Oracle of Thorup and Zwick

In their seminal paper Thorup and Zwick [19] showed that there is a data structure of size \(O(kn^{1+1/k})\) that returns a \((2k-1)\) multiplicative approximation (stretch) of the distances of an undirected weighted graph in O(k) time. Let \(k\ge 1\) and let \(A_0,A_1,\ldots , A_k\) be sets of vertices, such that \(A_0=V\), \(A_k=\emptyset \) and \(A_i\) is a subset of \(A_{i-1}\) of size at most \(\tilde{O}(|A_{i-1}|/s)\) that hits for every \(v\in V\) the set \(N(v,s,A_{i-1})\), where s is a parameter. The set \(A_i\) is computed using Lemma 1. For every \(u\in V\), let \(p_i(u)=p_{A_i}(u)\) and \(\ell _i(u)=d(u,A_i)=d(u,p_i(u))\). We set \(p_0(u)\) to u, \(p_k(u)\) to be null and \(\ell _k(u)\) to \(\infty \).

For every \(0 \le i \le k-1\), let \(B_i(u)=B(u,\ell _{i+1}(u),A_i)\). The bunch of \(u\in V\) is \(B(u)=\cup _{i=0}^{k-1} B_i(u)\).

The information saved in the distance oracle for every \(u\in V\) is \(B(u)=\cup _{i=0}^{k-1} B_i(u)\), the value of d(uv), for every \(v\in B(u)\), in a 2-level hash table and the vertex \(p_i(u)\), where \(0 \le i \le k\).

Thorup and Zwick proved the following:

Lemma 2

[Theorem 3.7 [19]]. For every \(u \in V\) and \(i\in [0,k-2]\), the size of \(B_i(u)\) is at most s and the size of \(B_{k-1}(u)\) is \(\tilde{O}(n/s^{k-1})\).

Setting \(s=n^{1/k} c \log n\) yields the desired size bound \(O(kn^{1+1/k})\). The query algorithm dist(uv) of the distance oracle is presented in [15]. We look for the smallest even i such that \(p_i(u) \in B_i(v)\) or \(p_{i+1}(v) \in B_{i+1}(u)\). Since both \(p_{k-1}(u) \in B_{k-1}(v)\) and \(p_{k-1}(v) \in B_{k-1}(u)\) the algorithm always stops. Let f(uv) be the largest value that i reached to during the run of dist(uv). In other words, f(uv) is the largest value such that for every even \(j<f(u,v)\), it holds that \(p_j(u)\notin B_j(v)\) and for every odd \(j< f(u,v)\) it holds that \(p_j(v)\notin B_j(u)\). Since dist(uv) always stops it follows that \(f(u,v)\le k-1\).

To bound the stretch we first prove the following Lemma that is implicit in [19]. We prove it explicitly in [15] since we use it in our proofs

Lemma 3

For every even \(i\le f(u,v)\) it holds that \(\ell _i(u) \le i \cdot d(u,v)\) and for every odd \(i\le f(u,v)\) it holds that \(\ell _i(v) \le i \cdot d(u,v)\).

We proceed with the following useful observation on Thorup-Zwick distance oracle that we will use later on. Consider the set \(A_{i-j}\), where i and j are even and \(0\le j<i\le f(u,v)\). From Lemma 3 it follows that \(\ell _{i-j}(u) \le (i-j)\cdot d(u,v)\) and \(\ell _{i}(u) \le i\cdot d(u,v)\). But what if we have a bound for \(\ell _{i-j}(u)\) that is better than \((i-j)\cdot d(u,v)\), can we use it to obtain a better bound for \(\ell _{i}(u)\)? In the next Lemma we present a generalization of Lemma 3 and show that this is indeed possible. The proof is given in [15].

Lemma 4

For every even \(i\le f(u,v)\): (i) \(\ell _i(u) \le \ell _{i-j}(u) + j\cdot d(u,v)\), for every even \(j\le i\), and (ii) \(\ell _i(u) \le \ell _{i-j}(v) + j\cdot d(u,v)\), for every odd \(j\le i\).

For every odd \(i\le f(u,v)\): (i) \(\ell _i(v) \le \ell _{i-j}(u) + j\cdot d(u,v)\), for every even \(j\le i\), and (ii) \(\ell _i(v) \le \ell _{i-j}(v) + j\cdot d(u,v)\), for every odd \(j\le i\).

We finish the description of Thorup-Zwick distance oracle with a bound on dist(uv).

Lemma 5

dist(uv) outputs an estimation that is bounded by \(2\ell _{f(u,v)}(u)+d(u,v)\le (2f(u,v)+1)d(u,v)\le (2k-1)d(u,v)\), for even f(uv) and by \(2\ell _{f(u,v)}(v)\) \(+ d(u,v)\le (2f(u,v)+1)d(u,v)\le (2k-1)d(u,v)\), for odd f(uv).

Proof

Let \(i=f(u,v)\) be even. The algorithm returns \(d(u,p_i(u))+d(v,p_i(u))\). Using the triangle inequality we get \(d(u,p_i(u))+d(v,p_i(u))\le 2\ell _i(u)+d(u,v)\). From Lemma 3 we have \(\ell _i(u) \le i \cdot d(u,v)\) and since \(i\le k-1\) we get \(d(u,p_i(u))+d(v,p_i(u))\le (2i+1)d(u,v) \le (2k-1)d(u,v)\). For the case that f(uv) is odd the proof is the same with u and v switching their roles.

2.2 A Standard Variant of the Distance Oracle of Thorup and Zwick

In order to obtain the new distance oracle we are using a slightly different but relatively standard variant of the distance oracle of Thorup and Zwick (e.g. [5]), which we present below.

In this variant we also save in the distance oracle the exact distance for every pair \(\langle u,v \rangle \in A_{k/2} \times A_{k/2-1}\), when k is even, and every pair \(\langle u,v \rangle \in A_{(k-1)/2} \times A_{(k-1)/2}\) when k is odd. In both cases the space remains \(O(kn^{1+1/k}\log n)\), since \(|A_{k/2}|\cdot |A_{k/2-1}|=O(kn^{1+1/k}\log n)\), when k is even and \(|A_{(k-1)/2}| \cdot |A_{(k-1)/2}|=O(kn^{1+1/k}\log n)\), when k is odd.

The query will work as follows. Let \(u,v\in V\). Let \(f=\min (f(u,v),f(v,u))\). If \(f\le \lfloor k/2 \rfloor \) then we output \(\min (dist(u,v),dist(v,u))\). If \(f> \lfloor k/2 \rfloor \) then we output \(\min \big ( \ell _{k/2}(u) + d(p_{k/2}(u), p_{k/2-1}(v)) + \ell _{k/2-1}(v), \ell _{k/2}(v) + d(p_{k/2}(v), p_{k/2-1}(u)) + \ell _{k/2-1}(u)\big )\), for an even k, and \(\ell _{(k-1)/2}(u) + d(p_{(k-1)/2}(u), \)

\(p_{(k-1)/2}(v)) + \ell _{(k-1)/2}(v)\), for an odd k.

In the next Lemma we establish an upper bound on the query output when \(f > \lfloor k/2 \rfloor \).

Lemma 6

When \(f > \lfloor k/2 \rfloor \) the query algorithm described above returns an estimation that is at most \(\min (2\ell _{k/2}(u)+2\ell _{k/2-1}(v)+d(u,v), 2\ell _{k/2}(v)+2\ell _{k/2-1}(u)+d(u,v))\), when k is even and at most \(2\ell _{(k-1)/2}(u)+2\ell _{(k-1)/2}(v)+d(u,v)\), when k is odd.

Proof

Let \(a=\ell _{k/2}(u) + d(p_{k/2}(u), p_{k/2-1}(v)) + \ell _{k/2-1}(v)\). Let \(b=\ell _{k/2}(v) + d(p_{k/2}(v), p_{k/2-1}(u)) + \ell _{k/2-1}(u)\). Let \(A=2\ell _{k/2}(u)+2\ell _{k/2-1}(v)+d(u,v)\) and let \(B=2\ell _{k/2}(v)+2\ell _{k/2-1}(u)+d(u,v)\). For even k, the query returns \(\min \big ( a,b \big )\). We show that this value is at most \(\min (A, B)\).

Using the triangle inequality we get that \(d(p_{k/2}(u), p_{k/2-1}(v))\le \ell _{k/2}(u) + d(u,v) + \ell _{k/2-1}(v)\). Therefore, \(a\le A\). Similarly, we get that \(d(p_{k/2}(v), p_{k/2-1}(u))\) \(\le \ell _{k/2}(v) + d(u,v) + \ell _{k/2-1}(u)\). Therefore, \(b\le B\). Adding it all together we get that \(\min (a,b)\le \min (A,B)\), as required.

When k is odd, the query returns \(\ell _{(k-1)/2}(u) + d(p_{(k-1)/2}(u), p_{(k-1)/2}(v)) + \ell _{(k-1)/2}(v) \le \ell _{(k-1)/2}(u) + (\ell _{(k-1)/2}(u) + d(u,v) + \ell _{(k-1)/2}(v)) + \ell _{(k-1)/2}(v)= 2\ell _{(k-1)/2}(u)+2\ell _{(k-1)/2}(v)+d(u,v)\).

It is relatively straightforward to prove that the estimation produced by the updated query algorithm has \(2k-1\) stretch by combining Lemma 6 with Lemma 3.

Throughout the paper we will refer to this variant of Thorup-Zwick distance oracle as the standard variant of Thorup-Zwick distance oracle.

3 Distance Oracles with Improved Stretch

In this section we present our new distance oracle construction. We combine between two ideas. The first idea is to interplay between a hitting set of vertices and a hitting set of edges. This allows us to obtain, in some cases, a better bound on \(\ell _1(u)\), for every \(u\in V\). Consider a pair of vertices \(u,v\in V\) such that \(d(u,v)=\varDelta \). In Thorup and Zwick distance oracles if \(v \notin B_0(u)\) then it follows that \(\ell _1(u)\le \varDelta \) and this bound is used, among other bounds, to bound the estimation. In our distance oracles we will have to use \(\ell _1(u)\) to bound the estimation only in the case that \(\ell _1(u)\le \varDelta -1\). Our second idea is that in order to amplify the affect of this better bound we can use the standard variant of Thorup and Zwick distance oracles, presented in Sect. 2.2, since it allows to combine in the bound of the estimation both \(\ell _1(u)\) and \(\ell _1(v)\) in the case that both \(\ell _1(u)\le \varDelta -1\) and \(\ell _1(v)\le \varDelta -1\).

We now prove the following Theorem:

Theorem 1

Let \(G = (V,E)\) be an n-vertices m-edges undirected unweighted graph. For every \(k>2\) there is a distance oracle that uses \(O(knm^{1/k}\log n)\) space and for every pair of vertices \(u,v\in V\) returns in O(k) time an estimation \(d^*(u,v)\) such that:

$$\begin{aligned} d(u,v) \le d^*(u,v) \le (2k-1)d(u,v)-4. \end{aligned}$$

For \(k=2\), the estimation \(d^*(u,v)\) satisfies: \(d(u,v) \le d^*(u,v) \le 3d(u,v)-2.\)

Proof

Our new distance oracle is constructed as follows. Let \(s=m^{1/k}c\log n\). We start with the set \(A_1\) that will be the union of two sets, \(A^{\mathrm {v}}_1\) and \(A^{\mathrm {e}}_1\). The set \(A^{\mathrm {v}}_1\subseteq V\) is a hitting set of size \(\tilde{O}(m/s)\) of the sets N(vsV), for every \(v\in V\), computed using Lemma 1.

The set \(A^{\mathrm {e}}_1\) is computed as follows. We first compute for every \(u\in V\) the set \(L(u,d(u,A^{\mathrm {v}}_1))\). Let \(V^H=\{u \mid |L(u,d(u,A^{\mathrm {v}}_1))|\ge s\}\). For every \(u\in V^H\) let \(E^H(u) = \{ (x,y)\in E \mid x\in L( u,d(u,A^{\mathrm {v}}_1)-1) \wedge y\in L( u,d(u,A^{\mathrm {v}}_1)) \}\), that is, all the edges with one endpoint at distance \(d(u,A^{\mathrm {v}}_1)-1\) from u and another endpoint at distance \(d(u,A^{\mathrm {v}}_1)\) from u. Consider now the sets \(E^H(u)\), for every \(u\in V^H\). Each such set contains at least s edges and there are at most n such sets. Thus, we can apply Lemma 1 to compute a hitting set \(E^H \subseteq E\) of size \(\tilde{O}(m/s)\). Let \(A^{\mathrm {e}}_1=V(E^H)\). We set \(A_1\) to \(A^{\mathrm {v}}_1 \cup A^{\mathrm {e}}_1\).

We now proceed with the sets \(A_2,\ldots ,A_{k-1}\) as in the distance oracle of Thorup and Zwick, that is, \(A_i\) is a subset of \(A_{i}\) of size at most \(\tilde{O}(|A_{i-1}|/s)\) that hits for every \(v\in V\) the set \(N(v,s,A_{i-1})\). The set \(A_k\) is empty.

We use the sets \(V=A_0,A_1,\ldots , A_k\) to construct the standard variant of the distance oracle. The special way we used to compute the set \(A_1\) allows us to prove the following crucial Lemma:

Lemma 7

\(\sum _{u\in V} |L(u,\ell _1(u))| =\tilde{O}(nm^{1/k}).\)

Proof

Assume, towards a contradiction, that there exists \(u\in V\) such that \(|L(u,\ell _1(u))| > s\). Since \(A_1 = A^{\mathrm {v}}_1 \cup A^{\mathrm {e}}_1\) we have \(\ell _1(u) = \min ( d(u,A^{\mathrm {v}}_1), d(u,A^{\mathrm {e}}_1))\). It cannot be that \(\ell _1(u)=d(u,A^{\mathrm {v}}_1)\) because this implies that \(|L(u,d(u,A^{\mathrm {v}}_1))|>s\) and \(u\in V^H\). In such a case, an edge (xy) from \(E^H(u)\) is in \(E^H\) and \(x\in A^{\mathrm {e}}_1\) is added to \(A_1\). Since \(d(u,A^{\mathrm {e}}_1)\le d(u,x) = d(u,A^{\mathrm {v}}_1)-1\) and \(\ell _1(u) = \min ( d(u,A^{\mathrm {v}}_1), d(u,A^{\mathrm {e}}_1))\) we get that it must be that \(\ell _1(u)<d(u,A^{\mathrm {v}}_1)\).

So we have \(|L(u,\ell _1(u))| > s\) and \(\ell _1(u)=d(u,A^{\mathrm {e}}_1)<d(u,A^{\mathrm {v}}_1)\). The set \(A^{\mathrm {v}}_1\) is a hitting set for the sets N(vsV), for every \(v\in V\). From Lemma 2 it follows that \(|B(u,d(u,A^{\mathrm {v}}_1))|\le s\). Since \(\ell _1(u)=d(u,A^{\mathrm {e}}_1)<d(u,A^{\mathrm {v}}_1)\) we get that \(L(u,\ell _1(u))\subseteq B(u,d(u,A^{\mathrm {v}}_1))\), a contradiction to the fact that \(|L(u,\ell _1(u))| > s\). Thus, we get that \(\sum _{u\in V} |L(u,\ell _1(u))| =s\cdot n=\tilde{O}(nm^{1/k})\), as required.

It follows from the above Lemma that we can save also the set \(L(u,\ell _1(u))\), for every \(u\in V\), in a 2-level hash table, without increasing the total size of the distance oracle.

Given a pair \(u,v \in V\) the query works as follows. First, we check if \((u,v)\in E\) and if so return 1 and stop. Otherwise, we check if either \(v\in L(u,\ell _1(u))\) or \(u\in L(v,\ell _1(v))\) and if so return the exact distance and stop. If this is not the case we use the query of the standard variant of Thorup-Zwick distance oracle on uv and on vu and report the minimum of these two estimations.

Next, we analyze the stretch of the distance oracle. Let \(u,v \in V\) and let \(\varDelta =d(u,v)\). If \((u,v)\in E\) or \(u\in B_0(v)\) or \(v\in B_0(u)\) then the exact distance is returned. Therefore, we can assume that \((u,v)\notin E\), \(u\notin B_0(v)\) and \(v\notin B_0(u)\). Let \(d(u',v)=d(u,v')=\varDelta -1\), where \(u'\in N(u)\) and \(v'\in N(v)\). If \(u'\in B_0(v)\) (respectively, \(v'\in B_0(u)\)) then \(u\in L(v,\ell _1(v))\) (respectively, \(v\in L(u,\ell _1(u))\)) and the exact distance is returned. Therefore, we can assume also that \(u'\notin B_0(v)\) and \(v'\notin B_0(u)\). This implies that \(\ell _1(v)\le \varDelta -1\) and \(\ell _1(u)\le \varDelta -1\).

For \(k=2\) the standard variant of Thorup-Zwick distance oracle degenerates to the regular one since the additional distances stored are for pairs from \(A_1 \times A_0\). The query returns \(\ell _1(u)+d(v,p_1(u))\) which is bounded by \(2\ell _1(u)+\varDelta \). Using the bound \(\ell _1(u)\le \varDelta -1\) we get that the estimation is bounded by \(3\varDelta -2\), as required.

Consider now the case that \(k \ge 3\). As we have checked whether \((u,v)\in E\), we can assume that \(\varDelta \ge 2\). Let \(f=\min \big ( f(u,v), f(v,u)\big )\). In the case that \(f\le \lfloor k/2 \rfloor \) the query returns \(\min (dist(u,v),dist(v,u))\). From Lemma 5 it follows that this estimation is bounded by \((2(k/2) + 1)d(u,v)=(k+1)\varDelta \le (2k-1)\varDelta - 4\) for even \(k\ge 4\) and \(\varDelta \ge 2\), and bounded by \((2((k-1)/2) + 1)d(u,v)=k\varDelta \le (2k-1)\varDelta - 4\) for odd \(k\ge 3\) and \(\varDelta \ge 2\).

For \(f> \lfloor k/2 \rfloor \) the query returns \(\min \big ( \ell _{k/2}(u) + d(p_{k/2}(u), p_{k/2-1}(v)) + \ell _{k/2-1}(v), \ell _{k/2}(v) + d(p_{k/2}(v),p_{k/2-1}(u)) + \ell _{k/2-1}(u)\big )\), for an even k, and \(\ell _{(k-1)/2}(u) + d(p_{(k-1)/2}(u),p_{(k-1)/2}(v)) + \ell _{(k-1)/2}(v)\), for an odd k.

Consider the case of an even k. Let \(i=k/2\) and assume that i is even. It follows from Lemma 6 that \(2\ell _{i}(u)+2\ell _{i-1}(v)+d(u,v)\) is an upper bound for the estimation. From Lemma 4 we have \(\ell _{i}(u)\le \ell _1(v) + (i - 1)\varDelta \) and \(\ell _{i-1}(v)\le \ell _1(u) +(i-2)\varDelta \). Thus, we get:

$$\begin{aligned} 2\ell _{i}(u) +2\ell _{i-1}(v) + d(u,v)&\le 2(\ell _1(v) + (i - 1)\varDelta ) +2((\ell _1(u) + (i - 2)\varDelta )) + \varDelta \\&\le 2\ell _1(u)+2\ell _1(v) + 4i\varDelta -5\varDelta \\&\le 4(\varDelta - 1) + 4(k/2)\varDelta -5\varDelta \\&\le (2k-1)\varDelta - 4 \end{aligned}$$

Assume now that i is odd. It follows from Lemma 6 that \(2\ell _{i}(v)+2\ell _{i-1}(u)+d(u,v)\) is an upper bound for the estimation. From Lemma 4 we have \(\ell _{i}(v)\le \ell _1(v) + (i - 1)\varDelta \) and \(\ell _{i-1}(u)\le \ell _1(v) +(i-2)\varDelta \). Thus, we get:

$$\begin{aligned} 2\ell _{i}(v) +2\ell _{i-1}(u) + d(u,v)&\le 4\ell _{1}(v) + 4i\varDelta -5\varDelta \\&\le 4(\varDelta - 1) + 4(k/2)\varDelta -5\varDelta \\&\le (2k-1)\varDelta - 4 \end{aligned}$$

Consider now the case that k is odd. Let \(i=(k-1)/2\). It follows from Lemma 6 that \(2\ell _{i}(u)+2\ell _{i}(v)+d(u,v)\) is an upper bound for the estimation. From Lemma 4 we have \(\ell _{i}(v)\le \ell _1(u) + (i - 1)\varDelta \) and \(\ell _{i}(u)\le \ell _1(v) +(i-1)\varDelta \) if i is even or odd. Thus, we get:

$$\begin{aligned} 2\ell _{i}(u)+2\ell _{i}(v)+d(u,v)&\le 2(\ell _1(v) + (i - 1)\varDelta )+2( \ell _1(u) + (i - 1)\varDelta ) + \varDelta \\&\le 4(\varDelta -1 + (i - 1)\varDelta ) + \varDelta \\&\le 4(i\varDelta - 1) +\varDelta \\&\le (2k-1)\varDelta - 4 \end{aligned}$$

Remark. The hierarchal nature of the query algorithm that is based on the bunches induced by the sets \(V=A_0,A_1,\ldots , A_k\) makes it tempting to try to apply the interplay between a hitting set of vertices and a hitting set of edges not only to \(A_1\) but also to the sets \(A_2,\ldots , A_k\). This however is not possible from the following reason. To obtain the improved bound on \(\ell _1(u)\) we need that \(p_{A_1}(u)\in A^{\mathrm {e}}_1\). Thus, in the next step of the query we need to check if \(p_{A_1}(u)\in A^{\mathrm {e}}_1\) is in \(B_2(v)\). To get a better bound now for \(\ell _2(v)\) we need to be able to either save the vertices of \(A_1\) that are at distance \(\ell _2(v)\) from v, in case that there are at most s such vertices or to improve the bound on \(\ell _2(v)\) by a tighter hitting set of size \(\tilde{O}(m/s^2)\), if there are strictly more than s such vertices. However, in the later case, the fact that there are more than s vertices of \(A_1\), which all might be vertices of \(A^{\mathrm {e}}_1\), at distance \(\ell _2(v)\) does not imply that the number of edges with one endpoint at distance \(\ell _2(v)-1\) from v and another endpoint at distance \(\ell _2(v)\) from v is more than \(s^2\). It might be that there are many edges (strictly more than \(s^2\)) with both endpoints at distance \(\ell _2(v)\) from v. These edges can cause to strictly more than s vertices of \(A^{\mathrm {e}}_1\) to be at distance \(\ell _2(v)\) from v. On the other hand, hitting these set of edges might result with an edge whose both endpoints are at distance \(\ell _2(v)\) and will not improve \(\ell _2(v)\).

4 A Refined Stretch Analysis of Thorup-Zwick Distance Oracle

In this section we present several different conditions that can be easily checked and once fulfilled by the distance oracle of Thorup-Zwick guarantee that the estimation has a stretch which is strictly better than \(2k-1\).

The main parameter that we use is the average distance between a vertex and the sets \(A_1,\ldots ,A_{k-1}\). We define the average distance between \(u\in V\) and \(A_i\) to be \(\bar{\ell }_i(u) = \lceil \ell _i(u) / i \rceil \), where \(i\in [1,k-1]\).

Let \(\hat{d}(u,v)=\min (dist(u,v),dist(v,u))\). We prove the following properties:

Property 1

Let \(u\in V\). If \(\bar{\ell }_{i}(u)\ne \bar{\ell }_{j}(u)\) for some \(i,j\in [1,k-1]\) then for every \(v\in V\) the stretch of \(\hat{d}(u,v)\) is strictly better than \((2k-1)\).

Property 2

Let \(u,v\in V\). If \(\bar{\ell }_{i}(u)\ne \bar{\ell }_{i}(v)\) for some \(i\in [1,k-1]\) then the stretch of \(\hat{d}(u,v)\) is strictly better than \((2k-1)\).

Property 3

Let \(u,v\in V\). If \(\bar{\ell }_{i}(u)=\bar{\ell }_{i}(v)=q\), for every \(i\in [1,k-1]\) and \(d(u,v)\ne q\) then the stretch of \(\hat{d}(u,v)\) is strictly better than \((2k-1)\).

Before we turn into the technical part of this section we discuss these properties. First notice to the nice relation between these properties. If the conditions of Property 1 do not hold then the conditions of Property 2 can still hold, and if the conditions of both Properties 1 and 2 do not hold then the conditions of Property 3 can still hold.

From the implementation perspective we can verify whether Property 1 and Property 2 hold using a simple computation that does not require the actual computation of the distance oracle itself. Moreover, if Property 1 does not hold then we have \(\bar{\ell }_{i}(u)= \ell _{1}(u)\), for every \(i\in [1,k-1]\), since \(\bar{\ell }_{1}(u)=\ell _{1}(u)\). Thus, \(\ell _{1}(u)-1\le \nicefrac {\ell _{i}(u)}{i}\le \ell _{1}(u)\) and we get that \(\ell _{i}(u)\in [i\ell _{1}(u)-i,i\ell _{1}(u)]\). In such a scenario the shortest paths tree of u has a relatively well defined structure in which \(|B(u,\ell _{1}(u))|\le n^{1/k}\) and for every \(i\in [2,k-1]\) it holds that \(|B(u,i\ell _{1}(u)-i)|\le n^{i/k}\) and \(n^{i/k}\le |B(u,i\ell _{1}(u))|\). It is a plausible conjecture that such a well defined structure is not common. For the sake of completeness we do a small experiment on several different datasets of real world graphs to test how frequent these properties are. We elaborate more on this experiment in [15].

Due to lack of space, we omit the technical part of this section, which can be found in [15].

5 Concluding Remarks

In this paper we proved that for every \(k\ge 2\) there is a distance oracle of size \(O(knm^{1/k}\log n)\) that produces in O(k) time an estimation \(d^*(u,v)\) that satisfies \(d(u,v) \le d^*(u,v) \le (2k-1)d(u,v)-4\), for \(k>2\), and \(d(u,v) \le d^*(u,v) \le 3d(u,v)-2\), for \(k=2\).

An interesting open problem is whether it is possible to obtain a distance oracle with the same size and query time whose estimation \(d^*(u,v)\) satisfies \(d(u,v) \le d^*(u,v) \le (2k-1)d(u,v)-\varOmega (k)\), for large enough k.