Speeding up dynamic programming in the line-constrained k-median

Gawrychowski, Paweł; Zatorski, Łukasz

doi:10.1007/s00224-017-9780-y

Speeding up dynamic programming in the line-constrained k-median

Published: 22 June 2017

Volume 62, pages 1351–1365, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Theory of Computing Systems Aims and scope Submit manuscript

Speeding up dynamic programming in the line-constrained k-median

Download PDF

268 Accesses
1 Citation
Explore all metrics

Abstract

In the planar k-median problem we are given a set of demand points and want to open up to k facilities as to minimize the sum of the transportation costs from each demand point to its nearest facility. In the line-constrained version the medians are required to lie on a given line. We present a new dynamic programming formulation for this problem, based on constructing a weighted DAG over a set of median candidates. We prove that, for any convex distance metric and any line, this DAG satisfies the concave Monge property. This allows us to construct efficient algorithms in L _∞ and L ₁ and any line, while the previously known solution (Wang and Zhang, ISAAC 2014) works only for vertical lines. We also provide an asymptotically optimal $\mathcal {O}(n)$ solution for the case of k = 1.

Speeding up Dynamic Programming in the Line-Constrained k-median

Extensions to the planar p-median problem

Article 19 April 2023

To Close Is Easier Than To Open: Dual Parameterization To k-Median

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The planar k-median problem is a variation of the well-known facility location problem. For a given set P of demand points, we want to find a set Q of k facilities, such that the sum of all transportation costs from a demand point to its closest facility is minimized. Each p ∈ P is associated with its own (positive) cost per unit of distance to assigned facility, denoted w(p). Formally, we want to minimize:

$$S(P) = \sum\limits_{p \in P} \min_{q \in Q} \, w(p) \cdot d(p, q) $$

Because the problem is NP-hard for many metrics [8], we further restrict it by introducing a line-constraint on the set Q. We require that all facilities should belong to a specified facility line χ defined by an equation a x + b y = c, where $a,b,c \in \mathbb {R} \>$ and a ⋅ b ≠ 0. Such a constraint is natural when all facilities are by design placed along a path that can be locally treated as linear, e.g., pipeline, railroad, highway, country border, river, longitude or latitude.

For k = 1 we obtain the line-constrained 1-median problem. Despite the additional restriction, the complexity of this simplest variant strongly depends on the metric. For a point $p \in \mathbb {R}^{2}$, let x(p) and y(p) denote its x- and y-coordinate. The most natural metric is the Euclidean distance, where $ {L_{2}}(p,q) = \sqrt {(x(p) - x(q))^{2} + (y(p) - y(q))^{2}}$. It is known that even for 5 points, it is not possible to construct the 1-median with a ruler and compass. It can also be proven that the general, line-constrained and 3-dimension versions of the k-median problem are not solvable over the field of rationals [2]. Hence it is natural to consider also other distance functions, for example (Fig. 1):

Chebyshev distance:: L _∞(p, q) = max{|x(p) − x(q)|,|y(p) − y(q)|},
Manhattan distance:: L ₁(p, q) = |x(p) − x(q)| + |y(p) − y(q)|,
squared Euclidean distance:: ${{L_{2}^{2}}}(p,q) = (x(p) - x(q))^{2} + (y(p) - y(q))^{2}$.^{Footnote 1}

All these distances functions have been recently considered by Wang and Zhang [11] in the context of line-constrained k-median problem. They designed efficient algorithms based on a reduction to the minimum weight k-link path problem. However, their L ₁ and L _∞ solutions work only in the special case of a horizontal facility line.

We provide a different dynamic programming formulation of the problem that works for any facility line χ in L ₁ and L _∞. The new formulation can also be seen as a minimum weight k-link path in a DAG, where the weights are Monge. However, looking up the weight of an edge in this DAG is more expensive. We show how to implement edge lookups in $\mathcal {O}(\log n)$ after $\mathcal {O}(n\log n)$ time and space preprocessing which then allows us to apply the SMAWK algorithm [1] or, if k = Ω(log n), the algorithm of Schieber [10] to obtain complexities listed in (Table 1).

Table 1 Comparison of results with previous approaches

Full size table

In L _∞, our general solution is faster than the one given by Wang and Zhang for the special case of horizontal facility line. We also provide a specialized procedure solving the problem for k = 1 in linear time.

2 Preliminaries

2.1 Monge property in dynamic programming

In many applications dynamic programming problem formulation can be visualized as finding the row minima of (possibly many) matrices [6]. Such approach helps to notice additional conditions satisfied by subproblem space. A basic tool for speeding up dynamic programming is the so-called Monge property. It can often be used to improve the time complexity by an order of magnitude, especially in geometric problems.

Definition 1

A weight function w is concave Monge if, for all a < b and c < d, w(a, c) + w(b, d) ≤ w(b, c) + w(a, d).

The naive solution for finding row minima of an n × n matrix takes $\mathcal {O}(n^{2})$ time. However, Aggarwal et al. [1] showed how to decrease the time complexity to $\mathcal {O}(n)$ if the matrix has the so-called total monotonicity property, which is often established through the Monge property. Their method is usually referred to as the SMAWK algorithm. There is a deep connection between SMAWK and other methods for speeding up dynamic programming, such as the Knuth-Yao inequality used for building optimal binary search trees, as observed by Bein et al. [3].

2.2 Minimum diameter path and minimum weight k-link path

Let D be a DAG on n nodes 0,1,…, n − 1 with a weight function w(i, j) defined for 0 ≤ i < j < n that corresponds to the weight of the edge 〈i, j〉. The weights form an n × n upper triangular partial matrix, where defined entries form within each row or column a contiguous interval. In such case, Monge and total monotonicity properties are required to hold only for quadruples of defined entries. Although not needed by referenced algorithms, one can imagine that undefined entries are pre-filled with infinite-like values in order to satisfy the property for the full matrix, as in Bein et al. [3]. It follows that:

Definition 2

A weight function w of a DAG is concave Monge if, for all a < b < c < d, w(a, c) + w(b, d) ≤ w(b, c) + w(a, d) (Fig. 2).

A minimum diameter path in D is a path from 0 to n − 1 with the minimum weight. Galil and Park showed how to find such path in optimal $\mathcal {O}(n)$ time using the SMAWK algorithm [5]. A minimum weight k-link path is a minimum weight path from 0 to n − 1 consisting of exactly k edges (links).

Lemma 1

Minimum weight k-link path of concave Monge DAG can be found in $\mathcal {O}(nk)$ and, for k = Ω(log n), $n2^{\mathcal {O}(\sqrt {\log k \log \log n})}$ time.

Proof

To obtain $\mathcal {O}(nk)$ time complexity, we iteratively compute minimum weight 1-link, 2-link, … , (k − 1)-link and finally k-link paths from 0 to every other node. This can be seen as k layers of dynamic programming, each requiring only $\mathcal {O}(n)$ time thanks to the SMAWK algorithm. Alternatively, $n2^{\mathcal {O}(\sqrt {\log k \log \log n})}$ time algorithm for k = Ω(log n) was given by Schieber [10]. □

2.3 Orthogonal queries

The weights of the edges in our DAG will be computed on-the-fly with orthogonal queries. We will use the following tool: preprocess a given set of n weighted points in a plane for computing the sum of the weights of all points in a given query range [x, + ∞] × [y, + ∞]. We call this problem orthogonal range sum.

Lemma 2

There exists a data structure for the orthogonal range sum problem that can be built in $\mathcal {O}(n \log n)$ time and answers any query in $\mathcal {O}(\log n)$ time.

Proof

We convert the points into a sequence by sorting them according to their x-coordinates (without losing the generality, these coordinates are all distinct) and writing down the corresponding y-coordinates. The y-coordinates are further normalized by replacing with the ranks on a sorted list of all y-coordinates (again, we assume that they are all distinct). Hence we obtain a sequence of length n over an alphabet [n], where each character has its associated weight. We build a wavelet tree [7] of this sequence in $\mathcal {O}(n\log n)$ time. Each node of the wavelet tree is augmented with an array of partial sums of the prefixes of its subsequence. This extra information yields in total Θ(n log n) space complexity, dominating any efforts to compress the node structure itself, otherwise common in text-index applications of the tree. Given an orthogonal query, we first normalize it by looking at the sorted list of all x- and y-coordinates. Then we traverse the wavelet tree starting from the root and accumulate appropriate partial sums. The details can be found in [9]. □

3 Normalizing problem instances

L ₁ and L _∞ metrics are equivalent, which can be seen by rotating the plane by 45^∘. Hence from now on we will work in L ₁ metric. This simplification was not possible in the previous approach [11], since it required the facility line to be horizontal, which is no longer true after rotation.

We further modify the problem instance so that the line χ is expressed in a slope intercept form y = a x, where a ∈ [0,1], and all coordinates of points in P are non-negative. This is always possible by reflecting along the horizontal axis, then along the line y = x, and finally translating. Such transformations do not modify the distances in L ₁, so computing the k-median solution Q for the transformed instance gives us the answer for the original instance. Because any solution Q can be transformed so that the x-coordinates of all facilities are distinct without increasing the cost, we will consider only such solutions and identify each facility with its x-coordinate.

4 Computing 1-median

Let D(p, x) be the weighted distance between p ∈ P and (x, a ⋅ x) ∈ χ :

$${D}(p, x) = w(p) \cdot d(p, (x, a\cdot x)) $$

Whenever we say that p ∈ P is closer to coordinate x _i than x _j, we mean that D(p, x _i) < D(p, x _j). For a set of points A ⊆ P, D(A, x) is the sum of weighted distances:

$${D}(A, x) = \sum\limits_{p \in A} w(p) \cdot d(p, (x, a\cdot x)) $$

The 1-median is simply $\min \limits _{x \in \mathbb {R}} {D}(P, x) $.

A function $f : \mathbb {R} \rightarrow \mathbb {R}$ is convex if the line segment between any two points on its graph lies above or on the graph. Such functions have the following properties:

1.
f(x) = |x − y| is convex for any y.
2.
if f(x) is convex, then g(x) = c ⋅ f(x) is convex for any positive c.
3.
if f(x) and g(x) are convex, then h(x) = f(x) + g(x) is also convex.

Lemma 3

For any point p, D(p, x)is convex. For any set of points P, D(P, x)is also convex.

Proof

Consider any point p ∈ P. From the definition:

$${D}(p, x) = w(p) \cdot L_{1}(p, (x, a\cdot x)) = w(p) \cdot \left( | x(p) - x | + |y(p) - a \cdot x | \right). $$

This is a sum of absolute values functions multiplied by the (positive) weight of p. Hence by the properties of convex functions D(p, x) is convex. Then D(P, x) is also convex since it is a sum of convex functions over p ∈ P. □

Since D(p, x) is convex, any of its local minima is a global minimum. Similarly to the function f(x) = |x|, it is only semi-differentiable. Its derivative D ^′(p, x) is a staircase nondecreasing function, undefined for at most two values x = x ₁ and x = x ₂. We call x ₁ and x ₂ the median candidates and for convenience assume that D ^′(p, x) is equal to its right derivative there. When a = 0 or p ∈ χ, D ^′(p, x) has exactly one median candidate x ₁ = x(p), that is the minimum. Otherwise, there are two median candidates x ₁ = x(p) and $x_{2}=\frac {y(p)}{a}$. For a ∈ (0,1), x ₁ is the only minimum, whereas for a = 1 every value in range [x ₁, x ₂] is a minimum.

Because the derivative of a sum of functions is the sum of their derivatives, D ^′(P, x) can only change at a median candidate of some p ∈ P. This means that a minimum of D(P, x) corresponds to one of at most 2n median candidates of P (Fig. 3). In other words, there exists a solution (x, y) ∈ χ, such that x = x(p) or y = y(p) for some p ∈ P. From now on, we use $\mathcal {M}(P)$ to denote the set of median candidates of P. $\mathcal {M}(P)$ can be computed in $\mathcal {O}(n)$ time by simply iterating over p ∈ P and adding x = x(p) and $x=\frac {y(p)}{a}$ to the result (note that this might give us a multiset, i.e., some median candidates might be included multiple times).

Theorem 1

We can solve line-constrained 1-median problem in $\mathcal {O}(n)$ time.

Proof

Because D ^′(P, x) is nondecreasing, we can binary search for the largest x such that D ^′(P, x) ≤ 0. Then we return x as the solution. In every step of the binary search we use the median selection algorithm [4] to narrow down the current search range X = (x _left, x _right). At the beginning of every step:

1.
M is a multiset of all median candidates of P that are in X.
2.
S contains all points from P with at least one median candidate in M.
3.
r = D ^′(P ∖ S, x) for some x ∈ X.

We select the median x _m of M and compute D ^′(P, x _m). If D ^′(P, x _m) > 0, we continue the search in (x _left, x _m), and otherwise in (x _m, x _right), updating S and M accordingly. Eventually x _left = x _right and we return x _left.

The key observation is that when a point p is removed from S, it does no longer have a median candidate within X and its D ^′(p, x) remains constant in all further computations. This means that D ^′(P ∖ S, x) is constant for all x ∈ X and r can be updated after removing every point p from S in $\mathcal {O}(1)$ time. x _m can be found in $\mathcal {O}(|M|)$ time. Calculating D ^′(P, x _m) = r + D ^′(S, x _m) then takes $\mathcal {O}(1+|S|)$ time. For a point p to be in S, one of its median candidates must belong to M, so |S|≤|M|. Hence the complexity of a single iteration is $\mathcal {O}(|M|)$. After each iteration the size of M decreases by a factor of two, so the running time is described by $T(n)=\mathcal {O}(n)+T(n/2)$, which solves to $\mathcal {O}(n)$. □

Theorem 2

We can calculate D(P, x)for all $x \in \mathcal {M}(P)$ in $\mathcal {O}(n \log n)$ time.

Proof

The elements of $\mathcal {M}(P)$ can be sorted in $\mathcal {O}(n\log n)$ time, and we can assume that every point generates exactly two median candidates. Let $\mathcal {M}(P)=\{x_{1},x_{2},\ldots ,x_{2n}\}$, where x _i ≤ x _i+1 for all i = 1,2,…,2n − 1. Recall that D ^′(P, x) = D ^′(P, x _i) for any x ∈ (x _i, x _i+1). We compute D(P, x ₁) together with D ^′(P, x ₁) in $\mathcal {O}(n)$ time. Then all other D(P, x _i) are computed sequentially for i = 2,3,…,2n in $\mathcal {O}(1)$ time each using the formula:

$$\begin{array}{@{}rcl@{}} D(P,x_{i}) &=& D(P, x_{i-1}) + D^{\prime}(P, x_{i-1}) \cdot(x_{i} - x_{i-1}) \\ D^{\prime}(P, x_{i}) &=& D^{\prime}(P, x_{i-1}) + 2 \cdot w(p) \cdot \sigma \end{array} $$

where x _i is generated by the point p, σ = 1 if x _i = x(p) and σ = a otherwise. □

5 Computing k-median

Consider now any optimal solution Q of the k-median problem for the given set of weighted points P. For any facility q ∈ Q, let P _q be the set of points of P assigned to q. By interchanging the order of the summation, Q should minimize

$$\sum\limits_{q \in Q} \sum\limits_{p \in P_{q}} w(p) \cdot d(p, q). $$

Hence q must be an optimal solution of the 1-median problem for P _q. Since replacing q will not increase the sum of distances of points in P ∖ P _q, q can be chosen to be a median candidate of P _q. We deduce that there exists an optimal solution Q ^′ such that

For $k \geq \min (n, |\mathcal {M}(P)|)$, every p ∈ P can be assigned to its closest possible facility. Such an assignment can be easily computed in $\mathcal {O}(n)$ time. If we are required to return exactly k medians, we simply add k − m i n(n,|M(P)|) unique facility points to the result. They will not be assigned to any of the demand points. From now on, we assume that $k < \min (n, |\mathcal {M}(P)|)$. Thus there exists an optimal k-median solution, where all facilities are 1-median candidates of P.

By arranging all median candidates in a sequence according to their x-coordinates, we can view choosing k facilities as selecting a (k + 1)-link path in a DAG between two artificial elements infinitely to the left and to the right of the sequence, called source and sink, respectively.

Imagine that we traverse the sequence from left to right while deciding if we should open a new facility at the current median candidate, see Fig. 4. Initially, all points are assigned to the artificial facility source and the cost of the current solution S is set to + ∞. If we decide to open a new facility at the current median candidate x _j, for every p ∈ P we check if x _j is closer to p than the facility p is currently assigned to. If so, we reassign p to x _j, see Fig. 5.

We claim that p ∈ P can be closer to x _j than the facility p is currently assigned to only if the currently assigned facility is the most recently chosen facility x _i, that is, the current solution does not contain any facilities between x _i and x _j. Assuming that the claim holds, we define the weight of an edge 〈source, i〉 to be D(P, x _i), and the weight of an internal edge 〈i, j〉 to be total decrease of the cost after giving each point p ∈ P the possibility to switch from x _i to x _j. Finally, the weight of an edge 〈j,sink〉 is 0. Then selecting k medians corresponds to selecting an (k + 1)-link from source to sink in the DAG. However, we need to show the claim. To this end we consider the following properties of convex functions:

Proposition 1

For any convex function f and a < b < c :

1.
If f(c) < f(b)then f(b) < f(a).
2.
If f(c) < f(a)then f(b) < f(a).

Proof

Assume otherwise for any of the two implications. This means that f(a) ≤ f(b) > f(c) and the segment AC where A = (a, f(a)) and C = (c, f(c)) lies below f(b), contradicting the assumption that the function f is convex. □

Now we can prove the claim. Consider a point p ∈ P such that its currently assigned facility is x _i and, for some k > i, facility x _k was not selected as a better option. Then, for any j > k, facility x _j cannot be a better option either, because x _i < x _k < x _j so by Proposition 1.1 D(p, x _i) ≤ D(p, x _k) implies D(p, x _j) ≥ D(p, x _k). This means that if x _i was the most recently opened facility and x _j is the current median candidate, opening a new facility at x _j changes the total cost by

$$\sum\limits_{p \in P} \min({D}(p, x_{j}) - {D}(p, x_{i}), 0). $$

Definition 3

Let x ₁, x ₂,…, x _n−1, x _2n be the sorted sequence of median candidates of P. We define its median DAG over nodes 0,1,…,2n,2n + 1 with edge weight function w(i, j) defined for i < j as follows:

$$\begin{array}{@{}rcl@{}} w(i,j) = \left\{\begin{array}{ll} \infty & \text{if } i=0 \text{ and } j=2n+1, \\ 0 & \text{if } i>0 \text{ and } j = 2n + 1, \\ {D}(P, x_{j}) & \text{if } i=0 \text{ and } j \in \{1,2,\ldots,2n\}, \\ \sum\limits_{p \in P} \min({D}(p, x_{j}) - {D}(p, x_{i}), 0) & \text{otherwise.} \end{array}\right. \end{array} $$

The total cost of any k-median solution is equal to the sum of weights on its corresponding path of length k + 1 between 0 and 2n + 1, so finding k-median reduces to finding the minimum weight (k + 1)-link path in the median DAG.

5.1 Mongeness in k-median DAG

Because a sum of Monge functions is also Monge, to prove that w(i, j) is Monge we argue that w _p(i, j) is Monge, where $w(i,j) = \sum \limits _{p \in P} w_{p}(i,j)$ and for i < j:

$$\begin{array}{@{}rcl@{}} w_{p}(i,j) = \left\{\begin{array}{ll} \infty & \text{if } i=0 \text{ and } j=2n+1, \\ 0 & \text{if } i>0 \text{ and } j = 2n + 1, \\ {D}(p, x_{j}) & \text{if } i=0 \text{ and } j \in \{1,2,\ldots,2n\}, \\ \min({D}(p, x_{j}) - {D}(p, x_{i}), 0) & \text{otherwise.} \end{array}\right. \end{array} $$

Proposition 2

For any convex function f, if a < b < c then:

$$\min(f(c) - f(a), 0) \leq \min(f(c) - f(b), 0). $$

Proof

If f(c) ≥ f(b) then the right side of the equation is equal to 0 and left side is non-positive. If f(c) < f(b) then by Proposition 1.1 also f(b) < f(a), so

$$\min(f(c)-f(a),0) \leq f(c)-f(a) < f(c)-f(b) = \min(f(c)-f(b),0) $$

so the claim holds. □

Proposition 3

For any convex function f, if a < b < c then

$$f(b) + \min(f(c) - f(a), 0) \leq f(c) + \min(f(b) - f(a), 0). $$

Proof

If f(b) ≥ f(a) then by Proposition 1.1 also f(c) ≥ f(b). Hence also f(c) ≥ f(a) and

$$f(b) + \min(f(c) - f(a), 0) = f(b) \leq f(c) = f(c) + \min(f(b)-f(a), 0) $$

so the property holds. Otherwise, f(b) < f(a) and the property becomes

$$f(b)+\min(f(c) - f(a), 0) \leq f(c) + f(b) - f(a) $$

which is always true due to min(f(c) − f(a),0) ≤ f(c) − f(a). □

Proposition 4

For any convex function f, if a < b < c < d then

$$\min(f(c) - f(a), 0) + \min(f(d) - f(b), 0) \leq \min(f(d) - f(a), 0) + \min(f(c) - f(b), 0). $$

Proof

If f(d) ≥ f(a), then

$$\min(f(d) - f(b), 0) \leq 0 = \min(f(d) - f(a), 0). $$

Combined with Proposition 2 applied to a < b < c we obtain the claim. Otherwise, f(d) < f(a) and by Proposition 1(b) applied to a < c < d also f(c) < f(a), so the property becomes

$$f(c) + \min(f(d) - f(b), 0) \leq f(d) + \min(f(c) - f(b), 0) $$

which holds by Proposition 3 applied to b < c < d. □

Based on these properties we can show that:

Theorem 3

For any point p, w _p(i, j)is concave Monge.

Proof

Following Definition 2, consider any s, t, u, v ∈ [0,2n + 1] such that s < t < u < v. We need to prove that for any p ∈ P:

$$w_{p}(s,u) + w_{p}(t,v) \leq w_{p}(s,v) + w_{p}(t, u). $$

Case 1

s = 0 and v = 2n + 1

Straightforward, since w _p(s, v) = ∞ and all other edges have finite weights.

Case 2

s > 0 and v = 2n + 1

$$\begin{array}{@{}rcl@{}} w_{p}(s,u) + w_{p}(t,v) &=& w_{p}(s,u) + 0 \\ & =& \min({D}(p, u) - {D}(p, s), 0) \\ & \overset{2}{\leq}& \min({D}(p, u) - {D}(p, t), 0) \\ & = & 0 + w_{p}(t,u) \\ &=& w_{p}(s,v) + w_{p}(t,u) \end{array} $$

Case 3

s = 0 and v < 2n + 1

$$\begin{array}{@{}rcl@{}} w_{p}(s,u) + w_{p}(t,v) &=& {D}(p, u) + \min({D}(p, v) - {D}(p, t), 0) \\ &\overset{3}{\leq}& {D}(p, v) + \min({D}(p, u) - {D}(p, t), 0) \\ &=& w_{p}(s,v) + w_{p}(t,u) \end{array} $$

Case 4

s > 0 and v < 2n + 1

$$\begin{array}{@{}rcl@{}} w_{p}(s,u) + w_{p}(t,v) &=& \min({D}(p, u) - {D}(p, s), 0) + \min({D}(p, v) - {D}(p, t), 0) \\ &\overset{4}{\leq}& \min({D}(p, v) - {D}(p, s), 0) + \min({D}(p, u) - {D}(p, t), 0) \\ &=& w_{p}(s,v) + w_{p}(t,u) \end{array} $$

So in all cases w _p(s, u) + w _p(t, v) ≤ w _p(s, v) + w _p(t, u) and hence w _p(i, j) is concave Monge. □

5.2 Our approach

In order to apply the known algorithms for finding minimum weight k-link path in the k-median problem, we need to answer queries for w(i, j).

Lemma 4

After $\mathcal {O}(n\log n)$ time and space preprocessing, we can answer queries for w(i, j)in $\mathcal {O}(\log n)$ time per query.

Proof

All edges from the source can be computed in $\mathcal {O}(n \log n)$ time via Theorem 2. All edges to sink have zero weight. It remains to show how to calculate the weight of an internal edge 〈i, j〉. Consider the set of points p ∈ P that are closer to x _j than to x _i (Fig. 6):

$$V(i,j) = {\left\{ (x,y) \in P : |x - x_{i}| + |y - a\cdot x_{i}| > |x - x_{j}| + |y - a\cdot x_{j}| \right\}} $$

By definition, w(i, j) = D(V (i, j), x _j) − D(V (i, j), x _i). We describe how to compute D(V (i, j), x _i). D(V (i, j), x _j) can be computed using the formula:

$$D(V(i,j), x_{j}) = D(P, x_{j}) - D(P \setminus V(i,j), x_{j}) $$

where D(P, x _j) is the already preprocessed weight of the edge 〈source, j〉, and D(P ∖ V (i, j), x _j) can be calculated by rotating the plane by 180^∘ and using the same method as the one described below.

First we argue that if (x, y) ∈ V (i, j) then x > x _i. Otherwise

$$|y-a\cdot x_{i}| - |y-a\cdot x_{j}| > x_{j} - x_{i} \geq a\cdot x_{j} - a\cdot x_{i} \geq 0 $$

and we obtain a contradiction in each of the three cases:

1.
y < a ⋅ x _i then the inequality becomes a ⋅ x _i − a ⋅ x _j > 0 but x _i < x _j.
2.
y ∈ [a ⋅ x _i, a ⋅ x _j) then the inequality becomes 2y > 2a ⋅ x _j but y < a ⋅ x _j.
3.
y > a ⋅ x _j then the inequality becomes a ⋅ x _j − a ⋅ x _i > a ⋅ x _j − a ⋅ x _i.

We partition V (i, j) into V ₁(i, j) and V ₂(i, j) with a horizontal line y = a ⋅ x _i:

$$\begin{array}{@{}rcl@{}} V_{1}(i,j) &=& V(i,j) \cap \{(x,y): y \geq a\cdot x_{i} \} \\ V_{2}(i,j) &=& V(i,j) \cap \{(x,y): y < a\cdot x_{i} \}. \end{array} $$

The median candidate (x _i, a ⋅ x _i) is on the left and bottom of all points in V ₁(i, j) and on the left and top of all points in V ₂(i, j). Consider the minimum area rectangle enclosing P with sides parallel to the coordinate axes, and enumerate its corners clockwise starting from the top left as c ₁, c ₂, c ₃, c ₄. In L ₁ metric, one of the shortest routes from any point in V ₁(i, j) to the bottom left corner point c ₄ goes via x _i, see Fig. 7. Therefore our desired sum of distances to x _i can be described in respect to c ₄ as:

$$\begin{array}{@{}rcl@{}} {D}(V_{1}(i,j), x_{i}) &=& \sum\limits_{p \in V_{1}(i,j)} w(p) \cdot d(p, (x_{i}, a\cdot x_{i})) \\ &=& \left( \sum\limits_{p \in V_{1}(i,j)} w(p) \cdot d(p, c_{4})\right) - \left( d(c_{4}, (x_{i}, a\cdot x_{i})) \cdot \sum\limits_{p \in V_{1}(i,j)} w(p)\right). \end{array} $$

Similarly, one of the shortest routes from any point in V ₂(i, j) to c ₁ goes via x _i:

$$\begin{array}{@{}rcl@{}} {D}(V_{2}(i,j), x_{i}) =\left( \sum\limits_{p \in V_{2}(i,j)} w(p) \cdot d(p, c_{1})\right) - \left( d(c_{1}, (x_{i}, a\cdot x_{i})) \cdot \sum\limits_{p \in V_{2}(i,j)} w(p)\right). \end{array} $$

The distances d(c ₁,(x _i, a ⋅ x _i)) and d(c ₄,(x _i, a ⋅ x _i)) can be computed in $\mathcal {O}(1)$ time. The expressions ${\sum }_{p \in V_{2}(i,j)} w(p) \cdot d(p, c_{1})$ and ${\sum }_{p \in V_{2}(i,j)} w(p)$ can be evaluated in $\mathcal {O}(\log n)$ with orthogonal queries. To calculate ${\sum }_{p \in V_{1}(i,j)} w(p) \cdot d(p, c_{4})$ and ${\sum }_{p \in V_{1}(i,j)} w(p)$, we represent V ₁(i, j) as a difference between sets V ₃(i, j) and V ₄(i, j), see Fig. 8 where δ _x = x _j − x _i, δ _y = a(x _j − x _i) and

$$\begin{array}{@{}rcl@{}} V_{3}(i,j) &=& \left\{ (x,y) \in P : y \geq ax_{i} \wedge \left( x+ y > \frac{(\delta_{x} + \delta_{y})}{2} \right)\right\} \\ V_{4}(i,j) &=& \left\{ (x,y) \in P : x \leq x_{i} + \delta_{x} - \delta_{y} \wedge \left( x+ y >\frac{(\delta_{x} + \delta_{y})}{2} \right)\right\} \\ V_{1}(i,j) &=& V_{3}(i,j) \setminus V_{4}(i,j) . \end{array} $$

Now each of V ₂(i, j), V ₃(i, j) and V ₄(i, j) is defined by an intersection of two half-planes. By transforming every point p ∈ P into (x(p) + y(p), y(p)) for V ₃(i, j) and into (x(p) + y(p), x(p)) for V ₄(i, j), we can assume that the lines defining the half-planes are parallel to the coordinate axes. Hence each sum can be calculated with orthogonal queries in $\mathcal {O}(\log n)$ time and $\mathcal {O}(n\log n)$ time and space preprocessing by Lemma 2. □

We reduced the line-constrained k-median problem in L ₁ to the minimum k-link path problem. The weight of any edge can be retrieved in $\mathcal {O}(\log n)$ time by decomposing it into a constant number of orthogonal queries. By plugging in an appropriate algorithm for the minimum k-link path problem, we obtain the final theorem.

Theorem 4

We can solve the line-constrained k-median problem in L ₁ and L _∞ using $\mathcal {O}(kn\log n)$ time or, if k = Ω(log n), $n2^{\mathcal {O}(\sqrt {\log k \log \log n})} \log n$ time.

Notes

This is not a metric.

References

Aggarwal, A., Klawe, M. M., Moran, S., Shor, P., Wilber, R.: Geometric applications of a matrix-searching algorithm. Algorithmica 2(1-4), 195–208 (1987)
Article MathSciNet MATH Google Scholar
Bajaj, C.: The algebraic degree of geometric optimization problems. Discrete Comput. Geom. 3(1), 177–191 (1988)
Article MathSciNet MATH Google Scholar
Bein, W., Golin, M. J., Larmore, L. L., Zhang, Y.: The knuth–Yao quadrangle-inequality speedup is a consequence of total monotonicity. ACM Trans. Algorithms (TALG) 6(1), 17 (2009)
MathSciNet MATH Google Scholar
Blum, M., Floyd, R. W., Pratt, V. R., Rivest, R. L., Tarjan, R. E.: Time bounds for selection. J. Comput. Syst. Sci. 7(4), 448–461 (1973)
Article MathSciNet MATH Google Scholar
Galil, Z., Park, K.: A linear-time algorithm for concave one-dimensional dynamic programming. Inf. Process. Lett. 33(6), 309–311 (1990)
Article MathSciNet MATH Google Scholar
Galil, Z., Park, K.: Dynamic programming with convexity, concavity and sparsity. Theor. Comput. Sci. 92(1), 49–76 (1992)
Article MathSciNet MATH Google Scholar
Grossi, R., Gupta, A., Vitter, J. S.: High-order entropy-compressed text indexes Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850. Society for Industrial and Applied Mathematics (2003)
Google Scholar
Megiddo, N., Supowit, K. J.: On the complexity of some common geometric location problems. SIAM J. Comput. 13(1), 182–196 (1984)
Article MathSciNet MATH Google Scholar
Navarro, G., Russo, L. M.: Space-efficient data-analysis queries on grids Algorithms and Computation, pp. 323–332. Springer (2011)
Google Scholar
Schieber, B.: Computing a minimum weight k-link path in graphs with the concave Monge property. Journal of Algorithms 29(2), 204–222 (1998)
Article MathSciNet MATH Google Scholar
Wang, H., Zhang, J.: Line-constrained K-median, K-means, and K-center problems in the plane Algorithms and Computation, pp. 3–14. Springer (2014)
Google Scholar

Download references

Author information

Łukasz Zatorski
Present address: imo.im, Palo Alto, CA, USA

Authors and Affiliations

University of Haifa, Haifa, Israel
Paweł Gawrychowski
Institute of Computer Science, University of Wrocław, Wrocław, Poland
Łukasz Zatorski

Authors

Paweł Gawrychowski
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Zatorski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Łukasz Zatorski.

Additional information

This article is part of the Topical Collection on Special Issue on Combinatorial Algorithms

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gawrychowski, P., Zatorski, Ł. Speeding up dynamic programming in the line-constrained k-median. Theory Comput Syst 62, 1351–1365 (2018). https://doi.org/10.1007/s00224-017-9780-y

Download citation

Published: 22 June 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00224-017-9780-y

Speeding up dynamic programming in the line-constrained k-median

Abstract

Similar content being viewed by others

Speeding up Dynamic Programming in the Line-Constrained k-median

Extensions to the planar p-median problem

To Close Is Easier Than To Open: Dual Parameterization To k-Median

1 Introduction

2 Preliminaries

2.1 Monge property in dynamic programming

Definition 1

2.2 Minimum diameter path and minimum weight k-link path

Definition 2

Lemma 1

Proof

2.3 Orthogonal queries

Lemma 2

Proof

3 Normalizing problem instances

4 Computing 1-median

Lemma 3

Proof

Theorem 1

Proof

Theorem 2

Proof

5 Computing k-median

Proposition 1

Proof

Definition 3

5.1 Mongeness in k-median DAG

Proposition 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Theorem 3

Proof

Case 1

Case 2

Case 3

Case 4

5.2 Our approach

Lemma 4

Proof

Theorem 4

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation