Keywords

1 Introduction

The uncapacitated facility location problem (UFL) has numerous applications in operations management and computer science. In this problem, we are given a set of facilities and a set of clients. Every facility has an associated opening cost, and every facility-client pair has an associated connection cost which is proportional to the distance between the facility and client. The aim is to open some facilities and connect every client to an opened facility so as to minimize the total opening and connection cost. Since the UFL is a well-known NP-hard problem, researchers pay attention on designing approximation algorithms for it and its generalizations [3, 8, 10, 12, 15, 16, 18, 19]. For a minimization problem, a \(\lambda \)-approximation algorithm is a polynomial time algorithm which can output a solution for any instance of the problem, such that the cost of the solution is within a factor of \(\lambda \) of the cost of an optimal solution. For the UFL, under the assumption that the connection costs are metric (i.e., the connection costs are non-negative, symmetric and satisfy the triangle inequality), Li [12] presents the current best 1.488-approximation algorithm and assume that P\( \not =\)NP Sviridenko [16] gives the 1.463-hardness of approximation.

However, in many real-life situations, the facility expects to be connected by a minimum number of clients for the profitable sake. In fact, the lower-bounded facility location problem (LBFL) characterizes these scenarios. Besides, the motivation of the lower bound constraints also comes from a data privacy perspective [1]. Compared with the UFL, in the LBFL, every facility is given an additional lower bound on the minimum number of clients that must be connected to the facility if it is opened. The goal is to find some facilities to open and connect every client to an opened facility without violating any lower bound constraints, such that the total opening as well as connection cost is minimized. The LBFL is introduced by Guha et al. [7] and Karger and Minkoff [11] simultaneously. Both give an O(1)-bi-criteria approximation algorithm that approximately subjects to the lower bound constraints. For the special case of the LBFL where the lower bound of every facility is the same, by reducing the LBFL to the capacitated facility location problem (CFL), Svitkina [17] proposes the first true 448-approximation algorithm. Later, Ahmadian and Swamy [2] improve the approximation ratio to 82.6. For the general case of the LBFL where every facility has its own lower bound, Li [13] also reduces the LBFL to the CFL and gives the breakthrough true approximation algorithm which has a ratio of 4000.

When every facility in the UFL does not have an opening cost and the aim becomes to find at most k facilities to open and connect every client to some opened facility so as to minimize the sum of connection costs, we get the classical k-median problem. The k-median is another well-studied NP-hard problem and has various generalizations. [3,4,5,6, 9, 10, 14, 18, 19]. For the k-median, under the assumption that the connection costs are metric, Byrka et al. [5] present the state-of-art \((2.675+\epsilon )\)-approximation algorithm and assume that NP\(\not \subseteq \)DTIME(\(n^{O(\log \log n)}\)) Jain et al. [9] offer the 1.736-hardness of approximation. Despite the fact that many meaningful and interesting general versions of the k-median have been considered in the literatures, to the best of our knowledge, very little work concentrates on studying the k-median with lower bounds. This situation stimulates us to pay attention on the lower-bounded k-median problem (LB k-median). Compared with the k-median, the LB k-median has extra lower bound constraints which need to be respected.

In this paper, we study the LB k-median and its generalizations. First, inspired by the previous works of Guha et al. [7] and Karger and Minkoff [11] on the LBFL, we propose our main O(1)-bi-criteria approximation algorithm for the LB k-median, which satisfies the lower bound constraints by a factor of some given constant \(\alpha \in [0, 1)\) and has an approximation ratio of \(\frac{1+\alpha }{1-\alpha } \rho \), where \(\rho \) is the current best approximation ratio for the k-facility location problem (k-FL). The key idea behind the main algorithm relies on an observation that constructing and solving a new instance of the k-FL instead of the original instance of the LB k-median can easily obtain a solution respects the cardinality constraint, and then trying to guarantee every facility is connected by a certain amount of clients can give us a bi-criteria solution. Second, we extend the main algorithm to several generalizations of the LB k-median, including the lower-bounded k-facility location problem (LB k-FL), the lower-bounded knapsack median problem (LB knapsack median) and the prize-collecting lower-bounded k-median problem (PLB k-median). The algorithms for these generalizations involve constructing and solving new instances of the k-FL, the knapsack facility location problem (knapsack FL) and the prize-collecting k-facility location problem (P k-FL), respectively. Last but not least, we give the relationships between the given constant \(\alpha \) and the approximation ratios of all the algorithms to demonstrate their performances. Particularly, we show that our algorithm for the LB k-median can give a nice approximation ratio while violating the lower bound constraints within an acceptable range.

The remainder of our paper is structured as follows. Section 2 presents our main O(1)-bi-criteria approximation algorithm for the LB k-median. Section 3 extends the main algorithm to several general versions of the LB k-median. Section 4 demonstrates the performances of our algorithms. Due to space constraint, all proofs are removed but will further appear in a full version of this paper.

2 The Lower-Bounded k-median Problem

In this section, we present an O(1)-bi-criteria approximation algorithm for the LB k-median. Subsection 2.1 describes the LB k-median and the relevant k-FL along with their integer programs. Subsection 2.2 presents our main algorithm for the LB k-median and its analysis.

2.1 Preliminaries for the LB k-median

In the LB k-median, we are given a set of facilities \(\mathcal {F}\), a set of clients \(\mathcal {D}\) and an integer k. Every facility \(i \in \mathcal {F}\) has an associated lower bound \(L_i\) on the minimum number of clients in \(\mathcal {D}\) that must be connected to the facility if it is opened. Every facility-client pair (ij), where \(i \in \mathcal {F}\) and \(j \in \mathcal {D}\), has an associated connection cost \(c_{ij}\) which is proportional to the distance between facility i and client j. Under the assumption that the connection costs are metric, the goal is to find at most k facilities to open and connect every client to some opened facility, such that the total connection cost is minimized.

The LB k-median can be formulated as the following integer program:

$$\begin{aligned} \mathrm{\min }&\sum \limits _{i \in \mathcal {F}} \sum \limits _{j \in \mathcal {D}} c_{ij} x_{ij} \end{aligned}$$
(1)
$$\begin{aligned} \mathrm{s.\ t.}&\sum \limits _{i \in \mathcal {F}} x_{ij} \ge 1, \qquad \qquad \qquad ~~~~~~ \forall j \in \mathcal {D}, \end{aligned}$$
(2)
$$\begin{aligned}&x_{ij} \le y_i, \qquad \qquad \qquad \qquad ~~~\forall i \in \mathcal {F}, j \in \mathcal {D}, \end{aligned}$$
(3)
$$\begin{aligned}&\sum \limits _{j \in \mathcal {D}} x_{ij} \ge L_i y_i, \qquad \qquad \qquad \forall i \in \mathcal {F}, \end{aligned}$$
(4)
$$\begin{aligned}&\sum \limits _{i \in \mathcal {F}} y_{i} \le k, \end{aligned}$$
(5)
$$\begin{aligned}&x_{ij} \in \{0, 1\}, \qquad \qquad ~~ \qquad ~~~~ \forall i \in \mathcal {F}, j \in \mathcal {D}, \end{aligned}$$
(6)
$$\begin{aligned}&y_i \in \{0, 1\}, \qquad \qquad \qquad \qquad \forall i \in \mathcal {F}. \end{aligned}$$
(7)

In program (17), there are two types of variables (\(\{x_{ij}\}_{i \in \mathcal {F}, j \in \mathcal {D}}\), \(\{y_{i}\}_{i \in \mathcal {F}}\)). The variable \(x_{ij}\) indicates whether client j is connected to facility i for any facility-client pair (ij) where \(i \in \mathcal {F}\) and \(j \in \mathcal {D}\). The variable \(y_{i}\) indicates whether facility i is opened for any facility \(i \in \mathcal {F}\). The objective function describes the total connection cost. The constraints (2) say that every client \(j \in \mathcal {D}\) must be connected to some facility. The constraints (3) state that if a client j is connected to some facility \(i \in \mathcal {F}\), then the facility must be opened. The constraints (4) guarantee that the lower bound of any opened facility cannot be violated. The constraint (5) shows that the number of opened facilities can not exceed k.

When every facility \(i \in \mathcal {F}\) in the LB k-median has an associated opening cost \(f_i\) instead of the lower bound \(L_i\) and the aim becomes to find at most k facilities to open and connect every client to some opened facility so as to minimize the sum of opening costs as well as connection costs, we get the k-FL. By introducing the same variables (\(\{x_{ij}\}_{i \in \mathcal {F}, j \in \mathcal {D}}\), \(\{y_{i}\}_{i \in \mathcal {F}}\)), as in the integer program (17), the k-FL can be formulated as the following integer program:

$$\begin{aligned} \mathrm{\min }&\sum \limits _{i \in \mathcal {F}} f_i y_i + \sum \limits _{i \in \mathcal {F}} \sum \limits _{j \in \mathcal {D}} c_{ij} x_{ij} \end{aligned}$$
(8)
$$\begin{aligned} \mathrm{s.\ t.}&\sum \limits _{i \in \mathcal {F}} x_{ij} \ge 1, \qquad \qquad \qquad ~~~~~~ \forall j \in \mathcal {D}, \end{aligned}$$
(9)
$$\begin{aligned}&x_{ij} \le y_i, \qquad \qquad \qquad \qquad ~~~\forall i \in \mathcal {F}, j \in \mathcal {D}, \end{aligned}$$
(10)
$$\begin{aligned}&\sum \limits _{i \in \mathcal {F}} y_{i} \le k, \end{aligned}$$
(11)
$$\begin{aligned}&x_{ij} \in \{0, 1\}, \qquad \qquad ~~ \qquad ~~~~ \forall i \in \mathcal {F}, j \in \mathcal {D}, \end{aligned}$$
(12)
$$\begin{aligned}&y_i \in \{0, 1\}, \qquad \qquad \qquad \qquad \forall i \in \mathcal {F}. \end{aligned}$$
(13)

In program (813), the objective function consists of opening costs and connection costs.

Note that we have the following observation.

Lemma 1

From program (17) and program (813), it is clear that with the same inputs of \(\mathcal {F}\), \(\mathcal {D}\) and k, any feasible solution for the LB k-median is also a feasible solution for the k-FL.

2.2 Algorithm for the LB k-median

For the LB k-median, we propose a bi-criteria approximation algorithm, that, for any given constant \(\alpha \in [0, 1)\), outputs a solution in which every opened facility i is connected by at least \(\alpha L_i\) clients and has a constant approximation ratio of \( \frac{1+\alpha }{1-\alpha } \rho \), where \(\rho \) is the current best approximation ratio for the k-FL. Our main algorithm for the LB k-median consists of three steps. First of all, from the instance \(\mathcal {IN}\) of the LB k-median, we construct a new instance \(\mathcal {IN}^\prime \) of the k-FL. Secondly, we apply existing approximation algorithm for the k-FL to solve the instance \(\mathcal {IN}^\prime \) and obtain a solution \((S^\prime , \sigma ^\prime )\) where \(S^\prime \) is the set of opened facilities and \(\sigma ^\prime : \mathcal {D} \rightarrow S^\prime \) is the corresponding connections of clients in \(\mathcal {D}\) to facilities in \(S^\prime \). Finally, we continually close some facility in \(S^\prime \) and reconnect its clients to obtain a new solution \((S, \sigma )\) which connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\).

For any facility \(i \in \mathcal {F}\), denote \(\mathcal {D}_i\) as the set of closest \(L_i\) clients to it in \(\mathcal {D}\). Now we are ready to present our main algorithm.

Algorithm 1

 

  • Step 1 Construct a new instance of the k -FL.

    • For the instance \(\mathcal {IN} = (\mathcal {F}, \mathcal {D}, k, \{L_i\}_{i \in \mathcal {F}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the LB k-median, pick a constant \(\alpha \in [0, 1)\), get rid of the lower bounds \(\{L_i\}_{i \in \mathcal {F}}\) from \(\mathcal {IN}\) and

      $$set~f_i:= \frac{2\alpha }{1-\alpha } \sum \limits _{j \in \mathcal {D}_i}c_{ij}~for~every~i \in \mathcal {F},$$

      in order to obtain a new instance \(\mathcal {IN}^\prime = (\mathcal {F}, \mathcal {D}, k, \{f_i\}_{i \in \mathcal {F}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the k-FL.

  • Step 2 Solve the instance of the k -FL.

    • Solve new instance \(\mathcal {IN}^\prime \) with the current best \(\rho \)-approximation algorithm for the k-FL (see [19]), where \(\rho = 2+ \sqrt{3} + \epsilon \), and obtain a feasible solution \((S^\prime , \sigma ^\prime )\), where \(S^\prime \) is the set of opened facilities and \(\sigma ^\prime : \mathcal {D} \rightarrow S^\prime \) is a function that maps every client \(j \in \mathcal {D}\) to the closest facility in \(S^\prime \). For any client \(j \in \mathcal {D}\), let \(\sigma ^\prime (j)\) denote its closest facility in \(S^\prime \).

  • Step 3 Construct a solution for the LB k -median.

    • Step 3.1 Initialization.

      At the very begining, set \(S:= S^\prime \) and \(\sigma (j):=\sigma ^\prime (j)\) for any \(j \in \mathcal {D}\), define \(l_i := |j \in \mathcal {D}: \sigma (j) = i |\) for any \(i \in \mathcal {F}\) and \(S_d := \{i \in S : l_i < \alpha L_i\}\).

    • Step 3.2 Close facilities and reconnect clients.

      While \(S_d \not = \emptyset \) do

      • Arbitrarily choose some facility \(i \in S_d\) and close it. For every client j with \(\sigma (j) = i\), reconnect it to its closest facility \(i^\prime \) in \(S \setminus \{i\}\) and update \(\sigma (j):= i^\prime \). Update \(S:=S \setminus \{i\}\). Update \(l_i\) for any facility \(i \in \mathcal {F}\) and \(S_d\).

    • Output solution \((S, \sigma )\).

Algorithm 1 provides a solution \((S, \sigma )\), where S is the set of opened facilities and \(\sigma : \mathcal {D} \rightarrow S\) denotes the corresponding connections between clients in \(\mathcal {D}\) and facilities in S, for the LB k-median. For any client \(j \in \mathcal {D}\), let \(\sigma (j)\) denote the facility which is connected by j in solution \((S, \sigma )\).

The following theorem presents our main result for the LB k-median.

Theorem 1

Algorithm 1 is a bi-criteria approximation algorithm for the LB k-median that produces a solution \((S, \sigma )\), which connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\), and has an approximation ratio of \(\frac{1+\alpha }{1-\alpha } \rho \) where \(\alpha \) is a given constant in interval [0, 1) and \(\rho \) is the current best approximation ratio of \(2+ \sqrt{3} + \epsilon \) for the k-FL.

Because of Step 3.2 in Algorithm 1, it is not hard to see that

$$|j \in \mathcal {D} : \sigma (j) = i| = l_i \ge \alpha L_i~\mathrm{for~any~}i \in S,$$

which means the solution \((S, \sigma )\) connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\). The remainder of this section will put focus on analyzing the approximation ratio of our algorithm for the LB k-median.

Suppose that \((S^*, \sigma ^*)\) is the optimal solution for the instance \(\mathcal {IN}\) of the LB k-median, where \(S^*\) is the optimal set of opened facilities and \(\sigma ^*: \mathcal {D} \rightarrow S^*\) denotes the optimal corresponding connections. Let \(OPT_{lk}\) be the total cost of the solution \((S^*, \sigma ^*)\) for \(\mathcal {IN}\), i.e., \(OPT_{lk} = \sum _{j \in \mathcal {D}} c_{\sigma ^*(j)j}\). For every client \(j \in \mathcal {D}\), denote \(\sigma ^*(j)\) as the facility which is connected by j in solution \((S^*, \sigma ^*)\). In order to provide the approximation ratio of Algorithm 1, the following lemmas are essential.

Lemma 2

The total cost of the solution \((S^\prime , \sigma ^\prime )\) for the instance \(\mathcal {IN}^\prime \) of the k-FL is within a factor of \(\frac{1+\alpha }{1-\alpha } \rho \) of the total cost of the optimal solution \((S^*, \sigma ^*)\) for the instance \(\mathcal {IN}\) of the LB k-median, i.e.,

$$\sum \limits _{i \in S^\prime } f_i + \sum \limits _{j \in \mathcal {D}} c_{\sigma ^\prime (j)j} \le \frac{1+\alpha }{1-\alpha } \rho \cdot OPT_{lk},$$

where \(\alpha \in [0,1)\) and \(\rho =2+ \sqrt{3} + \epsilon \).

Lemma 3

The total cost of the solution \((S, \sigma )\) for the instance \(\mathcal {IN}\) of the LB k-median is no more than the total cost of the solution \((S^\prime , \sigma ^\prime )\) for the instance \(\mathcal {IN}^\prime \) of the k-FL, i.e.,

$$\sum \limits _{j \in \mathcal {D}} c_{\sigma (j)j} \le \sum \limits _{i \in S^\prime } f_i + \sum \limits _{j \in \mathcal {D}} c_{\sigma ^\prime (j)j}.$$

Integrating Lemma 2 with Lemma 3 implies the approximation ratio of Algorithm 1.

3 Generalizations of the Lower-Bounded k-median Problem

In this section, by extending our main algorithm to several more general versions of the LB k-median, we show the versatility of Algorithm 1. Subsection 3.1, 3.2 and 3.3 present algorithms for the LB k-FL, LB knapsack median and PLB k-median through altering only the first step, the first two steps and all the steps in Algorithm 1, respectively.

3.1 The Lower-Bounded k-facility Location Problem

Compared with the LB k-median, in the LB k-FL, every facility \(i \in \mathcal {F}\) is given an additional opening cost \(f_i\). The aim is to open at most k facilities and connect every client to some opened facility, such that the total opening and connection cost is minimized.

The algorithm for the LB k-FL is obtained by only modifying the first step in Algorithm 1 slightly.

Algorithm 2

 

  • Step 1 Construct a new instance of the k -FL.

    • For the instance \(\mathcal {IN} = (\mathcal {F}, \mathcal {D}, k, \{L_i\}_{i \in \mathcal {F}},\{f_i\}_{i \in \mathcal {F}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the LB k-FL, pick a constant \(\alpha \in [0, 1)\), get rid of the lower bounds \(\{L_i\}_{i \in \mathcal {F}}\) from \(\mathcal {IN}\) and

      $$set~f_i^\prime := f_i + \frac{2\alpha }{1-\alpha } \sum \limits _{j \in \mathcal {D}_i}c_{ij}~for~every~i \in \mathcal {F},$$

      in order to obtain a new instance \(\mathcal {IN}^\prime = (\mathcal {F}, \mathcal {D}, k, \{f_i^\prime \}_{i \in \mathcal {F}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the k-FL.

  • Step 2 Solve the instance of the k -FL.

    • Same as Step 2 in Algorithm 1. Solve new instance \(\mathcal {IN}^\prime \) with the current best \(\rho \)-approximation algorithm for the k-FL (see [19]), where \(\rho = 2+ \sqrt{3} + \epsilon \), and obtain a feasible solution \((S^\prime , \sigma ^\prime )\).

  • Step 3 Construct a solution for the LB k -FL.

    • Same as Step 3 in Algorithm 1. At the end of this step, output solution \((S, \sigma )\).

The following theorem offers the result for the LB k-median.

Theorem 2

Algorithm 2 is a bi-criteria approximation algorithm for the LB k-FL that produces a solution \((S, \sigma )\), which connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\), and has an approximation ratio of \(\frac{1+\alpha }{1-\alpha } \rho \) where \(\alpha \in [0,1)\) and \(\rho =2+ \sqrt{3} + \epsilon \).

We skip the proof of this theorem since it is similar to the one for the LB k-median.

3.2 The Lower-Bounded Knapsack Median Problem

Compared with the LB k-median, in the LB knapsack median, the cardinality constraint is replaced with a knapsack constraint. More specifically, we are given some budget B instead of the integer k. Every facility \(i \in \mathcal {F}\) has a weight \(w_i\). We want to open a subset \(S \subseteq \mathcal {F}\) of facilities which subjects to \(\sum _{i \in S} w_i \le B\), and connect every client to some opened facility, so as to minimize the total connection cost.

The algorithm for the LB knapsack median is offered by changing the first two steps in Algorithm 1.

Algorithm 3

 

  • Step 1 Construct a new instance of the knapsack FL.

    • For the instance \(\mathcal {IN} = (\mathcal {F}, \mathcal {D}, B, \{L_i\}_{i \in \mathcal {F}},\{w_i\}_{i \in \mathcal {F}},\{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the LB knapsack median, pick a constant \(\alpha \in [0, 1)\), get rid of the lower bounds \(\{L_i\}_{i \in \mathcal {F}}\) from \(\mathcal {IN}\) and

      $$set~f_i:= \frac{2\alpha }{1-\alpha } \sum \limits _{j \in \mathcal {D}_i}c_{ij}~for~every~i \in \mathcal {F},$$

      in order to obtain instance \(\mathcal {IN}^\prime = (\mathcal {F}, \mathcal {D}, B, \{f_i\}_{i \in \mathcal {F}}, \{w_i\}_{i \in \mathcal {F}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the knapsack FL.

  • Step 2 Solve the instance of the knapsack FL.

    • Solve new instance \(\mathcal {IN}^\prime \) with the current best \(\eta \)-approximation algorithm for the knapsack FL (see [4]), where \(\eta = 17.46 + \epsilon \), and obtain a feasible solution \((S^\prime , \sigma ^\prime )\).

  • Step 3 Construct a solution for the LB knapsack median.

    • Same as Step 3 in Algorithm 1. At the end of this step, output solution \((S, \sigma )\).

The following theorem gives the result for the LB knapsack median.

Theorem 3

Algorithm 3 is a bi-criteria approximation algorithm for the LB knapsack median that produces a solution \((S, \sigma )\), which connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\), and has an approximation ratio of \(\frac{1+\alpha }{1-\alpha } \eta \) where \(\alpha \in [0,1)\) and \(\eta =17.46 + \epsilon \).

We skip the proof of this theorem since it is analogous to the one for the LB k-median.

3.3 The Prize-Collecting Lower-Bounded k-median Problem

Compared with the LB k-median, in the PLB k-median, every client \(j \in \mathcal {D}\) is given an additional penalty cost \(p_j\). Our goal is to select at most k facilities to open, connect a portion of the clients and penalize the rest of them, so as to minimize the sum of opening, connection and penalty costs.

The algorithm for the PLB k-median is given by transforming all the steps in Algorithm 1.

Algorithm 4

 

  • Step 1 Construct a new instance of the P k -FL.

    • For the instance \(\mathcal {IN} = (\mathcal {F}, \mathcal {D}, k, \{L_i\}_{i \in \mathcal {F}},\{p_j\}_{j \in \mathcal {D}},\{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the PLB k-median, pick a constant \(\alpha \in [0, 1)\), get rid of the lower bounds \(\{L_i\}_{i \in \mathcal {F}}\) from \(\mathcal {IN}\) and

      $$set~f_i:= \frac{1+\alpha }{1-\alpha } \sum \limits _{j \in \mathcal {D}_i}c_{ij}~for~every~i \in \mathcal {F},$$

      in order to obtain instance \(\mathcal {IN}^\prime = (\mathcal {F}, \mathcal {D}, k, \{f_i\}_{i \in \mathcal {F}}, \{p_j\}_{j \in \mathcal {D}}, \{c_{ij}\}_{i\in \mathcal {F}, j \in \mathcal {D}})\) of the P k-FL.

  • Step 2 Solve the instance of the P k -FL.

    • Solve new instance \(\mathcal {IN}^\prime \) with the current best \(\theta \)-approximation algorithm for the P k-FL (see [18]), where \(\theta = 2+ \sqrt{3} + \epsilon \), and obtain a feasible solution \((S^\prime , P^\prime , \sigma ^\prime )\), where \(S^\prime \) is the set of opened facilities, \(P^\prime \) is the set of penalized clients and \(\sigma ^\prime : \mathcal {D} \setminus P^\prime \rightarrow S^\prime \) is a function that maps every client \(j \in \mathcal {D} \setminus P^\prime \) to the closest facility in \(S^\prime \). For any client \(j \in \mathcal {D} \setminus P^\prime \), let \(\sigma ^\prime (j)\) denote its closest facility in \(S^\prime \). For any client \(j \in P^\prime \), define its \(\sigma ^\prime (j):= i_p\) where \(i_p\) is a dummy facility for penalizing.

  • Step 3 Construct a solution for the PLB k -median.

    • Step 3.1 Initialization.

      At the very begining, set \(S:= S^\prime \), \(P:=P^\prime \) and \(\sigma (j):=\sigma ^\prime (j)\) for any \(j \in \mathcal {D}\), define \(T_i := \{j \in \mathcal {D}: \sigma (j) = i \}\), \(l_i := |T_i|\) and \(P_i := \{j \in \mathcal {D}: j \in \mathcal {D}_i, \sigma (j) = i_p\}\) for any \(i \in \mathcal {F}\). Define \(S_d := \{i \in S : l_i < \alpha L_i\}\).

    • Step 3.2 Close facilities and reconnect clients.

      While \(S_d \not = \emptyset \) do

      • Arbitrarily choose some facility \(i \in S_d\). There are two possible cases.

        • Case 1. \(|T_i|+|P_i| < \alpha L_i\).

          In this case, close facility i. For every client \(j \in T_i\), reconnect it to its closest facility \(i^\prime \in S \setminus \{i\}\) and update \(\sigma (j):=i^\prime \). Update \(S:= S \setminus \{i\}\). Update \(T_i\), \(l_i\) for any facility \(i \in \mathcal {F}\) and \(S_d\).

        • Case 2. \(|T_i|+|P_i| \ge \alpha L_i\).

          In this case, for every client \(j \in P_i\), connect it to its closest facility \(i^\prime \in S\) and update \(\sigma (j):=i^\prime \). Update \(P:=P \setminus P_i\). Then, update \(P_i\), \(T_i\) as well as \(l_i\) for any facility \(i \in \mathcal {F}\), also update \(S_d\).

    • Output solution \((S, P, \sigma )\).

The following theorem provides the result for the PLB k-median.

Theorem 4

Algorithm 4 is a bi-criteria approximation algorithm for the PLB k-median that produces a solution \((S, P, \sigma )\), which connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\), and has an approximation ratio of \(\frac{2 \theta }{1-\alpha }\) where \(\alpha \in [0,1)\) and \(\theta =2 + \sqrt{3} + \epsilon \).

The proof of this theorem is more intricate than the one for the LB k-median, since every client has an alternative decision in the PLB k-median, which is to be penalized. Suppose that \((S^*, P^*, \sigma ^*)\) is the optimal solution for the instance \(\mathcal {IN}\) of the PLB k-median. Let \(OPT_{plk}\) be the total cost of the solution \((S^*, P^*, \sigma ^*)\) for the instance \(\mathcal {IN}\). From Algorithm 4, it is clear that the solution \((S, P, \sigma )\) connects at least \(\alpha L_i\) clients to every opened facility \(i \in S\). We need the following lemmas to achieve the approximation ratio of our algorithm for the PLB k-median.

Lemma 4

The total cost of the solution \((S^\prime , P^\prime , \sigma ^\prime )\) for the instance \(\mathcal {IN}^\prime \) of the P k-FL is within a factor of \(\frac{2 \theta }{1-\alpha }\) of the total cost of the optimal solution \((S^*, P^*, \sigma ^*)\) for the instance \(\mathcal {IN}\) of the PLB k-median, i.e.,

$$\sum \limits _{i \in S^\prime } f_i + \sum \limits _{j \in \mathcal {D} \setminus P^\prime } c_{\sigma ^\prime (j)j} + \sum \limits _{j \in P^\prime } p_j \le \frac{2 \theta }{1-\alpha } \cdot OPT_{plk},$$

where \(\alpha \in [0,1)\) and \(\theta =2+ \sqrt{3} + \epsilon \).

Lemma 5

The total cost of the solution \((S, P, \sigma )\) for the instance \(\mathcal {IN}\) of the PLB-k-median is no more than the total cost of the solution \((S^\prime , P^\prime , \sigma ^\prime )\) for the instance \(\mathcal {IN}^\prime \) of the P k-FL, i.e.,

$$\sum \limits _{j \in \mathcal {D} \setminus P } c_{\sigma (j)j} + \sum \limits _{j \in P} p_j \le \sum \limits _{i \in S^\prime } f_i + \sum \limits _{j \in \mathcal {D} \setminus P^\prime } c_{\sigma ^\prime (j)j} + \sum \limits _{j \in P^\prime } p_j.$$
Fig. 1.
figure 1

Relationships between the constant \(\alpha \) and approximation ratios.

Combining Lemma 4 and Lemma 5 implies the approximation ratio of Algorithm 4.

Fig. 2.
figure 2

Relationships between the constant \(\alpha \) and approximation ratios. Zoom in on approximation ratios smaller than 100.

4 Performances Evaluation of the Algorithms

In this section, through providing the relationships between the given constant \(\alpha \in [0, 1)\) and the approximation ratios, we demonstrate the performances of Algorithm 14.

It is worth mentioning that, combining the idea behind Algorithm 1 and the significant first true 4000-approximation algorithm of Li [13] for the LBFL, we strongly believe that there exists O(1)-approximation algorithm for the LB k-median which does not violate any lower bound. Unfortunately, the true approximation algorithm for the LB k-median cannot be practical, since the approximation ratio of it is likely no less than the one for the LBFL. Li [13] also states that even with more in-depth consideration, it is hard to reduce the approximation ratio for the LBFL to below 100 by using the same method.

Now, we want to demonstrate that our Algorithm 1 for the LB k-median can offer an obviously better approximation ratio while slightly violating the lower bound constraints. In Fig. 1, we show the relationships between constant \(\alpha \in [0, 1)\) and the approximation ratios of Algorithm 14. Note that the approximation ratios grow slowly and steadily at first, but after the constant \(\alpha \) exceeds 0.9 the approximation ratios begin to increase in a steep way. From Fig. 1, it is clear that when \(\alpha = 0.9\) (i.e., when the algorithm outputs a solution that \(90 \%\) of the lower bound requirement of the opened facility is satisfied), the approximation ratio of Algorithm 1 for the LB k-median is significantly better than 4000. Figure 2 zooms in on approximation ratios smaller than 100. As we can see, when Algorithm 1 outputs a solution for the LB k-median with an approximation ratio of 100, the solution can satisfy the majority (i.e., more than 90%) of the lower bound requirement of any opened facility. In some real-world applications, it would be advisable to choose our algorithm, which has a preferable approximation ratio and violates the lower bound constraints within an acceptable range (i.e., violates no more than 10 % of the lower bound requirements).

Additionally, Fig.1 and Fig. 2 show that Algorithm 2 performs as same as Algorithm 1 and Algorithm 4 is almost as well as Algorithm 1. Among our algorithms, Algorithm 3 has the worst performance. From Fig. 1, when \(\alpha = 0.9\), the approximation ratio of Algorithm 3 for the LB knapsack median is visibly greater than the ratios of other algorithms. From Fig. 2, when Algorithm 3 outputs a solution with a ratio of 100, the solution can only guarantee to satisfy about \(70\%\) of the lower bound requirement of any opened facility.