Keywords

1 Introduction

1.1 Motivation

In wireless network routing, a common approach is to select a set of nodes as the virtual backbone. The virtual backbone is responsible for relaying packets. Specifically, when a node s generates a packet destined to t, the packet is routed through path \((s, v_1, v_2, \cdots , v_k, t)\), where every internal node \(v_i, 1 \le i \le k,\) belongs to the virtual backbone. To realize this idea, we can model the wireless network as a graph \(G=(V,E)\), where V is the set of nodes in the wireless network, and \((u,v) \in E\) if and only if u and v can communicate with each other directly. Thus, a connected dominating set of G is a virtual backbone for the wireless network.Footnote 1 One of the concerns in constructing the virtual backbone is the routing cost. Specifically, the routing cost of sending a packet from the source s to the destination t is the number of internal nodes (relays) in the routing path from s to t. For example, the routing cost is k if the routing path is \((s, v_1, v_2, \cdots , v_k, t)\). The routing cost should not be too high even if packets are only allowed to be routed through the virtual backbone. Next, we give the formal definition of the problem.

1.2 Problem Definition

Let G[S] be the subgraph of \(G=(V,E)\) induced by \(S \subseteq V\). Let \(m_G(u,v)\) be the number of internal vertices on the shortest path between u and v in G. For example, if u and v are adjacent, then \(m_G(u,v) = 0\). If u and v are not adjacent and have a common neighbor, then \(m_G(u,v) = 1\). Furthermore, given a vertex subset D of G, \(m^D_G(u,v)\) is defined as \(m_{G[D \cup \{u,v\}]}(u,v)\), i.e., the number of internal vertices on the shortest path between u and v through D. We use n(G) to denote the number of vertices in graph G. When the graph we are referring to is clear from the context, we simply write n, m(uv), and \(m^D(u,v)\) instead of n(G), \(m_G(u,v)\), and \(m^D_G(u,v)\), respectively.

Definition 1

Given a connected graph G and a positive integer \(\alpha \), the Connected Dominating set problem with Routing cost constraint (CDR-\(\alpha \)) asks for the smallest connected dominating set D of G, such that, for every two vertices u and v, if u and v are not adjacent in G, then \(m^D(u,v) \le \alpha \cdot m(u,v)\).

1.3 Preliminary

An Equivalent Problem: In the CDR-\(\alpha \) problem, we need to consider all the pairs of non-adjacent nodes. Ding et al. discovered that to solve the CDR-\(\alpha \) problem, it suffices to consider only vertex pairs (uv) such that \(m(u,v)= 1\), i.e., u and v are not adjacent but have a common neighbor [5]. We call the corresponding problem the 1-DR-\(\alpha \) problem.

Definition 2

Given a connected graph \(G = (V,E)\) and a positive integer \(\alpha \), the 1-DR-\(\alpha \) problem asks for the smallest dominating set D of G, such that, for every two vertices u and v, if \(m(u,v) = 1\), then \(m^D(u,v) \le \alpha \).

We say that u and v form a target couple, denoted by [uv], if \(m(u,v) = 1\). We say that a set S covers a target couple [uv] if \(m^S(u,v) \le \alpha \). Hence, the 1-DR-\(\alpha \) problem asks for the smallest dominating set that covers all the target couples. Notice that any feasible solution of the 1-DR-\(\alpha \) problem must induce a connected subgraph of G. The equivalence between the CDR-\(\alpha \) problem and the 1-DR-\(\alpha \) problem is stated in the following theorem.

Theorem 1

(Ding et al. [5]). D is a feasible solution of the CDR-\(\alpha \) problem with input graph G if and only if D is a feasible solution of the 1-DR-\(\alpha \) problem with input graph G.

Corollary 1

Any r-approximation algorithm of the 1-DR-\(\alpha \) problem is an r-approximation algorithm of the CDR-\(\alpha \) problem.

In this paper, we thus focus on the 1-DR-\(\alpha \) problem.

Feasibility of the 1-DR-\(\varvec{\alpha }\) Problem for \(\varvec{\alpha \ge 5}\): Next, we give the basic idea of finding a feasible solution of the 1-DR-\(\alpha \) problem for \(\alpha \ge 5\) used in previous researches, e.g., in [11]. One of our algorithms still uses this idea. First, find a dominating set D. Thus, for any target couple [uv], there exist \(u^d\) and \(v^d\) in D, such that \(u^d\) and \(v^d\) dominate u and v, respectively.Footnote 2 Let \(D' = D\). We then add more vertices to \(D'\) so that \(D'\) becomes a feasible solution. For any two vertices \(u'\) and \(v'\) in D, if \(m(u',v') \le 3\), then we add the \(m(u',v')\) internal vertices of the shortest path between \(u'\) and \(v'\) on G to \(D'\). Observe that \(m(u^d,v^d) \le 3\). Hence, \(m^{D'}(u,v) \le 5\) and \(D'\) is a feasible solution of the 1-DR-\(\alpha \) problem for \(\alpha \ge 5\).

Lemma 1

Let D be a dominating set of G. Let \(D' \supseteq D\) be a vertex subset of G such that, for any two vertices \(u'\) and \(v'\) in D, if \(m(u',v') \le 3\), then \(m^{D'}(u',v') \le 3\). Then, \(D'\) is a feasible solution of the 1-DR-\(\alpha \) problem with input G and \(\alpha \ge 5\).

1.4 Previous Result

Previous Result on General Graphs: When \(\alpha = 1\), the 1-DR-\(\alpha \) problem can be transformed into the set cover problem, i.e., cover all the vertices (to form a dominating set) and cover all the target couples. Observe that each target couple can be covered by a single vertex. The resulting approximation ratio is \(O(\log n)\) [5]. When \(\alpha \) is sufficiently large, e.g., \(\alpha \ge n\), any connected dominating set is feasible for the CDR-\(\alpha \) problem. Note that, for any \(\alpha \), the size of the minimum connected dominating set is a lower bound of the CDR-\(\alpha \) problem. Since the connected dominating set can be approximated within a factor of \(O(\log n)\) [12, 21], the CDR-n problem can be approximated within a factor of \(O(\log n)\). If \(\alpha \) falls between these two extremes, e.g., \(\alpha = 2\), the only known previous result is the trivial O(n)-approximation algorithm. On the hardness side, it has been proved that, unless \(NP \subseteq DTIME(n^{O(\log \log n)})\), there is no polynomial-time algorithm that can approximate the CDR-\(\alpha \) problem within a factor of \(\rho \ln \delta \) (\(\forall \rho <1\)) for \(\alpha = 1\) [5] and \(\alpha \ge 2\) [8, 9], where \(\delta \) is the maximum degree of G.

Open Question 1

(Du and Wan [8]). Is there a polynomial-time \(O(\log n)\)-approximation algorithm for the CDR-\(\alpha \) problem for \(\alpha \ge 2\)?

Previous Result on Unit Disk Graph (UDG): Most of the studies on the CDR-\(\alpha \) problem focused on UDG [5, 8, 9, 11, 19]. UDG exhibits many nice properties that enable constant factor approximation algorithms (or PTAS) in many problems where only \(O(\log n)\)-approximation algorithms (or worse) are known in general graphs, e.g., the minimum (connected) dominating set problem and the maximum independent set problem [3, 4, 20]. All the previous research on the CDR-\(\alpha \) problem on UDG leveraged constant bounds of the maximum independent set or the minimum dominating set. However, all the previous research only solved the case where \(\alpha \ge 5\) (by Lemma 1), and the best result so far is a PTAS by Du et al. [11]. When \(1< \alpha < 5\), the only known previous result is the trivial O(n)-approximation algorithm.

1.5 Our Result and Basic Ideas

In this paper, we first give an approximation algorithm of the 1-DR-\(\alpha \) problem on general graphs for any constant \(\alpha > 1\). A critical observation is that the 1-DR-2 problem is a special case of the Set Cover with Pairs (SCP) problem [13]. Hassin and Segev proposed an \(O(\sqrt{t\log t})\)-approximation algorithm for the SCP problem, where t is the number of targets to be covered. However, since there are \(O(n^2)\) target couples to be covered, directly applying the \(O(\sqrt{t\log t})\)-approximation bound yields a trivial upper bound for the 1-DR-2 problem. We re-examine the analysis in [13] and find that, when applying the algorithm to the 1-DR-2 problem, the approximation ratio can also be expressed as \(O(\sqrt{n\log n})\). Nevertheless, in this paper, we give a slightly simplified algorithm with an easier analysis for the SCP problem. The algorithm and analysis also make it easy to solve the generalized SCP problem. We obtain the following result, which is the first non-trivial result of the CDR-\(\alpha \) problem for \(\alpha > 1\) on general graphs and for \(1< \alpha < 5\) on UDG.

Theorem 2

For any constant \(\alpha > 1\), there is an \(O(n^{1-\frac{1}{\alpha }}(\log n)^{\frac{1}{\alpha }})\)-approximation algorithm for the 1-DR-\(\alpha \) problem.

Apparently, the above performance guarantee deteriorates quickly as \(\alpha \) increases. In our second algorithm, we apply the aforementioned idea of finding a feasible solution when \(\alpha \ge 5\), i.e., Lemma 1. We have the following result.

Theorem 3

When \(\alpha \ge 5\), there is an \(O(\sqrt{n}\log n)\)-approximation algorithm for the 1-DR-\(\alpha \) problem.

Finally, we answer Open Question 1 negatively. We improve upon the \(\varOmega (\log \delta )\) hardness result for the 1-DR-2 problem (albeit under a stronger hardness assumption) [8, 9]. In this paper, we give a reduction from the MIN-REP problem [15].

Theorem 4

Unless \(NP \subseteq DTIME(n^{poly\log n})\), for any constant \(\epsilon > 0\), the 1-DR-2 problem admits no polynomial-time \(2^{\log ^{1-\epsilon }n}\)-approximation algorithm, even if the graph is triangle-freeFootnote 3 or the constraint that the feasible solution must be a dominating set is ignoredFootnote 4.

1.6 Relation with the Basic k-Spanner Problem

When we ignore the constraint that any feasible solution must be a connected dominating set, the CDR-\(\alpha \) problem is similar to the basic k-spanner problem. For completeness, we give the formal definition of the basic k-spanner problem. Given a graph \(G = (V,E)\), a k-spanner of G is a subgraph H of G such that \(d_H(u,v) \le kd_G(u,v)\) for all u and v in V, where \(d_G(u,v)\) is the number of edges in the shortest path between u and v in G. The basic k-spanner problem asks for the k-spanner that has the fewest edges. The CDR-\(\alpha \) problem differs with the basic k-spanner problem in the following three aspects: First, in the CDR-\(\alpha \) problem, we find a set of vertices D, and all the edges in the subgraph induced by D can be used for routing; while in the basic k-spanner problem, only edges in H can be used. Second, in the CDR-\(\alpha \) problem, the objective is to minimize the number of chosen vertices; while in the basic k-spanner problem, the objective is to minimize the number of chosen edges. Finally, in the basic k-spanner problem, the distance is measured by the number of edges; while in the CDR-\(\alpha \) problem, the distance is measured by the number of internal nodes. Despite the above differences, these two problems share similar approximability and hardness results. Althöfer et al. proved that every graph has a k-spanner of at most \(n^{1+\frac{1}{\lfloor (k+1)/2\rfloor }}\) edges, and such a k-spanner can be constructed in polynomial time [1, 7]. Since the number of edges in any k-spanner is at least \(n-1\), this yields an \(O(n^{\frac{1}{\lfloor (k+1)/2\rfloor }})\)-approximation algorithm for the basic k-spanner problem. For \(k = 2\), there is an \(O(\log n)\)-approximation algorithm due to Kortsarz and Peleg [16], and this is the best possible [15]. For \(k = 3\), Berman et al. proposed an \(\tilde{O}(n^{1/3})\)-approximation algorithm [2]. For \(k = 4\), Dinitz and Zhang proposed an \(\tilde{O}(n^{1/3})\)-approximation algorithm [7]. On the hardness side, it has been proved that for any constant \(\epsilon > 0\) and for \(3 \le k \le \log ^{1-2\epsilon }n\), unless \(NP \subseteq BPTIME(2^{poly\log n})\), there is no polynomial-time algorithm that approximates the basic k-spanner problem to a factor better than \(2^{(\log ^{1-\epsilon }n)/k}\) [6].

2 Two Algorithms for the 1-DR-\(\alpha \) Problem

2.1 The First Algorithm

We first give the formal definition of the Set Cover with Pairs (SCP) problem.

Definition 3

Let T be a set of t targets. Let V be a set of n elements. For every pair of elements \(P=\{v_1, v_2\} \subseteq V\), C(P) denotes the set of targets covered by P. The Set Cover with Pairs (SCP) problem asks for the smallest subset S of V such that \(\bigcup \limits _{\{v_1, v_2\} \subseteq S}{C(\{v_1, v_2\})}=T\).

Let OPT be the number of elements in the optimal solution. We only need to consider the case where \(t > 1\) and \(OPT > 1\).

Approximating the SCP Problem: Our algorithm is a simple greedy algorithm: in each round, we choose at most two elements u and v that maximize the number of covered targets. Specifically, S is an empty set initially. In each round, we select a set \(P \subseteq V\setminus S\) such that \(|P| \le 2\) and P increases the number of covered targets the most, i.e., \(P=\mathop {\text {argmax}}\limits _{P': |P'| \le 2, P' \subseteq V \setminus S} {g(P')}\), where

$$g(P') = |\bigcup _{\{v_1,v_2\} \subseteq S \cup P'}{C(\{v_1,v_2\})}| -|\bigcup _{\{v_1,v_2\} \subseteq S}{C(\{v_1,v_2\})}|.$$

We then add P to S and repeat the above process until all the targets are covered. The algorithm terminates once all targets are covered.Footnote 5

Theorem 5

The above algorithm is an \(O(\sqrt{n\log t})\)-approximation algorithm for the SCP problem.

Proof

Let \(R_i\) be the number of uncovered targets after round i. In the first round, some pair of elements in the optimal solution can cover at least \(t/{{OPT}\atopwithdelims (){2}}\) targets. Since we choose a pair of elements greedily in each round, \(R_1 \le t(1-1/{{OPT}\atopwithdelims (){2}})\). In the second round, there exists a pair of elements in the optimal solution that can cover at least \(R_1/{{OPT}\atopwithdelims (){2}}\) targets among the \(R_1\) uncovered targets. Again, we choose the pair of elements that increases the number of covered targets the most. Hence, \(R_2 \le R_1 - R_1/{{OPT}\atopwithdelims (){2}}\le t(1-1/{{OPT}\atopwithdelims (){2}})^2\). In general, \(R_i \le t(1-1/{{OPT}\atopwithdelims (){2}})^i\). After \(r = {{OPT}\atopwithdelims (){2}}\ln t\) rounds, the number of uncovered targets is at most \(t(1-1/{{OPT}\atopwithdelims (){2}})^r \le t(e^{-1/{{OPT}\atopwithdelims (){2}}})^r \le te^{-\ln t} = 1\). Hence, after \(O(OPT^2\ln t)\) rounds, all targets are covered. Let ALG be the number of elements chosen by the algorithm. Since we choose at most two elements in each round, \(ALG = O(OPT^2\ln t)\). Finally, since \(ALG \le n\), \(ALG = O(\sqrt{n \cdot OPT^2\ln t}) = O(\sqrt{n\ln t})OPT\).    \(\square \)

Note that, in Theorem 5, we can replace n with any upper bound of the size of solutions obtained by any polynomial-time algorithm \(\mathcal {A}\) for the SCP problem. This is achieved by executing both \(\mathcal {A}\) and our algorithm. Choosing the best between the two outputs yields the desired approximation ratio. For example, if we replace n with 2t, we then get the result in [13].

Approximating the 1-DR-2 Problem: To transform the 1-DR-2 problem to the SCP problem, we treat each target couple as a target. Moreover, we treat each vertex as a target so that the output is a dominating set. The set of elements V in the SCP problem is the vertex set of G. C(P) consists of all the vertices that are dominated by P in G and all the target couples that are covered by P in G. In this SCP instance, \(n=n(G)\) and \(t=O(n(G)^2)\). It is easy to verify the following result.

Theorem 6

There is an \(O(\sqrt{n\log n})\)-approximation algorithm for the 1-DR-2 problem.

The Set Cover with \(\varvec{\alpha }\)-Tuples (SCT-\(\varvec{\alpha }\)) Problem: In the 1-DR-2 problem, every target couple can be covered by no more than two vertices. In the 1-DR-\(\alpha \) problem, every target couple can be covered by no more than \(\alpha \) vertices. Hence, we consider the following generalization of the SCP problem.

Definition 4

Let T be a set of t targets. Let V be a set of n elements. Let \(\alpha \) be a positive integer constant greater than one. For every \(\alpha \)-tuple \(P=\{v_1, v_2, \cdots , v_{\alpha }\}\subseteq V\), C(P) denotes the set of targets covered by P. The Set Cover with \(\alpha \)-Tuples (SCT-\(\alpha \)) problem asks for the smallest subset S of V such that \(\bigcup \limits _{\{v_1, v_2, \cdots , v_{\alpha }\} \subseteq S} {C(\{v_1, v_2, \cdots , v_{\alpha }\})}=T\).

We only need to consider the case where \(t > 1\) and \(OPT \ge \alpha \) (\(\alpha \) is a constant).

Approximating the SCT-\(\varvec{\alpha }\) Problem and the 1-DR-\(\varvec{\alpha }\) Problem: The algorithm for the SCT-\(\alpha \) problem is a straightforward generalization of the algorithm for the SCP problem. The difference is that, in each round, we choose a set P of at most \(\alpha \) elements that increases the number of covered targets the most. The transformation from the 1-DR-\(\alpha \) problem into the SCT-\(\alpha \) problem is also similar to the previous transformation. The value of \(\alpha \) in the constructed SCT-\(\alpha \) instance is equal to that in the 1-DR-\(\alpha \) instance. Again, \(n = n(G)\) and \(t = O(n(G)^2)\) in the constructed SCT-\(\alpha \) instance. Theorem 2 is a direct result of the following theorem.

Theorem 7

There is an \(O(n^{1-\frac{1}{\alpha }} \cdot (\ln t)^{\frac{1}{\alpha }})\)-approximation algorithm for the SCT-\(\alpha \) problem.

Claim 1

When \(c = \frac{1}{\alpha }-\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}\), \(n^{1-c}=\sqrt{n \cdot \alpha (n^c)^{\alpha -2}\ln t} = n^{1-\frac{1}{\alpha }} \cdot (\alpha \ln t)^{\frac{1}{\alpha }}\).

Proof of Claim 1:

$$\begin{aligned} n^{1-c}&= \sqrt{n \cdot \alpha (n^c)^{\alpha -2}\ln t} \\ \Leftrightarrow n^{2-2c}&= n \cdot \alpha (n^c)^{\alpha -2}\ln t \\ \Leftrightarrow n^{2-2c-(1+c(\alpha -2))}&= \alpha \ln t \\ \Leftrightarrow n^{1-c\alpha }&= \alpha \ln t. \end{aligned}$$

When \(c = \frac{1}{\alpha }-\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}\),

$$\begin{aligned} n^{1-c\alpha }&= n^{1-(1-\frac{\ln \ln (t^{\alpha })}{\ln n})} \nonumber \\&= n^{\frac{\ln \ln (t^{\alpha })}{\ln n}} \end{aligned}$$
(1)
$$\begin{aligned}&= (n^{\ln (\ln (t^{\alpha }))})^{\frac{1}{\ln n}} \end{aligned}$$
(2)
$$\begin{aligned}&= ((\ln (t^{\alpha }))^{\ln n})^{\frac{1}{\ln n}} \end{aligned}$$
(3)
$$\begin{aligned}&= (({\alpha }\ln t)^{\ln n})^{\frac{1}{\ln n}} \end{aligned}$$
(4)
$$\begin{aligned}&= {\alpha }\ln t . \end{aligned}$$
(5)

Hence, when \(c = \frac{1}{\alpha }-\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}\), \(n^{1-c}=\sqrt{n \cdot \alpha (n^c)^{\alpha -2}\ln t}\).

Finally, when \(c = \frac{1}{\alpha }-\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}\),

$$\begin{aligned} n^{1-c}&= n^{1-\frac{1}{\alpha }+\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}} \\&= n^{1-\frac{1}{\alpha }} \cdot n^{\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}} \\&= n^{1-\frac{1}{\alpha }} \cdot (n^{\frac{\ln \ln (t^{\alpha })}{\ln n}})^{\frac{1}{\alpha }} \\&= n^{1-\frac{1}{\alpha }} \cdot (\alpha \ln t)^{\frac{1}{\alpha }}. \end{aligned}$$

In the last equality, we reuse Eqs.(1)–(5).     \(\square \)

Proof of Theorem 7: Let \(R_i\) be the number of uncovered targets after round i. By a similar argument in the proof of Theorem 5, we get that \(R_i \le t(1-1/{{OPT}\atopwithdelims (){\alpha }})^i\). After \(r = {{OPT}\atopwithdelims (){\alpha }}\ln t\) rounds, the number of uncovered targets is at most one. Hence, after \(O(OPT^{\alpha }\ln t)\) rounds, all targets are covered. Let ALG be the number of elements chosen by the algorithm. Since we choose at most \(\alpha \) elements in each round, \(ALG = O(\alpha OPT^{\alpha }\ln t)\). Since \(ALG \le n\), \(ALG = O(\sqrt{n \cdot \alpha OPT^{\alpha }\ln t})\).

Let \(c = \frac{1}{\alpha }-\frac{\ln \ln (t^{\alpha })}{\alpha \ln n}\). When \(OPT \ge n^c\), the approximation ratio is \(n^{1-c}\). When \(OPT \le n^c\), \(ALG = O(\sqrt{n \cdot \alpha OPT^{\alpha -2}\ln t})OPT = O(\sqrt{n \cdot \alpha (n^c)^{\alpha -2}\ln t})OPT\). The proof then follows from Claim 1 and \(\alpha ^{\frac{1}{\alpha }}=O(1)\).     \(\square \)

2.2 The Second Algorithm

The second algorithm is designed for the 1-DR-\(\alpha \) problem when \(\alpha \ge 5\). It has a better approximation ratio than that of the previous algorithm when \(\alpha \ge 5\). The algorithm is suggested in Lemma 1: We first find a dominating set D by any \(O(\log n)\)-approximation algorithm. Let \(D' = D\). For any two vertices u and v in D, if \(m(u,v) \le 3\), we then add at most three vertices to \(D'\) so that \(m^{D'}(u,v) \le 3\).

Proof of Theorem 3: By Lemma 1, \(D'\) is a feasible solution for the 1-DR-\(\alpha \) problem when \(\alpha \ge 5\). Let \(OPT_{DS}\) be the size of the minimum dominating set in G. Let OPT be the size of the optimum of the 1-DR-\(\alpha \) problem. Since any feasible solution of the 1-DR-\(\alpha \) problem must be a dominating set, \(OPT_{DS} \le OPT\). Moreover, \(|D'| \le |D|+3{{|D|}\atopwithdelims (){2}} = O((\log n \cdot OPT_{DS})^2) = O((\log n \cdot OPT)^2)\). Since \(|D'| \le n\), we have \(|D'| = O(\sqrt{n \cdot (\log n \cdot OPT)^2}) = O(\sqrt{n}\log n)OPT\).     \(\square \)

3 Inapproximability Result

3.1 The MIN-REP Problem

We prove Theorem 4 by a reduction from the MIN-REP problem [15]. The input of the MIN-REP problem consists of a bipartite graph \(G=(X, Y, E)\), a partition of X, \(\mathcal {P}_X=\{X_1, X_2, \cdots , X_{k_X}\}\), and a partition of Y, \(\mathcal {P}_Y=\{Y_1, Y_2, \cdots , Y_{k_Y}\}\), such that \(\bigcup _{i=1}^{k_X}{X_i} = X\) and \(\bigcup _{i=1}^{k_Y}{Y_i} = Y\). Every \(X_i \in \mathcal {P}_X\) (respectively, \(Y_i \in \mathcal {P}_Y\)) has size \(|X|/k_X\) (respectively, \(|Y|/k_Y\)). \(X_1, X_2, \cdots , X_{k_X}\) and \(Y_1, Y_2, \cdots , Y_{k_Y}\) are called super nodes, and two super nodes \(X_i\) and \(Y_j\) are adjacent if some vertex in \(X_i\) and some vertex in \(Y_j\) are adjacent in G. If \(X_i\) and \(Y_j\) are adjacent, then \(X_i\) and \(Y_j\) form a super edge. In the MIN-REP problem, our task is to choose representatives for super nodes so that if \(X_i\) and \(Y_j\) form a super edge, then some representative for \(X_i\) and some representative for \(Y_j\) are adjacent in G. Note that a super node may have multiple representatives. Specifically, the goal of the MIN-REP problem is to find the smallest subset \(R \subseteq X \cup Y\) such that if \(X_i\) and \(Y_j\) form a super edge, then R must contain two vertices x and y such that \(x \in X_i\), \(y \in Y_j\) and \((x, y) \in E\). In this case, we say that \(\{x,y\}\) covers the super edge \((X_i, Y_j)\). The inapproximability result of the MIN-REP problem is stated as the following theorem.

Theorem 8

(Kortsarz et al. [17]). For any constant \(\epsilon > 0\), unless \(NP \subseteq DTIME(n^{poly\log n})\), there is no polynomial-time algorithm that can distinguish between instances of the MIN-REP problem with a solution of size \(k_X+k_Y\) and instances where every solution is of size at least \((k_X+k_Y) \cdot 2^{\log ^{1-\epsilon }{n(G)}}\), where n(G) is the number of vertices in the input graph of the MIN-REP problem.

3.2 The Reduction

Given inputs \(G=(X,Y,E)\), \(\mathcal {P}_X\), and \(\mathcal {P}_Y\) of the MIN-REP problem, we construct a corresponding graph \(G'(G,\mathcal {P}_X, \mathcal {P}_Y)\) of the 1-DR-2 problem. When G, \(\mathcal {P}_X\), and \(\mathcal {P}_Y\) are clear from the context, we simply write \(G'\) instead of \(G'(G,\mathcal {P}_X, \mathcal {P}_Y)\). Initially, \(G'=G\). Hence, \(G'\) contains X, Y, and E. For each super node \(X_i\) (respectively, \(Y_i\)), we create two corresponding vertices \(px^1_i\) and \(px^2_i\) (respectively, \(py^1_i\) and \(py^2_i\)) in \(G'\). If x is in super node \(X_i\) (respectively, y is in super node \(Y_i\)), then we add two edges \((x, px^1_i)\) and \((x, px^2_i)\) (respectively, \((y, py^1_i)\) and \((y, py^2_i)\)) in \(G'\). If \(X_i\) and \(Y_j\) form a super edge, then we add two vertices \(r^1_{i,j}\) and \(r^2_{i,j}\) to \(G'\), and we add four edges \((px^1_i, r^1_{i,j})\), \((r^1_{i,j}, py^1_j)\), \((px^2_i, r^2_{i,j})\), \((r^2_{i,j}, py^2_j)\) to \(G'\). \(r^1_{i,j}\) (respectively, \(r^2_{i,j}\)) is called the relay of \(px^1_i\) and \(py^1_j\) (respectively, \(px^2_i\) and \(py^2_j\)).

Before we complete the construction of \(G'\), we briefly explain the idea behind the construction so far. If two super nodes \(X_i\) and \(Y_j\) form a super edge, then \(px^I_i\) and \(py^I_j\) (\(I \in \{1,2\}\)) have a common neighbor in \(G'\), i.e., the relay \(r^I_{i,j}\). Because \(px^I_i\) and \(py^I_j\) are not adjacent, \(px^I_i\) and \(py^I_j\) form a target couple. To transform a solution D of the 1-DR-2 problem to a solution of the MIN-REP problem, we need to transform D to another feasible solution \(D'\) for the 1-DR-2 problem so that none of the relays is chosen, and only vertices in \(X \cup Y\) are used to connect \(px^I_i\) and \(py^I_j\). This is the reason that we have two corresponding vertices for each super node (and thus two relays for each super edge). Under this setting, to connect \(px^1_i\) to \(py^1_j\) and \(px^2_i\) to \(py^2_j\), choosing two vertices in \(X \cup Y\) is no worse than choosing the relays.

Fig. 1.
figure 1

An example of the reduction.

Let \(PX = \{px^1_1, px^1_2, \cdots , px^1_{k_X}\} \cup \{px^2_1, px^2_2, \cdots , px^2_{k_X}\}\) be the set of vertices in \(G'\) corresponding to the super nodes in \(\mathcal {P}_X\). Similarly, let \(PY = \{py^1_1, py^1_2, \cdots , py^1_{k_Y}\} \cup \{py^2_1, py^2_2, \cdots , py^2_{k_Y}\}\). Let R be the set of all relays. To complete the construction, we add four vertices (hubs) \(h_{X,R}\), \(h_{Y,R}\), \(h_{PX}\), and \(h_{PY}\) to \(G'\). In \(G'\), all the vertices in X, Y, PX, and PY are adjacent to \(h_{X,R}\), \(h_{Y,R}\), \(h_{PX}\), and \(h_{PY}\), respectively. Moreover, every relay is adjacent to \(h_{X,R}\) and \(h_{Y,R}\). These four hubs induce a 4-cycle \((h_{PX}, h_{Y,R}, h_{PY}, h_{X,R}, h_{PX})\) in \(G'\). Finally, for each hub h, we create two dummy nodes \(d_1\) and \(d_2\), and add two edges \((h, d_1)\) and \((h, d_2)\) to \(G'\). This completes the construction of \(G'\). Figure 1 shows an example of the reduction. Let H and M be the set of hubs and the set of dummy nodes, respectively. Hence, the vertex set of \(G'\) is \(X \cup Y \cup PX \cup PY \cup R \cup H \cup M\). Let N(u) be the set of neighbors of u in \(G'\). We then have

$$\begin{aligned}&N(px) \subseteq X \cup R \cup \{h_{PX}\} \text { if } px \in PX.&N(py) \subseteq Y \cup R \cup \{h_{PY}\} \text { if } py \in PY.\\&N(x) \subseteq PX \cup Y \cup \{h_{X,R}\} \text { if } x \in X.&N(y) \subseteq PY \cup X \cup \{h_{Y,R}\} \text { if } y \in Y.\\&N(h_{X,R}) \setminus M = X \cup R \cup \{h_{PX}, h_{PY}\}.&N(h_{Y,R}) \setminus M = Y \cup R \cup \{h_{PX}, h_{PY}\}.\\&N(h_{PX}) \setminus M = PX \cup \{h_{X,R}, h_{Y,R}\}.&N(h_{PY}) \setminus M = PY \cup \{h_{X,R}, h_{Y,R}\}.\\&N(m) \subseteq H \text { if } m \in M.&N(r) \subseteq PX \cup PY \cup \{h_{X,R}, h_{Y,R}\} \text { if } r \in R. \end{aligned}$$

Observe that \(|R|=O(n(G)^2)\). We have the following lemma.

Lemma 2

\(n(G')=O(n(G)^2)\).

It is easy to check that, for any two adjacent vertices u and v in \(G'\), u and v have no common neighbor. Hence, we have the following lemma.

Lemma 3

\(G'\) is triangle-free.

We say that a target couple [ab] is in [AB] if \(a \in A\) and \(b \in B\). It is easy to verify the following two lemmas.

Lemma 4

Only H can cover the target couples in [MM].

Lemma 5

H is a dominating set of \(G'\).

The proof of the following lemma can be found in the appendix.

Lemma 6

H covers all the target couples except those in [PXPY].

Let px and py be vertices in PX and PY, respectively. Observe that, if (pxxypy) is a path in \(G'\), then \(x \in X\) and \(y \in Y\). We then have the following lemma.

Lemma 7

D covers target couples \([px^1_i,py^1_j]\) and \([px^2_i,py^2_j]\) if and only if at least one of the following conditions is satisfied.

  1. 1.

    There exist \(x \in X\) and \(y \in Y\) such that \((px^1_i,x,y,py^1_j)\) and \((px^2_i,x,y,py^2_j)\) are paths in \(G'\) and \(\{x,y\} \subseteq D\).

  2. 2.

    \(\{r^1_{i,j}, r^2_{i,j}\} \subseteq D\).

3.3 The Analysis

Let \(I_{MR}\) be an instance of the MIN-REP problem with inputs G, \(\mathcal {P}_X\), and \(\mathcal {P}_Y\). Let \(I_D\) be the instance of the 1-DR-2 problem with input \(G'(G,\mathcal {P}_X,\mathcal {P}_Y)\). To prove the inapproximability result, we use the following two lemmas.

Lemma 8

If \(I_{MR}\) has a solution of size s, then \(I_D\) has a solution of size \(s+4\).

Lemma 9

If every solution of \(I_{MR}\) has size at least \(s \cdot 2^{\log ^{1-\epsilon }{n(G)}}\), then every solution of \(I_D\) has size at least \(s \cdot 2^{\log ^{1-\epsilon }{n(G)}}+4\).

Proof of Theorem 4: By Theorem 8, for any constant \(\epsilon > 0\), unless \(NP \subseteq DTIME(n^{poly\log n})\), there is no polynomial-time algorithm that can distinguish between instances of the MIN-REP problem with a solution of size \(k_X+k_Y\) and instances where every solution is of size at least \((k_X+k_Y) \cdot 2^{\log ^{1-\epsilon }{n(G)}}\). By the above two lemmas, it is hard to distinguish between instances of the 1-DR-2 problem with a solution of size \(k_X+k_Y+4\) and instances in which every solution is of size at least \((k_X+k_Y) \cdot 2^{\log ^{1-\epsilon }{n(G)}}+4\). Therefore, for any constant \(\epsilon > 0\), unless \(NP \subseteq DTIME(n^{poly\log n})\), there is no polynomial-time algorithm that can approximate the 1-DR-2 problem by a factor better than \(\frac{(k_X+k_Y)\cdot 2^{\log ^{1-\epsilon }{n(G)}}+4}{k_X+k_Y+4}\). Lemma 2 implies that, for any constant \(\epsilon ' > 0\), unless \(NP \subseteq DTIME(n^{poly\log n})\), there is no \(O(2^{\log ^{1-\epsilon '}{n(G')^{0.5}}})\)-approximation algorithm for the 1-DR-2 problem. By considering sufficiently large instances and a small enough \(\epsilon '\), we have the hardness result claimed in Theorem 4. On the other hand, let 1-DR-\(2'\) be the problem obtained by removing the constraint that any feasible solution must be a dominating set from the 1-DR-2 problem. Thus, in the 1-DR-\(2'\) problem, we only focus on covering target couples. By Lemmas 4 and 5, a solution D is feasible for the 1-DR-\(2'\) problem with input \(G'\) if and only if D is a feasible solution of \(I_D\). Thus, the inapproximability result also applies to the 1-DR-\(2'\) problem. Finally, the proof follows from Lemma 3.     \(\square \)

Lemma 8 is a direct result of the following claim.

Claim 2

If S is a feasible solution of \(I_{MR}\), then \(S \cup H\) is a feasible solution of \(I_D\).

Proof

Since H is a dominating set, by Lemma 6, it suffices to prove that every target couple \([u,v] = [px^{I_1}_i, py^{I_2}_j]\) in [PXPY] is covered by S. Note that \([px^{I_1}_i, py^{I_2}_j]\) cannot be a target couple if \(I_1 \ne I_2\). This is because \(px^{I_1}_i\) and \(py^{I_2}_j\) do not have a common neighbor if \(I_1 \ne I_2\). If \(I_1 = I_2\), then the common neighbor must be \(r^I_{i,j}\). By the construction of \(G'\), this implies that \(X_i\) and \(Y_j\) form a super edge. Since S is a feasible solution of \(I_{MR}\), there exists \(x \in X_i\) and \(y \in Y_j\) such that x and y are adjacent in G and \(\{x,y\} \subseteq S\). Again, by the construction of \(G'\), (uxyv) is a path in \(G'\). Hence, \(S \supseteq \{x,y\}\) covers [uv].     \(\square \)

To prove Lemma 9, we use the following claim.

Claim 3

\(I_D\) has an optimal solution \(D^*\), such that \(D^* \setminus H\) is a feasible solution of \(I_{MR}\).

Proof of Claim 3: Let \(D_{OPT}\) be any optimal solution of \(I_D\). By Lemmas 4, 6, and 7, \(D_{OPT} \subseteq H \cup X \cup Y \cup R\). If \(D_{OPT} \cap R = \emptyset \), by Lemma 7, each target couple \([px^I_i,py^I_j]\) is covered by some \(x \in X\) and some \(y \in Y\). By the construction of \(G'\), such x and y also cover the super edge \((X_i, Y_j)\) in \(I_{MR}\). Because each super edge in \(I_{MR}\) has a corresponding target couple in \(I_D\), \(D_{OPT} \setminus H\) is a feasible solution of \(I_{MR}\).

If \(D_{OPT} \cap R \ne \emptyset \), then some \(r^I_{i,j} \in D_{OPT}\). We can further assume that both \(r^1_{i,j}\) and \(r^2_{i,j}\) are in \(D_{OPT}\); otherwise, by Lemma 7, we can remove \(r^I_{i,j}\) from \(D_{OPT}\), the resulting solution is smaller and is still feasible. We then replace \(r^1_{i,j}\) and \(r^2_{i,j}\) with some \(x \in X\) and some \(y \in Y\) satisfying the first condition in Lemma 7. By Lemma 7, the resulting solution is still feasible, and the size remains the same. Repeat the above replacing process until the resulting solution does not contain any relay. The proof then follows from the argument of the case where \(D_{OPT} \cap R = \emptyset \).    \(\square \)

Proof of Lemma 9: Let \(S^*\) be the optimal solution of \(I_{MR}\). By the assumption, we have \(|S^*| \ge s \cdot 2^{\log ^{1-\epsilon }{n(G)}}\). It suffices to prove that \(S^* \cup H\) is an optimal solution for \(I_D\), which implies that every feasible solution of \(I_D\) has size at least \(|S^* \cup H| = |S^*| + 4 \ge s \cdot 2^{\log ^{1-\epsilon }{n(G)}} + 4\). The feasibility of \(S^* \cup H\) follows from Claim 2. For the sake of contradiction, assume that the optimal solution of \(I_D\) has size smaller than \(|S^* \cup H|=|S^*|+4\). Claim 3 and Lemma 4 then imply that \(S^*\) is not an optimal solution of \(I_{MR}\), which is a contradiction.     \(\square \)