Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper, we investigate the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem, which is obtained from the study of the homophyly law [8, Chap. 4] of large scale networks. Being one of the basic laws governing the structures of large scale networks, the homophyly law states that edges in a network tend to connect nodes with the same or similar attributes, just as an old proverb says, “birds of a feather flock together”. For example, in a paper citation network, papers are more likely to cite papers with which they have the same or similar keywords.

While it is common to list keywords in a paper by its authors, in a paper citation network there are still many papers whose keywords are not explicitly given. Consequently, it is natural to predict keywords for these papers using the homophyly law. Inspired by this observation, Zhang (the first author of this paper) and Li [21] proposed the Maximum Happy Edges (MHE) problem. In the MHE problem, we are given an undirected graph \(G = (V, E)\) and a color set \(C = \{1, 2, \cdots , k\}\). Only part of vertices are given colors in C. An edge is happy if its two endpoints share the same color. The goal of MHE is to color all the uncolored vertices such that the number of happy edges is maximized. Here, vertices correspond to papers, edges correspond to citations (neglecting directions), and colors correspond to keywords.

A natural variant of MHE is that in the input graph all vertices are uncolored and the problem just asks to color them in k colors such that the number (or total weight) of happy edges is maximized. This suggests the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem we investigate in this paper.

Definition 1

The \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) Problem

(Instance). We are given an undirected graph \(G = (V, E)\) with nonnegative edge weights \(\{w_e \mid e \in E\}\), and a positive integer k.

(Goal). The problem asks to find a partition \(\{V_1, V_2, \cdots , V_k\}\) of V (i.e., to find a k-coloring of vertices) such that the total weight of happy edges is maximized.

In the definition of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), by k-coloring we mean a coloring scheme that uses exactly k colors, which results in a k-partition \(\{V_1, V_2, \cdots , V_k\}\), where \(V_i\) is the set of vertices whose color is i. In the paper we will interchangeably use k-coloring and k-partition. Note that the requirement of exactly k colors is necessary, otherwise we can color all vertices in one color and all edges are happy.

In the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem, if \(k=1\) or \(k=n\), the problem becomes trivial. The optimum would be respectively the number of all edges and 0 in these two cases. So, throughout the paper we always assume \(2 \le k \le n-1\) for the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem.

Note that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is not a special case of MHE. In MHE, if all vertices are un-colored, then the problem becomes trivial: Just color all vertices in one color, then all edges will become happy. In contrast, if all vertices in \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) are uncolored, we cannot color them in one color. In \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), we must figure out a k-coloring.

Two problems that are closely related to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) have already appeared in literature. Choudhurya et al. [7] proposed the capacitated \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem. Given an undirected graph G and k integers \(s_1, s_2, \cdots , s_k\), this problem is to partition V(G) into k subsets of sizes \(s_1, s_2, \cdots , s_k\) respectively, such that the total weight of happy edges is maximized. Very recently, Wu et al. [18] studied the balanced Max 3-Uncut problem, in which an input graph is partitioned into 3 equal-sized parts so that the total weight of happy edges is maximized.

Notations and Terms. Some common notations and terms are listed here. Given a graph G, let n be the number of its vertices. Given an optimization problem, let \({\text {OPT}}\) denote the value of its optimal solution. By r-clique for some integer r, we mean a clique (i.e., a complete subgraph) that contains exactly r vertices.

1.1 Related Work

To the best of our knowledge, the general \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem is new and has not been studied in literature. Though it is new, \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) has rich connection to the classic and existing problems.

\(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is just the complement of the classic \(\mathsf{Min}~k\text {-}\mathsf{Cut}\) problem. The \(\mathsf{Min}~k\text {-}\mathsf{Cut}\) problem asks for a k-partition such that the total weight of cut edges is minimized. The \(\mathsf{Min}~k\text {-}\mathsf{Cut}\) problem is strongly NP-hard [12], so is the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem. The best approximation ratio for \(\mathsf{Min}~k\text {-}\mathsf{Cut}\) is 2 [17]. When k is a constant, the \(\mathsf{Min}~k\text {-}\mathsf{Cut}\) problem can be optimally solved in polynomial time [12]. Obviously, \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) with constant k is also polynomial time solvable. In a word, \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is strongly NP-hard (when k is given in the input), and is polynomial time solvable when k is a constant.

Previously we have pointed out two closely related variants of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), i.e., the capacitated \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem and the balanced Max 3-Uncut problem. Using the heuristic of local search, Choudhurya et al. [7] gave a \(\frac{1}{d(k-1)+1}\)-approximation algorithm for capacitated \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), where d is the ratio of the largest size and the smallest size in the partition. This ratio is somewhat poor and cannot extend to the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem studied in this paper. Using the semidefinite programming technique, Wu et al. [18] gave a 0.3456-approximation algorithm for the balanced Max 3-Uncut problem.

The cut problems are classic and rich. They play an important role in the study of approximation algorithms and operations research. In literature, the “uncut” problems are also been studied. Besides \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), three examples are Min Uncut [1], Multiway Uncut [15, 20], and the complement of Min Bisection [19]. Min Uncut is the complement of the classic Max Cut problem. Agarwal et al. [1] gave an \(O(\sqrt{\log n})\)-approximation algorithm for Min Uncut, where n is the number of vertices in the input graph. Multiway Uncut is the complement of the classic Multiway Cut problem [5, 6]. Langberg et al. [15] proposed the Multiway Uncut problem. The current best approximation ratio for Multiway Uncut is \(\frac{1}{2} + \frac{\sqrt{2}}{4}f(k) \ge 0.8535\) [20], where \(f(k) \ge 1\) is a function of k. Ye and Zhang [19] gave a 0.602-approximation algorithm for the complement of the Min Bisection problem.

Due to the close relation of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) to Multiway Uncut, we have to say more about Multiway Uncut. Given a graph \(G = (V, E)\) with edge weights and a terminal set \(\{s_1, s_2, \cdots , s_k\} \subseteq V\), the Multiway Uncut problem asks a partition of V that separates the k terminals from each other and maximizes the total weight of happy edges. (Multiway Uncut is a special case of MHE.) Both \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and Multiway Uncut ask for a k-partition. The only difference is that in \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), there is no terminal, and in Multiway Uncut, there are terminals.

Another closely related problem is \(\mathsf{Max}~k\text {-}\mathsf{Cut}\), which is the problem to find a k-partition such that the total weight of cut edges is maximized. When \(k = 2\), \(\mathsf{Max}~k\text {-}\mathsf{Cut}\) (namely, Max Cut) is already NP-hard. The current best approximation ratio for Max Cut is 0.87856, given by Goemans and Williamson [11] using the semidefinite programming technique. Frieze and Jerrum [10] extended Goemans-Williamson’s technique to the \(\mathsf{Max}~k\text {-}\mathsf{Cut}\) problem, obtained the approximation ratio \(\alpha _k = 1-\frac{1}{k}+(1+\epsilon (k))\frac{2\ln k}{k^2}\) for \(\mathsf{Max}~k\text {-}\mathsf{Cut}\), where \(\epsilon (k)\) is a function of k which tends to zero as \(k \rightarrow \infty \). When \(k = 3, 4, 5\), \(\alpha _k\) is no less than 0.800217, 0.850304, and 0.874243, respectively.

1.2 Our Results

In this paper, we give three approximation algorithms for the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem and prove a (weak) approximation hardness result of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). These three algorithms share the same idea, which is simple but powerful: To find a k-partition with many happy edges, one may just find a dense subgraph as large as possible. The subgraph is used as one part of the k-partition. The larger and denser the subgraph is, the more happy edges we will get. Along this line, we finally find that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is in fact equivalent to the \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) problem in approximability (up to a factor of 2). Note that \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) is one of the current hot topics in approximation algorithms. This may be our most important find in this paper.

The first algorithm is a randomized algorithm (Algorithm 2.1) whose approximation ratio is \((1-\frac{k}{n})^2\). This algorithm can be derandomized in polynomial time. The second algorithm is a greedy algorithm (Algorithm 2.2) whose approximation ratio is \(1-\frac{2(k-1)}{n}\). While the ratios of these two algorithms are very close, they are still incomparable. Specifically, when \(k < \sqrt{2n}\), the ratio \(1-\frac{2(k-1)}{n}\) is better than the ratio \((1-\frac{k}{n})^2\). Otherwise (when \(k > \sqrt{2n}\)), the latter is better than the former.

The ratio \(\rho = \max \{(1-\frac{k}{n})^2, 1-\frac{2(k-1)}{n}\}\) for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) we obtain so far is already good when k is not too large. For example, if \(k \le n/2\), then \(\rho \ge 1/4\). However, when k approaches \(n-1\), \(\rho \) becomes worse and worse, and equals to \(\frac{1}{n^2}\) finally. This observation suggests that the most difficult case of approximating \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) should be the case when k is close to n, say, \(k = n - O(\log n)\). And in this case (i.e., when k is large), we may make use of the connection to \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\).

Therefore, in the third algorithm (Algorithm 2.3), we reduce \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) to \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) (for some suitable \(\bar{k}\)) by exploring the structure of optimal solutions to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). It is convenient to define the \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) problem here.

Definition 2

The \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) Problem

(Instance). We are given an undirected graph \(G = (V, E)\) with nonnegative edge weights \(\{w_e \mid e \in E\}\), and a positive integer k.

(Goal). The problem asks to find a k-vertex subgraph \(G'\) such that the total weight of edges in \(E(G')\) is maximized.

The reduction used in Algorithm 2.3 is nontrivial. Let \(\alpha \) be the approximation ratio for \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\). Then Algorithm 2.3 approximates \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) within \(\frac{1}{2} \alpha \) in polynomial time. The current best value of \(\alpha \) is \(\varOmega (1/n^{\frac{1}{4}+\epsilon })\) [4]. Consequently, Algorithm 2.3 repairs the deficiencies of Algorithms 2.1 and 2.2. Now, the approximation ratio we obtain for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is \(\max \{\rho , \frac{1}{2} \alpha \}\).

Surprisingly and interestingly, our technique in the analysis of Algorithm 2.3 also implies that if \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) can be approximated within a factor of \(\beta \), then \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) can be approximated within \(\frac{1}{2} \beta \). Therefore, \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) are equivalent in approximability up to a factor of 2. This reveals the strong connection between \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\), and may open a new viewpoint in tackling the \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) problem, since this problem is known as a notorious hard problem in approximation algorithms. (There is a wide gap between its best approximation factor and its best hardness factor).

Next, we prove an approximation hardness result for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\): For any small constant \(\epsilon > 0\), \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) cannot be approximated within \(1-\frac{1}{2n^\epsilon }\) in polynomial time, where n is the number of vertices in the input graph. This is proved via a gap-preserving reduction from the hardness result of the Max Clique problem [3, 13]. As a result, the hardness \(1-\frac{1}{2n^\epsilon }\) for any small constant \(\epsilon > 0\) implies that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) does not admit FPTAS.

Honestly speaking, this hardness result is weak since \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is indeed strongly NP-hard, and the strong NP-hardness already rules out FPTAS. However, we make twofold contribution in proving the approximation hardness of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). First, we give an explicit expression of the approximation hardness factor of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), instead of just speaking that it is strongly NP-hard. Second, we prove a technical lemma (Lemma 2), which gives an upper bound of the number of happy edges that can be produced by any k-partition on a graph with no \((r+1)\)-clique. The technical lemma is of independent interest and may find more applications in related problems. In fact, the upper bound is obtained by a special k-partition which consists of \(k-1\) singletons and one subset of size \(n-(k-1)\). This again hints the connection of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) to \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) and Max Clique.

2 Approximation Algorithms

2.1 A Randomized Algorithm

A straightforward idea for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is to color vertices randomly. However, if we color every vertex randomly, we may not get an approximation algorithm with good ratio. (We can prove that an algorithm of this type has approximation ratio \(\frac{1}{k} - \frac{k-1}{n(n-1)}\). The details are omitted here.)

In graphs with only unit weight on edges, to maximize the total weight of happy edges is equivalent to leave as many as possible edges uncut. So, a clever randomized strategy is to randomly color \(k-1\) vertices only, making the remaining vertices as many as possible. Intuitively, these many vertices would induce many happy edges. Algorithm \(\mathcal R\) below is a randomized algorithm for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) of this idea.

figure a

Let \(W_{tot}\) be the total weight of edges in graph G.

Theorem 1

Algorithm \(\mathcal R\) is a randomized \(\left( 1-\frac{k}{n}\right) ^2\)-approximation algorithm for the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem.

Proof

First note that Algorithm \(\mathcal R\) runs in polynomial time. Let \(V_i\) be the set of vertices of color i. Take any edge \(e = (u, v)\). Then, e is happy (uncut) if and only if both u and v are not chosen in the first \(k-1\) random choices (step 1). This means that

$$\begin{aligned} \Pr [\text {edge}~e~\text {is happy}] = \frac{{{n-2} \atopwithdelims (){k-1}}}{{n \atopwithdelims (){k-1}}} = \frac{(n-k+1)(n-k)}{n(n-1)} > \frac{(n-k)^2}{n^2} = \left( 1-\frac{k}{n}\right) ^2 . \end{aligned}$$

Let SOL be the solution value obtained by Algorithm \(\mathcal R\). Therefore, we have

$$\begin{aligned} {\text {E}}[SOL] = \sum _{e \in E} w_e \cdot \Pr [\text {edge}~e~\text {is happy}] \ge \left( 1-\frac{k}{n}\right) ^2 W_{tot}. \end{aligned}$$

On the other hand, the optimum \({\text {OPT}}\) is obviously at most \(W_{tot}\). So, the approximation ratio of Algorithm \(\mathcal R\) is at least \(\left( 1-\frac{k}{n}\right) ^2\).    \(\square \)

Algorithm \(\mathcal R\) can be derandomized by the conditional expectation method in polynomial time. This is sketched as follows in rounds. In the first round we determine the first vertex to be removed. We remove each vertex \(v_i\) (\(1 \le i \le n\)) from G to obtain \(G_i\). That is, \(\forall 1 \le i \le n\), \(G_i = G \setminus v_i\). For each \(G_i\), we compute the expected solution value \(a_i\) of Algorithm \(\mathcal R\) for the \(\mathsf{Max}(k-1)\text {-}\mathsf{Uncut}\) problem. We find the largest expected value in this round, say \(a_j\). Then, \(v_j\) is the first vertex we pick and is colored in color 1. The next round begins from G with \(v_j\) removed. Repeating the above procedure for \(k-1\) rounds, we obtain a solution whose value is at least as better as the expected value of Algorithm \(\mathcal R\).

2.2 A Greedy Algorithm

The idea in Sect. 2.1 can be restated as finding a subgraph of size \(n-k+1\) as dense as possible, where by dense subgraph we mean a subgraph whose total weight of edges is as much as possible. This leads to a greedy algorithm for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), shown as Algorithm \(\mathcal G\) below. For the sake of description, we define the weighted degree \(d_w(v)\) of a vertex v as the sum of weights of edges incident to v. By definition, the weight of vertex v is equal to the capacity of the cut \((\{v\}, V \setminus \{v\})\). Obviously, when each edge in the graph has unit weight, the weighted degree of a vertex is simply its degree.

figure b

Theorem 2

Algorithm \(\mathcal G\) is a \(\left( 1-\frac{2(k-1)}{n}\right) \)-approximation algorithm for the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem.

Proof

Algorithm \(\mathcal G\) obviously runs in polynomial time. Let \(v_1, \cdots , v_{k-1}\) be the vertices picked in the first step of Algorithm \(\mathcal G\). By the algorithm, only edges incident to vertices in \(\{v_1, \cdots , v_{k-1}\}\) would be unhappy. So, the total weight of unhappy edges is at most

$$\begin{aligned} \sum _{i=1}^{k-1} d_w(v_i) \le \frac{k-1}{n}\sum _{v} d_w(v) = \frac{2(k-1)}{n} W_{tot}. \end{aligned}$$

Therefore, the total weight of happy edges is at least

$$\begin{aligned} W_{tot} - \frac{2(k-1)}{n} W_{tot}. \end{aligned}$$

Since \({\text {OPT}}\le W_{tot}\), this means the approximation ratio of Algorithm \(\mathcal G\) is at least \(1 - \frac{2(k-1)}{n}\).    \(\square \)

The approximation ratios \((1-\frac{k}{n})^2\) and \(1-\frac{2(k-1)}{n}\) behave well when k is not too large. For example, \((1-\frac{k}{n})^2 \ge \frac{1}{4}\) when \(k \le \frac{n}{2}\). However, when k is large enough, say, \(k = n - O(\log n)\), the approximation ratio \(\max \{(1-\frac{k}{n})^2, 1-\frac{2(k-1)}{n}\}\) we obtained so far becomes bad. To remedy this deficiency, we design another approximation algorithm for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), that is, Algorithm \(\mathcal T\) in Sect. 2.3. Actually, our subsequent study on \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) in this paper makes us realize that the hard core of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) just lies in the case when k is large.

2.3 Reduces to Densest k-Subgraph

In this section, we reduce \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) to \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) for some suitable \(\bar{k}\). For clarity, when the instance of \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) is given as, e.g., \((G, w, {\bar{k}})\), we call it the instance of the \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) problem. The reader should be aware of that \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) and \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) are the same problem. This usage also happens to the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem.

Given a vertex subset S of an edge-weighted graph G, let w(S) denote the total weight of happy edges induced by S. Given a k-partition \(\mathcal{P} = \{V_1, V_2, \cdots , V_k\}\) of V(G), let \(w(\mathcal{P})\) denote the total weight of happy edges induced by \(\mathcal P\), i.e., \(w(\mathcal{P}) = \sum _i w(V_i)\).

First we prove a technical lemma.

Lemma 1

Let \(\mathcal{P} = \{V_1, V_2, \cdots , V_k\}\) be a k-partition of graph G with weights defined on edges. Then in polynomial time (in terms of |V(G)|) we can construct a k-partition \(\mathcal{P}' = \{V'_1, V'_2, \cdots , V'_k\}\) which satisfies

  1. (i)

    \(|V'_1| = \cdots = |V'_{k-1}| = 1\), \(|V'_k| = n-k+1\), and

  2. (ii)

    \(w(\mathcal{P}') \ge \frac{1}{2} w(\mathcal{P})\).

Proof

We renumber the vertex subsets in \(\mathcal P\) according to the non-decreasing order of their \(w(\cdot )\) values, and rewrite \(\mathcal P\) as \(\{R_1, R_2,\) \(\cdots ,\) \(R_a, S_1, S_2,\) \(\cdots ,\) \(S_b\}\), where we assume that in \(\mathcal P\) there are a singletons \(R_1, \cdots , R_a\), and b non-singletons \(S_1, \cdots , S_b\) (that is, each \(S_i\) has size at least two). So, we have \(a + b = k\),

$$\begin{aligned} w(R_1) \le \cdots \le w(R_a) \le w(S_1) \le \cdots \le w(S_b), \end{aligned}$$

and

$$\begin{aligned} w(\mathcal{P}) = w(S_1) + \cdots + w(S_b). \end{aligned}$$

Note that a may be zero.

If \(b=1\), then the theorem is proved by just letting \(\mathcal{P}' = \mathcal{P}\). So, in the following we assume that \(b \ge 2\). We shall convert \(S_1, \cdots , S_{b-1}, S_b\) to \(S'_1, \cdots , S'_{b-1}, S'_b\) such that the first \(b-1\) \(S'_i\)’s are singletons. This is done as follows.

We pick the unique \(\ell \in \left[ 1, \lceil \frac{b-1}{2}\rceil \right] \) such that

$$\begin{aligned} |S_1| + |S_2| + \cdots + |S_\ell | \ge b-1 \end{aligned}$$

and

$$\begin{aligned} |S_1| + |S_2| + \cdots + |S_{\ell -1}| < b-1. \end{aligned}$$
(1)

Note that it may be the case that \(\ell = 1\), and in this case we do not need the condition (1). Also note that since \(b \ge 2\) and \(\forall 1\le i \le b\), \(|S_i| \ge 2\), \(\ell \) must be at most \(\lceil \frac{b-1}{2}\rceil \).

Initially \(S'_b\) is empty. We merge all vertices in \(S_{\ell +1}, \cdots , S_b\) into \(S'_b\). Then we pick arbitrarily \(b-1\) vertices from \(S_1\), \(\cdots \), \(S_\ell \) to make \(b-1\) singletons \(S'_1\), \(S'_2\), \(\cdots \), \(S'_{b-1}\). If there are still remaining vertices in \(S_1\), \(\cdots \), \(S_\ell \) (in case that \(|S_1| + |S_2| + \cdots + |S_\ell | > b-1\)), we then move all of them to \(S'_b\). This finishes the construction of \(S'_1\), \(\cdots \), \(S'_{b-1}\), \(S'_b\).

Since \(\ell \le \lceil \frac{b-1}{2}\rceil \le \frac{1}{2} b\), the number of subsets \(S_{\ell +1}\), \(\cdots \), \(S_b\) is at least half of b. In the above construction, all the happy edges in these subsets are kept in \(S'_b\). Since \(S_1\), \(\cdots \), \(S_b\) are in the non-decreasing order of the total weights of happy edges they contain, we know that

$$\begin{aligned} w(S'_b) \ge \frac{1}{2} (w(S_1) + \cdots + w(S_b)) \end{aligned}$$

The desired k-partition \(\mathcal{P}'\) is just \(\{R_1, \cdots , R_a, S'_1, \cdots , S'_b\}\).    \(\square \)

figure c

Algorithm \(\mathcal T\) is the algorithm reducing \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) to \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\). Since the subgraph \(G'\) found in step 2 contains \(\bar{k}\) vertices, there are exactly \(n - \bar{k} = k-1\) vertices in \(V(G) \setminus V(G')\). So, in step 4 we can color them in colors \(1, \cdots , k-1\), respectively.

Theorem 3

Let \(\alpha \) be the approximation ratio of \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\). Then Algorithm \(\mathcal T\) is a \(\frac{\alpha }{2}\)-approximation algorithm for the \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) problem.

By [4], \(\alpha \) can be \(\varOmega (1/n^{1/4+\epsilon })\) for every small constant \(\epsilon > 0\). (The ratio in [4] is for unweighted \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\). Using the technique in [9], this ratio can be extended to weighted \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\).) This means that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) can be approximated within \(\frac{1}{2} \alpha = \varOmega (1/n^{1/4+\epsilon })\) in polynomial time.

Proof

(of Theorem 3 ). Let \({\text {OPT}}_{\text {M}k\text {U}}\) be the optimal value of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) on instance (Gwk). Let \(\mathcal{P}^* = \{V_1, V_2, \cdots , V_k\}\) be the corresponding optimal solution. By Lemma 1, we can build a k-partition \(\mathcal{P}' = \{V'_1, V'_2, \cdots , V'_k\}\) from \(\mathcal{P}^*\) such that \(V'_1\), \(\cdots \), \(V'_{k-1}\) are singletons. This means that \(|V'_k| = n-(k-1)\). So, \(V'_k\) is a feasible solution to \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) on instance \((G, w, \bar{k})\) satisfying

$$\begin{aligned} w(V'_k) = w(\mathcal{P}') \ge \frac{1}{2} w(\mathcal{P}^*) = \frac{1}{2} {\text {OPT}}_{ M{ k}U }, \end{aligned}$$
(2)

where the inequality is by Lemma 1.

Note that \(G'\) is the subgraph found in step 2 by the approximation algorithm for \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\). There are \(k-1\) vertices in \(V(G) \setminus V(G')\). Then, step 4 builds \(k-1\) singletons using the vertices in \(V(G) \setminus V(G')\). These singletons, together with \(V(G')\), constitute a k-partition, denoted by \(\mathcal P\), which is a feasible solution to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). We have

$$\begin{aligned} w(\mathcal{P}) = w(V(G')) \ge \alpha \cdot {\text {OPT}}_{\text {D}\bar{k}\text {S}} \ge \alpha \cdot w(V'_k) \mathop {\ge }_{(2)} \frac{\alpha }{2} \cdot {\text {OPT}}_{\text {M}k\text {U}}, \end{aligned}$$

where the first inequality holds since \(G'\) is an \(\alpha \)-approximate solution to the \(\mathsf{Densest}~\bar{k}\text {-}\mathsf{Subgraph}\) instance \((G, w, \bar{k})\), and the second inequality holds since \(V'_k\) is a feasible solution to \((G, w, \bar{k})\). The theorem is proved.    \(\square \)

Note that the running time of Algorithm \(\mathcal T\) does not depend on the construction time of the k-partition in Lemma 1. Lemma 1 is only used in the analysis of Algorithm \(\mathcal T\). (This construction time is useful in the following Algorithm \(\mathcal C\).)

Interestingly and somewhat surprisingly, Lemma 1 also implies the converse of Theorem 3: \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) reduces to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). This is shown in Algorithm \(\mathcal C\) and Theorem 4.

figure d

Theorem 4

If \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) can be approximated within a factor of \(\alpha \), then \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) can be approximated within a factor of \(\alpha /2\).

Proof

We design Algorithm \(\mathcal C\) as the approximation algorithm for \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\). Step 2 calls the supposed \(\alpha \)-approximation algorithm for \(\mathsf{Max}~\bar{k}\text {-}\mathsf{Uncut}\). By Lemma 1, step 3 can be finished in polynomial time. Therefore, the overall running time of Algorithm \(\mathcal C\) is polynomial.

Let \(V^*\) be an optimal solution to the \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) instance (Gwk), whose value is denoted by \({\text {OPT}}_{\text {D}k\text {S}}\). Note that in Algorithm \(\mathcal C\) we have \({\bar{k}} = n-k+1\). By viewing each vertex in \(V(G) \setminus V^*\) as a singleton, we can build a \(\bar{k}\)-partition \(\mathcal{P}^\circ = \{V^\circ _1, V^\circ _2, \cdots , V^\circ _{\bar{k}}\}\), where \(V^\circ _{\bar{k}} = V^*\). Obviously we have \(w(\mathcal{P}^\circ ) = {\text {OPT}}_{\text {D}k\text {S}}\). A crucial observation is that \(\mathcal{P}^\circ \) is a feasible solution to the \(\mathsf{Max}~\bar{k}\text {-}\mathsf{Uncut}\) instance \((G, w, {\bar{k}})\). This helps us get the connection

$$\begin{aligned} {\text {OPT}}_{\text {M}\bar{k}\text {U}} \ge {\text {OPT}}_{\text {D}k\text {S}}, \end{aligned}$$
(3)

where \({\text {OPT}}_{\text {M}\bar{k}\text {U}}\) is the optimal value of the instance \((G, w, {\bar{k}})\) of \(\mathsf{Max}~\bar{k}\text {-}\mathsf{Uncut}\).

For the two \(\bar{k}\)-partitions \(\mathcal P\) and \(\mathcal{P}'\), we have \(w(\mathcal{P}') \ge \frac{1}{2} w(\mathcal{P})\) by Lemma 1. Since \(\mathsf{Max}~\bar{k}\text {-}\mathsf{Uncut}\) can be approximated within \(\alpha \), we have \(w(\mathcal{P}) \ge \alpha \cdot {\text {OPT}}_{\text {M}\bar{k}\text {U}}\). These facts, together with (3), conclude the theorem.    \(\square \)

Theorems 3 and 4 show that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) are in fact equivalent in approximability up to a factor of two.

3 Approximation Hardness

3.1 Ruling Out Constant Factor Approximation

The approximability equivalence (up to a factor of two) of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) naturally suggests that the approximation hardness results of \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) may extend to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\). In particular, the following conditional hardness result holds.

Corollary 1

If \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) cannot be approximated within any constant factor, then so do \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\).

Under some appropriate complexity assumptions, people indeed proved that \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) cannot be approximated within any constant factor. Raghavendra and Steurer [16] proved that assuming that the Unique Games with Small Set Expansion conjecture is true, it is NP-hard to approximate the \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) problem within any constant factor. Alon et al. [2] also ruled out constant approximation factor for \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\), under an average case hardness assumption. For the exact meaning of these complexity assumptions, we refer the reader to [2, 16].

Khot [14] proved that assuming \(\text {NP} \not \subseteq \cap _{\epsilon > 0} \text {BPTIME}(2^{n^\epsilon })\), \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) has no PTAS. However, this result cannot be extended to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) directly, since the approximability equivalence of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) and \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) proved above omits a constant factor 2.

3.2 An Explicit Hardness Factor

The approximation hardness results for \(\mathsf{Densest}~k\text {-}\mathsf{Subgraph}\) mentioned above all use stronger complexity assumptions than the general assumption \(\text {P} \ne \text {NP}\). In the following, we shall prove an approximation hardness result for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), assuming that \(\text {P} \ne \text {NP}\). The proved hardness factor is \(1-\frac{1}{2n^{\epsilon }}\), where \(\epsilon > 0\) is an arbitrarily small constant, and n is the vertex number of the input graph. This result implies that, if \(\text {P} \ne \text {NP}\), \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) does not admit FPTAS.

The hardness result \(1-\frac{1}{2n^{\epsilon }}\) for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is rather weak since \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) is strongly NP-hard, and this (the strong NP-hardness) already rules out FPTAS. However, we make twofold contribution in proving such a result. First, we give an explicit expression of the approximation hardness factor of \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\), instead of just speaking that it is strongly NP-hard. Second, we prove a technical lemma (Lemma 2), which gives an upper bound of the number of happy edges that can be produced by any k-partition on a graph with no \((r+1)\)-clique. The technical lemma is of independent interest and may find more applications in related problems.

Lemma 2

Any k-partition on an n-vertex undirected graph with no \((r+1)\)-clique can produce at most \(\frac{1}{2} (1-\frac{1}{r})u^2\) happy edges, where \(u = n-(k-1)\).

Håstad [13] proved the following remarkable approximation hardness result for the Max Clique problem: For any \(\epsilon > 0\), unless \(\text {P} = \text {NP}\), there is no polynomial time algorithm that approximates Max Clique within a factor of \(n^{1/2-\epsilon }\), where n is the vertex number of the input graph. By this result and Lemma 2, we can prove that

Theorem 5

For any \(\epsilon > 0\), unless \(\text {P} = \text {NP}\), there is no polynomial time algorithm that approximates \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) within a factor of \(1-\frac{n^{1/2-\epsilon }-1}{n^{1/2}-1}\), where n is the vertex number of the input graph. The hardness factor is \(\le 1-\frac{1}{2n^{\epsilon }}\) for sufficiently large n.

The proof of Lemma 2 is rather complicated. Due to space limitation, the proofs of Lemma 2 and Theorem 5 are omitted here and will be given in the journal version of the paper.

Håstad [13] also proved that assuming \(\text {ZPP} \ne \text {NP}\), Max Clique cannot be approximated within \(n^{1-\epsilon }\) for any small constant \(\epsilon > 0\). However, for technical reasons, this stronger hardness factor cannot improve the result of Theorem 5 accordingly.

A corollary of Theorem 5 is that \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) has no FPTAS, if \(\text {P} \ne \text {NP}\).

Corollary 2

\(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) does not admit FPTAS, if \(\text {P} \ne \text {NP}\).

Proof

Suppose for contradiction that there is an FPTAS for \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) which for any small \(\epsilon ' > 0\), gets a \((1-\epsilon ')\)-approximation to \(\mathsf{Max}~k\text {-}\mathsf{Uncut}\) instance I, with running time \({\text {poly}}(\frac{1}{\epsilon '}, |I|)\), where |I| denotes the length of instance I, and \({\text {poly}}()\) denotes some polynomial. Given any small constant \(\epsilon > 0\), if we set \(\epsilon ' = \frac{1}{2n^{\epsilon }}\) and run the FPTAS, where n is the number of vertices in the input graph, then we can get a \((1 - \frac{1}{2n^{\epsilon }})\)-approximation to instance I in time \({\text {poly}}(2n^{\epsilon }, |I|) = {\text {poly}}(|I|)\), contradicting Theorem 5.    \(\square \)