1 Introduction

Finding dense components in a graph is an active research topic in graph mining. Techniques for identifying dense subgraphs have been used in various applications. For example, in Web graph analysis, they are used for detecting communities (i.e., sets of web pages dealing with the same or similar topics) [9] and spam link farms [13]. As another example, in bioinformatics, they are used for finding molecular complexes in protein–protein interaction networks [4] and identifying regulatory motifs in DNA [11]. Furthermore, they are also used for expert team formation [6, 21] and real-time story identification in micro-blogging streams [2].

To date, various optimization problems have been considered to find dense components in a graph. The densest subgraph problem is one of the most well-studied optimization problems. Let \(G=(V,E,w)\) be an edge-weighted undirected graph consisting of \(n=|V|\) vertices, \(m=|E|\) edges, and a weight function \(w:E\rightarrow \mathbb {Q}_{>0}\), where \(\mathbb {Q}_{>0}\) is the set of positive rational numbers. For a subset of vertices \(S\subseteq V\), let G[S] be the subgraph induced by S, i.e., \(G[S]=(S,E(S))\), where \(E(S)=\{\{i,j\}\in E\mid i,j\in S\}\). The density of \(S\subseteq V\) is defined as w(S) / |S|, where \(w(S)=\sum _{e\in E(S)}w(e)\). In the (weighted) densest subgraph problem, given an (edge-weighted) undirected graph \(G=(V,E,w)\), we are asked to find \(S\subseteq V\) that maximizes the density w(S) / |S|.

The densest subgraph problem has received significant attention recently because it can be solved exactly in polynomial time and approximately in nearly linear time. Indeed, there exist a flow-based exact algorithm [14] and a linear-programming-based (LP-based) exact algorithm [7]. Charikar [7] demonstrated that the greedy algorithm designed by Asahiro et al. [3], which is called the greedy peeling, obtains a 2-approximate solutionFootnote 1 for any instance. This algorithm runs in \(O(m+n\log n)\) time for weighted graphs and \(O(m+n)\) time for unweighted graphs.

However, the densest subgraph problem has a drawback; it may happen that the obtained subset is too large or too small in comparison with the size desired in the application at hand. To overcome this issue, some variants of the problem have often been employed. The densest k-subgraph problem (DkS) is a straightforward size-restricted variant of the densest subgraph problem [10]. In this problem, given an additional input k being a positive integer, we are asked to find \(S\subseteq V\) of size k that maximizes the density w(S) / |S|. Note that in this problem, the objective function can be replaced by w(S) since |S| is fixed to k. Unfortunately, it is known that this size restriction makes the problem much harder to solve. Indeed, Khot [16] proved that DkS has no PTAS under some reasonable computational complexity assumption. The current best approximation algorithm has an approximation ratio of \(O(n^{1/4+\epsilon })\) for any \(\epsilon >0\) [5].

Furthermore, Andersen and Chellapilla [1] introduced two relaxed versions of DkS. The first problem, the densest at-least-k-subgraph problem (DalkS), asks for \(S\subseteq V\) that maximizes the density w(S) / |S| under the size constraint \(|S|\ge k\). For this problem, Andersen and Chellapilla [1] adopted the greedy peeling, and demonstrated that the algorithm yields a 3-approximate solution for any instance. Later, Khuller and Saha [17] investigated the problem more deeply. They proved that DalkS is NP-hard, and designed a flow-based algorithm and an LP-based algorithm. These algorithms have an approximation ratio of 2, which improves the approximation ratio of 3 provided by Andersen and Chellapilla [1]. The second problem is called the densest at-most-k-subgraph problem (DamkS), which asks for \(S\subseteq V\) that maximizes the density w(S) / |S| under the size constraint \(|S|\le k\). The NP-hardness is immediate since finding a maximum clique can be reduced to it. Khuller and Saha [17] proved that approximating DamkS is as hard as approximating DkS, within a constant factor.

1.1 Our Contribution

In this study, we address the size issue of the densest subgraph problem by generalizing the density of \(S\subseteq V\). Specifically, we introduce the f-density of \(S\subseteq V\), which is defined as w(S) / f(|S|), where \(f: \mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\) is a monotonically non-decreasing function with \(f(0)=0\) and \(f(1)>0\).Footnote 2 Note that \(\mathbb {Z}_{\ge 0}\) and \(\mathbb {R}_{\ge 0}\) are the sets of nonnegative integers and nonnegative real numbers, respectively. In the f-densest subgraph problem (f-DS), we aim to find a nonempty \(S\subseteq V\) that maximizes the f-density w(S) / f(|S|). Without loss of generality, we assume that \(E\ne \emptyset \). Hence, any optimal solution \(S^*\subseteq V\) satisfies \(|S^*|\ge 2\). Although f-DS does not explicitly specify the size of the output subset of vertices, we can handle the size issue mentioned above using a convex size function f or a concave size function f appropriately. Indeed, we can show that any optimal solution to f-DS with convex (resp. concave) function f has a size smaller (resp. larger) than or equal to that of any densest subgraph (i.e., any optimal solution to the densest subgraph problem). For details, see Sects. 2 and 3.

Here we mention the relationship between our problem and DkS. Any optimal solution \(S^*\subseteq V\) to f-DS is a maximum weight subset of size \(|S^*|\), i.e., \(S^*\in \hbox {argmax}\{w(S)\mid S\subseteq V,~|S|=|S^*|\}\), which implies that \(S^*\) is also optimal to DkS with \(k=|S^*|\). Furthermore, the iterative use of a \(\gamma \)-approximation algorithm for DkS leads to a \(\gamma \)-approximation algorithm for f-DS. Using the \(O(n^{1/4+\epsilon })\)-approximation algorithm for DkS [5], we can obtain an \(O(n^{1/4+\epsilon })\)-approximation algorithm for f-DS.

In what follows, we summarize our results for both the cases where f is convex and where f is concave.

The case of convex f We first describe our results for the case where f is convex. A function \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\) is said to be convex if \(f(x)-2f(x+1)+f(x+2)\ge 0\) holds for any \(x\in \mathbb {Z}_{\ge 0}\). We first prove the NP-hardness of f-DS with a certain convex function f by constructing a reduction from DamkS. Thus, for f-DS with convex function f, there is no hope to have a polynomial-time exact algorithm.

Alternatively, we propose a \(\min \left\{ \frac{f(2)/2}{f(|S^*|)/|S^*|^2},~\frac{2f(n)/n}{f(|S^*|)-f(|S^*|-1)}\right\} \)-approximation algorithm, where \(S^*\subseteq V\) is an optimal solution to f-DS with convex function f. Our algorithm consists of the following two procedures, and outputs the better solution found by them. The first one is based on the brute-force search, which obtains an \(\frac{f(2)/2}{f(|S^*|)/|S^*|^2}\)-approximate solution in \(O(m+n)\) time. The second one adopts the greedy peeling, which obtains a \(\frac{2f(n)/n}{f(|S^*|)-f(|S^*|-1)}\)-approximate solution in \(O(m+n\log n)\) time. Thus, the total running time of our algorithm is \(O(m+n\log n)\). Our analysis on the approximation ratio of the second procedure extends the analysis by Charikar [7] for the densest subgraph problem.

At the end of our analysis, we observe the behavior of the approximation ratio of our algorithm for three concrete size functions. We consider size functions between linear and quadratic because, as we will see later, f-DS with any super-quadratic size function is a trivial problem; indeed, it only produces constant-size optimal solutions. The first example is \(f(x)=x^\alpha ~(\alpha \in [1, 2])\). We show that the approximation ratio of our algorithm is \(2\cdot n^{(\alpha -1)(2-\alpha )}\), where the worst-case performance of \(2\cdot n^{1/4}\) is attained at \(\alpha =1.5\). The second example is \(f(x)=\lambda x+(1-\lambda )x^2~(\lambda \in [0,1))\). For this case, the approximation ratio of our algorithm is \((2-\lambda )/(1-\lambda )\), which is a constant for a fixed \(\lambda \). The third example is \(f(x)=x^2/(\lambda x +(1-\lambda ))~(\lambda \in [0,1])\). Note that this size function is derived from density function \(\lambda \frac{w(S)}{|S|}+(1-\lambda )\frac{w(S)}{|S|^2}\). The approximation ratio of our algorithm is \(4/(1+\lambda )\), which is at most 4.

The case of concave f We next describe our results for the case where f is concave. A function \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\) is said to be concave if \(f(x)-2f(x+1)+f(x+2)\le 0\) holds for any \(x\in \mathbb {Z}_{\ge 0}\). Unlike the above convex case, f-DS in this case can be solved exactly in polynomial time.

Indeed, we present an LP-based exact algorithm, which extends Charikar’s exact algorithm for the densest subgraph problem [7] and Khuller and Saha’s 2-approximation algorithm for DalkS [17]. It should be emphasized that our LP-based algorithm obtains not only an optimal solution to f-DS but also some useful subsets of vertices. We illustrate this advantage of our algorithm by using an example in Fig. 1. The graph consists of 8 vertices and 11 unweighted edges (i.e., \(w(e)=1\) for every \(e\in E\)). For this graph, we plot all the points contained in \(\mathcal {P}=\{(|S|,w(S))\mid S\subseteq V\}\). We refer to the extreme points of the upper convex hull of \(\mathcal {P}\) as the dense frontier points. The (largest) densest subgraph is a typical subset of vertices corresponding to a dense frontier point. Our LP-based algorithm obtains a subset of vertices corresponding to every dense frontier point. It should be noted that the algorithm SSM designed by Nagano et al. [20] can also be used to obtain a corresponding subset of vertices for every dense frontier point. The difference between their algorithm and ours is that their algorithm is based on the computation of a minimum norm base of a submodular polyhedron, whereas ours solves linear programming problems.

Fig. 1
figure 1

An example graph and the corresponding points in \(\mathcal {P}=\{(|S|,w(S))\mid S\subseteq V\}\). The diamond-shaped points, i.e., (0, 0), (4, 6), (7, 10), and (8, 11), are dense frontier points

Moreover, in this concave case, we design a combinatorial exact algorithm for unweighted graphs. Our algorithm is based on a standard technique for fractional programming. By using the technique, we can reduce f-DS to a sequence of submodular function minimizations. However, a direct application of a submodular function minimization algorithm leads to a computationally expensive algorithm that runs in \(O(n^3(m+n)\log ^3 n + n^4\,\mathsf{polylog}(n))\) time. To reduce the computation time, we replace a submodular function minimization algorithm with a much faster flow-based algorithm that substantially extends a technique of Goldberg’s flow-based algorithm for the densest subgraph problem [14]. The total running time of our algorithm is \(O(n^3)\). Modifying this algorithm, we also present an \(O\left( \frac{n^3}{\log n}\cdot \log \left( \frac{\log n}{\epsilon }\right) \right) \)-time \((1+\epsilon )\)-approximation algorithm for weighted graphs.

Although our flow-based algorithm is much faster than the reduction-based algorithm, the running time is still long for large-sized graphs. To design an algorithm with much higher scalability, we adopt the greedy peeling. As mentioned above, this algorithm runs in \(O(m+n\log n)\) time for weighted graphs and \(O(m+n)\) time for unweighted graphs. We prove that the algorithm yields a 3-approximate solution for any instance.

1.2 Related Work

Tsourakakis et al. [21] introduced a general optimization problem to find dense subgraphs, which is referred to as the optimal \((g,h,\alpha )\)-edge-surplus problem. In this problem, given an unweighted undirected graph \(G=(V,E)\), we are asked to find \(S\subseteq V\) that maximizes \(\textsf {edge-surplus}_\alpha (S)=g(|E(S)|) -\alpha h(|S|)\), where g and h are strictly monotonically increasing functions, and \(\alpha >0\) is a constant. The intuition behind this optimization problem is the same as that of f-DS. Indeed, the first term g(|E(S)|) prefers \(S\subseteq V\) that has a large number of edges, whereas the second term \(-\alpha h(|S|)\) penalizes \(S\subseteq V\) with a large size. Tsourakakis et al. [21] were motivated by finding near-cliques (i.e., relatively small dense subgraphs), and they derived the function \(\textsf {OQC}_\alpha (S) =|E(S)|-\alpha \left( {\begin{array}{c}|S|\\ 2\end{array}}\right) \), which is called the OQC function, by setting \(g(x)=x\) and \(h(x)=x(x-1)/2\). For OQC function maximization, they adopted the greedy peeling and a simple local search heuristic.

Recently, Yanagisawa and Hara [22] introduced density function \(|E(S)|/|S|^\alpha \) for \(\alpha \in (1,2]\), which they called the discounted average degree. For discounted average degree maximization, they designed an integer-programming-based exact algorithm, which is applicable only to graphs with a maximum of a few thousand edges. They also designed a local search heuristic, which is applicable to web-scale graphs but has no provable approximation guarantee. As mentioned above, our algorithm for f-DS with convex function f runs in \(O(m+n\log n)\) time, and has an approximation ratio of \(2\cdot n^{(\alpha -1)(2-\alpha )}\) for \(f(x)=x^\alpha \) (\(\alpha \in [1,2]\)).

2 Convex Case

In this section, we investigate f-DS with convex function f. A function \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\) is said to be convex if \(f(x)-2f(x+1)+f(x+2)\ge 0\) holds for any \(x\in \mathbb {Z}_{\ge 0}\). We remark that f(x) / x is monotonically non-decreasing in x since we assume that \(f(0)=0\). It should be emphasized that any optimal solution to f-DS with convex function f has a size smaller than or equal to that of any densest subgraph. To see this, let \(S^*\subseteq V\) be any optimal solution to f-DS and \(S^*_\text {DS}\subseteq V\) be any densest subgraph. Then we have

$$\begin{aligned} \frac{f(|S^*|)}{|S^*|} =\frac{w(S^*)/|S^*|}{w(S^*)/f(|S^*|)} \le \frac{w(S^*_\text {DS})/|S^*_\text {DS}|}{w(S^*_\text {DS})/f(|S^*_\text {DS}|)} =\frac{f(|S^*_\text {DS}|)}{|S^*_\text {DS}|}. \end{aligned}$$
(1)

This implies that \(|S^*|\le |S^*_\text {DS}|\) holds because f(x) / x is monotonically non-decreasing.

2.1 Hardness

We first prove that f-DS with convex function f contains DamkS as a special case.

Theorem 1

For any integer \(k\in [2,n]\), a subset of vertices \(S\subseteq V\) is optimal to DamkS if and only if S is optimal to f-DS with (convex) function \(f(x)=\max \left\{ x,~\frac{w(V)}{w(e)/2}(x-k)+k\right\} \), where e is an arbitrary edge.

Proof

Since the maximum of linear functions is convex, the function f is convex. We remark that

$$\begin{aligned} f(x)= {\left\{ \begin{array}{ll} x &{}\quad \text {if } x\le k,\\ \frac{w(V)}{w(e)/2}(x-k)+k &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

For any \(S\subseteq V\) with \(|S|\le k\), we have \(w(S)/f(|S|)=w(S)/|S|\). On the other hand, for any \(S\subseteq V\) with \(|S|>k\), we have

$$\begin{aligned} \frac{w(S)}{f(|S|)} = \frac{w(S)}{\frac{w(V)}{w(e)/2}(|S|-k)+k} <\frac{w(S)}{\frac{w(V)}{w(e)/2}} \le \frac{w(e)}{2}, \end{aligned}$$

which implies that S is not optimal to f-DS. Thus, we have the theorem. \(\square \)

2.2 Our Algorithm

In this subsection, we provide an algorithm for f-DS with convex function f. Our algorithm consists of the following two procedures, and outputs the better solution found by them. Let \(S^*\subseteq V\) be an optimal solution to the problem. The first procedure is based on the brute-force search, which obtains an \(\frac{f(2)/2}{f(|S^*|)/|S^*|^2}\)-approximate solution in \(O(m+n)\) time. The second one adopts the greedy peeling [3], which obtains a \(\frac{2f(n)/n}{f(|S^*|)-f(|S^*|-1)}\)-approximate solution in \(O(m+n\log n)\) time. Combining these results, both of which will be proved later, we have the following theorem.

Theorem 2

Let \(S^*\subseteq V\) be an optimal solution to f-DS with convex function f. For the problem, our algorithm runs in \(O(m+n\log n)\) time, and has an approximation ratio of

$$\begin{aligned} \min \left\{ \frac{f(2)/2}{f(|S^*|)/|S^*|^2},~\frac{2f(n)/n}{f(|S^*|)-f(|S^*|-1)}\right\} . \end{aligned}$$

2.2.1 Brute-Force Search

As will be shown below, to obtain an \(\frac{f(2)/2}{f(|S^*|)/|S^*|^2}\)-approximate solution, it suffices to find the heaviest edge (i.e., \(\hbox {argmax}\{w(e)\mid e\in E\}\)), which can be done in \(O(m+n)\) time. Here we present a more general algorithm, which is useful in some case. Our algorithm examines all the subsets of vertices of size at most k, and then returns an optimal subset among them, where k is a constant that satisfies \(k\ge 2\). For reference, we describe the procedure in Algorithm 1. This algorithm can be implemented to run in \(O(n^k(m+n))\) time because the number of subsets with at most k vertices is \(\sum _{i=0}^k\left( {\begin{array}{c}n\\ i\end{array}}\right) =O(n^k)\) and the value of w(S) / f(|S|) for each \(S\subseteq V\) can be computed in \(O(m+n)\) time.

figure a

We analyze the approximation ratio of the algorithm. Let \(S_i^*\subseteq V\) denote a maximum weight subset of size \(i\ge 2\), i.e., \(S_i^*\in \hbox {argmax}\{w(S)\mid S\subseteq V,~|S|=i\}\). We refer to \(w(S_i^*)/\left( {\begin{array}{c}i\\ 2\end{array}}\right) \) as the edge density of i vertices. For \(S\subseteq V\) and \(v\in S\), let \(d_S(v)\) denote the weighted degree of v in the induced subgraph G[S], i.e., \(d_S(v)=\sum _{u\in V:\,\{u,v\}\in E(S)}w(\{u,v\})\). The following lemma gives a fundamental property of the edge density.

Lemma 1

The edge density is monotonically non-increasing in the number of vertices, i.e., \(w(S_i^*)/\left( {\begin{array}{c}i\\ 2\end{array}}\right) \ge w(S_j^*)/\left( {\begin{array}{c}j\\ 2\end{array}}\right) \) holds for any \(2\le i\le j\le n\).

Proof

It suffices to show that \(w(S^*_i)/\left( {\begin{array}{c}i\\ 2\end{array}}\right) \ge w(S^*_{i+1})/\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) \) holds for any positive integer \(i\in [2,n-1]\). Take a vertex \(u\in \hbox {argmin}\{d_{S^*_{i+1}}(v)\mid v\in S^*_{i+1}\}\). Then we obtain \(d_{S^*_{i+1}}(u)\le \frac{1}{i+1}\sum _{v\in S^*_{i+1}}d_{S^*_{i+1}}(v)= \frac{2}{i+1}\cdot w(S^*_{i+1})\). Hence, we have

$$\begin{aligned} \frac{w(S^*_i)}{\left( {\begin{array}{c}i\\ 2\end{array}}\right) }&\ge \frac{w(S^*_{i+1}{\setminus } \{u\})}{\left( {\begin{array}{c}i\\ 2\end{array}}\right) }\\&=\frac{w(S^*_{i+1})-d_{S^*_{i+1}}(u)}{\left( {\begin{array}{c}i\\ 2\end{array}}\right) } \ge \frac{\left( 1-\frac{2}{i+1}\right) \cdot w\left( S^*_{i+1}\right) }{\left( {\begin{array}{c}i\\ 2\end{array}}\right) } =\frac{w(S^*_{i+1})}{\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) }, \end{aligned}$$

as desired. \(\square \)

Using the above lemma, we can provide the approximation ratio.

Lemma 2

Let \(S^*\subseteq V\) be an optimal solution to f-DS with convex function f. If \(|S^*|\le k\), then Algorithm 1 obtains an optimal solution. If \(|S^*|\ge k\), then it holds that

$$\begin{aligned} \frac{w(S^*)}{f(|S^*|)}\le \frac{2\cdot f(k)/k^2}{f(|S^*|)/|S^*|^2}\cdot \frac{w(S^*_k)}{f(k)}. \end{aligned}$$

Proof

If \(|S^*|\le k\), then Algorithm 1 obtains an optimal solution because \(S^*\in \{S^*_2,\dots ,S^*_k\}\). If \(|S^*|\ge k\), then we have

$$\begin{aligned} \frac{w(S^*)}{f(|S^*|)}&\le \frac{w(S^*)}{f(|S^*|)}\cdot \frac{w(S_k^*)/\left( {\begin{array}{c}k\\ 2\end{array}}\right) }{w(S^*)/\left( {\begin{array}{c}|S^*|\\ 2\end{array}}\right) } = \frac{f(k)/\left( {\begin{array}{c}k\\ 2\end{array}}\right) }{f(|S^*|)/\left( {\begin{array}{c}|S^*|\\ 2\end{array}}\right) }\cdot \frac{w(S^*_k)}{f(k)}\\&= \frac{1-1/|S^*|}{1-1/k}\cdot \frac{f(k)/k^2}{f(|S^*|)/|S^*|^2}\cdot \frac{w(S^*_k)}{f(k)} \le \frac{2\cdot f(k)/k^2}{f(|S^*|)/|S^*|^2}\cdot \frac{w(S^*_k)}{f(k)}, \end{aligned}$$

where the first inequality follows from Lemma 1, and the last inequality follows from \(k\ge 2\). \(\square \)

From this lemma, we see that Algorithm 1 with \(k=2\), which returns a heaviest edge in \(O(m+n)\) time, has an approximation ratio of \(\frac{f(2)/2}{f(|S^*|)/|S^*|^2}\).

2.2.2 Greedy Peeling

Here we adopt the greedy peeling. Our algorithm iteratively removes the vertex with the smallest weighted degree in the currently remaining graph, and then returns \(S\subseteq V\) with maximum w(S) / f(|S|) over the iterations. For reference, we describe the procedure in Algorithm 2. This algorithm can be implemented to run in \(O(m+n\log n)\) time for weighted graphs and \(O(m+n)\) time for unweighted graphs.

figure b

The following lemma provides the approximation ratio.

Lemma 3

Let \(S^*\subseteq V\) be an optimal solution to f-DS with convex function f. Algorithm 2 returns \(S\subseteq V\) that satisfies

$$\begin{aligned} \frac{w(S^*)}{f(|S^*|)}\le \frac{2f(n)/n}{f(|S^*|)-f(|S^*|-1)}\cdot \frac{w(S)}{f(|S|)}. \end{aligned}$$

Proof

Choose an arbitrary vertex \(v\in S^*\). By the optimality of \(S^*\), we have

$$\begin{aligned} \frac{w(S^*)}{f(|S^*|)}\ge \frac{w(S^*{\setminus }\{v\})}{f(|S^*|-1)}. \end{aligned}$$

By using the fact that \(w(S^*{\setminus } \{v\}) = w(S^*)-d_{S^*}(v)\), the above inequality can be transformed to

$$\begin{aligned} d_{S^*}(v)\ge (f(|S^*|)-f(|S^*|-1))\cdot \frac{w(S^*)}{f(|S^*|)}. \end{aligned}$$
(2)

Let l be the smallest index that satisfies \(S_l\supseteq S^*\), where \(S_l\) is the subset of vertices of size l appeared in Algorithm 2. Note that \(v_l\ (\in \hbox {argmin}_{v\in S_l} d_{S_l}(v))\) is contained in \(S^*\). Then we have

$$\begin{aligned} \frac{w(S_l)}{f(l)}&= \frac{\sum _{u\in S_l}d_{S_l}(u)}{2f(l)} \ge \frac{l\cdot d_{S_l}(v_l)}{2f(l)} \ge \frac{d_{S^*}(v_l)}{2f(l)/l}\\&\ge \frac{f(|S^*|)-f(|S^*|-1)}{2f(l)/l}\cdot \frac{w(S^*)}{f(|S^*|)} \ge \frac{f(|S^*|)-f(|S^*|-1)}{2f(n)/n}\cdot \frac{w(S^*)}{f(|S^*|)}, \end{aligned}$$

where the first inequality follows from the greedy choice of \(v_l\), the second inequality follows from \(S_l\supseteq S^*\), the third inequality follows from inequality (2), and the last inequality follows from the monotonicity of f(x) / x. Since Algorithm 2 considers \(S_l\) as a candidate subset of the output, we have the lemma. \(\square \)

2.3 Examples

Here we observe the behavior of the approximation ratio of our algorithm for three concrete convex size functions. We consider size functions between linear and quadratic because f-DS with any super-quadratic size function is a trivial problem. For any super-quadratic size function f, there exists some positive integer \(k^*\) such that \(f(k)/k^2>f(2)/2\) for any \(k>k^*\). By Lemma 2, we have \(\frac{f(2)/2}{f(|S^*|)/|S^*|^2}\ge 1\) (i.e., \(f(|S^*|)/|S^*|^2\le f(2)/2\)) and hence \(|S^*|\le k^*\), which means that any instance of f-DS with the function f has a constant-size optimal solution.

(i) \(\varvec{f(x)=x^{\alpha }\ (\alpha \in [1,2])}\) The following corollary provides an approximation ratio of our algorithm.

Corollary 1

For f-DS with \(f(x)=x^{\alpha }~(\alpha \in [1,2])\), our algorithm has an approximation ratio of \(2\cdot n^{(\alpha -1)(2-\alpha )}\).

Proof

Let \(s=|S^*|\). By Theorem 2, the approximation ratio is

$$\begin{aligned} \min \left\{ \frac{f(2)/2}{f(s)/s^2},~\frac{2f(n)/n}{f(s)-f(s-1)}\right\}&=\min \left\{ 2^{\alpha -1}\cdot s^{2-\alpha },~\frac{2n^{\alpha -1}}{s^{\alpha }-(s-1)^{\alpha }}\right\} \\&\le \min \left\{ 2\cdot s^{2-\alpha },~\frac{2n^{\alpha -1}}{s^{\alpha -1}}\right\} \\&\le 2\cdot n^{(\alpha -1)(2-\alpha )}. \end{aligned}$$

The first inequality follows from the fact that \(s^\alpha -(s-1)^\alpha = s^\alpha -(s-1)^{\alpha -1}(s-1)\ge s^\alpha -s^{\alpha -1}(s-1)=s^{\alpha -1}\). The last inequality follows from the fact that the first term and the second term of the minimum function are, respectively, monotonically non-decreasing and non-increasing in s, and they have the same value at \(s=n^{\alpha -1}\). \(\square \)

Note that an upper bound on \(2\cdot n^{(\alpha -1)(2-\alpha )}\) is \(2\cdot n^{1/4}\), which is attained at \(\alpha =1.5\).

(ii) \(\varvec{f(x)=\lambda x+(1-\lambda )x^2\ (\lambda \in [0,1))}\) The following corollary provides an approximation ratio of Algorithm  1, which is a constant for a fixed \(\lambda \).

Corollary 2

For f-DS with \(f(x)=\lambda x+(1-\lambda )x^2~(\lambda \in [0,1))\), Algorithm 1 with \(k=2\) has an approximation ratio of \((2-\lambda )/(1-\lambda )\). Furthermore, for any \(\epsilon >0\), Algorithm 1 with \(k\ge \frac{2}{\epsilon }\cdot \frac{\lambda }{1-\lambda }\) has an approximation ratio of \(2+\epsilon \).

Proof

Let \(s=|S^*|\). By Lemma 2, the approximation ratio is

$$\begin{aligned} \frac{2\cdot f(k)/k^2}{f(s)/s^2} =2\cdot \frac{\lambda /k+(1-\lambda )}{\lambda /s+(1-\lambda )} \le 2\cdot \frac{\lambda /k+(1-\lambda )}{1-\lambda } = 2+\frac{2\lambda }{(1-\lambda )k}. \end{aligned}$$

Thus, by choosing \(k=2\), the approximation ratio is at most \((2-\lambda )/(1-\lambda )\). For any \(\epsilon >0\), by choosing \(k\ge \frac{2}{\epsilon }\cdot \frac{\lambda }{1-\lambda }\), the approximation ratio is at most \(2+\epsilon \). \(\square \)

(iii) \(\varvec{f(x)=x^2/(\lambda x+(1-\lambda ))\ (\lambda \in [0,1])}\) This size function is derived from density function \(\lambda \frac{w(S)}{|S|} + (1-\lambda )\frac{w(S)}{|S|^2}\). The following corollary provides an approximation ratio of our algorithm, which is at most 4.

Corollary 3

For f-DS with \(f(x)=x^2/(\lambda x+(1-\lambda ))~(\lambda \in [0,1))\), our algorithm has an approximation ratio of \(4/(1+\lambda )\).

Proof

Let \(s=|S^*|\). By Theorem 2, the approximation ratio is

$$\begin{aligned}&\min \left\{ \frac{f(2)/2}{f(s)/s^2},~\frac{2f(n)/n}{f(s)-f(s-1)}\right\} \\&\quad =\min \left\{ \frac{2(\lambda s+(1-\lambda ))}{1+\lambda },~\frac{\frac{2n}{\lambda n+(1-\lambda )}}{\frac{s^2}{\lambda s+(1-\lambda )}-\frac{(s-1)^2}{\lambda (s-1)+(1-\lambda )}}\right\} \\&\quad \le \min \left\{ \frac{2(\lambda s+(1-\lambda ))}{1+\lambda },~\frac{\frac{2n}{\lambda n+(1-\lambda )}}{\frac{s}{\lambda s+(1-\lambda )}}\right\} \\&\quad \le \frac{2}{1+\lambda }\left( \lambda \cdot \frac{(1+\lambda )n}{\lambda n+(1-\lambda )}+(1-\lambda )\right) \\&\quad \le \frac{2}{1+\lambda }\left( \lambda \cdot \frac{1+\lambda }{\lambda }+(1-\lambda )\right) =\frac{4}{1+\lambda }, \end{aligned}$$

where the second inequality follows from the fact that the first term and the second term of the minimum function are, respectively, monotonically non-decreasing and non-increasing in s, and they have the same value at \(s=\frac{(1+\lambda )n}{\lambda n+(1-\lambda )}\). \(\square \)

3 Concave Case

In this section, we investigate f-DS with concave function f. A function \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\) is said to be concave if \(f(x)-2f(x+1)+f(x+2)\le 0\) holds for any \(x\in \mathbb {Z}_{\ge 0}\). We remark that f(x) / x is monotonically non-increasing in x since we assume that \(f(0)=0\). It should be emphasized that any optimal solution to f-DS with concave function f has a size larger than or equal to that of any densest subgraph. This follows from inequality (1) and the monotonicity of f(x) / x.

3.1 Dense Frontier Points

Here we define the dense frontier points and prove some basic properties. We denote by \(\mathcal {P}\) the set \(\{(|S|,w(S))\mid S\subseteq V\}\). A point in \(\mathcal {P}\) is called a dense frontier point if it is a unique maximizer of \(y-\lambda x\) over \((x,y)\in \mathcal {P}\) for some \(\lambda >0\). In other words, the extreme points of the upper convex hull of \(\mathcal {P}\) are dense frontier points. The (largest) densest subgraph is a typical subset of vertices corresponding to a dense frontier point. We prove that (i) for any dense frontier point except for (0, 0), there exists some concave function f such that any optimal solution to f-DS with the function f corresponds to the dense frontier point, and conversely, (ii) for any strictly concave function f (i.e., f that satisfies \(f(x)-2f(x+1)+f(x+2)<0\) for any \(x\in \mathbb {Z}_{\ge 0}\)), any optimal solution to f-DS with the function f corresponds to a dense frontier point.

Fig. 2
figure 2

A relationship between a dense frontier point and concave functions

We first prove (i). Note that each dense frontier point can be written as \((i,w(S_i^*))\) for some \(i\in \{0,1,\dots ,n\}\), where \(S_i^*\subseteq V\) is a maximum weight subset of size i. Let \((k,w(S_k^*))\) be a dense frontier point with \(k\ge 1\) and assume that it is a unique maximizer of \(y-\hat{\lambda } x\) over \((x,y)\in \mathcal {P}\) for some \(\hat{\lambda }>0\). Consider the concave function f such that \(f(x)=\hat{\lambda }(x-k)+w(S_k^*)\) for \(x>0\) and \(f(0)=0\) (see Fig. 2). The concavity of f follows from \(w(S_k^*)-\hat{\lambda }k\ge w(S_0^*)-\hat{\lambda }\cdot 0=0=f(0)\). Then, any optimal solution \(S^*\subseteq V\) to f-DS with the function f corresponds to the dense frontier point (i.e., \((|S^*|,w(S^*))=(k,w(S_k^*))\) holds) because \(w(S)/f(|S|)\) is greater than or equal to \(w(S_k^*)/f(k)\ (=1)\) if and only if \(w(S)-\hat{\lambda }|S|\ge w(S_k^*)-\hat{\lambda }k\) holds.

We next prove (ii). Let f be any strictly concave function. Let \(S_k^*\subseteq V\) be any optimal solution to f-DS with the function f, and take \(\hat{\lambda }\) that satisfies \((f(k)-f(k-1))\cdot \frac{w(S_k^*)}{f(k)}>\hat{\lambda }> (f(k+1)-f(k))\cdot \frac{w(S_k^*)}{f(k)}\) (see Fig. 2). Note that the strict concavity of f guarantees the existence of such \(\hat{\lambda }\). Since f is strictly concave, we have that for any \(S\subseteq V\),

$$\begin{aligned} \hat{\lambda }(|S|-k)+w(S_k^*) \ge \frac{w(S_k^*)}{f(k)}\cdot f(|S|)\ge \frac{w(S)}{f(|S|)}\cdot f(|S|)=w(S), \end{aligned}$$

and the inequalities hold as equalities only when \((|S|,w(S))=(k,w(S_k^*))\). Thus, \((k,w(S_k^*))\) is a unique maximizer of \(y-\hat{\lambda } x\) over \((x,y)\in \mathcal {P}\), and hence is a dense frontier point.

3.2 LP-Based Algorithm

We provide an LP-based polynomial-time exact algorithm. We introduce a variable \(x_e\) for each \(e\in E\) and a variable \(y_v\) for each \(v\in V\). For \(k=1,\dots ,n\), we construct the following linear programming problem:

$$\begin{aligned}&\mathrm {LP}_k:&\ \&\text {maximize}&\&\sum _{e\in E}w(e)\cdot x_e&\quad&\\&&\text {subject to}&\sum _{v\in V}y_v=k,&\\&&&x_e\le y_u,~x_e\le y_v&\text {for all }\ e=\{u,v\}\in E,\\&&&x_e,\,y_v\in [0,1]&\text {for all }\ e\in E,\ v\in V. \end{aligned}$$

For an optimal solution \((\varvec{x}^k, \varvec{y}^k)\) to \(\mathrm {LP}_k\) and a real parameter r, we define a sequence of subsets \(S^k(r)=\{v\in V\mid y^k_v\ge r\}\). For \(k=1,\dots ,n\), our algorithm first solves \(\mathrm {LP}_k\) to obtain an optimal solution \((\varvec{x}^k,\varvec{y}^k)\), and then computes \(r^*_k\) that maximizes \(w(S^k(r))/f(|S^k(r)|)\). Note here that to find such \(r^*_k\), it suffices to check all the distinct sets \(S^k(r)\) by simply setting \(r=y^k_v\) for every \(v\in V\). The algorithm returns \(S\in \{S^1(r^*_1),\dots ,S^n(r^*_n)\}\) that maximizes w(S) / f(|S|). For reference, we describe the procedure in Algorithm 3. Clearly, the algorithm runs in polynomial time.

figure c

In what follows, we demonstrate that Algorithm 3 obtains an optimal solution to f-DS with concave function f. The following lemma provides a lower bound on the optimal value of \(\mathrm {LP}_k\).

Lemma 4

For any \(S\subseteq V\), the optimal value of \(\mathrm {LP}_{|S|}\) is at least w(S).

Proof

For \(S\subseteq V\), we construct a solution \((\varvec{x},\varvec{y})\) of \(\mathrm {LP}_{|S|}\) as follows:

$$\begin{aligned} x_e={\left\{ \begin{array}{ll} 1&{}\text {if } e\in E(S),\\ 0&{}\text {otherwise}, \end{array}\right. } \quad \text {and}\quad y_v={\left\{ \begin{array}{ll} 1&{}\text {if } v\in S,\\ 0&{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Then we can easily check that \((\varvec{x},\varvec{y})\) is feasible for \(\mathrm {LP}_{|S|}\) and its objective value is w(S). Thus, we have the lemma. \(\square \)

We prove the following key lemma.

Lemma 5

Let \(S^*\subseteq V\) be an optimal solution to f-DS with concave function f, and let \(k^*=|S^*|\). Furthermore, let \((\varvec{x}^*,\varvec{y}^*)\) be an optimal solution to \(\mathrm {LP}_{k^*}\). Then, there exists a real number r such that \(S^{k^*}(r)\) is optimal to f-DS with concave function f.

Proof

For each \(e=\{u,v\}\in E\), we have \(x^*_e=\min \{y^*_u,y^*_v\}\) from the optimality of \((\varvec{x}^*,\varvec{y}^*)\). Without loss of generality, we relabel the indices of \((\varvec{x}^*, \varvec{y}^*)\) so that \(y^*_1\ge \dots \ge y^*_n\). Then we have

$$\begin{aligned} \int _0^{y_1^*} w(S^{k^*}(r)) dr&= \int _0^{y_1^*} \left( \sum _{e=\{u,v\}\in E} w(e)\cdot [y^*_u\ge r~\text {and}~y^*_v\ge r]\right) dr \nonumber \\&= \sum _{e=\{u,v\}\in E} \int _0^{y_1^*} \left( w(e)\cdot [y^*_u\ge r~\text {and}~y^*_v\ge r]\right) dr \nonumber \\&= \sum _{e=\{u,v\}\in E} w(e)\cdot \min \{y^*_u,y^*_v\}= \sum _{e\in E} w(e)\cdot x_e^*\ge w(S^*), \end{aligned}$$
(3)

where \([y^*_u\ge r~\text {and}~y^*_v\ge r]\) denotes the function of r that takes 1 if the condition in the square bracket is satisfied and 0 otherwise, and the inequality follows from Lemma 4. Moreover, we have

$$\begin{aligned} \int _0^{y_1^*} f(|S^{k^*}(r)|) dr&=\sum _{h=1}^n f(h)\cdot (y_h^*-y_{h+1}^*)=\sum _{h=1}^n (f(h)-f(h-1))\cdot y_h^*\nonumber \\&\le \sum _{h=1}^{k^*} (f(h)-f(h-1))=f(k^*)-f(0)=f(k^*), \end{aligned}$$
(4)

where \(y_{n+1}^*\) is defined to be 0 for convenience, and the inequality holds by the concavity of f (i.e., \(f(h+2)-f(h+1)\le f(h+1)-f(h)\)), \(\sum _{h=1}^n y^*_h=k^*\), and \(y^*_h\le 1\).

Let \(r^*\) be a real number \(r\in [0,y_1^*]\) that maximizes \(w(S^{k^*}(r))/f(|S^{k^*}(r)|)\). Using inequalities (3) and (4), we have

$$\begin{aligned} \frac{w(S^*)}{f(k^*)}&\le \frac{\int _0^{y_1^*} w(S^{k^*}(r)) dr}{\int _0^{y_1^*} f(|S^{k^*}(r)|) dr} = \frac{\int _0^{y_1^*} \left( \frac{w(S^{k^*}(r))}{f(|S^{k^*}(r)|)}\cdot f(|S^{k^*}(r)|)\right) dr}{\int _0^{y_1^*} f(|S^{k^*}(r)|) dr}\\&\le \frac{\int _0^{y_1^*} \left( \frac{w(S^{k^*}(r^*))}{f(|S^{k^*}(r^*)|)}\cdot f(|S^{k^*}(r)|)\right) dr}{\int _0^{y_1^*} f(|S^{k^*}(r)|) dr} = \frac{w(S^{k^*}(r^*))}{f(|S^{k^*}(r^*)|)}. \end{aligned}$$

This completes the proof. \(\square \)

Algorithm 3 considers \(S^{k^*}(r^*)\) as a candidate subset of the output. Therefore, we have the desired result.

Theorem 3

Algorithm 3 is a polynomial-time exact algorithm for f-DS with concave function f.

By Lemma 5, for any concave function f, an optimal solution to f-DS with the function f is contained in \(\{S^k(r)\mid k=1,\dots ,n,~r\in [0,1]\}\) whose cardinality is at most \(n^2\). As shown in Sect. 3.1, for any dense frontier point, there exists some concave function f such that any optimal solution to f-DS with the function f corresponds to the dense frontier point. Thus, we have the following result.

Theorem 4

We can find a subset of vertices corresponding to every dense frontier point in polynomial time.

3.3 Flow-Based Algorithm

We provide a combinatorial exact algorithm for unweighted graphs (i.e., \(w(e)=1\) for every \(e\in E\)). We first show that using a standard technique for fractional programming, we can reduce f-DS with concave function f to a sequence of submodular function minimizations. The critical fact is that \(\max _{S\subseteq V} w(S)/f(|S|)\) is at least \(\beta \) if and only if \(\min _{S\subseteq V} (\beta \cdot f(|S|)-w(S))\) is at most 0. Note that for \(\beta \ge 0\), the function \(\beta \cdot f(|S|)-w(S)\) is submodular because \(\beta \cdot f(|S|)\) and \(-w(S)\) are submodular [12]. Thus, we can calculate \(\min _{S\subseteq V}(\beta \cdot f(|S|)-w(S))\) in \(O(n^3(m+n)\log ^2 n + n^4\,\mathsf{polylog}(n))\) time using the submodular function minimization algorithm by Lee et al. [18], which implies that we can determine \(\max _{S\subseteq V}w(S)/f(|S|)\ge \beta \) or not in the same time complexity. Hence, we can obtain the value of \(\max _{S\subseteq V}w(S)/f(|S|)\) by binary search. Note that the objective function of f-DS on unweighted graphs have at most O(mn) distinct values since w(S) is a nonnegative integer at most m. Thus, the procedure yields an optimal solution in \(O(\log (mn))=O(\log n)\) iterations. The total running time is \(O(n^3(m+n)\log ^3 n + n^4\,\mathsf{polylog}(n))\).

To reduce the computation time, we replace the submodular function minimization algorithm [18] with a much faster flow-based algorithm that substantially extends a technique of Goldberg’s flow-based algorithm for the densest subgraph problem [14]. The key technique is to represent the value of \(\min _{S\subseteq V}(\beta \cdot f(|S|)-w(S))\) using the cost of minimum cut of a certain directed network constructed from G and \(\beta \ge 0\).

For a given unweighted undirected graph \(G=(V,E,w)\) (i.e., \(w(e)=1\) for every \(e\in E\)) and a real number \(\beta \ge 0\), we construct a directed network \((U,A,w_{\beta })\) as follows. Note that for later convenience, we discuss the procedure on weighted graphs. The vertex set U is defined by \(U=V\cup P\cup \{s,t\}\), where \(P=\{p_1,\dots ,p_n\}\). The edge set A is given by \(A=A_s\cup A_t\cup A_1\cup A_2\), where

$$\begin{aligned} A_s&=\{(s,v)\mid v\in V\},\ A_t=\{(p,t)\mid p\in P\},\\ A_1&=\{(u,v), (v,u)\mid \{u,v\}\in E\},\ \text {and } A_2=\{(v,p)\mid v\in V,~p\in P\}. \end{aligned}$$

The edge weight \(w_{\beta }:A\rightarrow \mathbb {R}_{\ge 0}\) is defined by

$$\begin{aligned} w_{\beta }(e)= {\left\{ \begin{array}{ll} d(v)/2 &{}\quad (e=(s,v)\in A_s),\\ \beta \cdot k\cdot a_k &{}\quad (e=(p_k,t)\in A_t),\\ 1/2\ (= w(\{u,v\})/2)&{}\quad (e=(u,v)\in A_1),\\ \beta \cdot a_k &{}\quad (e=(v,p_k)\in A_2),\\ \end{array}\right. } \end{aligned}$$

where d(v) is the (weighted) degree of vertex v, and

$$\begin{aligned} a_k={\left\{ \begin{array}{ll} 2f(k)-f(k+1)-f(k-1) &{}\quad (k=1,\dots ,n-1),\\ f(n)-f(n-1) &{}\quad (k=n). \end{array}\right. } \end{aligned}$$

Note that \(a_k\ge 0\) holds since f is a monotonically non-decreasing concave function. For reference, Fig. 3 depicts the network \((U,A,w_\beta )\).

Fig. 3
figure 3

The network \((U,A,w_\beta )\) constructed from G and \(\beta \ge 0\)

The following lemma reveals the relationship between a minimum st cut in \((U,A,w_\beta )\) and the value of \(\min _{S\subseteq V}(\beta \cdot f(|S|) - w(S))\). Note that an st cut in \((U,A,w_\beta )\) is a partition (XY) of U (i.e., \(X\cup Y=U\) and \(X\cap Y=\emptyset \)) such that \(s\in X\) and \(t\in Y\), and the cost of (XY) is defined to be \(\sum _{(u, v)\in A:\,u\in X, v\in Y}w_\beta (u,v)\).

Lemma 6

Let (XY) be any minimum st cut in the network \((U,A,w_\beta )\), and let \(S=X\cap V\). Then, the cost of (XY) is equal to \(w(V)+\beta \cdot f(|S|)-w(S)\).

Proof

We first show that for any positive integer \(s\ (\le n)\), it holds that

$$\begin{aligned} \sum _{i=1}^n \min \{i,s\}\cdot a_i=f(s). \end{aligned}$$
(5)

By the definition of \(a_k\), we get

$$\begin{aligned} \sum _{j=i}^n a_j&=(f(n)-f(n-1))+\sum _{j=i}^{n-1} (2f(j)-f(j+1)-f(j-1))\\&=(f(n)-f(n-1))-\sum _{j=i}^{n-1} ((f(j+1)-f(j))-(f(j)-f(j-1)))\\&=f(i)-f(i-1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \sum _{i=1}^n \min \{i,s\}\cdot a_i =\sum _{i=1}^s\sum _{j=i}^n a_j =\sum _{i=1}^s(f(i)-f(i-1)) =f(s)-f(0)=f(s). \end{aligned}$$

We are now ready to prove the lemma. Note that \(p_k\in X\) if \(|S|>k\) and \(p_k\in Y\) if \(|S|<k\). Therefore, the cost of the minimum cut (XY) is

$$\begin{aligned}&\sum _{v\in V{\setminus } S}\frac{d(v)}{2} +\sum _{\{u,v\}\in E:\,u\in S,\,v\in V{\setminus } S}\frac{w(\{u,v\})}{2} +\beta \cdot \sum _{i=1}^n \min \{i,|S|\}\cdot a_i\\&\quad =\sum _{v\in V{\setminus } S}\frac{d(v)}{2} +\sum _{\{u,v\}\in E:\,u\in S,\,v\in V{\setminus } S}\frac{w(\{u,v\})}{2} +\beta \cdot f(|S|)\\&\quad =\sum _{\{u,v\}\in E} w(\{u,v\}) -\sum _{\{u,v\}\in E(S)}w(\{u,v\}) +\beta \cdot f(|S|)\\&\quad =w(V)+\beta \cdot f(|S|)-w(S), \end{aligned}$$

where the first equality follows from equality (5). \(\square \)

From this lemma, we see that the cost of a minimum st cut is \(w(V)+\min _{S\subseteq V}(\beta \cdot f(|S|)-w(S))\). Therefore, for a given value \(\beta \ge 0\), we can determine whether there exists \(S\subseteq V\) that satisfies \(w(S)/f(|S|)\ge \beta \) by checking the cost of a minimum st cut is at most w(V) or not. Our algorithm applies binary search for \(\beta \) within the possible objective values of f-DS (i.e., \(\{p/f(q)\mid p=0,1,\dots ,m,~q=2,3,\dots ,n\}\)). For reference, we describe the procedure in Algorithm 4. The minimum st cut problem can be solved in \(O(N^3/\log N)\) time for a network with N vertices [8]. Thus, the running time of our algorithm is \(O(\frac{n^3}{\log n}\cdot \log (mn))=O(n^3)\) since \(|U|=2n+2\). We summarize the result in the following theorem.

figure d

Theorem 5

Algorithm 4 is an \(O(n^3)\)-time exact algorithm for f-DS with concave function f on unweighted graphs.

For f-DS with concave function f on weighted graphs, the binary search used in Algorithm 4 is not applicable because there may be exponentially many possible objective values in the weighted setting and we cannot compute the values explicitly in polynomial time. To overcome this issue, we can use Meggido’s parametric search technique [19]. Assume that we have an algorithm that computes a minimum st cut in \((U,A,w_\beta )\) within O(p(|A|, |U|)) comparisons and O(q(|A|, |U|)) additions/subtractions. Then, using the parametric search technique, we can solve f-DS on weighted graphs exactly in \(O(p(|A|,|U|)(p(|A|,|U|)+q(|A|,|U|)))\) time. However, this algorithm is computationally expensive.

Alternatively, we present an algorithm that employs another binary search strategy (Algorithm 5). We have the following theorem.

figure e

Theorem 6

Algorithm 5 is an \(O\left( \frac{n^3}{\log n}\cdot \log \left( \frac{\log n}{\epsilon }\right) \right) \)-time \((1+\epsilon )\)-approximation algorithm for f-DS with concave function f.

Proof

Let \(i^*\) be the number of iterations executed by Algorithm 5, and let \(\hat{S}=X^{(i^*)}\cap V\). Then we have \(\max _{S\subseteq V} w(S)/f(|S|)\le \beta _{\mathrm {ub}}^{(i^*)}\), \(\beta _{\mathrm {ub}}^{(i^*)}\le (1+\epsilon )\cdot \beta _{\mathrm {lb}}^{(i^*)}\), and \(\beta _{\mathrm {lb}}^{(i^*)}\le w(\hat{S})/f(|\hat{S}|)\). Combining these inequalities, we have \(\max _{S\subseteq V} w(S)/f(|S|)\le (1+\epsilon )\cdot w(\hat{S})/f(|\hat{S}|)\), which means that \(\hat{S}\) is a \((1+\epsilon )\)-approximate solution.

In what follows, we analyze the time complexity of the algorithm. For each \(i\in \{0,1,\dots ,i^*-1\}\), it holds that

$$\begin{aligned} \frac{\beta _{\mathrm {ub}}^{(i+1)}}{\beta _{\mathrm {lb}}^{(i+1)}} \le \max \left\{ \frac{\beta _{\mathrm {ub}}^{(i)}}{\beta ^{(i)}},~\frac{\beta ^{(i)}}{\beta _{\mathrm {lb}}^{(i)}}\right\} =\max \left\{ \frac{\beta _{\mathrm {ub}}^{(i)}}{\sqrt{\beta _{\mathrm {lb}}^{(i)}\cdot \beta _{\mathrm {ub}}^{(i)}}},~\frac{\sqrt{\beta _{\mathrm {lb}}^{(i)}\cdot \beta _{\mathrm {ub}}^{(i)}}}{\beta _{\mathrm {lb}}^{(i)}}\right\} = \sqrt{\frac{\beta _{\mathrm {ub}}^{(i)}}{\beta _{\mathrm {lb}}^{(i)}}}. \end{aligned}$$

Since \(\beta _{\mathrm {ub}}^{(0)}/\beta _{\mathrm {lb}}^{(0)}=w(V)/w(S_2^*)\le m\), we have \(\beta _{\mathrm {ub}}^{(i)}/\beta _{\mathrm {lb}}^{(i)}\le m^{1/2^i}\) for \(i=1,\dots ,i^*\). Note that \(i^*\) is the minimum index i that satisfies \(\beta _{\mathrm {ub}}^{(i)}/\beta _{\mathrm {lb}}^{(i)}\le 1+\epsilon \). Thus, we see that \(i^*\) is upper bounded by \(O\left( \log \left( \frac{\log n}{\log (1+\epsilon )}\right) \right) \). Therefore, the total running time is \(O\left( \frac{n^3}{\log n}\cdot \log \left( \frac{\log n}{\log (1+\epsilon )}\right) \right) =O\left( \frac{n^3}{\log n}\cdot \log \left( \frac{\log n}{\epsilon }\right) \right) \), where the equality follows from the fact that \(\lim _{\epsilon \rightarrow +0}\frac{\epsilon }{\log (1+\epsilon )}=1\) holds. \(\square \)

3.4 Greedy Peeling

Finally, we provide an approximation algorithm with much higher scalability. Specifically, we prove that the greedy peeling (Algorithm 2) has an approximation ratio of 3 for f-DS with concave function f. As mentioned in Sect. 2.2.2, the algorithm runs in \(O(m+n\log n)\) time for weighted graphs and \(O(m+n)\) time for unweighted graphs.

We prove the approximation ratio. Recall that \(S_n,\dots , S_1\) are the subsets of vertices produced by the greedy peeling. We use the following fact, which implies that there exists a 3-approximate solution for DalkS in \(S_n,\dots , S_{k}\).

Fact 1

(Theorem 1 in Andersen and Chellapilla [1]) For any integer k \((\le n)\), it holds that

$$\begin{aligned} \max _{S\subseteq V:\,|S|\ge k}\frac{w(S)}{|S|}\le 3\cdot \max _{k\le i\le n}\frac{w(S_i)}{i}. \end{aligned}$$

Theorem 7

The greedy peeling (Algorithm 2) has an approximation ratio of 3 for f-DS with concave function f.

Proof

Let \(S^*\subseteq V\) be an optimal solution to f-DS with concave function f, and let \(s^*=|S^*|\). Let \(S\subseteq V\) be the output of the greedy peeling for the problem. Then we have

$$\begin{aligned} \frac{w(S^*)}{f(s^*)}&=\frac{w(S^*)/s^*}{f(s^*)/s^*} \le \max _{S'\subseteq V:\,|S'|\ge s^*}\frac{w(S')/|S'|}{f(s^*)/s^*}\\&\le 3\cdot \max _{s^*\le i\le n} \frac{w(S_i)/i}{f(s^*)/s^*} \le 3\cdot \max _{s^*\le i\le n}\frac{w(S_i)/i}{f(i)/i} \le 3\cdot \frac{w(S)}{f(|S|)}, \end{aligned}$$

where the second inequality follows from Fact 1, and the third inequality follows from the monotonicity of f(x) / x. \(\square \)