1 Introduction

The densest k-subgraph problem is defined as follows: given a graph with \( n \) vertices and an integer \( k \le n \), find a subgraph with \( k \) vertices that maximizes the number of induced edges. The NP-hardness is a direct consequence of the well-known fact that finding a maximum clique is NP-hard. This contrasts to the problem of finding an arbitrary-sized subgraph of maximum density, i.e., a subgraph with maximum average degree, which is polynomially solvable [4]. The first approximation algorithm for the densest \( k \)-subgraph problem with an approximation guarantee of \( \mathcal {O}(n^{0.3885}) \) was given by Kortsarz and Peleg [5]. Feige et al. [6] improved this factor to \( \mathcal {O}( n^{\delta } ) \) for some \( \delta < 1/3 \). Until recently, when Bhaskara et al. [7] presented an \( \mathcal {O}( n^{1/4+\epsilon } ) \)-approximation algorithm for any \( \epsilon > 0 \), it has been a long-standing open problem whether the factor \( \mathcal {O}( n^{1/3} ) \) can be significantly beaten. Ashahiro et al. [8] showed that a simple greedy strategy yields an approximation ratio of \( \mathcal {O}( n/k ) \), and Feige and Langberg [9] showed that \( n/k \) is achievable using semidefinite programming. On the other hand, using a random sampling technique, Arora et al. [10] presented a polynomial-time approximation scheme (PTAS) for dense graphs with \( k = \Omega (n) \), that is, if the number of edges is \( \Omega (n^2) \) and \( k = \Omega (n) \), then there is an \( (1+\epsilon ) \)-approximation algorithm for any \( \epsilon > 0 \). However, there is probably no PTAS for general graphs [2].

These weak approximation guarantees for general graphs motivated the study of special graph classes, like perfect graphs, chordal graphs, and interval graphs. Unfortunately, although a maximum clique of a perfect graph can be found in polynomial time [11], finding a densest \( k \)-subgraph remains NP-hard even for chordal graphs and bipartite graphs [1], two important subclasses of perfect graphs. However, the complexity status of finding a densest \( k \)-subgraph in interval graphs, a subclass of chordal graphs, has been a prominent open problem over the last three decades [1]. In an interval graph, each vertex corresponds to an interval, and two vertices are connected via an edge if their corresponding intervals overlap. Such a geometric structure occurs frequently in scheduling, VLSI-design, and biology, see for instance [12]. This unknown complexity status gave rise to the search for approximation results. Liazi et al. [3] presented a 3-approximation algorithm for interval graphs and chordal graphs, and there is moreover a PTAS if the clique graph is a star [13], i.e., if the intersection graph of the maximal cliques is a star. Recently, Chen et al. [14] showed that a large family of intersection graphs, including interval graphs, admit constant factor approximation algorithms. Finally, Backer and Keil [15] gave a 3/2-approximation algorithm for proper interval graphs [15], a subclass of interval graphs where no interval is allowed to contain another one. Using the shifting technique, it is actually possible to even derive a PTAS for this case [14].

Contributions We significantly improve upon the known approximation results by presenting a PTAS for finding a densest \( k \)-subgraph in interval graphs. Note that this is the first PTAS for an import graph class without any further restrictions, since so far only PTASs for dense graphs with \( k = \Omega (n) \) [10], a quite restricted subclass of chordal graphs [13], and proper interval graphs [14] are known. We conjecture that finding a densest \( k \)-subgraph in interval graphs is NP-hard, but proving this (or giving a polynomial time algorithm instead) remains a challenging open problem [1].

Technique and outline If \( k \) is constant, then we can easily find a densest \( k \)-subgraph in polynomial time by simply enumerating all \( \left( {\begin{array}{c}n\\ k\end{array}}\right) = \mathcal {O}( n^k ) \) subgraphs of size \( k \). Hence, the difficulty of the problem stems from the fact that \( k \) is part of the input. Now, let \( V \) be the vertices of the input interval graph, and let \( V^* \subseteq V \) with \(| V^* | = k \) be a subset of vertices that induce a densest \( k \)-subgraph. Consider some clique \( C \subseteq V \), and let \( C^* = C \cap V^* \) be the subclique of \( C \) contained in \( V^* \). Assume then that we already know the vertices \( V^* \backslash C \), and hence we only need to pick some \( k - | {V^* \backslash C} | \) vertices from \( C \) that maximize the number of induced edges in combination with \( V^* \backslash C \). Clearly, \( C^* \) will solve this simplified problem. However, we might need to enumerate \(\left( {\begin{array}{c}| {C} |\\ | {C^*} |\end{array}}\right) \) subcliques of \( C \) in order to find \( C^* \) (or another subclique of similar quality), which is not possible in polynomial time if \( | C^* | \) is not constant. Therefore, we show in Sect. 4 that, by losing an \( (1-\epsilon ) \)-factor in the number of induced edges for an arbitrary small \( \epsilon > 0 \), it suffices to only consider a polynomial number of subcliques (Lemma 4). To extend this search space restriction to the whole interval graph, we first need to introduce the notion of a sequence representation in Sect. 3 which allows us to decompose any densest \( k \)-subgraph into a sequence of cliques. Finally, we present a backward dynamic program in Sect. 5 that finds a densest \( k \)-subgraph in this restricted search space in polynomial time. The principle of reducing the search space in order to make it treatable by dynamic programming in polynomial time has been succesfully applied during the last decades [16, 17] to obtain approximation schemes. However, finding the right reduction and dynamic program is a highly problem-specific challenge.

Sparsest subgraphs Also the inverse problem, where the goal is to find a subgraph of size \( k \) that minimizes the number of induced edges, called sparsest k-subgraph problem, has attracted a considerable amount of attention, see [18] and the references therein. As for the densest \( k \)-subgraph problem, it is also not known whether this problem is NP-hard or not for interval graphs. Just recently, Watrigant, Bougeret, and Giroudeau [18] showed NP-hardness for the superclass of chordal graphs, for which they also gave a constant factor approximation algorithm. Moreover, they used the same kind of arguments as in this paper to obtain a PTAS for proper interval graphs. Hence, the presented techniques seem to be quite general to deal with problems regarding the overlap structure in interval graphs.

2 Preliminaries

Let \( G = (V,E) \) be an interval graph, i.e., each vertex in \( V \) corresponds to an interval, and two vertices are connected via an edge in \( E \) if their corresponding intervals overlap. Hence, we can also think of the vertices in \( V \) as intervals. For an interval subset \( V' \subseteq V \), let \( E(V') \) denote the number of edges of the subgraph of \( G \) induced by \( V' \), i.e., the number of overlaps between intervals in \( V' \). Hence, our goal is to find an interval set \( V^* \subseteq V \) of size \( k \) that maximizes \( E(V^*) \). We also refer to such a subset as a densest k-interval subset. Let \( \mathrm{OPT}:= E(V^*) \) denote this maximal number of overlaps. Finally, for two disjoint interval sets \( V',V'' \subseteq V \), let \( E(V',V'') \) denote the number of edges of the bipartite subgraph of \( G \) induced by these two interval sets, i.e., the number of overlaps between intervals in \( V' \) and intervals in \( V'' \). Observe that a clique \( C \subseteq V \) is a set of intervals with \( \cap _{I \in C} I \ne \emptyset \).

Let \( l_I \in \mathbb {R}\) and \( r_I \in \mathbb {R}\) denote the left and right endpoint of an interval \( I \in V \), respectively, and assume that all endpoints of intervals are distinct, which can be easily ensured without changing the overlap structure. This also ensures that all intervals are distinct. We write \( I_1 < I_2 \) for two intervals \( I_1,I_2 \in V \) if \( r_{I_1} < l_{I_2} \). In this case, we also say that the pair \( I_1,I_2 \) is consecutive. Analogously, we call a sequence of intervals \( I_1,I_2,\ldots ,I_s \in V \) with \( I_1 < I_2 < \cdots < I_s \) a consecutive interval sequence. Finally, for an interval \( I \in V \), let \( V_I \subseteq V \) be the set of all intervals \( I' \in V \) with \( I \subseteq I' \).

Lemma 1

For any densest \( k \)-interval subset \( V^* \subseteq V \), we may assume for each interval \( I \in V^* \) that \( V_I \subseteq V^* \).

Proof

Assume for contradiction that there is an interval \( I \in V^* \) and another interval \( I' \in V_I \backslash V^* \). In this case, since \( I \subset I' \), we could replace interval \( I \) by interval \( I' \) without modifying the size of \( V^* \) and without decreasing \( E(V^*) \). Therefore, iterating this scheme terminates and gives us a densest \( k \)-interval subset \( V^* \) that satisfies the property from the claim. \(\square \)

3 Sequence Representation

In this section, we show how to decompose any densest \( k \)-interval subset into a sequence of cliques, called a sequence representation. The purpose of this representation is to have a systematic way to enumerate enough k-interval subsets in order to find a densest one via a dynamic program as explained in Sect. 5. However, since there are superpolynomial many possible sets in this enumeration, it does not straight-forward result in a polynomial time algorithm. We deal with this problem in Sect. 4.

For a consecutive interval pair \( I_1, I_2 \in V \), let \( C_{I_1 I_2} \) denote the clique of all intervals \( I \in V \) with \( l_{I_1} < l_I < r_{I_1} \) and \( r_{I_1} < r_I < r_{I_2} \). Moreover, let \( C'_{I_1 I_2} \) denote the clique of all intervals \( I \in V \) with \( r_{I_1} < l_I < r_{I_2} \) and \( r_{I_2} \le r_I \). Note that \( I_2 \in C'_{I_1 I_2} \), but clique \( C_{I_1 I_2} \) might be empty. We illustrate the cliques \( C_{I_1 I_2} \) and \( C'_{I_1 I_2} \) in Fig. 1.

Fig. 1
figure 1

Example intervals in the cliques \( C_{I_1 I_2} \) and \( C'_{I_1 I_2} \)

To avoid case distinctions, we add a dummy interval \( I_{\infty } \) to \( V \) with \( I < I_{\infty } \) for any other interval \( I \in V \backslash \{ I_{\infty } \} \). Our goal is then to find a densest \( (k+1) \)-interval subset \( V^* \subseteq V \) with \( I_{\infty } \in V^* \). Since \( I_{\infty } \) does not overlap with any other interval in \( V \), this slightly modified problem is equivalent to our original problem of simply finding a densest \( k \)-interval subset. Specifically, removing \( I_{\infty } \) from a solution \( V' \subseteq V \) for this modified problem yields a solution for our original problem with the same number of interval overlaps \( E(V') \). Hence, \(\mathrm{OPT}\) denotes the maximal number of interval overlaps in both cases.

We say that an interval subset \( V' \subseteq V \) admits a sequence representation if there is a consecutive interval sequence \( I_1,I_2,\ldots ,I_s \) and a sequence of (possibly empty) cliques \( Q_1,Q_2,\ldots ,Q_{s-1} \) with \( Q_i \subseteq C_{I_i I_{i+1}} \) for each \( 1 \le i < s \) such that

$$\begin{aligned} V' = \bigcup _{i=1}^s V_{I_i} \cup \bigcup _{i=1}^{s-1} Q_i. \end{aligned}$$

In other words, for each interval \( I \in V' \), we either have that there is an index \( 1 \le i \le s \) with \( I_i \subseteq I \), or there is an index \( 1 \le i < s \) with \( I \in Q_i \). We schematically depict the overlap structure of the cliques \( Q_1,Q_2,\ldots ,Q_{s-1} \) in Fig. 2. Note here that any clique \( Q_i \) might only overlap with interval \( I_i \) but not with interval \( I_{i+1} \). If \( I_{\infty } \in V' \), note that it must then hold for any such sequence representation that \( I_s = I_{\infty } \).

Fig. 2
figure 2

Overlap structure of the cliques \( Q_1,Q_2,\ldots ,Q_{s-1} \)

The following lemma says that we only have to consider interval subsets that admit a sequence representation. Therefore, it suffices to enumerate all these subsets to find a densest one.

Lemma 2

We may assume that any densest \( (k+1) \)-interval subset \( V^* \subseteq V \) with \( I_{\infty } \in V^* \) admits a sequence representation.

Proof

Given a densest \( (k+1) \)-interval subset \( V^* \subseteq V \) with \( I_{\infty } \in V^* \), we inductively construct the claimed sequence representation. During each iteration, we want to conserve the invariant that the currently constructed interval sequence \( I_1,I_2,\ldots ,I_i \) and cliques \( Q_1,Q_2,\ldots ,Q_{i-1} \) are a sequence representation of \( \{ I \in V^* \mid l_I \le l_{I_i} \} \). Since we extend this interval sequence by one interval in each iteration, this already gives that we end up with the claimed sequence representation \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \) of \( V^* \). Note that we obtain that finally \( I_s = I_{\infty } \).

To start this inductive construction, let \( I_1 \in V^* \) be the interval with leftmost left endpoint \( l_{I_1} \) subject to the constraint that there is no interval \( I \in V^* \) with \( I \subset I_1 \). Recall that we know from Lemma 1 that we may assume that \( V_{I_1} \subseteq V^* \). Hence, to see that the invariant described above holds, we only have to show that also \( \{ I \in V^* \mid l_I < l_{I_1} \} \subseteq V_{I_1} \). Recall here the assumption that all left endpoints of intervals are distinct, and thus \( I = I_1 \) if \( l_I = l_{I_1} \). Now, assume for contradiction that there is an interval \( I \in V^* \) with \( l_I < l_{I_1} \) and \( I \not \in V_{I_1} \). Hence, we obtain that \( r_I < r_{I_1} \). Let then \( I' \in V^* \) be an interval with the properties that \( I' \subseteq I \) and there is no interval \( I'' \in V^* \) with \( I'' \subset I' \). If \( l_{I'} > l_{I_1} \), then \( I' \subset I_1 \), which contradicts the selection of \( I_1 \). On the other hand, if \( l_{I'} < l_{I_1} \), then \( I_1 \) is not an interval with leftmost left endpoint, which contradicts the selection of \( I_1 \) as well. This shows that the invariant holds after the first iteration. Now assume that, for some \( i \ge 1 \), we have a sequence representation \( I_1,I_2,\ldots ,I_i \) and \( Q_1,Q_2,\ldots ,Q_{i-1} \) of \( \{ I \in V^* \mid l_I \le l_{I_i} \} \). If \( I_i = I_{\infty } \), then we are done. Otherwise, let \( I_{i+1} \in V^* \) be the interval with leftmost left endpoint \( l_{I_{i+1}} \) subject to the constraints that \( I_i < I_{i+1} \) and there is no interval \( I \in V^* \) with \( I \subset I_{i+1} \). Moreover, define \( Q_i := V^* \cap C_{I_i I_{i+1}} \). Using nearly the same arguments as for the first iteration shows that the invariant is conserved during each iteration. \(\square \)

4 Simple Sequence Representations

In this section, we explain how to remove sufficiently many \( k \)-interval subsets from the enumeration via sequence representations introduced in Sect. 3 to reach polynomial size. The main ingredient is to exploit the overlap structure of a clique in a sequence representation in order to replace it by a sufficiently good one which is easier to describe (Lemma 4), and hence reduces the size of the enumeration.

Consider some fixed but arbitrary small \( \epsilon > 0 \), and assume that \( 1/\epsilon \) is integral and \( \epsilon \le 1 \). Moreover, let \( C \subseteq V \) be some clique, and let \( I'_1, I'_2, \ldots , I'_u \) with \( r_{I'_1} > r_{I'_2} > \ldots > r_{I'_u} \) be an ordering of the intervals in \( C \) according to their right endpoints. Recall here the assumption that all interval endpoints are distinct. We will show how the following inputs define a subclique \( Q \subseteq C \):

  1. (1)

    an integer \( v \) with \( 2 \le v \le 2/\epsilon + 2 \),

  2. (2)

    an integer sequence \( j_1,j_2,\ldots ,j_v \) with \( j_1 = 1 < j_2 < \ldots < j_v = u+1 \),

  3. (3)

    an integer sequence \( h_1,h_2,\ldots ,h_{v-1} \) with \( 0 \le h_t \le j_{t+1}-j_t \) for each \( 1 \le t < v \).

We define \( Q \) as follows: for each \( 1 \le t < v , Q \) contains exactly the \( h_t \) intervals in \( C_t := \{ I'_{j_t},I'_{j_t+1},\ldots ,I'_{j_{t+1}-1} \} \) with leftmost left endpoints. Thus, any such combination of inputs defines a subclique \( Q \subseteq C \). Let then \( P_{\epsilon }(C) \) be the set of all subcliques constructed in this way for all possible such inputs. Since \( | C | = u \le n \), we immediately obtain the polynomial bound \( | P_{\epsilon }(C) | \le (2/\epsilon +1) \cdot n^{2/\epsilon } \cdot n^{2/\epsilon +1} \), and thus \( | P_{\epsilon }(C) | = n^{\mathcal {O}(1/\epsilon ^2)} \).

Example Consider the clique \( C \) of intervals \( I'_1,I'_2,\ldots ,I'_7 \) depicted in Fig. 3 which are ordered according to their right endpoints. Then, for the inputs \( v = 4 , j_1 = 1 < j_2 = 4 < j_3 = 7 < j_4 = 8 , h_1 = h_2 = 2 \), and \( h_3 = 0 \), we add the set \( Q = \{I'_1,I'_3,I'_5,I'_6\} \) to \( P_{\epsilon }(C) \), which are the intervals with solid lines in Fig. 3. To see this, note that \( I'_1 \) and \( I'_3 \) are the \( h_1 \) intervals in \( C_1 = \{I'_1,I'_2,I'_3\} \) with leftmost left endpoints, and \( I'_5 \) and \( I'_6 \) are the \( h_2 \) intervals in \( C_2 = \{I'_4,I'_5,I'_6\} \) with leftmost left endpoints. Since \( h_3 = 0 , Q \) does not contain the single interval in \( C_3 = \{I'_7\} \).

Fig. 3
figure 3

Example construction of a clique \( Q \in P_{\epsilon }(C) \)

Let \( V' \subseteq V \) be an interval subset with a sequence representation \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \). We then say that \( V' \) admits a simple sequence representation if additionally \( Q_i \in P_{\epsilon }(C_{I_i I_{i+1}}) \) for each \( 1 \le i < s \). The following lemma is critical for the correctness of the PTAS, since it allows us to trade the size of the search space for accuracy.

Lemma 3

There is an interval subset \( V' \subseteq V \) with \( | V' | = k+1 , I_{\infty } \in V' \), and \( E(V') \ge ( 1- 8 \epsilon ) \mathrm{OPT}\) that admits a simple sequence representation.

To prove Lemma 3, we need one preliminary lemma.

Lemma 4

Consider an interval subset \( V' \subseteq V \) with a sequence representation \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \). Then, for any \( 1 \le i < s \), there exists a clique \( Q \in P_{\epsilon }(C_{I_i I_{i+1}}) \) with \( | Q | = | Q_i | \) such that replacing the clique \( Q_i \) by \( Q \) decreases \( E(V') \) by at most \( \epsilon | Q_i | | Q'_i | \), where \( Q'_i := V' \cap C'_{I_i I_{i+1}} \). Moreover, if \( | Q_i | = 1 \), then \( E(V') \) does not decrease at all.

Proof

We abbreviate \( C = C_{I_i I_{i+1}} \) and \( C' = C'_{I_i I_{i+1}} \) throughout this proof, and let \( I'_1, I'_2, \ldots , I'_u \) with \( r_{I'_1} > r_{I'_2} > \ldots > r_{I'_u} \) be an ordering of the intervals in \( C \) according to their right endpoints. First, consider the case that \( | Q_i | = 1 \). In this case, there is one input that defines a set \( Q \in P_{\epsilon }(C) \) with exactly \( Q = Q_i \). Specifically, if \( Q_i = \{ I'_t \} \) for \( t < u \), then we use the input \( j_1 = 1 < j_2 = t < j_3 = t+1 < j_4 = u + 1 , h_1 = h_3 = 0 \), and \( h_2 = 1 \). Recall here the assumption that \( \epsilon \le 1 \), and hence \( 2/\epsilon + 2 \ge 4 \), which ensures that we can set \( v=4 \). On the other hand, if \( Q_i = \{ I'_u \} \), then we use the input \( j_1 = 1 < j_2 = u < j_3 = u+1 , h_1 = 0 \), and \( h_2 = 1 \). This proves the second part of the claim. Hence, we may assume that \( | Q_i | > 1 \) in what follows.

To define the set \( Q \in P_{\epsilon }(C) \) for the first part of the claim, we have to define the respective inputs of \( Q \) used during the construction of \( P_{\epsilon }(C) \). To this end, for each \( 1 \le j \le u \), let \( b_j := E(\{I'_j\},Q'_i) \) denote the number of intervals in \( Q'_i \) which overlap with \( I'_j \). Consequently, since the intervals \( I'_1,I'_2,\ldots I'_u \in C \) are ordered according to their right endpoints and all intervals in \( Q'_i \subseteq C' \) are right of these intervals as schematically depicted in Fig. 1, we obtain that \( | Q'_i | \ge b_1 \ge b_2 \ge \ldots \ge b_u \). Using this, we now inductively construct a sequence \( j_1,j_2,\ldots ,j_v \) with \( j_1 = 1 < j_2 < \ldots < j_v = u+1 \) as follows:

Since \( j_1 = 1 \), the start of this inductive construction is well-defined. Now assume that we have already defined some integers \( j_1,j_2,\ldots ,j_t \). Then, if \( b_{j_t} - b_{j_t + 1} > \epsilon | Q'_i | \), set \( j_{t+1} := j_t + 1 \). Otherwise, let \( j_{t+1} > j_t \) be the maximal index such that still \( b_{j_t} - b_{j_{t+1}} \le \epsilon | Q'_i | \). If this index is the last index \( u \), then set \( j_{t+1} := u+1 \) instead, and terminate this inductive construction. This gives the integer sequence \( j_1,j_2,\ldots ,j_v \).

Observe that it holds for any \( 1 \le t \le v-2 \) that \( b_{j_t} - b_{j_{t+2}} > \epsilon | Q'_i | \). Consequently, since \( b_1 \le | Q'_i | \), we obtain that \( v \le 2/\epsilon + 2 \). Moreover, for each \( 1 \le t < v \), we either have that \( j_{t+1} = j_t + 1 \) or \( b_{j_t} - b_{j_{t+1}} \le \epsilon | Q'_i | \).

Example Assume that \( C \) is the clique from Fig. 3, and moreover assume that \( \epsilon | Q'_i | = 5 , b_1 = 10, b_2 = 8, b_3 = 6, b_4 = 6, b_5 = 0, b_6 = 0 \), and \( b_7 = 0 \). In any case, we have \( j_1 = 1 \). Consequently, it holds that \( j_2 = 4 \) is the maximal index such that still \( b_{j_1} - b_{j_2} = 4 \le \epsilon | Q'_i | \). Next, since \( b_{j_2} - b_{j_2+1} = 6 > \epsilon | Q'_i | \), we set \( j_3 := j_2 + 1 \). Finally, \( j_4 = 7 \) is the maximal index such that still \( b_{j_3} - b_{j_4} = 0 < \epsilon | Q'_i | \). However, since this is the last index, we set \( j_4 := 8 \) instead.

Finally, for each \( 1 \le t < v \), define \( h_t := | Q_i \cap C_t | \), and hence \( \sum _{t=1}^{v-1} h_t = | Q_i | \). The sequences \( j_1,j_2,\ldots ,j_v \) and \( h_1,h_2,\ldots ,h_{v-1} \) are the inputs required to define the claimed set \( Q \in P_{\epsilon }(C) \). Because of the definition of the sequence \( h_1,h_2,\ldots ,h_{v-1} \), we immediately obtain \( | Q | = | Q_i | \). For each \( 1 \le t < v \), observe that the construction of \( Q \) implies that \( Q \cap C_t \) contains the \( h_t \) intervals in \( C_t \) with leftmost left endpoints.

We still have to show that the interval set \( Q \) constructed above has the claimed property. First, since \( | Q | = | Q_i | \), we have \( E(Q) = E(Q_i) = E(V' \cap C) \). Therefore, to bound the decrease of \( E(V') \) due to replacing \( Q_i \) by \( Q \), we only need to show that \( E( Q, V' \backslash C ) \ge E(Q_i, V' \backslash C) - \epsilon | Q_i | | Q'_i | \). To this end, we partition \( V' \backslash C \) into four pairwise disjoint parts:

$$\begin{aligned} V^{<}&:= \{ I \in V' \backslash C \mid r_I < r_{I_i} \}, \\ V^{=}&:= \{ I \in V' \backslash C \mid l_I < r_{I_i} \le r_I \}, \\ V^{>}&:= \{ I \in V' \backslash C \mid r_{I_i} < l_I < r_{I_{i+1}} \}, \\ V^{\gg }&:= \{ I \in V' \backslash C \mid l_I > r_{I_{i+1}} \}. \end{aligned}$$

Since these sets are pairwise disjoint, we may consider them separately:

Case \( V^{<} \): Consider some index \( 1 \le t < v \), and recall that \( Q \cap C_t \) contains the \( h_t \) intervals in \( C_t \) with leftmost left endpoints. On the other hand, \( Q_i \cap C_t \) also contains \( h_t \) intervals from \( C_t \), but these are not necessarily the ones with leftmost left endpoints. Consequently, since \( V^{<} \) are the intervals left of \( r_{I_i} \) and it holds for any interval \( I \in C_t \) that \( r_{I_i} \in I \), this shows that \( E(Q \cap C_t,V^{<} ) \ge E( Q_i \cap C_t, V^{<} ) \). Combining this for all indices \( 1 \le t < v \) finally gives that \( E(Q,V^{<}) \ge E(Q_i, V^{<}) \).

Case \( V^{=} \): Since \( | Q | = | Q_i | \) and \( r_{I_i} \in I \) for each interval \( I \in C \), we have that \( E(Q,V^{=}) = E(Q_i,V^{=}) \).

Case \( V^{\gg } \): Since no interval \( V^{\gg } \) overlaps with an interval in \( C \), we trivially obtain \( E( Q, V^{\gg }) = 0 = E(Q_i, V^{\gg }) \). Therefore, combining this case with the last two cases, we see that the next case is the only case where \( E(V') \) might decrease.

Case \( V^{>} \): Since the consecutive interval pair \( I_i,I_{i+1} \) is part of a sequence representation of \( V' \), there is no interval \( I \in V' \) with \( r_{I_i} < l_I < r_{I_{i+1}} \) and \( r_I < r_{I_{i+1}} \). Therefore, we obtain that \( V^{>} = Q'_i := V' \cap C' \). Hence, we only need to upper bound \( E(Q_i, Q'_i) - E(Q,Q'_i) = \sum _{t=1}^{v-1} a_t \), where, for each \( 1 \le t < v \), we define \( a_t := E(Q_i \cap C_t, Q'_i ) - E( Q \cap C_t, Q'_i ) \). Now note that, for each \( 1 \le t < v \), each interval in \( C_t \) overlaps with at least \( b_{j_{t+1}} \) and at most \( b_{j_t} \) intervals in \( Q'_i \). On the other hand, the sets \( Q_i \cap C_t \) and \( Q \cap C_t \) both contain exactly \( h_t \) intervals. Combining this, we find that \( a_t \le h_t ( b_{j_t} - b_{j_{t+1}} ) \), where we define \( b_{j_v} := 0 \). Therefore, we obtain that

$$\begin{aligned} \sum _{t=1}^{v-1} a_t&= \sum _{t: j_{t+1} > j_t+1} a_t \le \sum _{t: j_{t+1} > j_t+1 } h_t ( b_{j_t} - b_{j_{t+1}} ) \\&\le \epsilon | Q'_i | \sum _{t: j_{t+1} > j_t+1 } h_t \le \epsilon | Q'_i | | Q_i |. \end{aligned}$$

The equality in the first line is due to the fact that if \( j_{t+1} = j_t+1 \), then trivially \( Q_i \cap C_t = Q \cap C_t \), since \( | C_t | = 1 \), and consequently \( a_t = 0 \). Moreover, the first inequality in the second line is due to the earlier observation that either \( j_{t+1} = j_t + 1 \) or \( b_{j_t} - b_{j_{t+1}} \le \epsilon | Q'_i | \). Hence, by combining all these arguments, we conclude that \( E(Q, V^{>}) \ge E(Q_i, V^{>}) - \epsilon | Q_i | | Q'_i | \).

Combining all four cases proves the claim. \(\square \)

Proof of Lemma 3 Consider a densest \( (k+1) \)-interval subset \( V^* \subseteq V \) with \( I_{\infty } \in V^* \) and \( E(V^*) = \mathrm{OPT}\). We know from Lemma 2 that we may assume that \( V^* \) admits a sequence representation \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \). Moreover, Lemma 4 implies that, for each \( 1 \le i < s \), we can replace \( Q_i \) by a clique \( Q \in P_{\epsilon }(C_{I_i I_{i+1} }) \) with \( | Q | = | Q_i | \) such that \( E(V^*) \) decreases by at most \( \epsilon | Q_i | | Q'_i | \), and if \( | Q_i | = 1 \), then \( E(V^*) \) does not decrease at all. Iterating these replacements yields the claimed interval subset \( V' \). We will argue in the following paragraph that these replacements can indeed be done iteratively without amplifying these decreases.

Assume that, for some \( 1 \le i < s \), we have already replaced \( Q_t \) for each \( t > i \) as described above. However, we obtain that the only such replacement that might affect the next replacement of \( Q_i \) by some clique \( Q \in P_{\epsilon }(C_{I_i I_{i+1}}) \) is the one with \( t = i+1 \). Therefore, consider the clique \( Q' \in P_{\epsilon }(C_{I_{i+1} I_{i+2}}) \) that replaced \( Q_{i+1} \). Because we always select intervals with leftmost left endpoints during the construction of \( Q' \) and \( | Q_{i+1} | = | Q' | \), we can identify each interval \( I \in Q_{i+1} \) with an interval \( I' \in Q' \) with \( l_{I'} \le l_I \). Hence, since \( Q \subseteq C_{I_i I_{i+1}} \) is left of \( C_{I_{i+1} I_{i+2}} \), we even have that \( E(Q,Q') \ge E(Q, Q_{i+1}) \). Consequently, replacing \( Q_i \) by \( Q \) will still decrease \( E(V^*) \) by at most \( \epsilon | Q_i | | Q'_i | \), even if we replace \( Q_{i+1} \) by \( Q' \) first.

To bound the total decrease of \( E(V^*) \), observe that

$$\begin{aligned} E(V^*) - E(V')&\le \sum _{i:| Q_i | > 1} \epsilon | Q_i || Q'_i | \\&\le \epsilon \left( \sum _{i=1}^{s-1} 2 | Q_i | ( | Q_i | - 1 ) + \sum _{i=1}^{s-1} 2 | Q'_i | ( | Q'_i | - 1 ) \right) \\&= \epsilon \left( \sum _{i=1}^{s-1} 4 E(Q_i) + \sum _{i=1}^{s-1} 4 E(Q'_i) \right) \le \epsilon 8 E(V^*) \end{aligned}$$

which completes the proof of the lemma. The first line is due to Lemma 4 and the arguments above. The second line is due to the simple observation that \( ab \le 2 ( a(a-1)\,{+}\,b(b-1) ) \) for any pair of integers \( a > 1 \) and \( b \ge 1 \). Moreover, the third line uses the fact that \( E(C) = \left( {\begin{array}{c}| C |\\ 2\end{array}}\right) = | C |( | C | - 1)/2 \) for any clique \( C \subseteq V \). Finally, the fourth line is due to the fact that the cliques \( Q_1,Q_2,\ldots ,Q_{s-1} \subseteq V^* \) are pairwise disjoint, and hence \( \sum _{i=1}^{s-1} E(Q_i) \le E(V^*) \). The same holds for the cliques \( Q'_1,Q'_2,\ldots ,Q'_{s-1} \subseteq V^* \). \(\square \)

5 Dynamic Programming

In this section, we finally show how to enumerate all simple sequence representations from Sect. 4 via a dynamic program. To this end, we have a dynamic programming array \( \Pi \) with integral entries of the form

$$\begin{aligned} \Pi (h,s,I_{s-1},I_s,Q_{s-1}), \end{aligned}$$

where \( h \) is an integer with \( 0 \le h \le k+1 \), \( s \) is an integer with \( 2 \le s \le n \), \( I_{s-1},I_s \in V \) are two consecutive intervals, and \( Q_{s-1} \in P_{\epsilon }(C_{I_{s-1} I_s}) \).

The indexing of \( I_{s-1},I_s \), and \( Q_{s-1} \) with \( s \) is just for convenience. Our goal is to fill this array such that \( \Pi (h,s,I_{s-1},I_s,Q_{s-1}) = E(V') \) for an interval subset \( V' \subseteq V \) with \( | V' | = h \) that maximizes \( E(V') \) subject to the constraint that \( V' \) admits a simple sequence representation \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \). Hence, only the last parts in this sequence representation are defined by the entry. We can bound the size of \( \Pi \) by \( n^4 \cdot \max _{I_{s-1},I_s} | P_{\epsilon }(C_{I_{s-1} I_s}) | \). The second part of this product is simply the maximal size of \( P_{\epsilon }(C_{I_{s-1} I_s}) \) for any consecutive interval pair \( I_{s-1},I_s \). Consequently, since we already know that \( P_{\epsilon }(C_{I_{s-1} I_s}) \) has polynomial size for any such interval pair \( I_{s-1}, I_s \in V \), we immediately obtain that the array \( \Pi \) has polynomial size as well.

We initialize \( \Pi \) by filling all entries of the form \( \Pi (h,2,I_1,I_2,Q_1) \) with \( | Q_1 \cup V_{I_1} \cup V_{I_2} | = h \). Specifically, for any integer \( h \) with \( 0 \le h \le k+1 \), any consecutive interval pair \( I_1,I_2 \), and any interval set \( Q_1 \in P_{\epsilon }(I_1,I_2) \) with \( | Q_1 \cup V_{I_1} \cup V_{I_2} | = h \), we set \( \Pi (h,2,I_1,I_2,Q_1) := E( Q_1 \cup V_{I_1} \cup V_{I_2} ) \). All other entries of \( \Pi \) are initialized as \( -\infty \).

To define a recurrence relation, assume that we have already filled all entries of the form \( \Pi (h,s-1,I_{s-2},I_{s-1},Q_{s-2}) \), and now we want to use them to fill all entries of the form \( \Pi (h,s,I_{s-1},I_s,Q_{s-1}) \), i.e., we want to increase the sequence length \( s \) by one. To this end, we can use the following recurrence relation:

$$\begin{aligned}&\Pi \left( h,s,I_{s-1},I_s,Q_{s-1}\right) \\&\quad =\max _{I_{s-2} < I_{s-1}, Q_{s-2} \in P_{\epsilon }(C_{I_{s-2} I_{s-1}})} \!\left\{ \! \Pi \left( h-| Q_{s-1} |-| V_{I_s}\backslash V_{I_{s-1}} |,s-1,I_{s-2},I_{s-1},Q_{s-2}\right) \right. \\&\quad \left. + E\left( Q_{s-2} \cup V_{I_{s-1}}, Q_{s-1} \cup V_{I_s} \backslash V_{I_{s-1}}\right) + E\left( Q_{s-1} \cup V_{I_s} \backslash V_{I_{s-1}}\right) \right\} \end{aligned}$$

In words, we take the maximum over all intervals \( I_{s-2} \in V \) with \( I_{s-2} < I_{s-1} \) and all cliques \( Q_{s-2} \in P_{\epsilon }(C_{I_{s-2} I_{s-1}}) \). Since \( P_{\epsilon }(C_{I_{s-1} I_s}) \) has polynomial size as already used above, this recurrence relation can be clearly implemented in polynomial time. Hence, in combination with the size of \( \Pi \) listed above, we find that it takes polynomial time to fill \( \Pi \).

To see the correctness of this recurrence relation, consider an interval subset \( V' \subseteq V \) that realizes an entry of the form \( \Pi (h,s,I_{s-1},I_s,Q_{s-1}) \), and let \( I_1,I_2,\ldots ,I_s \) and \( Q_1,Q_2,\ldots ,Q_{s-1} \) be a simple sequence representation of \( V' \). Let then \( V'' \subseteq V' \) be the subset with the shorter simple sequence representation \( I_1,I_2,\ldots ,I_{s-1} \) and \( Q_1,Q_2,\ldots ,Q_{s-2} \). Hence, we have that \( V' \backslash V'' = Q_{s-1} \cup V_{I_s} \backslash V_{I_{s-1}} \). Now observe that, as schematically illustrated in Fig. 4, the only intervals in \( V'' \) which might overlap with an interval in \( V' \backslash V'' \) are the ones in \( Q_{s-2} \cup V_{I_{s-1}} \). Consequently, we obtain the decomposition

$$\begin{aligned} E(V') = E(V'') + E(Q_{s-2} \cup V_{I_{s-1}}, Q_{s-1} \cup V_{I_s} \backslash V_{I_{s-1}} ) + E(Q_{s-1} \cup V_{I_s} \backslash V_{I_{s-1}} ) \end{aligned}$$

as used in the recurrence relation. Thus, since \( V' \) minimizes \( E(V') \), we have that also \( V'' \) minimizes \( E(V'') \), since we could otherwise improve \( V' \) by replacing the intervals \( V'' \) by another interval subset. This shows that \( E(V'') \) realizes the entry

$$\begin{aligned} \Pi (h-| Q_{s-1} |- | V_{I_s}\backslash V_{I_{s-1}} |,s-1,I_{s-2},I_{s-1},Q_{s-2}). \end{aligned}$$

Combining these facts shows the correctness of the recurrence relation.

Fig. 4
figure 4

Overlap structure during a recurrence relation

Theorem 5

There is a PTAS for finding a densest \( k \)-subgraph in interval graphs. The running time is \( n^{\mathcal {O}(1/\epsilon ^4)} \).

Proof

Use the dynamic programming approach explained above to compute all entries \( \Pi (h,s,I_{s-1},I_s,Q_{s-1}) \) with \( h = k+1 \) and \( I_s = I_{\infty } \), and let \( V' \) be the interval set that realizes an optimal such entry, i.e., one that maximizes \( E(V') = \Pi (h,s,I_{s-1},I_s,Q_{s-1}) \). By the definition of \( \Pi \), we obtain that \( V' \) is a densest \( (k+1) \)-interval subset with \( I_{\infty } \in V' \) subject to the constraint that \( V' \) admits a simple sequence representation. Consequently, by Lemma 3, we find that \( E(V' \backslash \{ I_{\infty } \}) = E(V') \ge (1-8 \epsilon )\,\mathrm{OPT}\). This proves the claim. \(\square \)

6 Conclusion

In this paper, we described a PTAS for the densest \( k \)-subgraph problem in interval graphs, which is the first PTAS for this problem in an important graph class without any further restrictions. We conjecture that the densest \( k \)-subgraph problem is NP-hard in interval graphs, but a proof is still outstanding. A similar technique as in this paper can be used to obtain a PTAS for the sparsest \( k \)-subgraph problem in proper interval graphs [18]. This basically requires taking the \( h_t \) intervals with rightmost left endpoint in Sect. 4 and moreover replacing \( \max \) by \( \min \) in the recurrence relation in Sect. 5. However, lifting this technique to non-proper interval graphs remains challenging. Although a similar lemma as Lemma 1 holds as well, the bounding of the cost change due to clique replacements as done in Sect. 4 fails in this case. Another interesting open question is whether the techniques introduced in this paper can be applied to the superclass of chordal graphs as well. We think that this transfer is not possible because chordal graphs lack the nice geometric structure of interval graphs.