Keywords

1 Introduction

The Feedback Vertex Set (FVS) problem asks for a smallest subset of vertices in an undirected graph to be removed such that the graph, , becomes acyclic. This problem was one of the first problems shown to be \(\mathsf {NP}\)-complete [6], and has applications to problems that arise in several areas. These applications include, but are not limited to, operating systems, database systems and VLSI chip design. Consequently, the FVS problem has been widely studied in the context of exact, parameterized and approximation algorithms.

Several variations of the FVS theme have also emerged over the years. For instance, one line of work considers the task of “deleting to specialized forests”, such as forests of pathwidth one [3, 8] or forests whose connected components are stars of bounded degree [5]. In this case, the forests of pathwidth one turn out to be graphs whose connected components are caterpillars.

Meanwhile, another line of work is the Tree Deletion Set (TDS) problem that considers the issue of the connectivity of the structure after the solution has been deleted. In particular, the TDS problem asks for a smallest subset of vertices such that is a tree [7, 10]. We remark that the NP-completeness of this TDS problem follows from a general result of Yannakakis [12]. To state this result, recall that a property \(\pi \) is a class of graphs, and we will say that \(\pi \) is satisfied by, or is true for, a graph if . A property is said to be non-trivial if it is satisfied for at least one graph and false for at least one graph; it is interesting if the property is true for arbitrarily large graphs and is hereditary on induced subgraphs if the deletion of any node from a graph in \(\pi \) always results in a graph that is in \(\pi \). The result in question states that the problem of finding a maximum connected subgraph satisfying a property \(\pi \) is \(\mathsf {NP}\)-hard for any non-trivial and interesting property that is hereditary on induced subgraphs.

In this work, we pose a variation of FVS that is in the spirit of a combination of the variations that we have alluded to; here, however, we are looking for a connected object with additional structure. Specifically, we consider the problem of deleting to a full binary tree. We recall that a full binary tree is a tree that has exactly one vertex of degree two and no vertex of degree more than three. Consider the problem of optimally deleting to a full binary tree, posed in the language of the theorem of Yannakakis [12] stated above, which is to find a maximum connected subgraph that satisfies a certain property. Observe that the property in question could be defined as the property of not having cycles, having exactly one vertex of degree two and no vertex of degree more than three. Note that this property is not hereditary on induced subgraphs: in particular, the deletion of a leaf from a graph that has the property will lead to a violation of the property. In our first result, we explicitly establish the NP-hardness of this problem by reducing from a variant of the Independent Set problem.

In addition, we also consider the edge deletion version of the question above. Recall that for a given connected graph on vertices and edges, deleting a smallest subset of edges to obtain a tree is straightforward: it is clear that we have to remove every edge that does not belong to a spanning tree, so the size of the solution is always . In fact, this problem can be solved in polynomial time even when the edges have weights and we seek a subset of edges of smallest total weight, whose removal results in a tree. It is straightforward to see that any such solution is the complement of a maximum spanning tree and thus, can be found in polynomial time.

In a somewhat surprising twist, we show that the problem of deleting a subset of edges of minimum total weight to obtain a full binary tree is, in fact, \(\mathsf {NP}\)-complete. To establish some intuition for why this is true, we briefly sketch a simple reduction from the problem of Exact Cover by 3-Sets to the closely related problem of deleting edges to obtain a full ternary tree.

A ternary tree is a tree where every non-leaf vertex, except the root, is exactly of degree four, while the root has degree three. Let be a family of sets of size three over the universe . The goal is to find a subfamily of disjoint sets whose union is . We create a full ternary tree with leaves labeled , and set the weight of the edges of to , a quantity that we will specify later. Then, we introduce for every element in the universe a vertex that is adjacent to , if and only if . The edges between the leaves of and the vertices corresponding to the elements of have unit weights. We also set . It is easy to verify that this graph has a solution of cost if and only if the system has an exact cover, as desired.

It turns out that establishing the hardness of the problem of deleting to full binary trees is non-trivial, and this is one of our main contributions. We reduce from a fairly restrained version of the Satisfiability problem, the hardness of which is inspired by a reduction in [1] and is of independent interest. We note that we deal with the weighted versions of the problems considered, and we also fix a choice of root vertex as part of the input. Finally, we also note that both the problems we propose above are fixed-parameter tractable, when parameterized by the solution size. To this end, we describe a natural branching algorithm and remark that most preprocessing rules that work in a straightforward manner for Feedback Vertex Set fail when applied as-is to our problem. In particular, it is not trivial to delete degree-one vertices or short-circuit vertices of degree two.

We believe that the problem we propose and the study we undertake has considerable practical motivation. One of the applications of FVS and related problems is to understand noisy datasets. For example, let us say that we expect the data to have a certain structure, but errors in the measurement cause the data at hand not to have the properties expected by said structures. In this context, one approach will be to identify and eliminate the noise - for acyclic structures, that could translate identifying a FVS of small cost. Therefore, for scenarios where the data corresponds to full binary trees, for instances in the case of phylogenetic trees, the problem we present here will be a more relevant model.

2 Preliminaries

We follow standard notation and terminology from parameterized complexity [2] and graph theory [4]; we use to denote the set . We now turn to the definitions of the problems that we consider.

figure a

The problems of Full Binary Tree Deletion by Edges (FBT-DE), Complete Binary Tree Deletion (by edges or vertices) and Binary Tree Deletion (by edges or vertices) can be defined analogously. Our focus in this contribution will be mainly on FBT-DV and FBT-DE.

The Multi-Colored Independent Set problem is the following.

figure b

3 NP-hardness

In this section, we establish that the problems of deleting to full binary trees via vertices or edges are \(\mathsf {NP}\)-complete. We first describe the hardness for the vertex-deletion variant.

Theorem 1

FBT-DV is \(\mathsf {NP}\)-complete.

Proof

We reduce from Multi-Colored Independent Set [2, Corollary 13.8]. Let be an instance of MCIS where and further, let denote the partition of the vertex set . We assume, without loss of generality, that for all . Specifically, we denote the vertices of by . We are now ready to describe the reduced instance of FBT-DV, which we denote by .

To begin with, let be a complete binary tree with leaves, where a complete binary tree is a full binary tree with vertices at distance from the root for all , where the distance between the root and the leaf furthest away from the root. We denote these leaf vertices as:

where, for all and , and are siblings, and their parent is denoted by . We refer to this as the backbone, to which we will now add more vertices and edges.

For each and , we now introduce a third child of , which we denote by . We refer to the ’s as the essential vertices, while its siblings (the ’s and the ’s) are called partners. For all , we also introduce two guards, denoted by and , which are adjacent to all the essential vertices of type , that is, all for . Finally, we ensure that the graph induced on the essential vertices is a copy of , more precisely, we have:

We set . This completes the construction. We now turn to a proof of equivalence.

The Forward Direction. If is a multi-colored independent set, then consider the subset given by all the essential vertices corresponding to , along with the partner vertices for each , for which belongs to . It is easy to verify that the proposed set consists of vertices. Observe that the deletion of leaves us with a full binary tree where each now has two children - either two partner vertices (for vertices not in ) or one essential vertex along with one partner vertex (for vertices in ). Further, each pair of guards of type now has an unique parent, which is the essential vertex corresponding to the vertex given by . The essential vertices have degree exactly three because their only other neighbors in were essential vertices corresponding to neighbors in , but the presence of any such vertex in will contradict the fact that induces an independent set in . This concludes the argument in the forward direction.

The Reverse Direction. Let be a subset of such that is a full binary tree. We claim that , since the deletion of any parent of a partner vertex will result in the corresponding partner vertex becoming isolated in — which leads to a contradiction when we account for the budget constraint on . Since all the parents of partner vertices survive and have degree four in , it follows that at least one of its neighbors must belong to . In particular, we claim that for every and , . Indeed, if this is not the case, then contains the parent of , and it is easy to verify that this leads to a situation where either is disconnected or one of the guard vertices has degree two and is not the root, contradicting the assumption that is a full binary tree.

From the discussion above, it is clear that picks at least vertices of type for each , and combined with the fact that , we note that does not contain any of the guard vertices. Our next claim is that for all , contains at least one essential vertex of type . If not, then contains all the neighbors of the guards of type , which makes them isolated in –a contradiction.

For each , consider the vertex in corresponding to the essential vertex that is not chosen by (in the event that there are multiple such vertices, we pick one arbitrarily). We denote this collection of vertices by . We claim that induces an independent set in : indeed, if not, then any edge in is also present in and creates a cycle when combined with the unique path connecting its endpoints via the backbone, which is again a contradiction. This concludes the proof.   \(\square \)

We now turn our attention to the edge-deletion variant. Here, we will find it convenient to reduce from a structured version of exact satisfiability, where the occurrences of the variables are bounded in frequency and also controlled in terms of how they appear. We will turn to a formal description in a moment, noting that here our reduction is similar to the one used to show that Linear-SAT is \(\mathsf {NP}\)-complete [1].

Theorem 2

FBT-DE is \(\mathsf {NP}\)-complete.

We first describe the version of Satisfiability that we will reduce from. Our instance consists of clauses which we will typically denote as follows:

We refer to the first clauses as the core clauses, and the remaining clauses as the auxiliary clauses. The core clauses have two literals each, and also enjoy the following structure:

We refer to the ’s as the main variables and the remaining variables that appear among the core clauses as shadow variables. The shadow variables occur exactly once, and have negative polarity among the core clauses. Therefore, using to denote the set of literals occurring amongst a subset of clauses, we have:

The auxiliary clauses have the property that they only contain the shadow variables, which occur exactly once amongst them with positive polarity. Also, every auxiliary clause contains exactly four literals. Note that this also implies, by a double-counting argument, that . We say that a collection of clauses is a chain if it has all the properties described above. An instance of Linear Near-Exact Satisfiability (LNES) is the following: given a set of clauses that constitute a chain, is there an assignment \(\tau \) of truth values to the variables such that exactly one literal in every core clause and two literals in every auxiliary clause evaluate to true under \(\tau \)?

For ease of discussion, given an assignment of truth values \(\tau \) we often use the phrase “\(\tau \) satisfies a literal” to mean that the literal in question evaluates to true under \(\tau \). For instance, the question from the previous paragraph seeks an assignment \(\tau \) that satisfies exactly one literal in every core clause and two literals in every auxiliary clause. We also refer to such an assignment as a near-exact satisfying assignment. The following observation is a direct consequence of the definitions above.

Proposition 1

Let be a collection of clauses that form a chain. For any assignment of truth values \(\tau \), the main variables satisfy exactly two core clauses and the shadow variables satisfy either one core clause or one auxiliary clause.

We first establish that LNES is \(\mathsf {NP}\)-complete:

Lemma 1

Linear Near-Exact Satisfiability is \(\mathsf {NP}\)-complete.

Proof

We reduce from (2/2/4)-SAT, which is the variant of Satisfiability where every clause has four literals and every literal occurs exactly twice — in other words, every variable occurs in exactly two clauses with positive polarity and in exactly two clauses with negative polarity. The question is, if there exists an assignment \(\tau \) of truth values to the variables under which exactly two literals in every clause evaluate to true. This problem is known to be \(\mathsf {NP}\)-complete [11].

Let be a (2/2/4)-SAT instance over the variables and clauses . For every variable , we introduce four new variables: and . We replace the two positive occurrences of with and , and the two negated occurrences of with and . We abuse notation and continue to use to denote the modified clauses. Also, introduce the clauses:  for all . Note that these collection of clauses form a chain, as required. We use to refer to this formula. We now turn to the argument for equivalence.

In the forward direction, let \(\tau \) be an assignment that sets exactly two literals of every clause in to true. Consider the assignment \(\zeta \) given by:

for all . It is straightforward to verify that \(\zeta \) satisfies exactly one literal in every core clause and exactly two literals in every auxiliary clause.

In the reverse direction, let \(\zeta \) be an assignment for the variables of that satisfies exactly one literal in every core clause and exactly two literals in every auxiliary clause. Define \(\tau \) as the restriction of \(\zeta \) on the main variables. Let be a clause in . To see that \(\tau \) satisfies exactly two literals of , note that the following:

is forced by the requirement that \(\zeta \) must satisfy exactly one literal in each core clause. Therefore, if \(\tau \) satisfies more or less than two literals of any clause , then that behavior will be reflected exactly in the auxiliary clause corresponding to , which would contradict the assumed behavior of \(\zeta \). We make this explicit with an example for the sake of exposition. Let from be the clause , and let the clause constructed in be . Suppose and . Then we have and , while . This demonstrates that \(\zeta \) satisfies three literals in the auxiliary clause corresponding to , in one-to-one correspondence with the literals that are satisfied by \(\tau \). This completes our argument. \(\square \)

We now turn to a proof of Theorem 2. The overall approach is the following. We will introduce a complete binary tree whose leaves will be used to represent variables using variable gadgets which will have obstructions that can be removed in a fixed number of ways, each of which corresponds to a “signal” for whether the variable is to be set to true or false. We will then introduce vertices corresponding to clauses that will be attached to the variable gadgets in such a way that they can only be “absorbed” into the rest of the structure precisely when exactly two of its literals are receiving a signal indicating that they are being satisfied.

Fig. 1.
figure 1

The gadget corresponding to the shadow variables.

The Shadow Variables. An instance of the gadget that we construct for the shadow variables is depicted in Fig. 1. We remark here that the notation used for the vertices here is to enable our discussion of how the gadget works and is not to be confused with the notation already used to denote the variables and clauses of the LNES instance.

The vertices and are called the anchors of the gadget, while the vertices and are called the drivers of the gadget. This is because, as we will see, the behavior of the edges incident to these vertices determines the fate of the variable—in terms of whether it “accepts” the vertex corresponding to the core clause or the auxiliary clause to which it belongs. We refer to the vertex in the gadget as the negative point of entry, while the vertex is called the positive point of entry.

We refer to the edges incident on the vertices and as active edges and the remaining edges (i.e, and ) as passive edges. We say that a solution is nice if it does not contain any passive edges. We also say that an instance of FBT-DE contains a clean copy of the gadget if appears in as an induced subgraph, and further, while , and none of the vertices of are chosen to be the target root vertex. We make the following observation about the behavior of this gadget.

Claim

Let be a vertex gadget for a shadow variable as defined above. Let be an instance of FBT-DE that contains a clean copy of . Then, any nice solution contains exactly four edges among the edges of .

Proof

Let denote the set of active edges in . Since , we claim that any solution must delete exactly four edges from : in particular, contains exactly one of the edges incident to each of the vertices. Indeed, if deletes fewer edges than suggested then contains a degree two vertex different from the root, which is a contradiction. On the other hand, if contains more than four edges from , then at least one of these four vertices is isolated in , which contradicts our assumption that is connected. This clearly implies the claim, since all edges not considered are passive and a nice solution does not contain these edges by definition. \(\square \)

We now analyze the possible behaviors of a solution localized to the gadget in greater detail. We refer the reader to the full version of this paper for the figures associated with this explanation.

The possibilities and do not arise because employing these deletions causes the entry point vertices to have degree four or more in . Further, since the solution does not involve any of the passive edges, then we also rule out the following possibilities, since they all lead to a situation where the degree of is four or more in :

figure c

Recalling that when makes a clean appearance in , we also safely rule out the possibilities: , , , . Note that they result in a situation where the degree of is exactly two in — since is not the target root vertex, this is a contradiction as well.

Observe that, given a nice solution , in all the valid scenarios possible, either and , or and . We say that a shadow variable gadget has a negative signal in solutions where . Similarly, we say that the gadget has a positive signal in the situations where . We refer to the edges as the negative witness and the edges as the positive witness for the shadow variable gadgets. This concludes the description of the gadget meant for shadow variables.

The Main Variables. We now turn our attention to the gadget corresponding to the main variables. Here, we find it convenient to incorporate vertices representing the core clauses that the main variables belong to also as a part of the gadget. The construction of the gadget is depicted in Fig. 2. As before, the notation used for the vertices here is to enable our discussion of how the gadget works. With the exception of , which indeed are meaningfully associated with the analogously named core clauses, the notation is not to be confused with the notation already used to denote the variables and clauses of the LNES instance.

The edges and are the passive edges of this gadget, while the remaining edges are active. The vertex is called the anchor of this gadget. As before, a solution is nice if it does not contain any of the passive edges. We say that an instance of FBT-DE has a clean copy of if appears in as an induced subgraph and, further, , , , , and none of the vertices of are chosen to be the target root vertex.

We reflect briefly on the nature of a nice solution in instances that have a clean copy of a main variable gadget . First, since and is not the target root vertex, we note that exactly one of or must belong to . Suppose . The removal of makes a vertex of degree two, and since is a passive edge and is nice, is forced. Along similar lines, we have that . Now, we argue that . Indeed, if , then has degree two from the deletions so far, and is a degree-one vertex with as its sole neighbor. Recalling that is not the target root vertex, we are now forced to delete exactly one of the endpoints incident on , but both possibilities lead us to a disconnected graph. Therefore, . It is easy to see that this forces and further, . A symmetric line of reasoning shows that if , then and are all in as well. We refer the reader to the full version of this paper for the figures associated with this explanation. These two scenarios motivate the definitions of positive and negative signals that we now make explicit.

With respect to a nice solution , we say that a main variable gadget has a negative signal if . Likewise, we say that the gadget has a positive signal if . We will also refer to the set of edges as the positive witness of this gadget, and the negative witness is defined analogously.

Fig. 2.
figure 2

The gadget corresponding to the main variables.

We are now ready to discuss the overall construction. Let be an instance of LNES with clauses given by:

where the main variable common to and is denoted by and the auxiliary variables in these two clauses are denoted by and , while the auxiliary variables in the clauses and are denoted and . We denote by the LNES instance that we will now construct based on .

First, we construct the smallest complete binary tree with at least leaves and let be the root of this tree. We refer to this tree as the backbone of . Let the first leaves of this tree be denoted by . For each main variable , let be the corresponding gadget. We identify the anchor of with . For each shadow variable, we introduce a gadget corresponding to it, and identify the first anchor vertex in the gadget with and the second anchor vertex with . For every core clause , we add an edge between the vertex in the gadget corresponding to and the negative entry point in the gadget for the shadow variable contained in the clause . We also do this in an analogous fashion for the core clauses , and .

Finally, for each auxiliary clause , we introduce two vertices and . We connect these vertices with the positive entry point into all gadgets corresponding to the shadow variables that belong to the clause . Note that each of these vertices have degree four. This completes the description of the construction of the graph . We note that all the gadgets present in are clean by construction. We now define the weight of every edge in the backbone and every passive edge in the gadgets as , while the weights of the remaining edges are set to be one. Finally, we set and let —this concludes the description of the instance . We defer the argument of the equivalence of the instances to the full version of this paper.

4 FPT Algorithms

We observe that the problems considered here, namely FBT-DV and FBT-DE are fixed-parameter tractable by the standard parameter. We briefly describe a natural branching algorithm for FBT-DV while noting that an analogous argument works for FBT-DE.

Let be an instance of FBT-DV. First, consider a vertex , different from the designated root, that has four or more neighbors. Choose any four neighbors of , say , and branch on the set and we adjust the remaining budget by subtracting the respective weights of these vertices. The exhaustiveness of this branching rule follows immediately from the definition of a full binary tree. Along similar lines, we can also branch on the designated root along with three of its neighbors at a time, if the root has degree three or more, and also the closed neighborhoods of vertices of degree exactly two. We abort any branches where we have exhausted the budget.

We say that a graph with a designated root vertex is nice if it is connected, its maximum degree is three and the root is only vertex of degree two. Note that the depth of the branching thus far is bounded by , and we branch appropriately on disconnected instances, noting that only one of the components can “survive”. Also, note that all the remaining instances are nice. If any of the remaining graphs are also acyclic, then we are already done.

If not, then we branch on these graphs further as follows. We pre-process vertices of degree three with a degree one neighbor by employing an appropriate short-circuiting rule. We can then start a breadth-first search (BFS) from the root vertex, noting that the depth of the BFS tree is at most , since every internal vertex in this tree has at least two children. Therefore, we may infer that there exists a cycle of length , and we branch on all the vertices of this cycle other than the root vertex. If the deletion of a vertex on the cycle leads to a disconnected graph, then we abort the corresponding branch. Similarly, if the deletion of a vertex on the cycle creates vertices of degree two in the resulting graph, then we branch on the closed neighborhood of these degree two vertices and discard any disconnected graphs until we arrive at a nice instance, at which point we recurse in the fashion described here. The correctness of the overall algorithm follows from the exhaustiveness of the branching rules. The fact that the running time is FPT follows by a well-known argument [9].

Theorem 3

The problems FBT-DV and FBT-DE are \(\mathsf {FPT}\) with respect to solution size.