1 Introduction

In clustering problems, there is a set of elements and a value of similarity between each pair of elements in it. The goal is to partition these elements into distinct groups based on their similarities. This problem has wide-ranging applications in various fields, such as machine learning [1] and computational biology [2]. One effective approach to model this problem is by constructing a similarity graph, where the elements form the vertices and edges exist only if the similarity between two vertices exceeds a specified threshold. Therefore, the clustering problem transforms into partitioning the similarity graph into parts where each part induces a complete graph. Occasionally, the given similarity data may deviate slightly from the actual clustering model due to noise or other factors, resulting in improper no-instance. For such cases, modifications are permitted within the clustering problem, leading to cluster graph modification problems. Depending on the type of modifications allowed, different problems can be defined. If we can only delete edges to create a cluster, i.e., a graph where each connected component forms a complete graph, the problem is referred to as Cluster Edge Deletion [7, 8, 10, 11, 15]. If both deleting and adding edges are allowed to form a cluster, the problem is known as Cluster Editing [3, 4, 7, 10, 11]. When solely the deletion of vertices is allowed, the problem becomes Cluster Vertex Deletion (CVD) [5, 6, 10, 12, 14, 16]. These problems have been extensively examined in the literature. In this paper, we focus on studying parameterized and exact algorithms for CVD.

Table 1. Parameterized algorithms for Cluster Vertex Deletion

In the Cluster Vertex Deletion (CVD) problem, we are given a graph \(G=(V,E)\) and an integer k. The goal is to determine whether there is a set of vertices S of size at most k such that after deleting S from G the remaining graph is a cluster. One important property is that a graph is a cluster if and only if there is no induced 3-path (i.e., a path with three vertices such that there is no edge between the first and the last vertices) in the graph. In other words, induced 3-path is the forbidden graph of a cluster. By using this property and applying the general branching algorithm of Cai [6], we can get an \(O^*(3^k)\)-time branching algorithm for this problem. Later, Gramm et al. [10] improved this result to \(O^*(2.26^k)\) by using automated generation techniques for search trees. Since hitting all induced 3-paths is a special case of the 3-hitting problem, the \(O^*(2.076^k)\)-time algorithm for the 3-hitting problem [16] directly implies the same complexity for CVD. In 2010, Hüffer et al. [12] improved the runtime to \(O^*(2^k)\) using the iterative compression method. However, the iterative compression method naturally provides a lower bound of \(2^k\) in complexity, limiting further improvement. The next breakthrough came in 2016 when Boral et al. [5] proposed a novel idea of selecting a vertex v and considering vertex sets that hit all induced 3-paths containing v. They constructed an auxiliary graph and used a Python script, similar to the one in [10], to automate the generation of search trees, resulting in an improved \(O^*(1.911^k)\)-time algorithm. Tsur [14] further refined the algorithm, achieving the current best runtime of \(O^*(1.811^k)\).

In this paper, we further improve the running time bound to \(O^*(1.7549^k)\) for CVD. To get this result, we first show that the problem can be solved in \(O^*(1.7485^k)\) when the input graph has maximum degree at most 4. After that, we adopt the technique of the auxiliary graph \(H^v\) and a method to automatically generate branching vectors used in [5, 14]. Since the degree of the vertex considered is at least 5, we have more vertices in the local structure and the previous bottlenecks can be avoided. The history of parameterized algorithms for CVD was listed in Table 1. We also remark that by using the general approach to obtain exact algorithms via fast parameterized algorithms in [9], we can also solve CVD in \(O(1.4302^n)\) time, improving all previous results.

One of the most important contributions in this paper is to obtain the improved result for the problem in degree-4 graphs. For vertices of degree \(\ge 5\), we just modify previous algorithms to get our desired result. Due to the space limitation, we mainly introduce our algorithm for degree-4 graphs in the main body and the part for high-degree vertices and general graphs can be found in the full version of this paper.

2 Notations

Let \(G=(V,E)\) denote an undirected graph. For a vertex \(v \in V\), we use \(N_G(v)\) to represent the set of neighbors of v in graph G, i.e., vertices adjacent to v in G. Define \(deg_G(v)=|N_G(v)|\) as the degree of the vertex v. For a subset \(V'\) of vertices, let \(N_G(V')=(\cup _{v \in V'} N_G(v)) \setminus V'\) and \(N_G[V']=N_G(V')\cup V'\). We also use \(N_G^2(v)=N_G(N_G(v))\setminus \{v\}\) to represent the vertices with distance exactly 2 to vertex v. In the above notations, we may omit the subscript G if the graph G is clear from the context. A singleton \(\{v\}\) may be simply written as v. For a graph \(G'\), we use \(V(G')\) and \(E(G')\) to denote the set of vertices and the set of edges in \(G'\), respectively. For \(V' \subseteq V\) and \(E'\subseteq E\), a graph \(G'=(V',E')\) is called an induced subgraph of G if any edge (xy) in E is in \(E'\) iff \(x\in V'\) and \(y \in V'\). Let \(G[V']\) denote the induced subgraph of G with vertices \(V'\) and \(G-V'\) denote \(G[V\setminus V']\).

A vertex set S of a graph G is called a vertex cover if for any edge in G there is at least one endpoint in S. A 3-path is a path of 3 vertices. A 3-path (uvw) is called an induced 3-path if there is no edge between u and w. A complete graph is also called a clique. A graph is called a cluster if each connected component is a clique. The Cluster Vertex Deletion problem (CVD) is to check whether we can delete at most k vertices from a graph to make the remaining graph a cluster. An instance of CVD is denoted by (Gk). A subset S of vertices is called a CVD-set if \(G-S\) is a cluster. A CVD-set is also called a solution set to CVD and a minimum CVD-set is called an optimal solution set to CVD. The following forbidden-graph property has been frequently used in algorithms for CVD [5, 6, 10, 12, 14, 16].

Lemma 1

A graph is a cluster iff the graph contains no induced 3-path.

3 Properties for Algorithm of Degree-4 Graph

3.1 The CVD-Dominating Family

Definition 1

Let \(V'\) be a subset of vertices of the graph G. Define \(\varDelta _G(V')= N_G(V\setminus V')\cap V'\), i.e., the set of vertices in \(V'\) adjacent to some vertices not in \(V'\). Let T be a CVD-set of \(G[V']\). Define \(\varDelta _{G}(V',T)\) as the set of vertices of cliques in \(G[V'-T]\) containing some vertices in \(\varDelta _G(V')\). See Fig. 1 for an illustration of these concepts.

Fig. 1.
figure 1

An example for Definition 1.

In Fig. 1, vertices in \(\varDelta _G(V')\) are bold. A CVD-set T of \(G[V^{\prime }]\) is \(\{c\}\). After removing T from \(V^{\prime }\), there are 3 cliques and the vertices in \(\varDelta _{G}(V',T)\) are all marked as gray vertices.

Definition 2

Let \(V'\) be a subset of vertices of the graph G. For two CVD-sets X and Y of induced graph \(G[V']\), we say that Y dominates X under \(G[V']\) if

$$|\varDelta _G(V',Y) \setminus \varDelta _G(V',X)| \le |X| - |Y|.$$

Lemma 2

Let \(V'\subseteq V\) be a vertex subset of G, and X and Y be two CVD-sets of \(G[V']\) such that Y dominates X under \(G[V']\). If there is a CVD-set S of G containing X then there is a CVD-set \(S'\) of G containing Y such that \(|S'|\le |S|\).

Proof

Let S be a CVD-set of G containing X. Let \(Y'=Y \cup (\varDelta _G(V',Y) \setminus \varDelta _G(V',X))\). We show that \(S' = (S \setminus X) \cup Y' \) is a CVD-set of G containing Y such that \(|S'|\le |S|\).

Since \(Y \subseteq Y' \subseteq S'\), we know that \(S'\) contains Y. Next, we prove \(G-S'\) is a cluster. Assume to the contrary that \(G-S'\) is not a cluster. By Lemma 1, we know there is an induced 3-path P in graph \(G-S'\).

Since Y is a CVD-set of \(G[V']\) and \(Y\subseteq S'\), we know that \(G[V'\setminus S']\) is a cluster. Then the induced 3-path P must contain at least one vertex in \(V\setminus V'\). Thus, all the three vertices of P must be in \((V\setminus (V'\cup S')) \cup \varDelta _G(V',Y')\), which is a superset of \((V\setminus (V'\cup S')) \cup \varDelta _G(V',X)\), by the definition of \(Y'\). This implies that the induced 3-path P also exists in \(G-S\), a contradiction. So \(G-S'\) is a cluster. Last, we consider the size of \(S'\). We can see that \(|S'| = |(S \setminus X) \cup Y'| = |(S \setminus X) \cup Y \cup (\varDelta _G(V',Y) \setminus \varDelta _G(V',X))| \le |S| - |X| + |Y| + |\varDelta _G(V',Y) \setminus \varDelta _G(V',X)| \le |S|\). So \(S'\) is a satisfied set.    \(\square \)

Definition 3

A CVD-dominating family \(\mathcal {F}_{G}(V')\) is a set of CVD-sets of \(G[V']\) such that for every CVD-set X of \(G[V']\), there is a CVD-set in \(\mathcal {F}_{G}(V')\) dominating X under \(G[V^{\prime }]\).

Based on Definition 3, to search for a solution, instead of listing all CVD-sets of \(V'\), we can only consider a CVD-dominating family.

Definition 4

A vertex subset \(V'\subseteq V\) is called an important set if there is a minimum CVD-set C of induced subgraph \(G[V']\) such that \(\varDelta _G(V')\subseteq C\). The set C is called an important cut.

Corollary 1 (of Lemma 2)

Let \(V'\subseteq V\) be an important set with an important cut C. There is a minimum CVD-set of G containing C.

3.2 Core Branching Processing

We introduce a technique to design branching rules. The idea is to relax the concept of important sets. For an instance (Gk), we consider a set of vertices C that induces a clique. Let S be a CVD-set of (Gk). Since \(G - S\) is a cluster, we know that \(C \cap (V - S)\) is either empty or forms a clique. When designing a branching rule, we can only consider all possible cliques in \(G-S\) intersecting with C. Furthermore, we can also reduce some dominated cases. This processing to generate branching rules is called core branching processing and the selected set C is called the core clique.

Definition 5

Given an reduced instance (Gk), the core branching processing will produce a branching rule by the following steps:

1. Choose an induced clique of G as the core clique C;

2. Let \(\mathcal {T}\) be the set of cliques intersected with C and \(\mathcal {C}=\{C\}\cup \{N(T)| T\in \mathcal {T}\}\);

3. If there exist \(T_1\) and \(T_2 \in \mathcal {T}\) such that \( |N(T_1)| \ge |N(T_2)|\) and \(N[T_1] \subseteq N[T_2]\), remove \(N(T_1)\) from \(\mathcal {C}\);

4. If there exist \(T_1 \in \mathcal {T}\) such that \( |N(T_1)| \ge |C|\) and \(N[T_1] \subseteq C\)(resp., \( |C| \ge |N(T_1)|\) and \(C \subseteq N[T_1]\)), remove \(N(T_1)\)(resp., C) from \(\mathcal {C}\).

The correctness of Step 3 of the core branching processing is based on the fact that if there is a CVD-set of G containing \(N(T_1)\), then there must be a CVD-set containing \(N(T_2)\). Here we use the concept of dominating. Let \(V'=N[T_2]\). We know that \(\varDelta _G(V') \subseteq N(T_2)\). Assume that S is a CVD-set of G containing \(N(T_1)\). Let \(X = S \cap V'\). Then S is also a CVD-set of \(V'\). Since \(N[T_1] \subseteq N[T_2]\), we know that \(|X| \ge |N(T_1)| \ge |N(T_2)|\). By Definition 2 and Lemma 2, we know that \(N(T_2)\) dominates X and there must be a CVD-set containing \(N(T_2)\).

4 Reduction Rules

We first introduce some reduction rules that will be applied in our algorithms to reduce the instance in polynomial time.

R-Rule 1

If \(k<0\), return ‘no’ to indicate that the instance is a no-instance.

R-Rule 2

If the graph is an empty graph, return ‘yes’ to indicate that the instance is a yes-instance.

R-Rule 3

If there is a connected component of G that is a clique, delete it from the graph.

R-Rule 4

If there is a connected component C of G and a vertex \(u \in C\) such that \(G[C] - u\) is a cluster, delete u from the graph and decrease k by 1.

R-Rule 5

If there is an important set \(V'\subseteq V\) with an important cut C such that \(|C|\le 10\), then return the instance \((G-C, k-|C|)\).

In R-Rule 5, we require that \(|C|\le 10\). So the important sets \(V'\) and important cuts C can be found in polynomial time if they exist. We mention a special case of the important sets that will be reduced by this rule. Let u be a degree-2 vertex with one degree-1 neighbor v and another neighbor w. Then \(\{u,v,w\}\) is an important set satisfying the condition in R-Rule 5. We may delete w directly from the graph.

5 The Algorithm for Degree-4 Graphs

First of all, we present our algorithm for the case that the input graph is a graph with maximum degree at most 4. In this section, we mainly use the concept of the CVD-dominating family and the method of core branching processing. The algorithm contains the following seven main steps. When introduce one step, we will assume all previous steps can not be applied.

  1. 1.

    Iteratively apply the above five reduction rules to get a reduced instance.

  2. 2.

    If there is a degree-1 vertex v, execute the branching operations in Sect. 5.1.

  3. 3.

    If there is a 4-clique, execute the branching operations in Sect. 5.2.

  4. 4.

    If there is a degree-4 vertex v, execute the branching operations in Sect. 5.3.

  5. 5.

    If there are two triangles sharing one edge, execute the branching operations in Sect. 5.4.

  6. 6.

    If there is a triangle, execute the branching operations in Sect. 5.5.

  7. 7.

    Reduce the problem to the 3-path vertex cover problem.

5.1 Degree-1 Vertices

Assume there is a degree-1 vertex u. We let v denote the unique neighbor of u. We know that v is a vertex of degree at least three otherwise if v is of degree 2 then R-Rule 5 would be applied on the important set N[v] and if v is of degree 1 then R-Rule 3 would be applied. We pick \(\{u\}\) as the core clique and apply the core branching processing. The set of intersecting cliques is \(\mathcal {T}=\{\emptyset , \{u\}, \{u,v\}\}\) and thus we will get three initial branches: (1) deleting \(\{u\}\) and decreasing k by 1; (2) deleting \(\{v\}\) and decreasing k by 1; (3) deleting N(uv) and decreasing k by \(|N(u,v)|\ge 2\). However, the first branch is dominated by the second branch and then the first branch will be removed according to the last step of the core branching processing. Thus, we get a branching vector at least (1, 2).

Fig. 2.
figure 2

The cases with 4-cliques

5.2 4-Cliques

Assume there is a 4-clique \(\{v,u,w,t\}\). We show all possible configurations of the clique in Fig. 2, where dark vertices mean all neighbors of them are already drawn on the figure and a dotted edge incident on a vertex means there may be some further edges incident on this vertex. Note that a vertex has degree at most 4. Each vertex in \(\{v,u,w,t\}\) can be adjacent to at most one vertex not in this set. Thus \(|N(v,u,w,t)|\le 4\). Let \(N_c=|N(v,u,w,t)|\). The cases in Fig. 2 are listed according to the value of \(N_c\). In Case 1, \(N_c=0\); In Cases 2–5, \(N_c=1\); In Cases 6–9, \(N_c=2\); In Cases 10–11, \(N_c=3\); In Case 12, \(N_c=4\). For Cases 1 and 5, the graph is a component of clique and it would be reduced by R-Rule 3. For Cases 2–4, \(\{v,u,w,t, a\}\) is an important set with the important cut \(\{a\}\) and it would be reduced by R-Rule 5. For Cases 6, 7 and 9, \(\{v,u,w,t, a, b\}\) is an important set with the important cut \(\{a,b\}\) and it would be reduced by R-Rule 5. For Case 8, we consider the subgraph \(G'\) induced by \(\{v,u,w,t, a, b\}\). All CVD-sets in \(G'\) of size at least 2 are dominated by \(\{a,b\}\) and there is only one CVD-set \(\{u\}\) of size less than 2 in \(G'\). Thus, we branch into two branches: deleting \(\{a,b\}\) and decreasing k by 2; deleting \(\{u\}\) and decreasing k by 1. We get a branching vector (2, 1). For Case 10, \(\{v,u,w,t, a, b,c\}\) is an important set with the important cut \(\{a,b,c\}\) and it would be reduced by R-Rule 5. For Case 11, we consider the subgraph \(G'\) induced by \(\{v,u,w,t, a, b,c\}\). All CVD-sets in \(G'\) of size at least 3 are dominated by \(\{a,b,c\}\) and there is only one CVD-set \(\{u,w\}\) of size less than 3 in \(G'\). Thus, we branch into two branches: deleting \(\{a,b,c\}\) and decreasing k by 3; deleting \(\{u,w\}\) and decreasing k by 2. We get a branching vector (3, 2). For Case 12, we first consider the subgraph \(G'\) induced by \(\{v,u,w,t, a, b,c,d\}\). All CVD-sets in \(G'\) of size at least 4 are dominated by \(\{a,b,c,d\}\). Next, we assume that S is a CVD-set such that \(S'=S\cap \{v,u,w,t, a, b,c,d\}\) contains at most 3 vertices. Then, there are only four possibilities: \(S'=\{u,v,w\}, \{u,v,t\}, \{v,t,w\}\) or \(\{t,w,u\}\). We consider \(S'= \{u,v,w\}\). For this case, both t and d are left in a component of a clique of size 2 and thus N(td) is in S. After Sect. 4.1, there is no degree-1 vertex, and then d is adjacent to a vertex \(d'\ne t\). Furthermore, \(d'\not \in \{u,v,w\}\) since the graph is a degree-4 graph. For this case, \(\{u,v,w,d'\}\subseteq N(t,d)\) and at least \(|N(t,d)|\ge 4\) vertices can be deleted. We can decrease k by at least 4. By the same reason, for the cases \(S'=\{u,v,t\}, \{v,t,w\}\) and \(\{t,w,u\}\), we can also decrease k by at least 4. Thus, we can branch into five branches and in each branch the parameter k decreases by at least 4. We get a branching vector (4, 4, 4, 4, 4).

Among all the cases, the worst branching vector is (2, 1), the branching factor of which is 1.6182.

Fig. 3.
figure 3

The cases with 4-degree vertices

5.3 Degree-4 Vertices

Assume there is a degree-4 vertex u with four neighbors \(\{a,b,c,d\}\). Let \(E_l\) denote the set of edges with both endpoints in \(\{a,b,c,d\}\) and l be the size of \(E_l\). Note that the edges in \(E_l\) will not form a triangle, since there is no cliques of size 4 after Sect. 4.2. We consider different cases according to the value of l. All the cases are shown in Fig. 3.

Case 1. \(l=4\). Since there is no triangle, the four edges in \(E_l\) will form a cycle of size 4. Without loss of generality, we assume that \(E_l = \{(a,b) , (b,c) , (c,d) , (d,a)\}\). Let X be the set with the least neighbors among \(\{u,a,b\},\{u,b,c\},\{u,c,d\}\) and \(\{u,d,a\}\). If \(|N(X)|\le 3\), then N[X] is an important set with the important cut being N(X), because any CVD-set of the graph induced by N[X] will contain at least |N(X)| vertices. For this case, it will be reduced by R-Rule 5. Next, we assume that \(|N(X)|\ge 4\). Let \(a', b', c'\) and \(d'\) be the four neighbor of abc and d, respectively, where \(a'\ne b'\), \(b'\ne c'\), \(c'\ne d'\) and \(d' \ne a'\).

We choose \(\{u\}\) as the core clique and apply the core branching processing. The intersecting cliques are \(\emptyset \), \(\{u\}\), \(\{u,a\}\), \(\{u,b\}\), \(\{u,c\}\), \(\{u,d\}\), \(\{u,a,b\}\), \(\{u,b,c\}\), \(\{u,c,d\}\), and \(\{u,d,a\}\). We get initial branches of deleting \(\{u\}\), N(u), N(ua), N(ub), N(uc), N(ud), N(uab), N(ubc), N(ucd), and N(uda). The branches of deleting N(u), N(ua) and N(ub) are dominated by the branch of deleting N(uab). The branches of deleting N(uc) and N(ud) are dominated by the branch of deleting N(ucd). We can branch by deleting \(\{u\}\), N(uab), N(ubc), N(ucd) or N(uda), where |N(uab)|, |N(ubc)|, |N(ucd)| and \(|N(u,d,a)|=4\) since \(|N(X)|\ge 4\) and the maximum degree of the graph is 4. The branching vector is (1,4,4,4,4) with a branching factor 1.7485.

Case 2. \(l=3\). There are three edges in \(E_l\) and they form a path of length 3. Without loss of generality, we assume that \(E_l = \{(a,b) , (b,c) , (c,d)\}\). If \(deg(b)=deg(c)=3\), then \(\{u,a,b,c,d\}\) is an important set with an important cut \(\{a,d\}\), which would be reduced by R-Rule 5. Next, we assume that \(|N(u,b,c)|\ge 3\).

Case 2.1. Only one of \(\{b,c\}\) is of degree 4. We assume without loss of generality that \(deg(b) = 4\), \(deg(c) = 3\) and e is the fourth neighbor of b. We further distinguish two cases. If a is not adjacent to e or both of a and d are adjacent to e, then \(\{u,a,b,c,d,e\}\) is an important set with an important cut \(\{a,d,e\}\), which would be reduced by R-Rule 5. Next, we assume that a and e are adjacent but d and e are not adjacent. If d is of degree 2, then \(\{u,a,b,c,d\}\) is an important set with an important cut \(\{a,b\}\), which would be reduced by R-Rule 5. Thus, we assume that d is adjacent to a vertex f different from \(\{e,u,c\}\).

Let \(V'=\{u,a,b,c,d,e,f\}\) and \(G'=G[V']\). In \(G'\), any CVD-set of size at least three is dominated by \(\{a,d,e\}\) or \(\{a,b,f\}\). There is only one CVD-set of size less than three \(\{u,c\}\). We can branch by either deleting \(\{a,d,e\}\) or \(\{a, b, f\}\) or \(\{u,c\}\), and we get a branching vector (3, 3, 2).

Case 2.2. Both of \(\{b,c\}\) are degree-4 vertices. Let e be the fourth neighbor of b and f be the fourth neighbor of d. We know that \(e,f \not \in \{u,a,b,c,d\}\). First we show that \(e \ne f\). If \(e = f\), then e is not adjacent to either a or d. Otherwise, this case will be solved in Case 1. Thus, there is an important set \(V'=\{u,a,b,c,d,e\}\) with an important cut \(\{a,e,d\}\), which should be reduced by R-Rule 5. Next, we assume that \(e\ne f\).

If a is not adjacent to a vertex different from ub and e, then there is an important set \(\{u,a,b,c,d,e\}\) with an important cut \(\{c,d,e\}\). If d is not adjacent to a vertex different from uc and f, then there is an important set \(\{u,a,b,c,d,f\}\) with an important cut \(\{a,b,f\}\). Thus, we can assume that \(N(u,a,b), N(u,c,d)\ge 4\).

We choose \(\{u\}\) as the core clique and apply the core branching processing. The intersecting cliques are \(\emptyset \), \(\{u\}\), \(\{u,a\}\), \(\{u,b\}\), \(\{u,c\}\), \(\{u,d\}\), \(\{u,a,b\}\), \(\{u,b,c\}\), \(\{u,c,d\}\), and \(\{u,d,a\}\). The branches of deleting N(u), N(ub) and N(uc) are dominated by the branch of deleting N(ubc). The branch of deleting N(ua) is dominated by the branch of deleting N(uab). The branch of deleting N(ud) is dominated by the branch of deleting N(ucd). We can branch by deleting \(\{u\}\), N(uab), N(ubc) or N(ucd), where \(|N(u,a,b)|,|N(u,b,c)|,|N(u,c,d)|\ge 4\). The branching vector is (1, 4, 4, 4) with a branching factor 1.6851.

Case 3. \(l = 3\) and the three edges in \(E_l\) have a common endpoint. Without loss of generality, we assume that \(E_l = \{(a,b) , (b,c) , (b,d)\}\). Let \(X= \{u,a,b\},\{u,b,c\}\), or \(\{u,b,d\}\). If \(|N(X)|\le 2\), then N[X] is an important set with the important cut being N(X). It will be reduced by R-Rule 5. Next, we assume that \(|N(X)|\ge 3\). Let \(a', c'\) and \(d'\) be the three neighbors of ac and d, respectively.

We choose \(\{u\}\) as the core clique and apply the core branching processing. The intersecting cliques are \(\emptyset \), \(\{u\}\), \(\{u,a\}\), \(\{u,b\}\), \(\{u,c\}\), \(\{u,d\}\), \(\{u,a,b\}\), \(\{u,b,c\}\), and \(\{u,b,d\}\). We get initial branches of deleting \(\{u\}\), N(u), N(ua), N(ub), N(uc), N(ud), N(uab), N(ubc) and N(ubd). The branches of deleting N(u) and N(ua) are dominated by the branch of deleting N(uab). The branches of deleting N(uc) and N(ud) are dominated by the branches of deleting N(ubc) and N(ubd), respectively. The branch of deleteing N(ub) will be dominated by the branching deleting X if \(|N(X)| \le 3\). Notice that u and b are twins of each other and by Lemma 9 in [15], we can delete b at the same time if we delete u. We can branch by deleting \(\{u,b\}\), N(ub), N(uab), N(ubc) or N(ubd), where |N(uab)|, |N(ubc)| and \(|N(u,b,d)|=4\) or deleting \(\{u,b\}\), N(uab), N(ubc) or N(ubd), where |N(uab)|, |N(ubc)| or \(|N(u,b,d)| = 3\). The branching vector is (2,3,4,4,4) or (2,3,3,3) with a branching factor 1.6472 or 1.6717.

Case 4. \(l = 2\) and the two edges in \(E_l\) have a common endpoint. Without loss of generality, we assume that \(E_l = \{(a,b) , (b,c)\}\). After Sect. 5.1, there is no degree-1 vertex. We assume that d is adjacent to a vertex \(e\ne u\).

Case 4.1. \(deg(b) = 3\). For this case, neither a or c can be a degree-2 vertex, otherwise there would be an important set \(\{u, a,b,c,d\}\) with an important cut \(\{c,d\}\) or \(\{a,d\}\). Thus, we have \(|N(a,b,u)|\ge 3\) and \(|N(c,d,u)|\ge 3\). We choose \(\{b\}\) as the core clique and apply the core branching processing. The intersecting cliques are \(\emptyset \), \(\{b\}\), \(\{b,a\}\), \(\{b,u\}\), \(\{b,c\}\), \(\{b,a,u\}\) and \(\{b,c,u\}\). The branch of deleting N(b) and deleting N(ba) is dominated by the branch of deleting N(bau). The branch of deleting N(bc) is dominated by the branch of deleting N(bcu). After ignoring the branching of deleting N(b), N(ba) and N(bc), we may be able to get a branching vector (1, 3, 3, 3). However, it is not good enough. We further analyze several cases. If \(deg(a) = 3\) (resp., \(deg(c) = 3\)), then the branching of deleting N(bu) is dominated by the branch of deleting N(bua) (resp., N(buc)). For this case, we can branch with (1, 3, 3) at least, the branching factor of which is 1.6957. Otherwise, \(deg(a)\ge 4\) and \(deg(c) \ge 4\). The branching vector is at least (1, 3, 4, 4) with a branching factor 1.7254.

Case 4.2. \(deg(b) = 4\). Let f be the fourth neighbor of b. We choose \(\{u\}\) as the core clique and apply the core branching processing. The intersecting cliques are \(\emptyset \), \(\{u\}\), \(\{u,a\}\), \(\{u,b\}\), \(\{u,c\}\), \(\{u,d\}\), \(\{u,a,b\}\), and \(\{u,b,c\}\). The branch of deleting N(u) is dominated by the branch of deleting N(ub). The branch of deleting N(ua) is dominated by the branch of deleting N(uab). The branch of deleting N(uc) is dominated by the branch of deleting N(ubc). We only need to consider five branches of deleting \(\{u\}\), N(ub), N(ud), N(uab), and N(ubc). We further consider several subcases. If \(deg(a)=deg(c)=4\), we have \(|N(a,u,b)|=|N(b,u,c)|=5\) because a and b (resp., b and c) have no common neighbor other than u after Case 1. Then, we will get a branching vector (1, 4, 4, 5, 5) with a branching factor 1.6770. Next, we assume that either a or c is a vertex of degree at most 3. Then the branch of deleting N(ub) is dominated by the branch of deleting N(uab) or N(ucd). There are four branches left: deleting \(\{u\}\), N(ud), N(uab), and N(ubc). If \(deg(a) =3 \) or \(deg(c) =3\), we can get a branching vector (1, 4, 4, 3) with a branching factor 1.7254. The only left case is that \(deg(a) = deg(c) =2\). For this case, the branch of deleting N(ubc) is dominated by the branch of deleting N(uab). we get a branching vector (1, 4, 3) with a branching factor 1.6181.

Please see the details of analysis for Cases 5,6 and 7 in the full version of this paper.

Fig. 4.
figure 4

two triangles sharing an edge

5.4 Two Triangles Sharing One Edge

In this step, there is no vertex of degree \(\ge 4\). Assume there are two triangles \(\{u,w,v\}\) and \(\{t,w,v\}\) sharing one edge (wv). All possible configurations are shown in Fig. 4. Since the maximum degree is at most 3 now, we know that only u and t are possibly adjacent to a vertex out of \(\{u,v,w,t\}\). Case 1 is a component of 4 vertices. We can simply put u to the solution. For Case 2, let \(V'=\{u,v,w,t,a\}\) and \(G'=G[V']\). We can see that \(\{u\}\) is a CVD-set dominates all other CVD-sets in \(G'\). Thus, we can also simply put u to the solution. For Case 3, let \(V'=\{u,v,w,t,a\}\) and \(G'=G[V']\). We can see that \(\{a, u\}\) is a CVD-set dominates all other CVD-sets in \(G'\). Thus, we can also simply put \(\{a,u\}\) to the solution. For Case 4, let \(V'=\{u,v,w,t,a,b\}\) and \(G'=G[V']\). Any CVD-set in \(G'\) is dominated by either \(\{a,t\}\) or \(\{u,b\}\). We can branch with a branching vector (2, 2).

5.5 Triangles

Assume that there is still a triangle \(\{u,v,w\}\) in the graph. If \(|N(u,v,w)| \le 2\), then N[uvw] is an important set with an important cut N(uvw), which should be reduced by R-Rule 5. Thus, it holds that \(|N(u,v,w)|=3\). Note that after Sect. 5.4, no two triangles share an edge and there are exact three edges between \(\{u,v,w\}\) and \(\{u',v',w'\}\). Let \(\{u',v',w'\}=N(u,v,w)\), where there are three edges \((u,u')\), \((v,v')\), \((w,w')\) between \(\{u,v,w\}\) and \(\{u',v',w'\}\). We choose \(\{u,v,w\}\) as the core clique and apply the core branching processing. The intersecting cliques will be \(\emptyset \), \(\{u\}\), \(\{v\}\), \(\{w\}\), \(\{u,v\}\), \(\{u,w\}\), \(\{v,w\}\) ,\(\{u,u'\}\), \(\{v,v'\}\), \(\{w,w'\}\) and \(\{u,v,w\}\). Any branch will be dominated by one of the branches of deleting \(N(u,u'), N(v,v'), N(w,w')\) and \(\{u',v',w'\}\). Since there is no degree-1 vertex, we can get a branching vector at least (3, 3, 3, 3) with the branching factor 1.5875.

5.6 The Remaining Case

Now the graph has the maximum degree at most 3 and there is no triangle in it. CVD is equal to delete at most k vertices from the graph to make the each connected component containing at most 2 vertices. The latter problem is the 3-path vertex cover problem. We directly use the \(O^*(1.713^k)\)-time algorithm for the 3-path vertex cover problem in [13] to solve our problem.

We have considered all the cases and the worst branching factor is 1.7485.

Theorem 1

Cluster Vertex Deletion in graphs with maximum degree at most 4 can be solved in \(O^*(1.7485^k)\) time and polynomial space.

With this result, we can improve the running time bound for Cluster Vertex Deletion in general graphs to \(O^*(1.7549^k)\). The details can be found in the full version of this paper.

6 Conclusion

In this paper, we proposed an improved parameterized algorithm for the classic Cluster Vertex Deletion. Except using previous techniques such as the auxiliary graph, twins, and automated generation of searching trees, we introduce two new concepts: CVD-dominating family and core branching processing. A significant contribution of our work is to tackle the problem in low-degree graphs as the first step. Subsequently, we focus solely on vertices with high degrees, which potentially contain more valuable information. This allows us to design specific branching rules to achieve better branching factors. It is worth noting that even for graphs with maximum degree 4, our algorithm still requires some new techniques and deep analyses to get the desired improvement.