1 Introduction

Given a simple, undirected graph G=(V,E), let G[S] denote the subgraph induced by a subset of vertices SV. A graph property Π is said to be hereditary on induced subgraphs, if for any SV such that G[S] satisfies property Π, the deletion of any subset of vertices in G[S] does not produce a graph violating Π. Property Π is said to be nontrivial if it is true for a single-vertex graph and is not satisfied by every graph. A property is said to be interesting if there are arbitrarily large graphs satisfying Π. Given a fixed graph property Π, the the maximum Π problem seeks to find a largest order subset of vertices inducing a subgraph satisfying property Π. Clearly, the maximum Π problem is meaningful only if Π is nontrivial and interesting, therefore, we will assume that these properties are satisfied for all the considered problems without stating this explicitly. Yannakakis [50] obtained a general complexity result for such properties Π as stated in Theorem 1 (see also [28, 29, 51]).

Theorem 1

[50]

The maximum Π problem for nontrivial, interesting graph properties that are hereditary on induced subgraphs is NP-hard.

The broad scope of this result is evident from the following examples of Π that meet (what are henceforth referred to as) the Yannakakis conditions: clique, independent set, planar, outerplanar, perfect, bipartite, complete bipartite, acyclic, degree-constrained, interval, and chordal graphs among others. In particular, the proof of the above theorem relies heavily on properties of cliques and independent sets, which are defined next. A clique CV is a subset of vertices such that G[C] is complete, i.e., vertices of C are pairwise adjacent. An independent set is a subset of pairwise nonadjacent vertices in a graph. Clearly, C is a clique in G if and only if C is an independent set in the complement graph \(\bar{G}=(V,\bar{E})\).

A maximum clique (independent set) is a clique (independent set) of the largest cardinality in the graph, whose size is the clique number (independence number) of G denoted by ω(G) (α(G)). The maximum clique and the maximum independent set are well-known problems in combinatorial optimization which are NP-hard [22] and hard to approximate [24]. Weighted versions of these problems seek to find a maximum weight clique (independent set) in G=(V,E) where weights w(v),vV are assumed to be positive. The weight of a clique (independent set) is the sum of the weights of vertices in the corresponding set. Note that a maximum weight clique (independent set) need not be a maximum size clique (independent set) in the graph, but it will be maximal by inclusion since the weights are positive. An exact algorithm for the weighted problem can solve the unweighted problem (with unit weights), while the converse is not true. Cliques and independent sets have found applications in several areas such as coding theory [15, 4244], telecommunication [1, 26, 38], fault-diagnosis [23], biochemistry [16], portfolio selection [11, 12], and social network analysis [39] among others. For a survey of the maximum clique problem formulations, exact and heuristic algorithms, see [13].

Motivated by the wide applicability of Yannakakis theorem and the effectiveness of algorithms that can be classified as Russian Doll Search (RDS) approaches to the maximum clique problem [17, 35], this paper introduces a general algorithmic framework for solving the maximum weight Π problem to optimality for any graph property Π that meets the Yannakakis conditions in Sect. 2. This general framework argues for the necessity and sufficiency of Yannakakis conditions in admitting the extension of RDS to graph properties other than cliques. Section 3 discusses algorithm design and implementation issues that help tailor the general framework for specific graph properties, namely, s-plexes [40] and s-defective cliques [52]. These are two hereditary clique relaxations used in graph-based data mining applications, specifically in social and biological network analysis. The performance of the algorithms so derived for the maximum s-plex problem, and the maximum s-defective clique problem, is assessed through a computational study detailed in Sect. 4. The paper is concluded in Sect. 5, and the Appendix (Online Supplement) includes the complete set of numerical results.

2 The maximum weight Π problem and a general algorithm

Exact combinatorial algorithms for the maximum clique problem have been developed by Bron and Kerbosch [14], Balas and Yu [6], Applegate and Johnson [2], Carraghan and Pardalos [17], Babel [3], Balas and Xue [5], Wood [49], Sewell [41], Östergård [35] and Tomita and Kameda [45]. Östergård also presented an extension designed to solve the weighted version of the problem in [34]. The algorithmic framework presented in this section is a direct generalization of Östergård’s algorithm for the maximum weight clique problem [34], which is itself a variant of the RDS algorithm proposed in [47] for constraint satisfaction problems. RDS has recently been applied to other discrete optimization problems such as the Steiner triple covering problem and the maximum transitive subtournament problem [36, 46].

Our choice of the RDS approach as the basis for a general framework applicable to any hereditary property is motivated by its simplicity and effectiveness. It should be noted that a variant of the Carraghan-Pardalos algorithm was used as a benchmark exact algorithm [2] for the maximum clique problem in the Second DIMACS Implementation Challenge [25]. Östergård’s algorithm enhances the framework proposed by Carraghan and Pardalos, yielding what is known to be one the fastest exact combinatorial algorithms for solving the maximum clique problem [35]. According to the results reported by Tomita and Kameda [45] for randomly generated test instances and DIMACS benchmark instances [19], their algorithm appears to be the fastest available algorithm in published literature with Östergård’s algorithm following closely behind at the time.

Consider a simple, undirected graph G=(V,E), positive weights w(v) for each vV, and a property Π that satisfies the Yannakakis conditions. Define the invariant μ(G) as follows:

$$\mu(G)=\max\bigl\{w(P):~P\subseteq V,~G[P]~\mbox{satisfies}~\varPi\bigr\}, $$

where w(P)=∑ vP w(v). Finding P such that μ(G)=w(P ) constitutes the maximum weight Π problem. When w(v)=1 ∀vV, this is the maximum Π problem. Note that the assumption of positive weights is not restrictive when Π is hereditary, as vertices with nonpositive weights can be deleted without changing μ(⋅). Under this assumption, an optimal P such that w(P )=μ(G) will be maximal by inclusion. Based on Theorem 1 and the fact that the unweighted version is a special case, the maximum weight Π problem is also NP-hard.

Algorithm 1 presenting the overall framework proceeds as follows. First, we impose an ordering of vertices in V and use the notation V=(v 1,v 2,…,v n ) to refer to the set of ordered vertices. With the sets S i V defined as S i ={v i ,v i+1,…,v n }, Algorithm 1 computes μ(G[S i ]), denoted simply by μ(i). Obviously, μ(n)=w(v n ) and μ(1)=μ(G). The algorithm computes the value of μ(i) starting from μ(n) and down to μ(1), and clearly μ(i)≥μ(i+1). Moreover, if μ(i)>μ(i+1), then any solution that yields the value μ(i) contains vertex v i . Otherwise, there exists a solution P i , such that μ(i)=w(P i ) and v i P i , then P i S i ∖{v i }=S i+1 implying, μ(i+1)=μ(i), a contradiction. Therefore, for the unweighted case with w(v i )=1,i=1,…,n, the above argument implies that,

$$\mu(i)= \begin{cases} \mu(i+1)+1, & \mbox{if every corresponding solution contains }v_i \\ \mu(i+1), & \mbox{otherwise} \end{cases} $$

For the general weighted case, it is important to observe that a similar claim, i.e. μ(i)=μ(i+1) or μ(i)=μ(i+1)+w(v i ) would be incorrect. However, μ(i)>μ(i+1) implies that v i belongs to every optimal solution, and μ(i)≤μ(i+1)+w(v i ).

Algorithm 1
figure 1

RDS Algorithm for The Maximum Π Problem

In order to understand the importance of requiring Π to be hereditary on induced subgraphs, consider the following definition of Π which is not hereditary on induced subgraphs. G is said to satisfy Π if the diameter of G is at most two (the subset of vertices inducing a subgraph satisfying such a Π is also called a 2-club [9]). Now consider the graph G=(V,E) where V={v 1,…,v n } and E={(v 1,v i ):i=2,…,n}, i.e., G is a star graph with central vertex v 1 and leaves v 2,…,v n . Assuming unit weights, μ(i)=1 ∀i=2,…,n, but μ(1)=n. This is because G satisfies Π while any induced subgraph with at least two vertices obtained by deleting v 1 violates Π.

Algorithm 1 computes μ(n),μ(n−1),…,μ(1) using backtracking, and employs two pruning strategies (discussed later). In particular, the second pruning strategy is based on the bounded increase going from i+1 to i, which only holds in general (for any graph and any ordering) when Π is hereditary. The second pruning strategy is a key factor contributing to the effectiveness of Östergård’s algorithm, which is dependent on the hereditary nature of Π.

The main procedure RDS-Algorithm accepts graph G (and property Π) as input and calls the recursive procedure FindMax Π to compute the values of μ(i). The procedure FindMax Π is the core of the algorithm and finds a maximum weight subgraph with property Π using two sets as the input: (i) the working set C (also known as candidate list or candidate set) is the set of vertices that may be used to build a subgraph with property Π, and (ii) the set of vertices P that represents the currently found subgraph with property Π. The procedure chooses vertices from C one by one, adds a chosen vertex to the current graph P, updates the candidate list, and calls itself with new values of input sets. Obviously, if the candidate set is empty, the procedure terminates returning the best weight value found. Other points of termination are the prune points at Line 20 and Line 24 indicated in Algorithm 1, referred to as type 1 and type 2 pruning, respectively. Pruning of the first type occurs when the weight of the current subgraph together with the weight of the whole candidate set is less than the incumbent max. The second type of pruning occurs when the weight of the current subgraph together with μ(i) (which is best possible in G[S i ]) is less than the incumbent max.

This general framework to solve the maximum weight Π problem would be a depth-first search order complete enumeration in the absence of both prune points 1 and 2; with only prune point 1, the algorithm is a depth-first search order implicit enumeration. The latter results in a “Carraghan-Pardalos type” backtracking algorithm for the maximum clique problem [17]. The second type of pruning is facilitated by the vertex ordering, and the ensuing arguments on bounded increase in μ(i) developed by Östergård [34, 35] for the maximum clique problem. One of the key design elements that heavily influences the effectiveness of this algorithm is the choice of a vertex ordering scheme that “encourages” type 2 pruning. While the general framework itself conceptually subsumes other approaches, key implementation issues need to be addressed to make the resulting algorithm as effective for the particular problem under consideration as the Östergård’s algorithm is for the maximum clique problem. In the remainder of this paper, we study the development of such an implementation for the maximum s-plex problem and the maximum s-defective clique problem.

3 Adaptation of the algorithm for two hereditary clique relaxations

Definition 1

Given a simple, undirected graph G=(V,E) and a fixed positive integer s, a subset of vertices S is an s-plex if it satisfies the following property:

$$\bigl|N(v) \cap S\bigr| \ge|S|-s\quad \forall v \in S, $$

where N(v) is the neighborhood of v (i.e., N(v)={uV:(u,v)∈E}). In other words, the minimum vertex degree in G[S] is at least |S|−s.

The case s=1 corresponds to a clique, and for s>1, the s-plex model is a relaxation of clique which allows for at most s−1 non-neighbors for each vertex.

Definition 2

Given a simple, undirected graph G=(V,E) and a fixed nonnegative integer s, an s-defective clique S is a subset of vertices that satisfies the following property:

$$\bigl|E[S]\bigr| \ge{|S|\choose2}-s, $$

where E[S] is the edge set of the subgraph induced by S.

A 0-defective clique is a clique, and the s-defective clique model for s>0 relaxes the clique requirement of having all possible edges by allowing at most s edges to be absent from the induced subgraph.

The maximum weight s-plex problem and the maximum weight s-defective clique problem are the optimization problems of interest. The s-plex model was introduced by Seidman and Foster [40] to represent cohesive subgroups in social network analysis [48]. A typical social network is the acquaintance network were vertices represent people and an edge indicates that the two people represented by the end-points know each other. Seidman and Foster were motivated by the observation that cliques were capable of modeling “ideal” cohesive subgroups in which every individual knew everyone else, but were not necessarily suitable for detecting real-life cohesive subgroups. Polyhedral approaches [8, 31] and exact combinatorial algorithms [32, 33] have been developed to solve the maximum s-plex problem. In particular, McClosky and Hicks [32] independently extended Carraghan-Pardalos [17] and Östergård [35] algorithms for the maximum s-plex problem. This paper demonstrates improvements over the results of McClosky and Hicks [32] and Balasundaram et al. [8] in Sect. 4 that are achieved via improved algorithm design and implementation.

The s-defective clique model was introduced by Yu et al. [52], who use it to represent clusters in protein interaction networks. In such networks, the proteins in an organism are represented by vertices, and an edge indicates that the proteins corresponding to its end-points are known to interact based on biological experiments. These biological experiments can be high-throughput and noisy [4, 52] resulting in large-scale network models representing an erroneous edge set. This motivated Yu et al. [52] to use s-defective cliques as an alternative to cliques to facilitate the detection of protein complexes in the presence of missing edges. However, a heuristic method is proposed therein and no exact algorithms are currently available for this problem. This paper investigates the performance of the proposed general-purpose exact algorithm when tailored to this problem.

It should be noted that in addition to s-plexes and s-defective cliques several other clique relaxations were introduced and studied in the context of graph-based data mining and clustering applications [10, 37, 39], however, unlike s-plexes and s-defective cliques, they do not satisfy the Yannakakis conditions. Tailoring the general framework of Algorithm 1 for the maximum s-plex problem and the maximum s-defective clique problem requires us to focus our attention on two key points identified in the algorithm, (i) vertex ordering, which significantly affects type 2 pruning (also noted by Östergård [35], McClosky and Hicks [32]), and (ii) Π-verification, a step that is invoked very frequently and hence, can be a potential source for performance improvement. These issues are the focus of the investigation in the remainder of this paper.

3.1 Candidate set generation and Π-verification

Recall that Algorithm 1 makes recursive calls with a candidate set C and a current solution P. The candidate set has the property that {v}∪P satisfies Π for each vC. The first key challenge with respect to algorithm design when tailoring Algorithm 1 for a specific graph property Π is the candidate set update procedure that verifies and, hence, maintains Π. For cliques, the candidate set is just the intersection of neighborhoods of vertices from the current clique. If v i was added at current iteration, then the candidate set C′ for the next iteration is the intersection of the current candidate set and the neighborhood of the newly added vertex: C′=CN(v i ).

However, in the case of s-plex and s-defective clique, the definition permits a limited number of non-neighbors. The new candidate set must be generated by explicitly verifying and selecting each vertex from the current candidate set that forms an s-plex/s-defective clique when added to the current s-plex/s-defective clique. This forces the use of Π-verification procedures such as Algorithm 2 and Algorithm 3. The function is called for each member of the current candidate set each time the candidate set needs to be updated. The speed of the Π-verification procedure determines the speed of candidate set update process, and significantly influences the overall algorithm running time, given the number of such calls. We need to focus on improving the running time of this procedure, as well as reducing the number of calls to this procedure. This subsection focuses on the former while Sect. 3.3 focuses on the latter.

Algorithm 2
figure 2

s-Plex verification procedure

Algorithm 3
figure 3

s-Defective clique verification procedure

Algorithms 2 and 3 present simple verification procedures for s-plex and s-defective clique, respectively. The function IsPlex1 has a quadratic running time complexity with respect to the size of its argument K with a linear-time procedure for determining the induced degree in each iteration of the for-loop. We can improve this procedure by using additional information available from the previous steps. Assume that the vertex u has just been added to the s-plex K in the current iteration, so that the new s-plex is K′=K∪{u}. We know that K∪{u} is an s-plex and K∪{v} is an s-plex for any vertex v from C. To determine whether v will be included in the new candidate set C′, we need to verify that K∪{u}∪{v} is an s-plex. Let us call a vertex iK saturated if |KN(i)|=|K|−s, which means it is not possible to add vertices not in N(i) to K without violating the degree requirement. Since K∪{v} is already known to be an s-plex, the only possible violations may be caused by the newly added vertex u. First, addition of u could increase the number of non-neighbors for some vertex yKN(u) by one. If such a vertex y becomes saturated in K′, then C′ must be a subset of N(y). Second, the vertex u itself could be saturated in K′, in which case C′⊆N(u). Unsaturated vertices in K′ do not create any restrictions for the candidate set. Note that if s=1, then every vertex in K is saturated and these observations unify to yield the intersection of neighborhoods. These considerations allow one to create a faster procedure for updating the candidate set. After adding a new vertex we determine the list of saturated vertices. Then the new candidate set is obtained by intersecting the current candidate set with the neighborhood of each saturated vertex. In order to quickly identify vertices that become saturated, we keep and update the list and number of non-neighbors for vertices in K. Algorithm 4 formalizes the resulting incremental s-plex verification procedure.

Algorithm 4
figure 4

Incremental s-plex verification procedure

Recall that the naive s-plex verification procedure runs in O(|K|2) time. The new procedure consists of two parts: generation of the list of saturated vertices, which can be done in O(|K|) time and is performed once; and verification whether a vertex is in the neighborhood of saturated vertices, which is performed for every vertex from the candidate list. The theoretical complexity of the second part is O(|K|), but each vertex can be a member of the saturated vertex list only once during the whole process of building K (once a vertex is included in this list, the candidate list will be intersected with this vertex neighborhood, and no more vertices from the candidate list may satisfy the condition in line 5 of Algorithm 4). As will be discussed later, the comparison of running time of the maximum weight s-plex algorithms that utilize the original and the alternative s-plex verification procedures shows a significant improvement in the performance by using the latter approach, especially for larger s.

Since an s-defective clique is also an (s+1)-plex, Algorithm 4 is valid for the maximum s-defective clique problem as well. Algorithm 4 verifies if the vertex subset K is an (s+1)-plex and if so, calls Algorithm 3 for checking if K is an s-defective clique. This reduces the number of calls for Algorithm 3, which runs in O(|K|2) time.

3.2 Scale-reduction based on 2-neighborhood for the unweighted case

The simple fact that for any vertex v from clique C, CN[v], allows us to reduce the problem of finding a maximum clique from the whole graph to several single-vertex neighborhoods. Seidman and Foster [40] showed that if G is an s-plex with more than 2s−2 vertices, then the diameter of G is at most two. The connectivity of large s-plexes has already been used indirectly in the previous subsection, where the candidate list was shrunk by intersection of neighborhoods of saturated vertices, which means that all other candidates to be included in s-plex are connected to the saturated vertex. This observation will also be used heuristically to order vertices based on the size of their distance-two neighborhoods in Sect. 3.3.2. Unlike with cliques, the containment of solution inside the (double) neighborhood only holds for s-plexes of size at least 2s−1 and may not hold in general for smaller order s-plexes. At first glance, the bounded diameter property cannot be used in the algorithm, since in a weighted graph there can be an s-plex of cardinality less than 2s−1, but of a weight larger than that of any s-plex of size greater than 2s−2. However, we can still modify the original maximum s-plex algorithm to improve its performance for instances of the problem containing s-plexes of a certain size in the unweighted case as follows. After adding a new vertex v to the formed s-plex, the working set outside of N 2[v] is simply cut off, thus reducing the problem of finding the required s-plex to N 2[v]. Then, if the resulting solution has cardinality at least 2s−1, this solution must be a maximum s-plex in G. Otherwise, if the resulting s-plex has cardinality smaller than 2s−1, then the original algorithm must be executed in order to find an optimal solution.

3.3 Vertex ordering

While Sect. 3.1 dealt with improving the s-plex and the s-defective clique verification procedure, this subsection investigates approaches that aim to decrease the number of calls to this procedure. Algorithm 1 is a branch and bound algorithm, where Π-verification procedure is called in each branching node multiple times. Given the intractability of the problem, it is unlikely to be able to provably reduce the number of such calls in general. The number of calls would be reduced if, (1) we can shrink the working set faster (i.e., the branch is cut off because of feasibility), and (2) we can prune more often (i.e., the branch is cut off because of the bound). Both these goals are influenced by the ordering of vertices in line 2 prior to commencing the main iterations in line 4 of Algorithm 1. The following subsections discuss different vertex ordering schemes for weighted and unweighted graphs.

3.3.1 Weight-based ordering

Since we seek an s-plex/s-defective clique of maximum weight, a weight-based ordering is an alternative. Recall that the algorithm processes the vertices starting from v n down to v 1. A greedy approach would suggest a non-increasing order of vertex weights i.e., an ordering such that w(v n )≥w(v n−1)⋯≥w(v 1). However, in this case the values μ(i) and the weight of the incumbent s-plex/s-defective clique, max, grow quickly in the beginning, but subsequently, as max is already close, or, perhaps, equal, to the optimal value, the growth stalls. Thus, the values of μ(i) and max remain equal for a long time, and the algorithm proceeds without any improvement. This leads to the situation where the pruning in line 24 of Algorithm 1 does not occur.

The smallest-to-the-largest-weight ordering benefits from the opposite effect: the values of μ(i) are maintained to be as small as possible in order to yield more type 2 pruning points. Therefore, ordering the vertices in nondecreasing order of their weights provides a reasonable weight-based ordering. It should be noted that the Östergård’s algorithm for the maximum weight clique used a similar ordering with additional tie-breaking rules when several vertices have the same weight. In our study, we break the ties by simply ordering vertices according to their original labels. In particular, for instances with all equal weights and for unweighted instances this results in the natural ordering from vertex labels, which we call the control ordering.

3.3.2 Degree-based ordering

When searching for a maximum clique in an unweighted graph, it is natural to order the vertices according to their degree. The ordering procedure does start with the minimum degree vertex v n , however the further order of vertices depends on their degrees in the subgraph induced by the subset of vertices that have not yet been ordered and hence the resulting order is not the same as the one resulting from sorting the vertices by their degree. Such a degree-based ordering was used in the Carraghan-Pardalos algorithm for the maximum clique problem, and is supported by the fact that the clique C that contains vertex v is a subset of N[v], so starting with vertices of smaller degree provides small values for μ(i), which can yield higher success rate with the type 2 prune points (Algorithm 1, line 24).

3.3.3 2-Neighborhood-based ordering

Since an s-plex containing v i is not necessarily a subset of N[v i ], the ordering is not as clearly justified as for the maximum clique algorithm, even though the number of non-neighbors is limited to s−1. The same holds for s-defective clique. Hence, the degree-based ordering for the maximum clique problem is modified to suit the properties of the problems in hand. The result of Seidman and Foster [40] is helpful in this regard. If the unknown optimal s-plex K is sufficiently large, i.e., |K |>2s−2, then every maximum s-plex has diameter at most two. So any maximum s-plex containing an arbitrary vertex v is a subset of \(N^{2}_{G}[v]=\{u \in V(G) : d_{G}(v,u) \le2\}\). This simple modification provides a new ordering based on the size of the 2-neighborhood of a vertex instead of its degree. The same considerations apply to the maximum s-defective clique problem.

3.3.4 Coloring based ordering

Another popular vertex ordering for the maximum clique problem emphasizes reducing the working set as fast as possible and is based on the fact that no more than one vertex from an independent set can be included in a clique. The corresponding ordering partitions a graph into independent sets (performs graph coloring) and groups vertices from the same color class together. Since the problem of graph coloring is NP-hard, the coloring is performed greedily, and the corresponding ordering is based on the order in which colors were assigned. Such an ordering is expected to reduce the working set fast, since all vertices of some color will be removed after one vertex of this color is chosen for inclusion into clique. Vertices can be ordered starting from the color class of the largest cardinality or in the opposite order.

Coloring guarantees that only one vertex of each color could be included in a clique. However, this is not the case for s-plex, as any s vertices form an s-plex by definition, and the implementation of coloring-based ordering must be modified accordingly. Thus, we replace partitioning into independent sets (classical coloring) by partitioning into independent set relaxations. One such structure is the co-s-plex, defined as a subset SV of vertices satisfying the following property:

$$\bigl|N(v) \cap S\bigr| \le s-1\quad \forall v \in S. $$

Clearly, a co-s-plex is an independent set for s=1 and its relaxation for s>1. Further, SV is a co-s-plex in G if and only if S is an s-plex in the complement of G.

Balasundaram et al. [8] showed that at most 2s−2+(smod2) vertices from a co-s-plex may be included into an s-plex. It is not known how many vertices from a co-\(\bar{s}\)-plex may be included in an s-plex for arbitrary values of \(\bar{s}\) and s, but obviously for a fixed s this number increases when \(\bar{s}\) increases. On the one hand, increasing \(\bar{s}\) creates more branches at each step, however, on the other hand, the number of partitions in the graph will be smaller. The problem of partitioning a graph into co-s-plexes is known in the literature as defective coloring [18, 21] and is NP-hard, so we use a simple greedy procedure (Algorithm 5) to obtain the needed partitions. If the graph is weighted then the next vertex in line 6 of Algorithm 5 is chosen as the smallest weight non-colored vertex. For vertices with the same weight, the one with the largest neighborhood weight is chosen, following the rules similar to [34, 35].

Algorithm 5
figure 5

Defective \(\bar{s}\)-coloring based vertex ordering

Experiments were conducted for \({\bar{s}}=1,2,\ldots, 5\) using both ordering strategies (largest color class first and the opposite) with unweighted instances of the maximum s-plex problem and the maximum (s−1)-defective clique problem. The main observation from these experiments is that the ordering based on defective coloring typically yields better algorithm performance than the control ordering. Again, there is no conclusive evidence as to which coloring provides the best results. In many cases, the classical coloring (i.e., partitioning the graph into independent sets) outperforms the other considered orderings, but quite often the best result is obtained when \({\bar{s}}=s\). Less often, the best running time is given by values of \(\bar{s}\) that are in between 1 and s, and even more rarely, when \({\bar{s}}>s\).

The vertex ordering techniques considered in this section are heuristic in nature and cannot provide a predictable performance improvement on all problem instances, as was in the case of candidate set generation in Sect. 3.1. But when they do provide an improvement, it is usually much more significant than that obtained using the techniques from Sect. 3.1 alone.

4 Results of numerical experiments

All numerical experiments presented in this paper were conducted on Dell Optiplex GX620 computer with Intel(R) Core(TM) 2 QUAD 3 GHz processor and 4 GB of RAM. The algorithm was implemented in the C programming language, using Microsoft Visual C++ .NET 2010 (v 10.0) development environment for Win32 platform. Despite the dual core feature of the CPU, the algorithm is implemented using single threaded mode and cannot use this hardware advantage.

In the experiments reported in this paper, we considered only unweighted instances of the problems of interest. A set of preliminary experiments was conducted to facilitate the choice of algorithm settings used in the set of tests reported below. In particular, in the reported experiments the algorithm always used the incremental s-plex verification routine, as described in Sect. 3.1 and the diameter based pruning from Sect. 3.2. Degree-based vertex ordering described in Sect. 3.3.2, which exhibited the best overall performance for both problems in the preliminary tests, is considered for the reported experiments. It should be noted that this ordering is the same as the one used in Carraghan-Pardalos algorithm for the maximum clique problem [17]. Therefore, we will refer to this ordering as the standard ordering.

4.1 Standard testbed

The standard testbed of instances used in our experiments consists of two groups. The first group is comprised of benchmark clique instances from the Second DIMACS Challenge [19, 25]. The second group is comprised of instances from the Tenth DIMACS Challenge [20] and Standford Large Network Database Collection (SNAP) [27]. The instances in this group are typically very large and sparse, and hence a reduction on the graph size using the so-called peeling procedure originally proposed for the maximum clique problem [1], which, given a lower bound |S| on the clique number (found, e.g., heuristically) recursively removes vertices of degree less than |S|−s. Obviously, the peeling procedure cannot remove any vertices belonging to a maximum s-plex or s-defective clique. The total number of instances considered is 63, and we preserve their original names in Tables 12, 3–6.

Table 1 Running time comparison with algorithms of Balasundaram et al.
Table 2 Running time comparison with McClosky-Hicks algorithm

Tables 3 and 4 in the Appendix (Online Supplement) show the parameters of the instances, including the graph order, size, density, clique number, s-plex number for s=2,3,4,5 and s-defective clique number for s=1,2,3,4. The clique numbers for DIMACS graphs were obtained from [19] and other sources. The s-plex numbers and the s-defective clique numbers were obtained by our algorithm. If the algorithm could not find the optimal solution within a 3-hour time limit, the size of the incumbent s-plex/s-defective clique is provided, indicated with “≥” in the tables.

Tables 5 and 6 in the Appendix (Online Supplement) provide the running time of the maximum weight s-plex and s-defective clique algorithm in CPU seconds. In these tables, the column labeled by “Reduced” specifies the number of vertices remaining in the graph after applying the peeling procedure, with an asterisk used to mark the cases where no reduction occurred.

As mentioned before, the ordering scheme considered was the standard ordering due to its superior performance on most instances. However, it should be noted that for several instances other ordering schemes explained in Sect. 3.3 performed much better than the standard ordering. Most notably, for the instance c-fat200-5.clq the running times obtained using the ordering by the smallest degree first for s=2,3, coloring-based ordering with the smallest color class first for s=4, and the ordering that uses the smallest double neighbourhood size first for s=5, were below 1 sec, whereas the run using the standard ordering failed to terminate after 3 hours.

4.2 Comparison with existing approaches

Since the recent introduction of the maximum s-plex problem to the operations research literature by Balasundaram at el. [7, 8], at least four approaches (including the original one and the current one) for solving the problem have been developed. The first approach was proposed in [7, 8], where a branch-and-cut method for the maximum s-plex problem was developed and tested on a subset of Dimacs graphs. Two versions of the branch-and-cut algorithm for the maximum s-plex problem were implemented. Both versions employ CPLEX for solving the IP formulation of the problem, the difference being the cuts used. One version incorporates the maximum independent set cuts (referred as BC-MIS), while the other incorporates co-s-plex cuts (referred as BC-C2PLX).

Table 1 shows the running time of the three approaches: BC-MIS and BC-C2PLX by Balasundaram et al. [8] and Algorithm 1. In most cases, our algorithm outperforms both branch-and-cut approaches, possibly due to tuning of this algorithm to the particular problem. The branch-and-cut approaches were less tuned to this particular problem as they only employed problem-specific cutting planes while inheriting default CPLEX branching and search strategies.

McClosky and Hicks [30, 32] considered two different algorithms for finding a maximum s-plex in a graph. The first algorithm is based on Carraghan-Pardalos [17] ideas, while the second is based on the aforementioned Östergård’s algorithm for cliques. Both algorithms were tested on a subset of Dimacs benchmarks. Since their second algorithm outperforms the first on all test instances, we only compare our results with the second algorithm. Since McClosky and Hicks limited the algorithm execution time to one hour in their experiments, we also disregard all results that were obtained in a larger amount of time. The numerical results were provided for s=2,3,4 with precision 1 sec, the running time 0 in McClosky-Hicks experiments was interpreted as <1 sec. The time limit was set to 1 hour in this case since the same termination criterion was used in [32]. Table 2 provides the running times, where the columns marked with “MH” refer to McClosky-Hicks experiments and those marked by “Alg. 1” show our results. From the results, we conclude that our algorithm demonstrated better running time performance, which can be explained by more refined procedures for candidate set generation and pruning points used.

5 Conclusion

This paper exploits the connections between the complexity result for the problem of finding a maximum subset of vertices satisfying a nontrivial, interesting property Π that is hereditary on induced subgraphs, established in 1978 by Yannakakis, and RDS-type algorithms for the maximum clique problem proposed by Carraghan and Pardalos in 1990 and enhanced by Östergård in 2002. We first extend the ideas used in these algorithms to the weighted version of node deletion problems satisfying the conditions of Yannakakis theorem and then perform a problem-specific study of the approach with the maximum s-plex and the maximum s-defective clique problems, which are two hereditary relaxations of the maximum clique problem that are of interest in graph-based data mining applications. The comparison of the proposed algorithm with two other existing methods for the maximum s-plex problem published in the literature on standard testing instances demonstrates the improved performance of the proposed algorithm and its problem-specific enhancements. To the best of our knowledge, our adaptation of the proposed algorithm to the maximum s-defective clique problem is the first reported exact algorithm for this problem. The proposed general algorithmic framework as well as some of the observations made in the process of experimenting with the maximum s-plex and the maximum s-defective clique instances could be used to develop successful implementations for other hereditary structures in graphs.