Introduction

The Boolean Satisfiability problem (SAT) is a Constraint Satisfaction Problem (CSP) which is the problem of determining whether a Boolean formula can be satisfied by a logical (i.e. truefalse) assignment of variables. In recent years, captivating improvements in SAT attract us to study further about this NP-complete problem. The attraction behind the SAT problem is the extensive application in real-world problems and theoretical aspects [1]. A framework of the SAT model is being used to solve many combinatorial problems. It is now routinely being used in bounded model checking (BMC) [2], circuit verification [3], test pattern generation and more specifically in checking complex circuits [4]. Recently, a good number of solvers have been programmed which have better run-time. Therefore, many complex problems can now be solved. In recent years, SAT solvers are successfully being used to solve different types of problems. Those solvers are now being used in artificial intelligence and optimization, more specifically in scheduling and planning problems [5], bioinformatics [6], probabilistic models, crypto-analysis [7], etc. Every year, new solvers with various encouraging techniques are introduced. Also, the SAT community all over the world arranges SAT competitionsFootnote 1 to share ideas and new solvers and benchmarks.

There are many popular solvers available for SAT. So far we have studied, these SAT solvers can be categorized into two popular types. The first one is the systematic approach solvers, e.g., Conflict Driven Clause Learning (CDCL) [8], Davis-Putnam-Logemann -Loveland (DPLL) [9], etc. Here, they use a complete search strategy to solve SAT. In contrast to procedural approaches, stochastic local search (SLS) solvers are also preferred and it has a dominant performance over complete solvers on different SAT instances. It produces a better success rate in random SAT formulas. SLS solvers are mostly preferred for their speed and scalability. Therefore, mighty research and influential efforts are going on to develop more efficient SAT solvers. There are both strengths and weaknesses of those approaches. Most of the modern solvers perform unit propagation as a preprocessing phase [10]. Few other variable and clause elimination techniques are also available [11,12,13]. Our approach is tantamount to those techniques in the case of preprocessing. It can be seen as part of improving the performance of modern SAT solvers to accelerate computational power.

In this paper, an improved version of SLS-based SAT solver CCAnrEQ is presented. This solver is mainly inherited from the state-of-the-art solver CCAnr [14], an SLS-based solver. Our thesis aims to present a simplification technique that enhances the SAT solver performance. This simplification process is done by restoring the gate structure from the CNF [15, 16]. The extraction of gates gives us directions about how the CNF encodes from the SAT. And so, it helps to exploit the intrinsic properties and to research before performing the satisfaction of the CNF formula. Our observation is encoding and restoring the gates which are the reflection of one another. We simplify the CNF by restoring gates, because it eases the SAT solver to reduce rambling formulas. This process is much easier than encoding as it is domain independent. The simplification, further improvements, and detailed experimental evaluation of our new solver help us to understand why the new method works.

As we describe above, our simplification technique can be employed during preprocessing. We focus on employing this reasoning as a preprocessor only. When restarts during SAT solving, the solver will use the simplified CNF always. We have added a rich literature review for SLS solvers along with CCAnr. Here, we present the local search strategy and the different shortcomings of the SAT solvers. The existing CCAnr solver performed very well in different SAT competitions in various tracks. Therefore, we try to improve it. These are helpful to build our new solver CCAnrEQ. We empirically demonstrate in the experimental result section that our new solver performs better than the existing one in different outlooks on several structured SAT instances. More specifically, our new solver can solve some new instances that cannot be solved by CCAnr with a specific time cut-off. Therefore, in both success rate and time comparison, our demonstrated solver outperforms CCAnr. On the other hand, we have also tried different heuristics to reduce greediness along with this preprocessing. Those heuristics also produce several good results that are shown in “Experimental Result” section.

Our paper is organized in the following way. “Related Works” section provides a study related to our research. Here, we also describe different SLS solvers along with the background of the state-of-the-art solver. Based on the previous experience, we present the approaches and implementation technique of our new solver CCAnrEQ in “Our Approach” section. In “Experimental Result” section, we show the comparison between our solver and the State-of-the-art solver. A general conclusion and an outlook on future works are added in “Conclusion” section.

Related Works

In this section, we first describe the basic notations related to the SAT problem. Then, a few definitions and terminologies related to this problem. After that, various existing techniques related to local search and SAT are described.

SAT Definitions

Given a set of propositional variables \(V=\{x_1,x_2,x_3,...,x_n\}\), where a literal can be either positive (the variable itself \(x_i\)) or negative (negation of a variable is \(\lnot x_i\)). We say, for each variable, the domain size is fixed which is {truefalse}. There are many formats to represent a SAT formula. Among all format, Conjunctive Normal Form (CNF) is very much familiar. A CNF is called clausal formula as it is a conjunction (\(\wedge\)) of a set of clauses \(\{C_1,C_2,.....,C_n\}\). A clause is a disjunction (\(\vee\)) of literals. If we say l as a literal, a \(clause\,\, C_i\) can be written as \(C_i=l_1\vee l_2\vee ....\vee l_n\) and \(l\ge 1\). Therefore, a CNF formula is written as \(F=C_1\wedge C_2 \wedge ... \wedge C_n\). For a given SAT formula F, a SAT problem can be evaluated as checking whether every clause can be satisfied by considering some assignments of truth values to each variable. To satisfy the formula F, every clause \(C_i\) needs to be true.

Let V(F) be the set of all variables. A structure \(S:V(F)\rightarrow \{0,1\}\) is an assignment. At local search, S maps all variables of F to a Boolean value. That is why it is called a complete assignment. In other words, S is a true assignment of 1,0 {truefalse} of each variable contained in V. If there exists any structure S such that \(S(F)=1\) then the clausal formula F is called satisfiable. A CNF will be a tautology if \(F(S)=1\) for every structure of S. There may be some clauses that contain only a literal, which are called a \(unit\;clause\). A CNF is unsatisfiable if two \(unit\;clauses\) appear as \(l_i\) and \(\lnot l_i\).

A SAT problem consists of m variables and n clauses (i.e. \(|V|=m,|C|=n\)). The clause-to-variable ratio r is defined as \(r=n/m\) (e.g. \(r=2.00001\) means each clause has two literals on average). The variables can be the neighbor of each other. A variable will be the neighbor of another variable if both appear in at least one clause. Let a and b be two variables. The neighborhood of variable a is defined as \(N(a)=\){b|b appears in any clause simultaneously with a}, where N(a) contains all neighbors of a. In this paper, we use all those terms as required.

Local Search Solvers

From the early ’90s, SAT is becoming much more popular because of its various practical applications. Therefore, many state-of-the-art solvers are found from that time. Every year, the SAT community celebrates an idea-sharing competition where a lot of solvers compete. That gives more opportunities to find more solvers and to enhance the solver’s performance. There are many complete solvers [8, 17,18,19,20] that are mainly tree-based method to solve the SAT. In this paper, we prefer to go ahead with local search solvers [14, 21,22,23,24,25,26,27,28] only as it is much more scalable. Solvers that use a pre-processor [12, 29, 30] also perform well.

GSAT [22] is the first stochastic local search (SLS) solver. SLS solvers start with a random complete assignment that is called the initialization phase. After that, it performs the basic local search approach. It continually takes a variable to be flipped and does this until an intended assignment is found or some other terminating conditions (e.g. time out, maximum iteration, etc.) is reached. Selecting the variable to be flipped is a crucial issue and many heuristics are applied here. In GSAT, a variable with the highest score, see Definition 1, is picked. As SLS solvers start with complete assignments, it is more scalable than the complete solver. But it can not guarantee whether the SAT formula can be satisfied.

Definition 1

score :  The score of a variable is the difference between make and break after flipping it. The break is the count of clauses that become unsatisfied if that variable is flipped. On the other hand, the make is the number of clauses that become satisfied if that variable is flipped.

Other types of solvers are based on WalkSAT [27, 28, 31]. As GSAT is a completely greedy approach, WalkSAT introduced noise parameters that reduce some greediness. It is always challenging to set the noise parameters and to tie-breaking in the variable selection phase.

The cycling problem is one of the greatest problems in local search [32] and it hinders search performance. Several strategies (e.g. Tabu [33]) are proposed to assuage it. It is also a big concern in the field of SAT. A novel strategy named \(configuration\; checking\) (CC), see Definition 2, is successfully applied in different problems [34, 35]. This is also applied in different SAT solvers named CCASat [25], swcc [36], CCAnr [14]. Similarity checking, one of our previous works, is applied to reduce the problem [37]. This approach is not integrated into this paper as it does not perform well after removing equivalence.

Definition 2

\(configuration\;checking\; (CC):\) it is a way to prevent a variable to be flipped until one of its neighbor variables is changed (flipped) since the last flip of that variable.

The CCAnr Solver

At CCAnr [14], two types of variables are introduced. One is CCD variable, see Definition 3, and another is SD variable, see Definition 4. A pickVar function in CCAnr returns a variable to be flipped. According to Algorithm 1, it checks first whether a CCD variable exists and if it is found then returns the one that has the highest score. Then, it checks again whether a SD variable exists and does the same process. Here, searching for a SD variable can be described as an aspiration technique in local search. This is called CCA heuristic in CCAnr.

If none of these variables are found, then it performs a focused random walk as diversification mode. Here, a clause weighting technique is applied to diversify the search. The core idea is to increase the weight of the falsified clause, while in a stagnant situation to diversify search. At CCAnr, first, it increases the weight of every unsatisfied clause by one. After that, the average clause weight \({\overline{w}}\) is calculated. If \({\overline{w}}\) is greater than a specific threshold \(\gamma\), then all clause weights are smoothed as \(w(c_i) := {\lfloor \rho \cdot w(c_i) \rfloor } + {\lfloor (1-\rho ){\overline{w}} \rfloor }\). Here, \(\gamma\) and \(\rho\) are the threshold parameter and the factor parameter respectively. Finally, it picks a random unsatisfied clause and returns the most aged variable.

CCAnr is a two-mode SLS solver. One is a greedy mode where CCD or SD variables are searched and another one is diversification mode. Some other two-mode SLS solvers can be found in [21, 38, 39]. At CCAnr, \(unit\; propagation\) is included only as preprocessing. It takes a little amount of time to preprocess. Therefore, the overall solver performance is not degraded. Now, the performance can be improved by executing efficient preprocessing. In our paper, we have selected the CCAnr to make further improvements.

Definition 3

\(configuration\;changed\;decreasing \;(CCD):\) It is the variable whose configuration has been changed (i.e. not locked by CC) and it has a positive score (i.e. make is greater than break).

Definition 4

\(significant\;decreasing\; (SD):\) A variable v that configuration is not changed (i.e. locked by CC) but has a significant score where \(score(v) > {\overline{w}}\). The average clause weight is \({\overline{w}}\).

figure a

Further Study

Path-Relinking (PR) [40], an intensification strategy, is another efficient technique and performs well in MAX-SAT [41]. Since path-relinking is well known for the different optimization problems, it cannot be applied directly in MAX-SAT. A greedy randomized adaptive technique is incorporated with path-relinking to improve search in MAX-SAT [42, 43]. IPBMR (Iterated Path-Breaking with Mutation and Restart) is another well-known mechanism that outperforms different popular solvers [44]. Here, the difficulties of incorporating path-relinking directly to MAX-SAT are described. Recent work suggests that parallel portfolio based local search works very well and the intensification and diversification states are controlled by path-relinking [45]. To the best of our knowledge, it has not been applied to structured instances of SAT. Therefore, we decide to implement a new strategy similar to PR but not in an exact way to check the efficiency of it. A basic idea of PR is described in Definition 5.

Definition 5

Path-Relinking (PR): A strategy to find new quality solutions by connecting two high-quality solutions: one is a guiding solution and another is an initial solution.

In the local search method, greediness is a very common phenomenon here. A lot of sequences of operations are done greedily. But the search performs better at the beginning but falls later. This is a typical scenario in the max–min search. Therefore, while performing the search greedily, it goes to a local optimum in a short time. But most of the time, it becomes stuck to reach the global optimum. Therefore, many strategies are applied here (e.g. simulated annealing [46], clause weighting scheme [14, 38] variable weighting [47], etc.). The study of the greedy method on SAT can be found on [48]. Therefore, adding some randomness is a solution to reduce complete greediness. Monte–Carlo method [49] is a useful method to solve different problems class like optimization. It uses randomness to solve the problem. First, the basic trend of this method is to generate inputs randomly from a set of possible inputs using a probability distribution. Second, it performs a deterministic calculation on the inputs. At last, the results of these calculations are combined to get the overall result.

Our Approach

In this section, we describe how to detect and remove equivalence from the CNF. Then, some other heuristics are described that can improve the solver performance.

Equivalence Removal

Suppose that, we have the following two clauses:

$$\begin{aligned}&C_1:a \vee \lnot b\\&C_2:\lnot a \vee b \end{aligned}$$

Now, to satisfy clause \(C_1\), we can put \(a=1\) or \(b=0\). But if we take \(a=1\) then b must be true to satisfy \(C_2\). Similarly, we can also satisfy both clause \(C_1\) and \(C_2\) by putting \(a=0\) and \(b=0\). Therefore, both two variables take the same value and we tell them as equivalent variables. To find the equivalent variables, we can consider only the clauses whose \(size=2\). Algorithm 2 describes how to get those variables and the more about this algorithm can be found in [15].

figure b

To find the equivalences, only two literal clauses are considered in our approach. An example of two literal clauses is described before. As per example, from lines 2 to 4, Algorithm 2 considers only unmarked clauses (e.g. a clause is marked when the variables of that clause are equal, while finding equivalence and that clause will not be used in the future) from all clauses that have two literals only. Whenever two variables are found to be equal, two corresponding clauses will be marked so that those two can not appear for future detection. If it is found any unmarked clause, line 6 and line 7 will store those two literals, respectively. At line 10, it checks whether two literals \(l_1\) and \(l_2\) are positive or negative as we need only different literals. At line 12, \(S_1\) is the set of all clauses which has only two literals and a literal of variable \(x_1\) and line 13 does the same for the variable \(x_2\). Therefore, if an intersection is performed here then it produces a set that will contain the clause numbers which have both literals of variables \(x_1\) and \(x_2\). And at line 14, \(S_3\) is such a set of common clauses. Now, if the clause \(C_i\) has the two literals \((\lnot l_1,l_2)\) then a second clause \(C_j\) with literals \((l_1, \lnot l_2)\) is required to treat them as equivalent. Similarly, if the clause \(C_i\) is \(( l_1,\lnot l_2)\) then a second clause \(C_j\) with literals \((\lnot l_1,l_2)\) is needed. The searching for such kind of the second clause is done from lines 15 to 20. At line 18, the second such clause is searched, and once found then \(C_j\) will be marked. Once we confirm equality, both variables will be pushed into a queue as a pair to find equivalent sets.

A lot of variables can be equivalent to each other. For example, if we have two equalities \(x_1=x_2\) and \(x_2=x_3\), then we can deduce that \(x_1=x_3\). This is the transitive property of equality. It is written as follows:

If \(a = b\) and \(b = c\), then \(a = c\).

Finding transitive equivalence is the key to remove equivalence from the CNF. We use a new data structure to find the equivalence chain. Algorithm 3 describes more to find transitive equivalence. At first, we set the parent of a variable as the variable itself. At line 4 and line 5, the first equivalent variable is popped from the queue. Two equivalent variables should have the same parent and they belong to the same set. Here, \(parent\_u\) and \(parent\_v\) at line 7 and line 8 stores the parent of two variables u and v, respectively. Now, if both the parents are not equal, then we will check for all variables parent. From all variables, if the parent of a variable is the same as \(parent\_v\) then we will replace it with \(parent\_u\). And therefore, we will get the same parents for all variables that belong to the same set. Algorithm 3 is time-consuming as all variable’s parents are checked at line 10–12 when two variables are dequeued from the queue. Here, an efficient algorithm can be used to reduce the equivalent set finding. The disjoint set concept, one of the best ways to find disjoint sets, can be used here as an alternative to our algorithm.

figure c

In this research, removing equivalent variables before searching for the solution is the key point. There are many cases to remove equivalence from the CNF. Those cases are pointed bellow:

Case 1: If two-variable (e.g., \(a,\; b\)) are equal and a clause is \((\lnot a,\; b)\) or \((a,\; \lnot b)\), then we can delete the clause. Because if we put \(b=a\) in the clause then the clause looks like \((\lnot a, \; a)\) which is always true. Similarly, a clause, of size larger than 2, that contains \((a,\; \lnot a)\) is also true.

Case 2: Let us consider a clause, size 2, is \((a, \;b)\) or \(( \lnot a, \; \lnot b)\). Now if we put \(b=a\) in that clause then it looks like \((a,\; a)\) or \((\lnot a, \;\lnot a)\) which results in a unit clause. As \((a,\; a)=a\), we can select that clause as a unit clause for \(unit\_propagation\).

Case 3: When two variables are equivalent but Case 1 and Case 2 does not occur, Case 3 is introduced here. Here, the second variable is replaced with the first one and clause size is reduced by one.

figure d

Algorithm 4 describes how to remove equivalence from the CNF. At line 6–8, it checks whether a clause, that contains equivalent variable \(eq\_var\), is deleted or not. If clause \(c_i\) is not deleted then the position of \(eq\_var\) is picked by performing lines 10–13. At line 16, it checks whether a replace variable is also contained in \(c_i\) that means both \(eq\_var\) and \(replace\_var\) is present in \(c_i\). Case 1 and Case 2 are introduced at line 18–19 and line 20–23, respectively. From lines 25 to 26, Case 3 is applied.

Step Re-linking

Path relinking (PR) strategy is a well-known concept in many problems, especially in MAX-SAT. This strategy is mainly used to intensify the search to obtain new trajectories by connecting two local minima. Those saved local minima, found previously by performing different search strategies (e.g. Tabu, Simulated Annealing, etc.), are called elite solutions. Two elite solutions, named initial and guiding, are picked randomly and perform path relinking. Therefore, if the distance between those solutions is bigger then we can conclude that a lot of space has been ignored and path relinking tries to relink two solutions that produce a new solution that may be better. At the relinking process, only uncommon elements are considered and tried to change. In this paper, we have tried to implement a new strategy named step relinking (SR) which is inspired by PR. Here, we save the local minima as elite solutions. When the search falls in local minima or stagnant situation, we try to diversify the search. But at that time, most of the clauses are satisfied and it is a draconian task to find the best assignment. Here, when the search steps into diversification state, we save the solution, and after a certain number of steps if the search cannot get rid of that state, we perform a relinking to get a better solution. Now, we describe how this strategy is added to CCAnr and differs from the typical way to implement it.

As PR is applied between two solutions, we first say how elite solutions are picked during the search. Algorithm 5 describes the procedure of adding solutions in an elite array. We save the current solution along with the step or iteration number while our search is stuck. The \(elite\_pointer\) indicates the last solution index elite array and we save few recent local minima always. At line 1, if \(elite\_pointer\) is at \(max\_elite\_pointer\), the highest number of solutions we want to save as elite solution, then we confirm that we have enough solutions. The \(elite\_memory\_full\) is made true when we have stored the maximum number of solutions. To save the most recent local minima, we start \(elite\_pointer\) from 0 or begin like a circular queue. Now, it will replace the most aged elite solutions. At line 6–7, the solution is saved at elite array. An \(elite\_iteration\) array, at line 8, saves the current iteration number of the saved solution. We need to save the iteration number, because we do not want to save all stagnant solutions.

figure e

Now, we describe how the SR algorithm works and Algorithm 6 describes the main procedure. The previous solutions named initial and a guiding solution are needed to relink. We call \(cur\_soln\) as the initial solution and a guiding solution is picked randomly from the elite array. At line 1, if we have fewer solutions than the maximum number of solutions then we pick a solution randomly within that range. Otherwise, at line 4, we pick randomly from all solutions. While performing the relinking process, first we save the solution. After that, a solution from the \(elite\_array\) is picked randomly to perform relinking with the current one. Therefore, a randomly picked solution can be the current one that’s why line 5–6 is introduced. Now we check all variables of the initial and guiding solution and pick only uncommon variables value as the PR strategy. Hence, if a variable has no similar value, then we check the score of that variable. After checking all the variables, we return a variable that has the highest score.

figure f

SR approach with CCAnr is described in Algorithm 7. Before that, adding solutions to the \(elite\_array\) and the SR process is described. Here, we perform the SR approach during diversification state (e.g. focused random-walk mode). At line 6, it checks an interval of last saved solutions to avoid every stagnant solution. SR can not be performed without enough elite solutions. That is why line 8 is introduced. As the SR approach is a time-consuming task, we do not perform it in every step of the diversification state. After at least \(\gamma\) iterations, we perform it.

figure g

Random Sampling (RS) Approach

As mentioned earlier, in pickVar function returns a variable to be flipped, it first checks whether a CCD variable exists. If CCD variables exist, it returns the variable that has the highest score. Therefore, a variable is selected greedily from the pickVar function. Here, we want to introduce some randomness rather than selecting a variable greedily as the greedy approach does not guarantee to reach an optimal solution. As a lot of CCD variables can exist during the search, we will take a random sample within a certain probability. And this method is related to the Monte Carlo method [49] where a random sample is taken from the set of all CCD variables. Now, we will describe how the RS method is co-related with the Monte Carlo method [49].

figure h

Although the Monte Carlo method varies with each problem, it follows a basic pattern always. First, it defines a set of possible inputs and at Algorithm 8, CCD variables are the inputs. Second, a random sample is taken from the full probability distribution and it checks a probability to take the random set. After that, it performs a deterministic approach to get the result. We have done this also after taking \(\alpha\) random variables. We select the variable which holds the highest score.

Experimental Result

We have chosen \(C++\) to implement all of our algorithms that are described earlier and then compiled with \(g++\) with \(-O3\) options. All of our experiments are performed on a GNU/Linux machine, having four cores of intel i7 @2.40GHz and 8 GByte RAM. We run each of the instances ten times with a time cut-off of 1000 s. All of our searching time, shown in the figure or table, is considered as the average of ten runs.

The Benchmarks

Here, we work with four different types of benchmarks to evaluate our algorithms. The first type, parity_games, is taken from SAT 2010 competition.Footnote 2 The other types of instances are collected from SATLIB.Footnote 3 Table 1 shows the number of instances of each type. Here, the instances which contained equivalent gates are mostly preferred for the discussion. We have presented the instances that the State-of-Art solver, named CCAnr [14], or our new solvers can solve.

Table 1 The detailed description of benchmarks

Evaluation of Equivalence Removal

In this subsection, we will first describe the effect of equivalence removal, explained in “Equivalence Removal” section, of a CNF. After that, the effect of other methods, described in subsection 3.3 and 3.2, will be shown as well.

When we remove equivalence from a CNF, a lot of clauses and variables are removed or fixed. A variable is fixed means no further calculation or assignment is needed for that variable while searching. A clause is fixed means that we need not satisfy that clause while employing the search. The number of equivalence gates is mainly problem-specific. Not in every CNF, an equal number of equivalence gates may be found.

Table 2 Effects of Equivalence Removal

Table 2 shows the effect of equivalence removal from a CNF. Let, we have \(\alpha\) variables in a CNF. Now, suppose after \(Unit\_propagation\), \(U_\alpha\) variables are removed from the CNF. Therefore, the number of remaining variables will be \(\alpha - U_{\alpha }\). Again, after performing \(equivalence\_removal\), suppose \(E_\alpha\) amount of more variables are reduced. The percentage of total fixed variables can be derived by Eq. 1.

$$\begin{aligned} FV= \frac{(\alpha - U_{\alpha })-E_{\alpha }}{(\alpha -U_{\alpha })}\times 100\% \end{aligned}$$
(1)

For example, let us consider the instance par16-1.cnf, the total number of variables is 1015. After performing \(Unit\_propagation\), the number of variables fixed is 408. The remaining variables are 1015 − 408 = 607. Next employing the \(equivalence\_removal\), about 273 more variables can be fixed from 607 variables. Therefore, the percentage of equivalence variable reduction is 44.98%. The percentage of fixed variables (FV%) is done concerning the number of variables reduced after performing \(Unit\_propagation\). But if we consider parity_games instances, we will see that no variable was reduced after \(Unit\_propagation\). A similar calculation is applied for clause reduction also.

Table 3 Time required to remove equivalence from CNF

Table 3 depicts the average time required to remove the equivalence from a CNF of the specific type. Here, we can see that a very negligible amount of time is needed to remove equivalence. Therefore, if we add before starting the search, this will not take so much time from the total time. This time can be larger for big formulas. A lot of techniques with efficient data structures can be applied here.

Comparing EQ with the State-of-the-Art Solver

In this subsection, we show the comparison of equivalence removal (EQ) with the State-of-the-Art solver CCAnr [14].

Fig. 1
figure 1

Time comparison in parity games instances with equivalence removal

On Parity Games Instances Figure 1 depicts the average time comparison of parity games instances when all equivalences are removed from the CNF. Here, among 11 instances our new solver can solve all the instances whereas CCAnr can solve only 5 instances. Therefore, after removing equivalence, our solver can solve 55% more instances than CCAnr. Figure 1a presents the average time, in logarithmic scale, needs to solve the instances by both solvers. Here, to solve instance number 5, the CCAnr takes 733s whereas our new one takes only 100. Therefore, for instance 5, our solver performs 87% better. We also find that it takes less time in instance 1 and 3. For other instances, our solver is tantamount to the CCAnr. Figure 1b shows the average time taken to solve the instances that are not solvable by CCAnr within the time cut-off. Therefore, it is clear that our new solver CCAnrEQ outperforms CCAnr in this type of instance.

Fig. 2
figure 2

Time comparison in parity16 instances with equivalence removal

On Parity16 Instances There are five instances at parity16 and the average time comparison is shown in Fig. 2. Here, our solver performs better than CCAnr in the case of time for every instance. In instance 5, CCAnr takes 155s to solve and our new solver takes only 40s which is 75% less time needed. Our solver takes maximum time of 82s to solve instance 3, whereas CCAnr takes 110s. Therefore, it takes 25% less time in case of the worst one. As a result, we can deduce that if we remove equivalence from the problem then the result improves a lot for many instances.

Fig. 3
figure 3

Time comparison in BMC instances with equivalence removal

On BMC Instances Figure 3 shows time comparison in logarithmic scale for BMC instances. Here, instance 3 needed 160 s to solve by CCAnr whereas our new solver can solve it within a second. That means our new method takes 99% less time. On the other hand, CCAnr alone can not solve instance 1 anyway within a certain time cutoff. But after removing equivalence, it can be solved within 200 s. In other instances, our solver is also comparable as those instances need less than a second.

Fig. 4
figure 4

Time comparison in QG instances with equivalence removal

On QG Instances For QG instances, the average time comparison in the logarithmic scale is shown in Fig. 4. Here, if we look at instance number 4 then we can get that our solver takes only a second to solve this whereas CCAnr alone takes 100s. The new one also performs better on instance 1. For other instances, it takes the almost same time to solve those instances.

We want to show here that if we remove equivalence, the search improves a lot as many clauses and variables are reduced. In total, after removing equivalence, our solver can solve 7 more new instances that can not be solved by CCAnr alone, and in other instances, our search performs better in case of time.

Table 4 Success rate and time comparison of different types of instances

Table 4 refers to the success rate and time comparison of different types of instances by both solvers. Here, we can see that after removing equivalence, all those instances can be solved 100%. Most importantly, at parity games instances, the solver performs a huge improvement. Here, our solver solves 100% instances, whereas the CCAnr can solve only 45% of instances. In the case of average time, in seconds, comparison, our solver can solve those instances more quickly. Therefore, our solver outperforms in both the success rate and average time.

Fig. 5
figure 5

Number of solutions found within time interval

Figure 5 depicts the total number of instances that can be solved for different time cutoffs. Here, for every type of instance, we see that our new solver is better regarding any time cutoff. In Fig. 5a, if we give a time cutoff of one second then our solver can solve 75% instances whereas CCAnr can 50%. After that, for a time cutoff of 300 s, the new one can solve 100% instances but CCAnr can not solve it yet. Again, in Fig. 5b, if a time cutoff of 300 s is set then the new one can solve 55% instances whereas CCAnr can solve 36% only. And again, CCAnr can not solve any more instances although more than 800 s is given. In Fig. 5c, for parity16 instances, our solver takes only 100 seconds to solve 100% instances, whereas the other one can solve 60% instances at that time. For a time cutoff of 1 s, our solver can solve 100% instances and CCAnr can solve 75% only.

Further Methods

In this subsection, we explain the impact of some further methods added in the CCAnr [14]. Here, those methods are part of our future improvement. We have added those methods to ensure that it can improve the performance of the solver. We carry out our experiments to evaluate those methods in the same benchmarks.

The RS and SR Methods

The Random Sampling (RS) and the Step Relinking (SR) are described in subsection 3.3 and 3.2, respectively. Those are two alternatives to the greedy heuristics of CCAnr. The comparative results are shown in Table 5. Here, the result table shows the average time in seconds and success rates. In the case of the success rate, both approaches are equal to CCAnr. But in timing constraints, SR is not performing so well. It is taking more time than the other two. Therefore, we have decided not to go further with this approach. On the other hand, if we perform RS, then sometimes it returns a good outcome. Table 5 shows that the RS approach takes lower time in the case of BMC and QG instances.

Table 5 Comparison with other approaches

As we have found that the RS approach performs better in different instances, we have tried it with equivalence removal. Table 6 shows the effect of adding RS with equivalence removal in the case of solving time. Here, we observe that the performance of RS with EQ is better in BMC and QG instances.

Table 6 Time comparison of RS with equivalence removal

Figure 6 presents a time comparison on parity games instances. It is described as the time required to solve the instance after running it several times. In our case, the minimum time needed to solve while running those instances ten times. Here, we observe that in every instance, RS with EQ gets the solution more quickly than EQ. Although in the case of average time comparison EQ with RS is not better, at this point, it is far better. The comparison for other instances of parity games is not shown here, because the time required to solve those instances is very small.

Fig. 6
figure 6

Time comparison in parity games instances with EQ and RS

There are two parameters used in SR: the number of elite solutions to be saved or the size of the elite array as \(max\_elite\_pointer\) and the number of iterations as \(\gamma\). We set the parameter \(max\_elite\_pointer=40\) and \(\gamma =100\). On the other hand, we take \(\alpha =20\) numbers of random CCD variables in RS.

In summary, the experimental results demonstrate that if we remove equivalence from the problem, not only a lot of variables and clauses are removed only but also the search performs better. The CCAnr solver has a unit processing before performing local search approach. Now, we want to add an equivalence removal option here to improve the CCAnr.

Conclusion

In conclusion, this paper proposes a binary equivalence removal technique for SAT which works with the input file to reduce the volume of a CNF. To implement the idea, here two different algorithms, namely \(equivalence\;detection\), and \(equivalence\;removal\) are added. The \(equivalence\;detection\) part is used to detect whether any equivalent variables exist and the other part removes those equivalences. This technique effectively removes many variables from the CNF and produces a reduced input. Thus, a lot of variables are exempted from assigning during the search.

We examined the performance comparison for different structured SAT instances. Based on this technique, our new solver CCAnrEQ can solve many hard structured instances that were unsolved by the existing state-of-the-art solver CCAnr. In addition, for other instances, our solver performs significantly faster than its original version.

As a part of future work, we would like to implement equivalence removal for all clauses rather than binary clauses only. An efficient preprocessing can lead to solver performances many times. Therefore, we believe, by extracting other gates (e.g., \(AND, \;OR,\; NAND\), etc.) and calculating the value of those gates, this solver can perform better.