1 Paper overview

This paper is organized as follows. Section 2 is dedicated to the introduction, in which we give a general overview of the Max Sat problem and related literature. In particular, in this section, an elaborated description of MOCE and CCLS is provided.

In Sect. 3, MOCE and CCLS are combined into an improved algorithm. Instead of starting CCLS from a random initial assignment, it is started from excellent initial assignments, provided by MOCE. The experiments show that this provides a performance improvement, which becomes more and more significant as the instance grows. It has been noticed in other problems, such as TSP and QAP, that local search heuristics yield excellent results when started from initial solutions selected greedily with respect to expectation (Gutin and Yeo 2002; Gutin and Punnen 2006).

In Sect. 4, the correlation between the quality of the initial assignments provided to local search heuristics and the quality of the final assignments resulting from them is explored. We show that there is a strong long-lasting correlation between the quality of the initial assignment, from which the local search heuristic starts, and that of the final assignment provided by it.

The above correlation implies that, even in later stages of the local search, it is still in the shadow of the initial assignment. Thus, the quality of the initial assignment is crucial under practical time constraints. The observed correlation decays slower for denser instances, and faster for sparser ones. We show that the correlation is statistically significant, and estimate the impact of the improvement in the quality of the initial assignment on the quality of the final assignment.

A summary and conclusions are presented in Sect. 5.

2 Introduction

This section is dedicated to presenting the maximum satisfiability problem, related literature and the current state-of-the-art in the field. An elaboration on the method of conditional expectations, and the so-called configuration checking local search heuristic is provided as well.

In the maximum satisfiability (Max Sat) problem (Li and Manyà 2009), the input is a sequence of clauses over some boolean variables. Each clause is a disjunction of literals over different variables. A literal is either a variable or its negation. The goal is to find a truth (true/false) assignment for the variables, maximizing the number of satisfied (made true) clauses.

In the Max \(r\)-Sat problem, each clause is restricted to consist of at most r literals. The case in which each of the clauses consists of exactly r literals, is sometimes called Max E\(r\)-Sat. Denote by n the number of variables and by m the number of clauses. The density of the instance is \(\alpha = m/n\). As is customary in the literature, this paper focuses on the case where r and \(\alpha \) are constant.

As Max \(r\)-Sat (for \(r\ge 2\)) is NP-hard Ausiello et al. (2003, pp. 455–456), it cannot be exactly solved in polynomial time (assuming \(P\ne NP\)), and one must resort to approximation algorithms and heuristics. Numerous methods have been suggested for solving Max \(r\)-Sat, e.g. (de Boer et al. 2005; Selman et al. 1992, 1996; Luo et al. 2014; Chen and Santhanam 2015; Narodytska and Bacchus 2014; Ansótegui et al. 2013; Davies and Bacchus 2011; Heras et al. 2008; Niedermeier and Rossmanith 2000), and an annual competition of solvers has been held since 2006 (Argelich et al. 2020).

Various complete solvers for Max Sat have been developed during recent years, some of which were presented in several annual evaluations of Max Sat solvers (Argelich et al. 2020). Among these practical solvers, one can find Branch and Bound solvers [e.g., MaxSatz (Li et al. 2007) and Clone (Pipatsrisawat and Darwiche 2007)], Satisfiability based solvers [e.g., SAT4J (Le Berre and Parrain 2010) and QMaxSat (Koshimura et al. 2012)], and Unsatisfiablity based solvers [e.g., WPM1 (Ansótegui et al. 2012, 2009)], etc. Complete solvers which participated in the last evaluations include MaxHS (Davies 2013), Pacose (Paxian et al. 2018), EvalMaxSAT (Avellaneda 2020), and more. Other exact algorithms worth mentioning include the branching algorithm presented by Li et al. (2022) [which improves over the results of Chen and Kanj (2004)], and the algorithm presented by Xiao (2022).

Practical incomplete solvers for Max Sat are actively researched as well. Some of them competed in the incomplete track of the last evaluations—Loandra (Berg et al. 2017), TT-open-WBO-Inc (Nadel 2019), sls-mcs (Guerreiro et al. 2019), and more. More elaboration on randomized techniques can be found in Sect. 2.1. Elaboration on local search based solvers can be found in Sect. 2.2.

Overall, satisfiability related questions have attracted a lot of attention from the scientific community. Some of them were thoroughly studied, especially the satisfiability threshold density question (Chvátal and Reed 1992; Friedgut and Bourgain 1999; Achlioptas and Peres 2004; Mertens et al. 2006; Coja-Oghlan 2014; Ding et al. 2015). According to the phase transition conjecture (Achlioptas 2009), for each \(r\ge 2\), there exists a constant \(\alpha _r\), such that the probability of a random Max \(r\)-Sat instance over n variables and \(\alpha n\) clauses to be satisfiable converges to 1 if \(\alpha <\alpha _r\) and to 0 if \(\alpha >\alpha _r\), as n approaches infinity.

There has been a lot of work on the conjecture, including (Franco and Paull 1983; Chvátal and Reed 1992; Friedgut and Bourgain 1999; Mézard et al. 2002; Achlioptas and Peres 2004). The conjecture was proved by Ding et al. (2015) for sufficiently large r. Coja-Oghlan (2014) has shown that the threshold is \(\alpha _r = 2^r\ln 2-(1+\ln 2)/2+\varepsilon _r\), where \(\varepsilon _r \rightarrow 0\) as \(r\rightarrow \infty \). Some lower and upper bounds on the satisfiability threshold density, \(\alpha _r\), are provided in Table 1. Empirical results indicate that \(\alpha _3\approx 4.27\) and \(\alpha _4\approx 9.93\) (Mertens et al. 2006; Crawford and Auton 1996). For a comprehensive overview of the whole domain of satisfiability, we refer to (Biere et al. 2009).

Table 1 Theoretical lower and upper bounds on the satisfiability threshold density

For instances with density much above the satisfiability threshold, Coppersmith et al. (2004) studied the expected number of clauses satisfied by an optimal assignment, for a uniformly random instance of clause length r and density \(\alpha \). In particular, they provided theoretical lower and upper bounds on this number as the density goes to infinity.

An interesting study of Max \(3\)-Sat was provided in Prügel-Bennett and Tayarani-Najaran (2012). The authors claimed that many instances share similar statistical properties and provided empirical evidence for it. Simulation results on the autocorrelation of a random walk in the assignments space were provided for several instances, as well as extrapolation for the typical instance. Finally, a novel heuristic was introduced, ALGH, which exploits long-range correlations found in the problem’s landscape. This heuristic outperformed GSAT (Selman et al. 1992) and WSAT (Selman et al. 1996).

A slightly better version of this novel heuristic, based on clustering instead of averaging, is provided in another paper (Qasem and Prügel-Bennett 2010) of the same authors. This version turned out to outperform all the heuristics implemented by that time in the Sat solver framework UBCSAT (Tompkins and Hoos 2005).

Hoos et al. (2004) analyze how the method of generating random instances affects the autocorrelation and fitness-distance correlation. These quantities are considered fundamental to understanding the hardness of instances for local search algorithms. They raised the question of similarity of the landscape of different instances. Angel and Zissimopoulos (2000) calculated the autocorrelation coefficient of several problems and classified problem hardness accordingly.

Merz and Freisleben (1999) provide elaboration on correlations and on the way of harnessing them to design well-performing local search heuristics and memetic algorithms. The importance of selecting an appropriate neighborhood operator for producing the smoothest possible landscape was emphasized. For some landscapes, the autocorrelation length is shown to be associated with the average distance between local optima. This may be used to facilitate the design of mutations that lead memetic algorithms out of the basin of attraction of a local optimum they reached.

Using Walsh analysis (Goldberg 1988), an efficient way of calculating moments of the number of satisfied clauses of a given instance of Max \(r\)-Sat was suggested in Heckendorn et al. (1999). Simulation results for the variance and higher moments of the number of clauses satisfied by a random assignment over the ensemble of all instances were provided as well.

Sutton et al. (2009) show how to use the Walsh decomposition (Goldberg 1988) to efficiently calculate the exact autocorrelation function and autocorrelation length of any given instance of Max \(r\)-Sat. Furthermore, this decomposition is used to approximate the expectation of these quantities over the ensemble of all instances.

The autocorrelation length, which is closely related to the ruggedness of landscapes, is of interest in the area of landscape analysis (Malan and Engelbrecht 2013; Sutton et al. 2009; Angel and Zissimopoulos 2000; Hoos et al. 2004; Fontana et al. 1993; Angel and Zissimopoulos 2001; Chicano et al. 2012). It is fundamental to the theory and design of local search heuristics (Chicano et al. 2013; Merz and Freisleben 1999). According to the autocorrelation length conjecture (Stadler 2002), in many landscapes, the number of local optima can be estimated using an expression based on this quantity.

2.1 The method of conditional expectations

The simple randomized approximation algorithm, which assigns to each variable a uniformly random truth value, independently of all other variables, satisfies \(1-1/2^r\) of all clauses on the average. Furthermore, this simple algorithm can be easily derandomized using the well-known method of conditional expectations (MOCE, also known as Johnson’s Algorithm) (Erdös and Selfridge 1973; Yannakakis 1994; Coppersmith et al. 2004; Poloczek 2011; Poloczek and Williamson 2016; Poloczek et al. 2017; Costello et al. 2011), yielding an assignment that is guaranteed to satisfy at least this proportion of clauses.

In a sense, this method is optimal for Max \(3\)-Sat, as no polynomial-time algorithm for Max \(3\)-Sat can achieve a performance ratio exceeding 7/8 unless \(P=NP\) (Håstad 2001). Note that, typically, this method yields assignments that are much better than this worst-case bound.

MOCE iteratively constructs an assignment by going over the variables in some (arbitrary) order. At each iteration, it sets the seemingly better truth value to the currently considered variable. This is done by comparing the expected number of satisfied clauses under each of the two possible truth values it may set to the current variable.

For a given truth value, the expected number of satisfied clauses is the sum of three quantities. The first is the number of clauses already satisfied by the values assigned to the previously considered variables. The second is the additional number of clauses satisfied by the assignment of the given truth value to the current variable. The third is the expected number of clauses that will be satisfied by a random assignment to all currently unassigned variables. The truth value, for which the sum in question is larger, is the one selected for the current variable. Ties are broken arbitrarily or randomly. The whole process is repeated until all variables are assigned.

In an efficient implementation, each step of MOCE takes a constant time on the average. The main thing to do at each step is to find the better truth value for the currently assigned variable and residualize the instance accordingly. To find this truth value, we calculate the expected gain in case the variable is assigned true. If this gain is positive, the variable is assigned true. Otherwise, it is assigned false, as the gain in assigning the variable false is the additive inverse.

To find the expected gain from assigning the current variable true, it suffices to go over the clauses the variable appears in. Each unsatisfied clause, that is made satisfied by the assignment to the current variable, contributes \(2^{-l}\) to the overall expected gain, where l is the number of literals in the clause. In the residualization of the instance, these clauses are eliminated.

On the other hand, each clause, that remains unsatisfied by the assignment of true to the variable, contributes \(-2^{-l}\) to the overall expected gain. In the residualization of the instance, these clauses remain, but they are shortened by one literal—the one associated with the current variable.

The overall expected gain is defined as the sum of all the contributions obtained from all the clauses the current variable appears in. As each variable appears initially in \(r\alpha \) clauses on the average, the whole step of selecting and assigning a variable a truth value depends only on the clauses this variable appears in. Thus, assuming that r and \(\alpha \) are constants, a step takes a constant time on the average. Note, though, that in order to execute MOCE efficiently, its implementation requires two auxiliary data structures: a map from each variable to the list of clauses containing it and a map from each clause to the list of variables it contains.

The linear time complexity of MOCE makes it a good candidate for supplying an initial assignment to local search algorithms. It was chosen in this work due to its simplicity and efficiency.

Theoretical and empirical works related to MOCE, randomized algorithms and other algorithms of the same spirit, include (Coppersmith et al. 2004; Poloczek 2011; Poloczek and Williamson 2016; Poloczek et al. 2017; Costello et al. 2011).

2.2 Local search

Local search heuristics (Hoos and Stützle 2004) explore the assignment space. They usually start at a randomly generated assignment, and traverse the search space by flipping variables, usually one at a time. The leading solver configuration checking local search (CCLS) (Luo et al. 2014) follows this scheme and flips variables until some predefined number of flips is executed or the allotted time has been used up. Of course, if a satisfying assignment has been found, the execution is stopped as well. This work focuses on CCLS as a local search algorithm, due to its success in competitive evaluations.

CCLS performs two types of flips: random ones, with some predefined probability p, and greedy ones, with probability \(1-p\). Random flips just flip a randomly selected variable from a randomly selected unsatisfied clause. Greedy flips are ones that flip the seemingly best possible variable among all the variables whose configuration has been changed and who satisfy at least one currently unsatisfied clause. This variable is the one with the maximum score out of those variables, i.e., the one whose flipping will lead to the maximum number of satisfied clauses. Ties are broken uniformly at random.

Generally, the number of satisfied clauses after CCLS flips a variable is not necessarily larger than prior to the flip. In fact, it is even possible that flipping any of the candidate variables will result at a lower quality assignment. Also, the set of candidates may be empty in some of the greedy steps. In the latter case, CCLS performs a random flip instead.

In CCLS, a variable is considered as a “configuration changed" variable if, since its most recent flip, at least one of its neighboring variables has been flipped. Here, the neighbors of a variable are those variables sharing with it at least one clause.

Recent works, related to local search, configuration checking, CCLS, and algorithms of the same spirit, include (Pankratov and Borodin 2010; Cai and Su 2011; Cai et al. 2014; Luo et al. 2014, 2013; Abramé et al. 2017; Cai et al. 2015; Cai and Su 2013; Selman et al. 1994; Mills and Tsang 2000; Smyth et al. 2003; Cha et al. 1997).

Iterated Local Search (ILS) metaheuristics (Lourenço et al. 2019; Dong et al. 2013; Den Besten et al. 2001) execute multiple local searches, each time starting from a different initial assignment. A simple approach to iterated local search is to generate the initial assignments randomly. In this approach, there is no learning from the results of the previous local searches. A better approach is to analyze the assignments provided by previous searches in order to produce a better initial assignment for the current local search.

Cai et al. (2016) propose various ideas to apply local search to Partial Max Sat instances. Their ideas include weighting for hard clauses, separating the scores for hard and soft clauses, unit propagation with priority on hard unit clauses, and more. Hains et al. (2013) use Walsh polynomials and exact computation of hyperplane averages to construct initial solutions and to improve local search heuristics.

Martin and Otto (1996) combine Simulated Annealing with local search heuristics. To this end, they introduce a new metaheuristic. In this metaheuristic, they embed local search techniques into simulated annealing so that it explores only local optima. Bouhmala (2019) introduces an adapted version of the Kernighan–Lin algorithm for the maximum satisfiability problem. This adapted algorithm is embedded into the simulated annealing algorithm. The combined solver has been shown to provide a state-of-the-art performance.

3 Combining MOCE and CCLS

This section describes the improvement obtained by letting CCLS start its execution from good initial assignments, versus starting it from a random assignment (as done originally). Specifically, the good initial assignments used here are assignments provided by MOCE. We refer to the algorithm that starts from the assignment provided by MOCE as MOCE–CCLS. To emphasize the fact that the original CCLS algorithm starts from a random assignment, we will call it RAND–CCLS.

The first experiments are conducted on several families of random instances. Random Max \(r\)-Sat instances are constructed as follows. The clauses are selected independently of each other. Each clause is generated by selecting r distinct variables uniformly at random, then negating each of them independently with probability 1/2.

The families have been selected in a systematic way, so as to reveal trends in the performance, and connect it to the parameters of the family. Additional experiments are conducted on some public benchmarks. The experiments show that MOCE–CCLS scales much better than RAND–CCLS. In particular, as the instance size grows, so does the performance improvement provided by MOCE–CCLS over RAND–CCLS.

figure a

The points diagram above summarizes qualitatively the insights gained from the experiments. The higher the algorithm appears in the diagram, the better it is. Inspecting the diagram, one can see that MOCE–CCLS performs much better than MOCE, which in turn shows performance very far away from the baseline reference RAND. MOCE–CCLS performs better than RAND–CCLS as well. The last statement holds significantly for large instances, while for small and medium instances MOCE–CCLS maintains or slightly improves the performance of RAND–CCLS.

3.1 Comparative performance on structured benchmarks

This section focuses on random instances, for which the clauses are of length 3. The number of variables in each instance ranges from 10,000 to 1,000,000 in order to represent medium and large instances. The satisfiability threshold for Max \(r\)-Sat with \(r=3\) is about 4.27. In order to check cases both slightly below and above the satisfiability threshold, the density in the experiments ranges from 3 to 9. Such ranges allow us to systematically study the performance of the algorithms at hand on diverse families. For each family, 100 instances were selected uniformly at random, in the same way elaborated in Sect. 4.

Figure 1 describes the flow of the experiment. First a random instance is generated. Then two assignments are obtained: the first is generated randomly and the second is the result of a MOCE execution. For each such assignment CCLS is executed, given the instance and the assignment as input. This process is repeated 100 times (to reduce variability) and the performance of each algorithm is taken as the average number of unsatisfied clauses.

Fig. 1
figure 1

The flow of the experiment

For Max \(r\)-Sat, the reference baseline RAND unsatisfies \(m/2^r\) clauses on average, with an approximate standard deviation of \(\sqrt{m(1-1/2^r)/2^r}\) clauses (Berend and Twitto 2016). For convenience, the (theoretical) average number of clauses unsatisfied by RAND for the families we studied (namely, m/8), is provided in Table 2. The rows correspond to the various numbers of variables, n, and the columns to the various densities, \(\alpha \).

Table 2 also presents the average number of clauses unsatisfied by MOCE. It turns out that this number scales linearly with the number of clauses, and thus can be described as a proportion of the number of clauses, for any fixed density. The proportion of clauses unsatisfied by MOCE, out of all clauses, was 1.4%, 3%, 4.1%, and 4.9% for the densities 3, 5, 7, and 9, respectively. For each family, the percentage of clauses unsatisfied by each of the algorithms is provided in the last line of the table.

Table 2 The number of clauses unsatisfied by RAND and MOCE

At each of the n steps of MOCE, it selects the seemingly best truth value to an arbitrary unassigned variable. This is done by inspecting all the clauses it appears on, once. Thus, during the overall execution of MOCE, each of the \(m = n\alpha \) clauses in the instance is inspected at most r times—once for each of its literals. As the inspection time is constant, the overall time complexity of MOCE is proportional to rm. Thus, MOCE is a linear time algorithm.

MOCE is extremely fast in practice. In fact, its execution time is only a few seconds for the larger instances that were studied, and less than a second for the small and medium size instances. This time includes loading the instance, building it in the memory, and constructing the solution.

Although MOCE returns excellent solutions, it benefits a lot from supplementing it with a highly performing local search. In fact, executing the local search part of CCLS [which we simply call CCLS], starting from the solution returned by MOCE, a significant improvement is obtained. Namely, the number of unsatisfied clauses is significantly reduced at the local search stage.

This improvement is summarized in Table 3. In this table, the columns named “(M−MC)/M" present the relative improvement of MOCE–CCLS over MOCE. This relative improvement is the difference between the number of clauses unsatisfied by MOCE and the number of those unsatisfied by MOCE–CCLS, divided by the number of clauses unsatisfied by MOCE. The table also presents the improvement in the number of unsatisfied clauses of supplementing RAND with CCLS (which is simply the standard version of CCLS), under the columns named “(R−RC)/R".

Table 3 The improvement in the number of unsatisfied clauses by executing CCLS after RAND and MOCE

It is worth mentioning that this significant improvement comes with a caveat—a significant increase in the execution time. In fact, the results shown in Table 3 are based on 30 min executions of CCLS, after the initial solution (by either RAND or MOCE) has been obtained in just a few seconds.

This prolongation may, sometimes, be too much. This is the case especially when a large number of instances should be solved. Note that some real-world problems may be reduced to the solution of a large number of Max Sat instances. Such a situation arises, for example, in the recovery of encryption keys (Liao 2013; Liao et al. 2013; Kamal 2012; Kamal and Youssef 2010; Heninger 2011; Halderman et al. 2009).

One more caveat is due to the fact that, as the instances grow larger, this improvement decreases. As the instance grows larger, the number of flips CCLS can perform during the allotted time decreases, and with it decreases the obtained improvement as well.

In this context, note that, theoretically, if the execution time is unlimited, CCLS, due to its incorporated randomness, will eventually reach an optimal solution. Clearly, this has no practical implications, as the amount of time required is way out of reach.

This section is concluded by comparing MOCE–CCLS and RAND–CCLS head to head. The comparison is provided in Table 4 (which is wrapped for readability). For each density, we provide the number of clauses unsatisfied by RAND–CCLS, the number of clauses unsatisfied by MOCE–CCLS, and the relative improvement of MOCE–CCLS over RAND–CCLS. The latter number is the difference between the number of clauses unsatisfied by RAND–CCLS and the number of those unsatisfied by MOCE–CCLS, divided by the number of clauses unsatisfied by RAND–CCLS.

Table 4 MOCE–CCLS vs. RAND–CCLS

The results demonstrate the importance of the initial solution. Even after 30 min of local search, and using the excellent local search heuristics CCLS, the initialization with MOCE instead of RAND yields better solutions.

Moreover, MOCE–CCLS proved to be much more scalable than RAND–CCLS. As the instance grows larger, the improvement of MOCE–CCLS over RAND–CCLS becomes more significant. Whereas, for small instances, MOCE–CCLS improves RAND–CCLS by less than 1%, for large instances the improvement exceeds 50%.

In view of the above, when using a local search algorithm for Max Sat, one should strive to start the search from very good assignments. This holds as long as it is not too much time consuming to attain such assignments.

3.1.1 Experimentation information

The experiments described in this section were executed on a Sun Grid Engine (SGE) (2020) managed cluster of 31 identical IBM m4 servers with Intel Xeon E5-2620@2.0GHz processors. Each of the servers consists of 24 computation cores and 64GB of working memory. Thus, the platform had 744 computation cores and 1984GB of working memory at hand.

Each of the jobs submitted to the cluster was limited to use up to 3GB of working memory. Provided the load on the cluster, the parallelization was of about 100 times, thus reducing the experiment overall sequential time of approximately 4.11 months to around 1.25 days of parallel execution.

3.2 Comparative performance on public benchmarks

An international evaluation of solvers for the maximum satisfiability problem has been held annually since 2006 (Argelich et al. 2020). The evaluations are primarily divided to complete/incomplete and weighted/unweighted tracks. Here, we focus on the incomplete unweighted track, as the suggested heuristic is incomplete and it does not take weights into account. Until 2016, it was common to group the benchmarks into 3 categories—Random, Crafted, and Industrial. From 2017 onward, the instances are categorized only by the background problems from which they originated.

In this subsection, we compare the performance of RAND–CCLS and MOCE–CCLS. To this end, the Exclusive Won measure is used. According to this measure, for each instance, the winner is the competitor that satisfies more clauses. If both competitors satisfy the same number of clauses, the result is a draw for this instance. The overall winner of a given benchmark (collection of instances) is the competitor that wins more instances in this benchmark. In the Max Sat 2016 Evaluation, a measure called the Instance Won measure (equivalent to Exclusive Won for two competitors) has been used. In the Max Sat 2021 Evaluation a similar measure has been used. The Instance Won measure can be derived from the Exclusive Won measure by adding the number of draws to the number of wins for each of the competitors.

In Sect. 3.2.1 we compare the performance of RAND–CCLS and MOCE–CCLS on the instances in the Max Sat 2016 Evaluation. In Sect. 3.2.2 we compare the performance of these heuristics for blown up versions of the random instances of the 2016 Evaluation. Finally, in Sect. 3.2.3 we compare the performance of RAND–CCLS and MOCE–CCLS on the Max Sat 2021 Evaluation. Overall, the results indicate that MOCE–CCLS outperforms RAND–CCLS. Moreover, as the instances grow larger, so does the performance gap in favor of MOCE–CCLS.

3.2.1 Max Sat 2016 evaluation benchmarks

Table 5 presents the results of a comparative evaluation, conducted to assess the performance of RAND–CCLS and MOCE–CCLS on the incomplete unweighted track of Max Sat 2016 Evaluation. The table is divided into four parts— Random, Crafted, Industrial, and Max Sat 2016 (overall). Each of the three first parts provides the results for each of the benchmarks of the relevant category and the overall performance in the whole category (denoted by “subtotal”). The “Size” column gives the number of instances in the given collection/benchmark. The column “RC” (“MC”, respectively) gives the number of instances exclusively won by RAND–CCLS (MOCE–CCLS, respectively). The last column gives the number of instances ended with a draw. The last row gives the total over all benchmarks of Max Sat 2016 Evaluation.

The random and crafted instances of the Max Sat 2016 Evaluation are very small. They consist of several hundred variables and several thousand clauses per instance. Thus, they are less adequate for evaluation of local search heuristics. Indeed, most of the solvers participating in that evaluation found solutions with the same number of unsatisfied clauses in most of the instances. (Those solutions were probably optimal.) In our comparison of RAND–CCLS and MOCE–CCLS on the random and crafted instances, the situation was no different; both usually found solutions of the same quality. The industrial instances, on the other hand, are quite large. They consist of hundreds of thousands of variables and millions of clauses. On those instances, MOCE–CCLS gives RAND–CCLS a knockout, as one clearly see in the table.

Table 5 MOCE–CCLS versus RAND–CCLS, the Exclusive Won measure with a time limit of 5 min, on instances of Max Sat Evaluation 2016

3.2.2 Blown-up versions of instances from the 2016 evaluation

This subsection considers three additional benchmarks in the same spirit as the abrame-habet benchmark (consisting of random instances) of the 2016 Evaluation—but larger ones. In order to keep the exact same blend of instances, the new benchmarks were created by blowing up the number of variables and of clauses of the original instances from the abrame-habet benchmark. The densities (and clause lengths) are the same as in the original benchmark.

Three expanded benchmarks were created by enlarging the original one by factors of 10, 100, and 1000. For example, for the new benchmark obtained after blowing up by a factor of 10, for each instance, a new random instance was created whose numbers of variables and clauses are 10 times the corresponding numbers in the original instance. Thus, instances with 70 variables and 700 clauses gave rise in the tenfold blown up benchmark to instances with 700 variables and 7000 clauses.

We compared MOCE–CCLS and RAND–CCLS on the enlarged instances using the Exclusive Won measure. To this end, each of the two competitors ran on each of the instances for a few minutes (CPU time). The results are presented in Table 6. The first row in the table relates to the benchmark with instances blown up by a factor of 10. The time limit for instances in this benchmark is 1 min. Similarly, the second and third rows relate to the benchmarks with instances blown up by a factor of 100 (with a time limit of 2 min) and 1000 (with a time limit of 5 min), respectively.

There is an almost complete draw on the competition instances and on the tenfold blown up the instances. However, when scaling the instances by a factor of 100, MOCE–CCLS wins decisively, and when scaling by a factor of 1000, it beats RAND–CCLS by a knockout.

Table 6 MOCE–CCLS vs. RAND–CCLS, Exclusive Won measure, on random instances blown up from the abrame-habet benchmark

Note that the number of variables in instances of the original benchmark is between 70 and 200. Thus, even after blowing up by a factor of 1000, the obtained instances are of medium size only.

In the 1000-fold expanded benchmark, the improvement was manifested in an average of extra 423 clauses satisfied by MOCE–CCLS per instance. As RAND–CCLS is a state-of-the-art solver, with excellent performance, this number of extra clauses satisfied by MOCE–CCLS provides a significant improvement.

3.2.3 Max Sat 2021 evaluation benchmarks

In order to assess the performance of the heuristics on more recent benchmarks, we have also compared them on the incomplete unweighted track of the Max Sat 2021 Evaluation. This track consists of 30 benchmarks, with 155 instances altogether. Note that the amount of memory, provided for running each of the heuristics on each of the instances, was insufficient for 12 out of the 155 instances. Thus, our statistics is based on the 143 other instances (representing 28 benchmarks) only.

The results are given in Table 7. As one can see, MOCE–CCLS wins exclusively on all instances of 6 of the benchmarks and non-exclusively on 5 additional benchmarks. RAND–CCLS wins exclusively on all instances of 4 of the benchmarks (3 of which have but a single instance), and non-exclusively on 3 additional benchmarks. In 10 of the benchmarks there is a draw. Altogether, MOCE–CCLS clearly wins, with 59 exclusive wins, compared to RAND–CCLS with 31 only.

Table 7 MOCE–CCLS versus RAND–CCLS, the Exclusive Won measure with a time limit of 5 min, on instances of Max Sat Evaluation 2021

4 Correlation between the quality of initial and final assignments

In this section, we explore the correlation between the number of clauses unsatisfied by an initial assignment and the number of those unsatisfied by the corresponding final assignment, where the transition is by CCLS. The ongoing correlation during the execution is explored as well. CCLS was chosen for its excellent performance; a local search heuristic of lower quality may well be expected to yield an even stronger correlation.

To generate initial assignments of diverse quality, MOCE is revised by adding to it a parameter that allows us to invert its decision regarding the truth value for the current variable. This parameter, to which we refer as the inversion probability, is the probability to assign to a variable the truth value opposite to the one chosen by MOCE. Namely, for a given inversion probability \(0\le p \le 1\), at each step, the current variable is assigned the truth value chosen by MOCE with probability \(1-p\), and the opposite truth value with probability p. Thus, for \(p=0\) the algorithm is simply MOCE, while for \(p=1\) it is “anti-MOCE”. We refer to this tailored algorithm as PMOCE.

It is worth emphasizing that there is no reason to use this algorithm (with \(p\ne 0\)) to solve Max Sat instances. Its sole purpose is to construct initial assignments with a wide range of qualities. Indeed, our experiments showed a strong correlation between the value of p and the quality of the assignment provided by PMOCE.

We have generated a benchmark, consisting of 5 families of instances of Max \(3\)-Sat. Each of the families consists of 150 instances over 100,000 variables. The densities of the 5 families are 5, 7, 9, 12, 15. The instances in each family were generated uniformly at random as follows. The clauses of an instance were generated independently of each other. Each of the clauses was generated by selecting 3 distinct variables uniformly at random, and then negating each of them with probability 1/2, independently.

4.1 End-to-end correlation

In the following, we describe what was done in the experiment for each family. For each instance in the family, we executed PMOCE with 51 inversion probabilities, ranging from 0 to 1 in steps of 0.02. Thus, we obtained 51 initial assignments with presumed diverse quality. From each of these initial assignments, a local search was started using CCLS, and thus obtained 51 final assignments. The result of the 51 executions, was 51 pairs of numbers. Each pair consisted of the number of clauses unsatisfied by the initial assignment generated by PMOCE, and the number of unsatisfied clauses at the end of the search done by CCLS. The cutoff time of CCLS was set to 30 min, measured in CPU time.

For each instance, the correlation coefficient was calculated over the corresponding 51 pairs. After going over the whole family, we had 150 correlation coefficients—one for each instance. Then, the mean and standard deviation of these 150 values of correlation coefficients were calculated.

For each of the correlation coefficients, we also calculated the p value. The p value is the probability that this correlation, or a higher one, would have been found if the correlation coefficient was in fact zero (null hypothesis). If this probability is lower than the conventional 5% (i.e., the p value is less than 0.05), the correlation coefficient is considered statistically significant. For each family, the average p value was calculated over the 150 correlation coefficients as a measure of the statistical significance of the results.

To measure the impact of the improvement of the quality of an initial assignment on the quality of the corresponding final assignment, regression analysis was applied. Specifically, we calculated the regression line of each of the instances of a family, and took its slope as a measure of the strength of the impact. The average of these 150 slopes was taken as a measure of the strength of this impact in a given family.

The results are provided in Table 8. Each line summarizes the results of one family. For example, the first line summarizes the results of the family with density 5. In this family, instances are of 100,000 variables and 500,000 clauses. The mean correlation coefficient measured (over 150 random instances) was 0.52, with a standard deviation of 0.11. The mean p value was \(1.7\cdot 10^{-3}\), and the mean and standard deviation of the regression slope were \(0.5\cdot 10^{-3}\) and \(0.1\cdot 10^{-3}\), respectively.

Table 8 End-to-end correlation coefficients and regression slopes
Fig. 2
figure 2

Histograms of end-to-end correlation coefficients

Figure 2 depicts histograms of the 150 end-to-end correlation coefficients of the family of density 5 (Fig.  2a) and for the family of density 15 (Fig. 2b).

The results show a strong positive correlation between the quality of the initial and final assignment for all densities. The correlation is stronger for denser families. The p value is lower by far than the conventional 0.05, which indicates that the correlation coefficients obtained in the experiments are statistically very significant.

The high correlation between the quality of the initial assignment and that of the final one makes it clear that the search continues to be in the shadow of the initial assignment for a long period of time. Thus, starting a local search from an excellent initial assignment, rather than a random assignment, its performance may improve (as long as the selection of such an initial assignment does not consume too much time).

While the correlation is strong, the regression slope suggests that a large improvement in the initial assignment yields only a small improvement in the final assignment. As CCLS eventually converges to the optimal solution, there is little room for improvement by the end of its execution, so that this regression slope makes sense. Moreover, it is to be expected that the slope becomes even smaller as one runs CCLS longer.

Note that, after 30 min of execution, CCLS is way beyond its rapid improvement stage. In fact, it is deep in its convergence stage and shows relatively minor improvements as time goes by. This validates the correlation observed as meaningful.

Fig. 3
figure 3

Number of unsatisfied clauses as function of the number of flips

Figure 3 depicts the number of unsatisfied clauses as a function of the number of flips made, for an arbitrary (but representative) instance from the family of density 15. The graph shows this number for inversion probability of 0 (MOCE) and 1 (anti-MOCE). One can see that CCLS enters its convergence stage quite early in the execution.

We also emphasize the two phases seen in the graphs. The first phase is the rapid improvements phase. In this phase, the number of unsatisfied clauses is decreasing rapidly. This phase ends after about 100,000 flips. The second phase, which we call the convergence phase, continues from there onward. In this phase, the improvements are rarer and smaller.

4.2 Ongoing correlation

Besides the end-to-end correlation, we explored the ongoing correlation during the experiment. To this end, for each initial assignment, the minimum number of unsatisfied clauses found so far was recorded, not only at the end of the execution, but also after every 1000 flips made by CCLS. Then we calculated the correlation coefficient between the number of clauses unsatisfied by the initial assignment and the number of unsatisfied clauses recorded at each 1000 flips snapshot.

The number of flips made during the execution is very different for different families. In a denser instance, a flip takes longer, so that less flips are made. Even for instances of the same family, the number of flips varies. The statistics are provided only up to the minimal number of flips made, over all instances in the family.

Fig. 4
figure 4

Ongoing correlation decay as a function of the number of flips

Figure 4 depicts the decay in the correlation as a function of time, where time is measured in number of flips made from the beginning of the local search. It seems that the number of flips is the natural time scale to measure the correlation decay. While the graphs are noisy, the trend is clear—the correlation gradually decays as a function of the number of flips made, and it does so slower for denser families. Moreover, as the density grows larger, the differences in the decay seem to be smaller and the graphs are almost overlapping.

Figure 4a shows the full results. It provides the graphs of correlation decay of all the families. In the figure, one may observe that the number of flips made in denser families is much smaller than the number of flips made in sparser ones. For example, the minimal number of flips over all instances and inversion probabilities for the family of density 5 was about 20,600,000, while for the family of density 15 it was about 1,500,000. The reason is that, in denser families, each variable appears in a larger number of clauses, and a flip makes a larger number of variables available for selection subsequently. Thus, at each step, CCLS has to deal with a larger pool of candidates for flipping, which consumes more time per flip.

Figure 4b depicts the same graphs, but only up to about 1,500,000 flips, which is the place where the graph of the family of density 15 ends. In this figure, one can clearly see the faster decay of the correlation in sparser families, as well as the smaller differences between the decay in denser families.

Figure 4c zooms in on the first 150,000 flips. During this stage, one may observe a phenomenon of phase transition in the decay of the correlation. The empirical results suggest two phases of decay. The first phase starts at the beginning and ends after about 60,000–80,000 flips. In this phase, the correlation decays very slowly. This phase is characterized by a rapid decrease in the number of unsatisfied clauses, and is aligned with the rapid decrease shown in Fig. 3.

The second phase is from about 60,000–80,000 flips onward. This phase is characterized by a faster decay in the correlation. It is aligned with the convergence stage of CCLS, shown in Fig.  3, in which the number of unsatisfied clauses is decreasing slowly over time.

The position of the phase transition around 60,000-80,000 flips may be explained by the fact that the initial assignment provided by MOCE is expected to be at a distance of about 50,000 flips from an optimal solution. So the first 50,000 flips are significant. But, as about 30% of the flips of CCLS are random, and not all the flips are useful in general, this area stretches further to about 60,000–80,000 flips.

During the first phase, the initial assignment is very important in determining the correlation. In all the executions, the decrease in the number of satisfied clauses is rapid and considerable at this phase. Thus, the different executions maintain their relative positions, which leads to a very slow decrease in the correlation. Afterward, the correlation decays at about the same speed, as can be seen in Fig. 4c.

Note that Fig. 4c demonstrates an initially quite strange-looking phenomenon. Namely, between about 50,000 flips and 80,000 flips, depending on the family, the correlation increases. The reason is the large variability in the quality of the initial assignments. Recall that we managed to get initial assignments in a very wide range, starting at very inferior “anti-MOCE" assignments and ending at superior MOCE assignments. When starting from a high-quality assignment, the rapid improvement phase is shorter, and when starting from a low-quality assignment, it is longer. In the time interval, where for good initial assignments the rapid improvement phase has ended already, while for other assignments it has not, the correlation is understandably smaller. After the rapid increase phase was finished for all initial assignments, the correlation returns to a higher level.

4.3 Experimentation information

The experiments described in this section were carried out on the same platform as those in Sect. 3. Provided the load on the cluster, the parallelization was of about 300 times, thus reducing the experiments overall sequential time of approximately 2.18 years to around 2.66 days of parallel execution.

5 Summary and conclusions

This paper suggests combining the method of conditional expectations (MOCE) with the state-of-the-art heuristic configuration checking local search (CCLS). Instead of starting CCLS from random initial assignments, we suggest starting it from excellent initial assignments, provided by MOCE. The combined MOCE–CCLS solver provided a significant improvement over CCLS. Moreover, MOCE–CCLS proved to be much more scalable than CCLS. Namely, while for small and medium size instances the improvement is minor, for large instances CCLS breaks down while MOCE–CCLS continues with the same performance.

To provide empirical basis to the above result, we have explored the correlation between the quality of initial assignments provided to local search heuristics and that of the corresponding final assignments. The experiment results have shown that this correlation is significant and long-lasting. Thus, under practical time constraints, the quality of the initial assignment is crucial to the performance of local search heuristics.

Given the above, we recommend MOCE–CCLS over RAND–CCLS, and generally recommend starting local search heuristics from assignments even better than those provided by MOCE, as long as such may be obtained in linear time or slightly longer (say, by a logarithmic factor).