1 Introduction

The Boolean satisfiability problem (SAT), as well as many related questions like equivalence, counting, enumeration, and numerous versions of optimization, are of great importance in both theory and applications of computer science. In this article, we focus on the solution-space structure: We consider the solution graph, where the vertices are the solutions, and two solutions are connected iff they differ in exactly one variable. For this implicitly defined graph, we then study the connectivity and st-connectivity problems, and the diameter of connected components. Figures 1 and 2 give an impression of how solution graphs may look like.

Fig. 1
figure 1

Depictions of the subgraph of the 5-dimensional hypercube graph induced by a typical random Boolean relation with 12 elements. Left: highlighted on a orthographic hypercube projection. Center: highlighted on a “Spectral Embedding” of the hypercube graph by Mathematica. Right: the sole subgraph, arranged by Mathematica

Fig. 2
figure 2

Subgraphs of the 8-dimensional hypercube graph induced by typical random relations with 40, 60 and 80 elements, arranged by Mathematica

While the standard satisfiability problem is defined for propositional formulas, which can be seen as one special form of descriptions for Boolean relations, satisfiability and related problems have also been considered for many alternative descriptions, e.g. Boolean constraint satisfactions problems (CSPs), Boolean circuits, binary decision diagrams, and Boolean neural networks. For the usual formulas with the connectives ∧, ∨ and ¬, there are several common variants. A special form are formulas in conjunctive normal form (CNF-formulas). A generalization of CNF-formulas are CNF(\(\mathcal {S}\))-formulas, which are conjunctions of constraints on the variables taken from a finite template set \(\mathcal {S}\).

Here we consider another type of generalization: Arbitrarily nested formulas built with connectives from some finite set of Boolean functions B (where the arity may be greater than two), known as Bf o r m u l a s. Also we study B -circuits, where analogously the allowed gates implement the functions from B. As a further extension we consider partially quantified B-formulas.

A direct application of st-connectivity in solution graphs are reconfiguration problems, that arise when we wish to find a step-by-step transformation between two feasible solutions of a problem, such that all intermediate results are also feasible. Recently, the reconfiguration versions of many problems such as Independent-Set, Vertex-Cover, Set-Cover Graph-k-Coloring, Shortest-Path have been studied, and complexity results obtained (see e.g. [12, 13]). Also of relevance are the connectivity properties to the problem of structure identification, where one is given a relation explicitly and seeks a short representation of some kind (see e.g. [6]); this problem is important especially for learning in artificial intelligence.

A better understanding of the solution space structure also promises advancement of SAT algorithms: It has been discovered that the solution space connectivity is strongly correlated to the performance of standard satisfiability algorithms like WalkSAT and DPLL on random instances: As one approaches the satisfiability threshold (the ratio of constraints to variables at which random k-CNF-formulas become unsatisfiable for k ≥ 3) from below, the solution space (with the connectivity defined as above) fractures, and the performance of the algorithms deteriorates [16, 17]. These insights mainly came from statistical physics, and lead to the development of the survey propagation algorithm, which has much better performance on random instances [16].

While current SAT solvers normally accept only CNF-formulas as input, one of the most important applications of satisfiability testing is verification and optimization in Electronic Design Automation (EDA), where the instances derive mostly from digital circuit descriptions [27]. Though many such instances can easily be encoded in CNF, the original structural information, such as signal ordering, gate orientation and logic paths, is lost, or at least obscured. Since exactly this information can be very helpful for solving these instances, considerable effort has been made recently to develop satisfiability solvers that work with the circuit description directly [27], which have far superior performance in EDA applications, or to restore the circuit structure from CNF [9]. This is a major motivation for our study.

Our perspective is mainly from complexity theory: We classify B-formulas and B-circuits by the worst-case complexity of the connectivity problems, analogously to Schaefer’s dichotomy theorem for satisfiability of CSPs from 1978 [22], Lewis’ dichotomy for satisfiability of B-formulas from 1979 [14], and Gopalan et al.’s classification for the connectivity problems of CSPs from 2006 [10]. Along the way, we will examine structural properties of the solution graph like its maximal diameter, and devise efficient algorithms for solving the connectivity problems.

We begin with a formal definition of some central concepts.

Definition 1

An n-ary Boolean relation is a subset of {0,1}n (n ≥ 1). If ϕ is some description of an n-ary Boolean relation R, e.g. a propositional formula (where the variables are taken in lexicographic order), the solution graph G(ϕ) of ϕ is the subgraph of the n-dimensional hypercube graph induced by the vectors in R, i.e., the vertices of G(ϕ) are the vectors in R, and there is an edge between two vectors precisely if they differ in exactly one position.

We use a, b,… to denote vectors of Boolean values and x, y,… to denote vectors of variables, a = (a 1, a 2,…) and x = (x 1, x 2,…).

The Hamming weight |a| of a Boolean vector a is the number of 1’s in a. For two vectors a and b, the Hamming distance |ab| is is the number of positions in which they differ.

If a and b are solutions of ϕ and lie in the same connected component of G(ϕ), we write d ϕ (a, b) to denote the shortest-path distance between a and b.

The diameter of a connected component is the maximal shortest-path distance between any two vectors in that component. The diameter of G(ϕ)is the maximal diameter of any of its connected components.

2 Connectivity of CNF-Formulas

Research has focused on the structure of the solution space only quite recently: One of the earliest studies on solution-space connectivity was done for CNF(\(\mathcal {S}\))-formulas with constants (see the definition below), begun in 2006 by Gopalan et al. ([10, 11, 15, 24]).

In our proofs for B-formulas and B-circuits, we will use Gopalan et al.’s results for 3-CNF-formulas, so we have to introduce some related terminology.

Definition 2

A CNF-formula is a Boolean formula of the form C 1∧⋯∧C m (1 ≤ m<), where each C i is a clause, that is, a finite disjunction of literals (variables or negated variables). A k -CNF-formula (k ≥ 1) is a CNF-formula where each C i has at most k literals.

For a finite set of Boolean relations \(\mathcal {S}\), a CNF( \(\mathcal {S}\) )-formula (with constants) over a set of variables V is a finite conjunction C 1∧⋯∧C m , where each C i is a constraint application (constraint for short), i.e., an expression of the form R(ξ 1,…, ξ k ), with a k-ary relation \(R\in \mathcal {S}\), and each ξ j is a variable in V or one of the constants 0, 1.

A k-clause is a disjunction of k variables or negated variables. For 0 ≤ ik, let D i be the set of all satisfying truth assignments of the k-clause whose first i literals are negated, and let S k = {D 0,…, D k }. Thus, CNF(S k ) is the collection of k-CNF-formulas.

Gopalan et al. studied the following two decision problems for CNF(\(\mathcal {S}\))-formulas:

  • the connectivity problem Conn ( \(\mathcal {S}\) ): given a CNF(\(\mathcal {S}\))-formula ϕ, is G(ϕ) connected? (if ϕ is unsatisfiable, then G(ϕ) is considered connected)

  • the st -connectivity problem st-Conn ( \(\mathcal {S}\) ): given a CNF(\(\mathcal {S}\))-formula ϕ and two solutions s and t, is there a path from s to t in G(ϕ)?

Lemma 1

[10, Lemm 3.6] st-Conn(S 3) and Conn(S 3) are PSPACE-complete.

Showing that the problems are in PSPACE is straightforward: Given a CNF(S 3)-formula ϕ and two solutions s and t, we can guess a path of length at most 2n between them and verify that each vertex along the path is indeed a solution. Hence st-Conn(S 3) is in NPSPACE, which equals PSPACE by Savitch’ s theorem. For Conn(S 3), by reusing space we can check for all pairs of vectors whether they are satisfying, and, if they both are, whether they are connected in G(ϕ).

The hardness-proof is quite intricate: it consists of a direct reduction from the computation of a space-bounded Turing machine M. The input-string w of M is mapped to a CNF(S 3)-formula ϕ and two satisfying assignments s and t, corresponding to the initial and accepting configuration of a Turing machine M constructed from M and w, s.t. s and t are connected in G(ϕ) iff M accepts w. Further, all satisfying assignments of ϕ are connected to either s or t, so that G(ϕ) is connected iff M accepts w.

Lemma 2

[10, Lemm 3.7] For n ≥ 2 , there is an n-ary Boolean function f with f(1,…,1) = 1 and a diameter of at least \(2^{\left \lfloor \frac {n}{2}\right \rfloor } \).

The proof of this lemma is by direct construction of such a formula.

3 Circuits, Formulas, and Post’s Lattice

An n-ary Boolean function is a function f:{0,1}n→{0,1}. Let B be a finite set of Boolean functions.

A B -circuit \(\mathcal {C}\) with input variables x 1,…, x n is a directed acyclic graph, augmented as follows: Each node (here also called gate) with indegree 0 is labeled with an x i or a 0-ary function from B, each node with indegree k > 0 is labeled with a k-ary function from B. The edges (here also called wires) pointing into a gate are ordered. One node is designated the output gate. Given values a 1,…, a n ∈{0,1} to x 1,…, x n , \(\mathcal {C}\) computes an n-ary function \(f_{\mathcal {C}}\) as follows: A gate v labeled with a variable x i returns a i , a gate v labeled with a function f computes the value f(b 1,…, b k ), where b 1,…, b k are the values computed by the predecessor gates of v, ordered according to the order of the wires. For a more formal definition see [ 26].

A B -formula is defined inductively: A variable x is a B-formula. If ϕ 1,…, ϕ m are B-formulas, and f is an n-ary function from B, then f(ϕ 1,…, ϕ n ) is a B-formula. In turn, any B-formula defines a Boolean function in the obvious way, and we will identify B-formulas and the function they define.

It is easy to see that the functions computable by a B-circuit, as well as the functions definable by a B-formula, are exactly those that can be obtained from B by superposition, together with all projections [2]. By superposition, we mean substitution (that is, composition of functions), permutation and identification of variables, and introduction of fictive variables (variables on which the value of the function does not depend). This class of functions is denoted by [B]. B is closed (or said to be a clone) if [B] = B. A base of a clone F is any set B with [B] = F.

Already in the early 1920s, Emil Post extensively studied Boolean functions [20]. He identified all clones, found a finite base for each of them, and detected their inclusion structure: The clones form a lattice, called Post’s lattice, depicted in Fig. 3.

Fig. 3
figure 3

Graphical representation of Post’s lattice. The classes on the hard side of the dichotomy for the connectivity problems and the diameter are shaded gray; the light gray shaded ones are only on the hard side for formulas with quantifiers. For comparison, the classes for which SAT (without quantifiers) is NP-complete are circled bold

The following clones are defined by properties of the functions they contain, all other ones are intersections of these. Let f be an n-ary Boolean function.

  • B F is the class of all Boolean functions.

  • R 0 (R 1) is the class of all 0-reproducing (1-reproducing) functions, f is c-reproducing, if f(c,…, c) = c, where c∈{0,1}.

  • M is is the class of all monotone functions, f is monotone, if a 1b 1,…, a n b n implies f(a 1,…, a n ) ≤ f(b 1,…, b n ).

  • D is the class of all self-dual functions, f is self-dual, if \(f(x_{1},\ldots ,x_{n})=\overline {f(\overline {x_{1}},\ldots ,\overline {x_{n}})}\).

  • L is the class of all affine (on linear) functions, f is affine, if \(f(x_{1},\ldots ,x_{n})=x_{i_{1}}\oplus \cdots \oplus x_{i_{m}}\oplus c\) with i 1,…, i m ∈{1,…, n} and c∈{0,1}.

  • S 0 (S 1) is the class of all 0-separating (1-separating) functions, f is c-separating, if there exists an i∈{1,…, n} s.t. a i = c for all af −1(c), where c∈{0,1}.

  • \(\mathsf {S}_{0}^{m}\) \(\left (\mathsf {S}_{1}^{m}\right )\) is the class of all functions that are 0-separating (1-separating) of degree m, f is c-separating of degree m, if for all Uf −1(c) of size |U| = m there exists an i∈{1,…, n} s.t. a i = c for all aU (c∈{0,1}, m ≥ 2).

The definitions and bases of all classes are given in Table 1. For an introduction to Post’s lattice and further references see e.g. [2].

Table 1 List of all closed classes of Boolean functions with definitions and bases

The complexity of numerous problems for B-circuits and B-formulas has been classified by the types of functions allowed in B with help of Post’s lattice (see e.g. [21, 23]), starting with satisfiability: Analogously to Schaefer’s dichotomy for CNF(\(\mathcal {S}\))-formulasfrom 1978, Harry R. Lewis shortly thereafter found a dichotomy for B-formulas [14]: If [B] contains the function \(x\wedge \overline {y}\), Sat is NP-complete, else it is in P.

While for B-circuits the complexity of every decision problem solely depends on [B] (up to AC 0 isomorphisms), for B-formulas this need not be the case (though it usually is, as for satisfiability and our connectivity problems, as we will see): The transformation of a B-formula into a B -formula might require an exponential increase in the formula size even if [B]=[B ], as the B -representation of some function from B may need to use some input variable more than once [18]. For example, let \(h(x,y)=x\wedge \overline {y}\); then (x∧y)∈[{h}] since xy = h(x, h(x, y)), but it is easy to see that there is no shorter {h}-representation of xy.

4 Computational and Structural Dichotomies for Connectivity

Now we consider the connectivity problems for B-formulas and B-circuits:

  • BF-Conn(B): Given a B-formula ϕ, is G(ϕ) connected?

  • st-BF-Conn(B): Given a B-formula ϕ and two solutions s and t, is there a path from s to t in G(ϕ)?

The corresponding problems for circuits are denoted Circ-Conn(B) resp. st-Circ-Conn(B).

Theorem 1

Let B be a finite set of Boolean functions.

  1. 1.

    If BM, BL, or BS 0, then

    1. (a)

      st-Circ-Conn(B) and Circ-Conn(B) are in P,

    2. (b)

      st-BF-Conn(B) and BF-Conn(B) are in P,

    3. (c)

      the diameter of every function f ∈ [B] is linear in the number of variables of f.

  2. 2.

    Otherwise,

    1. (a)

      st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,

    2. (b)

      st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,

    3. (c)

      there are functions f ∈ [B] such that their diameter is exponential in the number of variables of f.

The proof follows from the Lemmas in the next subsections. By the following proposition, we can relate the complexity of B-formulas and B-circuits.

Proposition 1

Every B-formula ϕ can be transformed into an equivalent B-circuit \(\mathcal {C}\) in polynomial time.

Proof

Any B-formula is equivalent to a special B-circuit where all function-gates have outdegree at most one: For every variable x of ϕ and for every occurrence of a function f in ϕ there is a gate in \(\mathcal {C}\), labeled with x resp. f. It is clear how to connect the gates. □

4.1 The Easy Side of the Dichotomy

Lemma 3

If BM, the solution graph of any n-ary function f ∈ [B] is connected, and d f(a, b) = |ab| ≤ n for any two solutions a and b.

Proof

Table 1 shows that f is monotone in this case. Thus, either f = 0, or (1,…,1) must be a solution, and every other solution a is connected to (1,…,1) in G(ϕ) since (1,…,1) can be reached by flipping the variables assigned 0 in a one at a time to 1. Further, if a and b are solutions, b can be reached from a in |ab| steps by first flipping all variables that are assigned 0 in a and 1 in b, and then flipping all variables that are assigned 1 in a and 0 in b. □

Lemma 4

If BS 0, the solution graph of any function f ∈ [B] is connected, and d f(a, b) ≤ |ab|+2 for any two solutions a and b.

Proof

Since f is 0-separating, there is an i such that a i = 0 for every vector a with f(a) = 0, thus every b with b i = 1 is a solution. It follows that every solution t can be reached from any solution s in at most |st|+2 steps by first flipping the i-th variable from 0 to 1 if necessary, then flipping all other variables in which s and t differ, and finally flipping back the i-th variable if necessary. □

Lemma 5

If BL,

  1. 1.

    st-Circ-Conn(B) and Circ-Conn(B) are in P,

  2. 2.

    st-BF-Conn(B) and BF-Conn(B) are in P,

  3. 3.

    for any function f ∈ [B], d f (a, b) = |ab| for any two solutions a and b that lie in the same connected component of G(ϕ).

Proof

Since every function fL is linear, \(f(x_{1},\ldots ,x_{n})=x_{i_{1}}\oplus \ldots \oplus x_{i_{m}}\oplus c\), and any two solutions s and t are connected iff they differ only in fictive variables: If s and t differ in at least one non-fictive variable (i.e., an \(x_{i}\in \{x_{i_{1}},\ldots ,x_{i_{m}}\}\)), to reach t from s, x i must be flipped eventually, but for every solution a, any vector b that differs from a in exactly one non-fictive variable is no solution. If s and t differ only in fictive variables, t can be reached from s in |st| steps by flipping one by one the variables in which they differ.

Since {xy,1} is a base of L, every B-circuit \(\mathcal {C}\) can be transformed in polynomial time into an equivalent {xy,1}-circuit \(\mathcal {C}^{\prime }\) by replacing each gate of \(\mathcal {C}\) with an equivalent {xy,1}-circuit. Now one can decide in polynomial time whether a variable x i is fictive by checking for \(\mathcal {C}^{\prime }\) whether the number of “backward paths” from the output gate to gates labeled with x i is odd, so st-Circ-Conn(B) is in P.

\(G(\mathcal {C})\) is connected iff at most one variable is non-fictive, thus Circ-Conn(B) is in P.

By Proposition 1, st-BF-Conn(B) and BF-Conn(B) are in P also. □

This completes the proof of the easy side of the dichotomy.

4.2 The Hard Side of the Dichotomy

Proposition 2

st-Circ-Conn(B) and Circ-Conn(B), as well as st-BF-Conn(B) and BF-Conn(B), are in PSPACE for any finite set B of Boolean functions.

Proof

This follows as in Lemma 3.6 of [10] (see Lemma 1). □

An inspection of Post’s lattice shows that if \(B\nsubseteq \mathsf {M}\), \(B\nsubseteq \mathsf {L}\), and \(B\nsubseteq \mathsf {S}_{0}\), then [B]⊇S 12, [B]⊇D 1, or \([B]\supseteq \mathsf {S}_{02}^{k}\,\forall k\geq 2\), so we have to prove PSPACE-completeness and show the existence of B-formulas with an exponential diameter in these cases.

In the proofs, we will use the following notation: We write x = c or x = c 1c n for (x 1 = c 1)∧⋯∧(x n = c n ), where c = (c 1,…, c n ) is a vector of constants; e.g., x = 0 means \(\overline {x}_{1}\wedge \cdots \wedge \overline {x}_{n}\), and x = 101 means \(x_{1}\wedge \overline {x}_{2}\wedge x_{3}\). Further, we use x∈{a, b,…} for (x = a)∨(x = b)∨…. Also, we write \(\psi (\boldsymbol {\overline {x}})\) for \(\psi (\overline {x}_{1},\ldots ,\overline {x}_{n})\). If we have two vectors of Boolean values a and b of length n and m resp., we write ab for their concatenation (a 1,…, a n , b 1,…b m ).

All hardness proofs are by reductions from the problems for 1-reproducing 3-CNF-formulas, which are PSPACE-complete by the following proposition.

Proposition 3

For 1-reproducing 3-CNF-formulas, the problems st-Conn and Conn are PSPACE-complete.

Proof

In the PSPACE-hardness proof for CNF(S 3)-formulas (Lemma 3.6 of [10], see Lemma 1), two satisfying assignments s and t to the constructed formula ϕ are known, so we can construct a connectivity-equivalent 1-reproducing 3-CNF-formula ψ, e.g. as ψ(x) = ϕ(x 1s 1⊕1,…, x n s n ⊕1), and then check connectivity for ψ instead of ϕ. □

Lemma 6

If [B] ⊇ S 12,

  1. 1.

    st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,

  2. 2.

    st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,

  3. 3.

    for n ≥ 3, there is an n-ary function f ∈ [B] with diameter of at least \(2^{\left \lfloor \frac {n-1}{2}\right \rfloor } \).

Proof

1. We reduce the problems for 1-reproducing 3-CNF-formulas to the ones for B-formulas: We map a 1-reproducing 3-CNF-formula ϕ and two solutions s and t of ϕ to a B-formula ϕ and two solutions s and t of ϕ such that s and t are connected in G(ϕ ) iff s and t are connected in G(ϕ), and such that G(ϕ ) is connected iff G(ϕ) is connected.

While the construction of ϕ is quite easy for this lemma, the construction for the next two lemmas is analogous but more intricate, so we proceed carefully in two steps, which we will adapt in the next two proofs: In the first step, we give a transformation T that transforms any 1-reproducing formula ψ into a connectivity-equivalent formula T ψ S 12 built from the standard connectives. Since S 12⊆[B], we can express T ψ as a B-formula \(T_{\psi }^{*}\). Now if we would apply T to ϕ directly, we would know that T ϕ can be expressed as a B-formula. However, this could lead to an exponential increase in the formula size (see Section 3), so we have to show how to construct the B-formula in polynomial time. For this, in the second step, we construct a B-formula ϕ directly from ϕ (by applying T to the clauses and the ∧’s individually), and then show that ϕ is equivalent to T ϕ ; thus we know that ϕ is connectivity-equivalent to ϕ.

Step 1. From Table 1, we find that S 12 = S 1R 2 = S 1R 0R 1, so we have to make sure that T ψ is 1-seperating, 0-reproducing, and 1-reproducing. Let

$$T_{\psi}=\psi\wedge y, $$

where y is a new variable.

All solutions a of T ψ (x, y) have a n + 1 = 1, so T ψ is 1-seperating and 0-reproducing; also, T ψ is still 1-reproducing. Further, for any two solutions s and t of ψ(x), s = s⋅1 and t = t⋅1 are solutions of T ψ (x, y), and it is easy to see that they are connected in G(T ψ ) iff s and t are connected in G(ψ), and that G(T ψ ) is connected iff G(ψ) is connected.

Step 2. The idea is to parenthesize the conjunctions of ϕ such that we get a tree of ∧’s of depth logarithmic in the size of ϕ, and then to replace each clause and each ∧ with an equivalent B-formula. This can increase the formula size by only a polynomial in the original size even if the B-formula equivalent to ∧ uses some input variable more than once.

Let ϕ = C 1∧⋯∧C n be a 1-reproducing 3-CNF-formula. Since ϕ is 1-reproducing, every clause C i of ϕ is itself 1-reproducing, and we can express \(T_{C_{i}}\) through a B-formula \(T_{C_{i}}^{*}\). Also, we can express T uv through a B-formula \(T_{u\wedge v}^{*}\) since ∧ is 1-reproducing; we write T (ψ 1, ψ 2) for the formula obtained from T uv by substituting the formula ψ 1 for u and ψ 2 for v, and similarly write \(T_{\wedge }^{*}(\psi _{1},\psi _{2})\) for the formula obtained from \(T_{u\wedge v}^{*}\) in this way. We let ϕ = Tr (ϕ), where Tr is the following recursive algorithm that takes a CNF-formula as input:

Algorithm Tr (ψ 1∧⋯∧ψ m )

If m = 1, return \(T_{\psi _{1}}^{*}\).

Else return \(T_{\wedge }^{*}\left (\textsf {Tr}(\psi _{1}\wedge \cdots \wedge \psi _{\left \lfloor m/2\right \rfloor } ),\text {\textsf {Tr}}(\psi _{\left \lfloor m/2\right \rfloor +1}\wedge \cdots \wedge \psi _{m})\right )\).

Since the recursion terminates after a number of steps logarithmic in the number of clauses of ϕ, and every step increases the total formula size by only a constant factor, the algorithm runs in polynomial time. We show ϕ T ϕ by induction on m. For m = 1 this is clear. For the induction step, we have to show \(T_{\wedge }^{*}(T_{\psi _{1}},T_{\psi _{2}})\equiv T_{\psi _{1}\wedge \psi _{2}}\), but since \(T_{\wedge }(\psi _{1},\psi _{2})\equiv T_{\wedge }^{*}(\psi _{1},\psi _{2})\), it suffices to show that \(T_{\wedge }(T_{\psi _{1}},T_{\psi _{2}})\equiv T_{\psi _{1}\wedge \psi _{2}}\):

$$T_{\wedge}(T_{\psi_{1}},T_{\psi_{2}})=(\psi_{1}\wedge y)\wedge(\psi_{2}\wedge y)\wedge y\equiv\psi_{1}\wedge\psi_{2}\wedge y=T_{\psi_{1}\wedge\psi_{2}}. $$

2. This follows from 1. by Proposition 1.

3. By Lemma 2, there is an 1-reproducing (n − 1)-ary function f with diameter of at least \(2^{\left \lfloor \frac {n-1}{2}\right \rfloor } \). Let f be represented by a formula ϕ; then, T ϕ represents an n-ary function of the same diameter in S 12. □

Lemma 7

If [B] ⊇ D 1,

  1. 1.

    st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,

  2. 2.

    st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,

  3. 3.

    for n ≥ 5, there is an n-ary function f ∈ [B] with diameter of at least \(2^{\left \lfloor \frac {n-3}{2}\right \rfloor } \).

Proof

1. As noted, we adapt the two steps from the previous proof.

Step 1. Since D 1 = DR 0R 1, T ψ must be self-dual, 0-reproducing, and 1-reproducing. For clarity, we first construct an intermediate formula \(T_{\psi }^{\sim }\in \mathsf {D}_{1}\) whose solution graph has an additional component, then we eliminate that component.

For ψ(x), let

$$T_{\psi}^{\sim}=\left( \psi(\boldsymbol{x})\wedge(\boldsymbol{y}=\boldsymbol{1})\right)\vee\left( \overline{\psi(\boldsymbol{\overline{x}})}\wedge(\boldsymbol{y}=\boldsymbol{0})\right)\vee\left( \boldsymbol{y}\in\left\{ 100,010,001\right\} \right), $$

where y = (y 1, y 2, y 3) are three new variables.

\(T_{\psi }^{\sim }\) is self-dual: for any solution ending with 111 (satisfying the first disjunct), the inverse vector is no solution; similarly, for any solution ending with 000 (satisfying the second disjunct), the inverse vector is no solution; finally, all vectors ending with 100, 010, or 001 are solutions and their inverses are no solutions. Also, \(T_{\psi }^{\sim }\) is still 1-reproducing, and it is 0-reproducing (for the second disjunct note that \(\overline {\psi (\overline {0\cdots 0})}\equiv \overline {\psi (1\cdots 1)}\equiv 0\)).

Further, every solution a of ψ corresponds to a solution a⋅111 of \(T_{\psi }^{\sim }\), and for any two solutions s and t of ψ, s = s⋅111 and t = t⋅111 are connected in \(G\left (T_{\psi }^{\sim }\right )\) iff s and t are connected in G(ψ): The “if” is clear, for the “only if” note that since there are no solutions of \(T_{\psi }^{\sim }\) ending with 110, 101, or 011, every solution of \(T_{\psi }^{\sim }\) not ending with 111 differs in at least two variables from the solutions that do.

Observe that exactly one connected component is added in \(G\left (T_{\psi }^{\sim }\right )\) to the components corresponding to those of G(ψ): It consists of all solutions ending with 000, 100, 010, or 001 (any two vectors ending with 000 are connected e.g. via those ending with 100). It follows that \(G\left (T_{\psi }^{\sim }\right )\) is always unconnected. To fix this, we modify \(T_{\psi }^{\sim }\) to T ψ by adding 1⋯1⋅110 as a solution, thereby connecting 1⋯1⋅111 (which is always a solution since \(T_{\psi }^{\sim }\) is 1-reproducing) with 1⋯1⋅100, and thereby with the additional component of T ψ . To keep the function self-dual, we must in turn remove 0⋯0⋅001, which does not alter the connectivity. Formally,

$$\begin{array}{@{}rcl@{}} T_{\psi} & = & \left( T_{\psi}^{\sim}\vee\left( (\boldsymbol{x}=\boldsymbol{1})\wedge(\boldsymbol{y}=110)\right)\right)\wedge\neg\left( (\boldsymbol{x}=\boldsymbol{0})\wedge(\boldsymbol{y}=001)\right)\\ & = & \left( \psi(\boldsymbol{x})\wedge(\boldsymbol{y}=\boldsymbol{1})\right)\vee\left( \overline{\psi(\boldsymbol{\overline{x}})}\wedge(\boldsymbol{y}=\boldsymbol{0})\right) \\ & & \vee\left( \boldsymbol{y}\in\left\{ 100,010,001\right\} \wedge\neg((\boldsymbol{x}=\boldsymbol{0})\wedge(\boldsymbol{y}=001))\right) \\ & & \vee((\boldsymbol{x}=\boldsymbol{1})\wedge(\boldsymbol{y}=110)). \end{array} $$
(1)

Now G(T ψ ) is connected iff G(ψ) is connected.

Step 2. Again, we use the algorithm Tr from the previous proof to transform any 1-reproducing 3-CNF-formula ϕ into a B-formula ϕ equivalent to T ϕ , but with the definition (1) of T (Fig. 4). Again, we have to show \(T_{\wedge }(T_{\psi _{1}},T_{\psi _{2}})\equiv T_{\psi _{1}\wedge \psi _{2}}\). Here,

$$\begin{array}{@{}rcl@{}} T_{\wedge}(T_{\psi_{1}},T_{\psi_{2}}) & = & \left( T_{\psi_{1}}\wedge T_{\psi_{2}}\wedge(\boldsymbol{y}=\boldsymbol{1})\right)\vee\left( \overline{\overline{T_{\psi_{1}}}\wedge\overline{T_{\psi_{2}}}}\wedge(\boldsymbol{y}=\boldsymbol{0})\right)\\ & & \vee\left( \boldsymbol{y}\in\left\{ 100,010,001\right\} \wedge\neg\left( \overline{T_{\psi_{1}}}\wedge\overline{T_{\psi_{2}}}\wedge(\boldsymbol{y}=001)\right)\right)\\ & & \vee\left( T_{\psi_{1}}\wedge T_{\psi_{2}}\wedge(\boldsymbol{y }=110)\right). \end{array} $$
Fig. 4
figure 4

An example for the transformation. Left: \(\psi =\left (x_{1}\vee \overline {x_{2}}\right )\wedge \left (\overline {x_{1}}\vee x_{2}\right )\), center: \(T_{\psi }^{\sim }\), right: T ψ . The “axis vertices” are labeled in the first two graphs

We consider the parts of the formula in turn: For any formula ξ we have T ξ (x ξ )∧(y = 1)≡ξ(x ξ )∧(y = 1) and \(T_{\xi }(\boldsymbol {x}_{\xi })\wedge (\boldsymbol {y}=\boldsymbol {0})\equiv \overline {\psi (\overline {\boldsymbol {x}_{\xi }})}\wedge (\boldsymbol {y}=\boldsymbol {0})\), where x ξ denotes the variables of ξ. Using \(\overline {\overline {T_{\psi _{1}}(\boldsymbol {x}_{\psi _{1}})}\wedge \overline {T_{\psi _{2}}(\boldsymbol {x}_{\psi _{2}})}}\wedge (\boldsymbol {y}=\boldsymbol {0})=\left (T_{\psi _{1}}(\boldsymbol {x}_{\psi _{1}})\vee T_{\psi _{2}}(\boldsymbol {x}_{\psi _{2}})\right )\wedge (\boldsymbol {y}=\boldsymbol {0})\), the first line becomes

$$\left( \psi_{1}(\boldsymbol{x}_{\psi_{1}})\wedge\psi_{2}(\boldsymbol{x}_{\psi_{2}})\wedge(\boldsymbol{y}=\boldsymbol{1})\right)\vee\left( \left( \overline{\psi_{1}(\overline{\boldsymbol{x}_{\psi_{1}}})\wedge\psi_{2}(\overline{\boldsymbol{x}_{\psi_{2}}})}\right)\wedge(\boldsymbol{y}=\boldsymbol{0})\right). $$

For the second line, we observe

$$\begin{array}{@{}rcl@{}} \overline{T_{\psi}(\boldsymbol{x}_{\psi})} & \equiv & \left( \overline{\psi(\boldsymbol{x}_{\psi})}\vee\neg(\boldsymbol{y}=\boldsymbol{1})\right)\wedge\left( \psi(\boldsymbol{\overline{x}_{\psi}})\vee\neg(\boldsymbol{y}=\boldsymbol{0})\right)\\ & & \wedge\left( \boldsymbol{y}\notin\left\{ 100,010,001\right\} \vee\left( (\boldsymbol{x}_{\psi}=\boldsymbol{0})\wedge(\boldsymbol{y}=001)\right)\right)\\ & & \wedge(\neg(\boldsymbol{x}_{\psi}=\boldsymbol{1})\vee\overline{(\boldsymbol{y}=110)}), \end{array} $$

thus \(\overline {T_{\psi }(\boldsymbol {x}_{\psi })}\wedge (\boldsymbol {y}=001)\equiv (\boldsymbol {x}_{\psi }=\boldsymbol {0})\wedge (\boldsymbol {y}=001)\), and the second line becomes

$$\vee\left( \boldsymbol{y}\in\left\{ 100,010,001\right\} \wedge\neg\left( (\boldsymbol{x}_{\psi_{1}}=\boldsymbol{0})\wedge(\boldsymbol{x}_{\psi_{2}}=\boldsymbol{0})\wedge(\boldsymbol{y}=001)\right)\right). $$

Since T ψ (x ψ )∧(y = 110)≡(x ψ = 1)∧(y = 110) for any ψ, the third line becomes

$$\vee\left( (\boldsymbol{x}_{\psi_{1}}=\boldsymbol{1})\wedge(\boldsymbol{x}_{\psi_{2}}=\boldsymbol{1})\wedge(\boldsymbol{y}=110)\right). $$

Now \(T_{\wedge }\left (T_{\psi _{1}},T_{\psi _{2}}\right )\) equals

$$\begin{array}{@{}rcl@{}} T_{\psi_{1}\wedge\psi_{2}} & = & \left( \psi_{1}(\boldsymbol{x}_{\psi_{1}})\wedge\psi_{2}(\boldsymbol{x}_{\psi_{2}})\wedge(\boldsymbol{y}=\boldsymbol{1})\right)\vee\left( \overline{\psi_{1}(\overline{\boldsymbol{x}_{\psi_{1}}})\wedge\psi_{2}(\overline{\boldsymbol{x}_{\psi_{2}}})}\wedge(\boldsymbol{y}=\boldsymbol{0})\right)\\ & & \vee\left( \boldsymbol{y}\in\left\{ 100,010,001\right\} \wedge\neg\left( (\boldsymbol{x}_{\psi_{1}}=\boldsymbol{0})\wedge(\boldsymbol{x}_{\psi_{2}}=\boldsymbol{0})\wedge(\boldsymbol{y}=001)\right)\right)\\ & & \vee\left( (\boldsymbol{x}_{\psi_{1}}=\boldsymbol{1})\wedge(\boldsymbol{x}_{\psi_{2}}=\boldsymbol{1})\wedge(\boldsymbol{y}=110)\right). \end{array} $$

2. This follows from 1. by Proposition 1.

3. By Lemma 2 there is an 1-reproducing (n − 3)-ary function f with diameter of at least \(2^{\left \lfloor \frac {n-3}{2}\right \rfloor } \). Let f be represented by a formula ϕ; then, T ϕ represents an n-ary function of the same diameter in D 1. □

Lemma 8

If \([B]\supseteq \mathsf {S}_{02}^{k}\) for any k ≥ 2,

  1. 1.

    st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,

  2. 2.

    st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,

  3. 3.

    for nk+4, there is an n-ary function f ∈ [B] with diameter of at least \(2^{\left \lfloor \frac {n-k-2}{2}\right \rfloor } \).

Proof

1. Step 1. Since \(\mathsf {S}_{02}^{k}=\mathsf {S}_{0}^{k}\cap \mathsf {R}_{0}\cap \mathsf {R}_{1}\), T ψ must be 0-separating of degree k, 0-reproducing, and 1-reproducing. As in the previous proof, we construct an intermediate formula \(T_{\psi }^{\sim }\). For ψ(x), let

$$T_{\psi}^{\sim}=\left( \psi\wedge y\wedge(\boldsymbol{z}=\boldsymbol{0})\right)\vee(|\boldsymbol{z}|>1), $$

where y and z = (z 1,…, z k + 1) are new variables.

\(T_{\psi }^{\sim }(\boldsymbol {x},y,\boldsymbol {z})\) is 0-separating of degree k, since all vectors that are no solutions of \(T_{\psi }^{\sim }\) have |z|≤1, i.e. z∈{0⋯0,10⋯0,010⋯0,…,0⋯01}⊂{0,1}k + 1, and thus any k of them have at least one common variable assigned 0. Also, \(T_{\psi }^{\sim }\) is 0-reproducing and still 1-reproducing.

Further, for any two solutions s and t of ψ(x), s = s⋅1⋅0⋯0 and t = t⋅1⋅0⋯0 are solutions of \(T_{\psi }^{\sim }(\boldsymbol {x},y,\boldsymbol {z})\) and are connected in \(G\left (T_{\psi }^{\sim }\right )\) iff s and t are connected in G(ψ).

But again, we have produced an additional connected component (consisting of all solutions with |z|>1). To connect it to a component corresponding to one of ψ, we add 1⋯1⋅1⋅10⋯0 as a solution,

$$\begin{array}{@{}rcl@{}} T_{\psi} & = & \left( \psi\wedge y\wedge(\boldsymbol{z}=\boldsymbol{0})\right)\vee(|\boldsymbol{z}|>1)\vee\left( (\boldsymbol{x}=\boldsymbol{1})\wedge y\wedge(\boldsymbol{z}=10\cdots0)\right). \end{array} $$

Now G(T ψ ) is connected iff G(ψ) is connected.

Step 2. Again we show that the algorithm Tr works in this case. Here,

$$\begin{array}{@{}rcl@{}} T_{\wedge}(T_{\psi_{1}},T_{\psi_{2}}) & = & \left( T_{\psi_{1}}(\boldsymbol{x}_{\psi_{1}})\wedge T_{\psi_{2}}(\boldsymbol{x}_{\psi_{2}})\wedge y\wedge(\boldsymbol{z}=\boldsymbol{0})\right)\vee(|\boldsymbol{z}|>1)\\ & & \vee\left( T_{\psi_{1}}(\boldsymbol{x}_{\psi_{1}})\wedge T_{\psi_{2}}(\boldsymbol{x}_{\psi_{2}})\wedge y\wedge(\boldsymbol{z}=10\cdots0)\right). \end{array} $$

Since T ψ (x ψ )∧y∧(z = 0)≡ψ(x ψ )∧y∧(z = 0) and T ψ (x ψ )∧y∧(z = 10⋯0)≡(x ψ = 1)∧y∧(z = 10⋯0) for any ψ, this is equivalent to

$$\begin{array}{@{}rcl@{}} T_{\psi_{1}\wedge\psi_{2}} & = & \left( \psi_{1}(\boldsymbol{x}_{\psi_{1}})\wedge\psi_{2}(\boldsymbol{x}_{\psi_{2}})\wedge y\wedge(\boldsymbol{z}=\boldsymbol{0})\right)\vee(|\boldsymbol{z}|>1)\\ & & \vee\left( \boldsymbol{x}_{\psi_{1}}\wedge\boldsymbol{x}_{\psi_{2}}\wedge y\wedge(\boldsymbol{z}=10\cdots0)\right). \end{array} $$

2. This follows from 1. by Proposition 1.

3. By Lemma 2 there is an 1-reproducing (nk − 2)-ary function f with diameter of at least \(2^{\left \lfloor \frac {n-k-2}{2}\right \rfloor } \). Let f be represented by a formula ϕ; then, T ϕ represents an n-ary function of the same diameter in \(\mathsf {S}_{02}^{k}\). □

This completes the proof of Theorem 1.

5 The Connectivity of Quantified Formulas

Definition 3

A quantified B-formula ϕ (in prenex normal form) is an expression of the form

$$Q_{1}y_{1}{\cdots} Q_{m}y_{m}\varphi(y_{1},\ldots,y_{m},x_{1},\ldots,x_{n}), $$

where φ is a B-formula, and Q 1,…, Q m ∈{∃,∀} are quantifiers. The solution graph G(ϕ) only involves the free variables x 1,…, x n .

For quantified B-formulas, we define the connectivity problems

  • QBF-Conn(B): Given a quantified B-formula ϕ, is G(ϕ) connected?

  • st-QBF-Conn(B): Given a quantified B-formula ϕ and two solutions s and t, is there a path from s to t in G(ϕ)?

Theorem 2

Let B be a finite set of Boolean functions.

  1. 1.

    If BM or BL, then

    1. (a)

      st-QBF-Conn(B) and QBF-Conn(B) are in P,

    2. (b)

      the diameter of every quantified B-formula is linear in the number of free variables.

  2. 2.

    Otherwise,

    1. (a)

      st-QBF-Conn(B) and QBF-Conn(B) are PSPACE-complete,

    2. (b)

      there are quantified B-formulas with at most one quantifier such that their diameter is exponential in the number of free variables.

Proof

1. For BM, any quantified B-formula ϕ represents a monotone function: Using ∃y ψ(y, x) = ψ(0, x)∨ψ(1, x) and ∀y ψ(y, x) = ψ(0, x)∧ψ(1, x) recursively, we can transform ϕ into an equivalent M-formula since ∧ and ∨ are monotone. Thus as in Lemma 3, st-QBF-Conn(B) and QBF-Conn(B) are trivial, and d f (a, b) = |ab| for any two solutions a and b.

For a quantified B-formula ϕ = Q 1 y 1Q m y m φ with BL, we first remove the quantifications over all fictive variables of φ (and eliminate the fictive variables if necessary). If quantifiers remain, ϕ is either tautological (if the rightmost quantifier is ∃) or unsatisfiable (if the rightmost quantifier is ∀), so the problems are trivial, and d f (a, b) = |ab| for any two solutions a and b. Otherwise, we have a quantifier-free formula and the statements follow from Lemma 5.

2. Again as in Lemma 1, it follows that st-QBF-Conn(B) and QBF-Conn(B) are in PSPACE, since the evaluation problem for quantified B-formulas is in PSPACE [22].

An inspection of Post’s lattice shows that if \(B\nsubseteq \mathsf {M}\) and \(B\nsubseteq \mathsf {L}\), then [B]⊇S 12, [B]⊇D 1, or [B]⊇S 02, so we have to prove PSPACE-completeness and show the existence of B-formulas with an exponential diameter in these cases.

For [B]⊇S 12 and [B]⊇D 1, the statements for the PSPACE-hardness and the diameter obviously carry over from Theorem 1.

For BS 02, we give a reduction from the problems for (unquantified) 3-CNF-formulas; we proceeded again similar as in the proof of Lemma 6. We give a transformation T ψ s.t. T ψ S 02 for all formulas ψ. Since S 02 = S 0R 0R 1, T ψ must be self-dual, 0-reproducing, and 1-reproducing. For ψ(x) let

$$T_{\psi}=(\psi\wedge y)\vee z, $$

with the two new variables y and z.

T ψ is 0-separating since all vectors that are no solutions have z = 0. Also, T ψ is 0-reproducing and 1-reproducing. Again, we use the algorithm Tr from the proof of Lemma 6 to transform any 3-CNF-formula ϕ into a B-formula φ equivalent to T ϕ . Again, we show

$$\begin{array}{@{}rcl@{}} T_{\wedge}(T_{\psi_{1}},T_{\psi_{2}}) & = & \left( \left( (\psi_{1}\wedge y)\vee z\right)\wedge\left( (\psi_{2}\wedge y)\vee z\right)\wedge y\right)\vee z\\ & \equiv & \left( \left( \psi_{1}\wedge y\right)\wedge\left( \psi_{2}\wedge y\right)\wedge y\right)\vee z\\ & \equiv & \left( \psi_{1}\wedge\psi_{2}\wedge y\right)\vee z=T_{\psi_{1}\wedge\psi_{2}}. \end{array} $$

Now let

$$\phi^{\prime}=\forall z\varphi^{\prime}. $$

Then, for any two solutions s and t of ϕ(x), s = s⋅1 and t = t⋅1 are solutions of ϕ (x, y), and they are connected in G(ϕ ) iff s and t are connected in G(ϕ), and G(ϕ ) is connected iff G(ϕ) is connected.

The proof of Lemma 2 shows that there is an (n − 1)-ary function f with diameter of at least \(2^{\left \lfloor \frac {n-1}{2}\right \rfloor } \). Let f be represented by a formula ϕ; then ϕ as defined above is a quantified B-formula with n free variables and one quantifier with the same diameter. □

Remark 1

An analog to Theorem 2 also holds for quantified circuits as defined in [21, Section 7].

6 Future Directions

While for st-connectivity and connectivity of B-formulas and B-circuits we now have a quite complete picture, there is a multitude of interesting variations in different directions with open problems.

As mentioned in the abstract, for CNF(\(\mathcal {S}\))-formulas with constants, we have a complete classification for both connectivity problems and the diameter also [24]. However, for CNF(\(\mathcal {S}\))-formulas without constants, the complexity of the connectivity problem is still open in some cases [25].

Besides CNF(\(\mathcal {S}\))-formulas, B-formulas and B-circuits, there are further variants of Boolean satisfiability, and investigating connectivity in these settings might be worthwhile as well. For example, disjunctive normal forms with special connectivity properties were studied by Ekin et al. already in 1997 for their “important role in problems appearing in various areas including in particular discrete optimization, machine learning, automated reasoning, etc.” [7].

Other connectivity-related problems already mentioned by Gopalan et al. are counting the number of components and approximating the diameter. Recently, Mouawad et al. investigated the question of finding the shortest path between two solutions [19], which is of special interest to reconfiguration problems.

Furthermore, our definition of connectivity is not the only sensible one: One could regard two solutions connected whenever their Hamming distance is at most d, for any fixed d ≥ 1; this was already considered related to random satisfiability, see [1]. This generalization seems meaningful as well as challenging.

Finally, a most interesting subject are CSPs over larger domains; in 1993, Feder and Vardi conjectured a dichotomy for the satisfiability problem over arbitrary finite domains [8], and while the conjecture was proved for domains of size three in 2002 by Bulatov [4], it remains open to date for the general case. Close investigation of the solution space might lead to valuable insights here.

For k-colorability, which is a special case of the general CSP over a k-element set, the connectivity problems and the diameter were already studied by Bonsma and Cereceda [3], and Cereceda, van den Heuvel, and Johnson [5]. They showed that for k = 3 the diameter is at most quadratic in the number of vertices and the st-connectivity problem is in P, while for k ≥ 4, the diameter can be exponential and st-connectivity is PSPACE-complete in general.