1 Introduction

Decision trees is a computational model in which we compute a known Boolean function f : {0,1}n →{0,1} on an unknown input x ∈{0,1}n and in one step we can query q(x) for q: {0,1}n →{0,1} from a fixed set of queries. In the standard and the most studied decision tree model we can query only individual variables of the input x [5, 12] (the complexity of f is denoted by D(f)). The studies of this model among other things are related to the well-known sensitivity conjecture [10] that was recently resolved [11]. Among other cases studied in the literature are parity decision trees that can query any parity of input bits [28], linear decision trees in which the queries are linear threshold functions [8, 12] and decision trees with AND and OR queries [1].

In this paper we will mainly deal with threshold functions. Threshold function \({\text {{THR}}_{n}^{k}}\) on n bits outputs 1 iff there are at least k ones in the input. The majority function MAJn is simply \(\text {{THR}}_{n}^{\lceil n/2 \rceil }\).

The main goal of the first part of the paper is to study a natural generalization of standard decision tree model: we address decision trees that are allowed to query any function depending on two input bits. We denote the complexity in this model by \(\mathsf {D}_{\mathrm {B}_2}(f)\). More generally, we can consider decision trees that can query arbitrary functions depending on at most r inputs, where r is a parameter. The standard decision tree model corresponds to the case r = 1.

This model can be viewed as a uniform version of multi-party Communication Complexity (see the book by Kushilevitz and Nisan [17] for details on Communication Complexity). In this model k players are trying to compute the function f : {0,1}nk →{0,1} and the input is shared by the players. Each player is associated with a piece of input of size n. In the Number in Hand model (NIH) players see only the input bits associated with them. In the Number on the Forehead model (NOF) players see all input bits except those that are associated with them (thus the inputs visible to players have large overlaps). Decision tree model with r = n for the computation of f can be viewed as a version of communication model where for each n bits there is a player seeing exactly these n bits. The decision tree model with r = (k − 1)n can be viewed as a generalization of NOF communication model.

The special case of this model was considered by Posobin [21] where the computation of MAJn with MAJk-queries was studied for k < n. There are results on a related model with non-Boolean counting queries [6, 13]. Related settings with non-Boolean domain also arise in algebraic decision tree model (see, e. g. [12, Section 14.8]).

We initiate the study of strong lower bounds for decision trees with queries of bounded fan-in considering the case of queries of fan-in 2. It is easy to see that the complexity \(\mathsf {D}_{\mathrm {B}_2}(f)\) in this model is lower bounded by D(f)/2 (each binary query can be simulated by two unary queries). However, in the view of generalization to larger r it is interesting to obtain lower bounds greater than n/2. We show that

$$ \mathsf{D}_{\mathrm{B}_2}(\text{{MAJ}}_{n}) \geq n - o(n). $$

We also show that if we allow additionally to query parities of three bits, the complexity of majority (as well as any symmetric function) drops to at most 2n/3. Thus to obtain strong lower bounds for r = 3 more complicated functions need to be considered.

Also in this part of the paper we address the complexity of majority function MAJ in decision tree model with AND and OR queries (of arbitrary fan-in). We denote the complexity of a function f in this model by D∧,∨(f). The complexity of threshold functions in this model was studied by Ben-Asher and Newman [1] with the relation to a certain PRAM model. It was shown there that \({\text {{THR}}_{n}^{k}}\) functions have complexity \({\Theta }(k/\log (n/k))\). In this paper we are interested in the precise complexity of functions in this model. We show that D∧,∨(MAJ) = n − 1.

In the second part of the paper we turn to parity decision tree model D(f).

Apart from being natural and interesting on its own parity decision tree model was studied mainly in connection with Communication Complexity and more specifically, with Log-rank Conjecture. In Communication Complexity’s most standard model there are two players Alice and Bob. Alice is given x ∈{0,1}n and Bob is given y ∈{0,1}n and they are trying to compute some fixed function F : {0,1}n ×{0,1}n →{− 1,1} on input (x,y). The question is how much communication is needed to compute F(x,y) in the worst case. It is known that the deterministic communication complexity Dcc(F) of the function F is lower bounded by \(\log \text {rank}(M_{F})\), where MF is the communication matrix of F [17]. It is a long standing conjecture and one of the key open problems in Communication Complexity, called Log-rank Conjecture [18], to prove that Dcc(F) is upper bounded by a polynomial of \(\log \text {rank}(M_{F})\).

An important special case of Log-rank Conjecture addresses the case of XOR-functions F(x,y) = f(xy) for some f, where xy is a bit-wise XOR of Boolean vectors x and y. On one hand, this class of functions is wide and captures many important functions (including equality and Hamming distance), and on the other hand the structure of XOR-functions allows to use analytic tools. For such functions rank(MF) is equal to the Fourier sparsity sparf, the number of non-zero Fourier coefficients of f. Thus, the Log-rank Conjecture for XOR-functions can be restated: is it true that Dcc(F) is bounded by a polynomial of \(\log \mathsf {spar} f\)?

Given a XOR-function f(xy) a natural way for Alice and Bob to compute the value of the function is to use a parity decision tree for f. They can simulate each query in the tree by computing parity of bits in their parts of the input separately and sending the results to each other. One query requires two bits of communication and thus \(\mathsf {D}^{\mathsf {cc}}(F) \leq 2\mathsf {D}_{\oplus }(f)\), where by D(f) we denote the parity decision tree complexity of f. This leads to an approach to establish Log-rank Conjecture for XOR-function [28]: show that D(f) is bounded by a polynomial of \(\log \mathsf {spar} f\).

This approach received a lot of attention in recent years and drew attention to parity decision trees themselves [9, 23,24,25, 27, 28]. In a recent paper [9] it was shown that actually Dcc(F) and D(f) are polynomially related. This means that the simple protocol described above is not far from being optimal and that the parity decision tree version of Log-rank Conjecture stated above is actually equivalent to the original Log-rank Conjecture for XOR-functions.

Known techniques for lower bounds for parity decision trees fall into one of the two categories: of analytical and combinatorial flavor. Analytical techniques include lower bounds on D(f) through sparsity spar(f), granularity gran(f) and degree \(\deg _{2}(f)\) over \(\mathbb {F}_{2}\). The strongest lower bound among these is D(f) ≥gran(f) + 1 (see details in Section 2).

Regarding combinatorial techniques, for standard decision trees there are several combinatorial measures known that lower bound decision tree complexity. Among them the most common are certificate complexity and block sensitivity. Zhang and Shi [28] generalized these measures to the setting of parity decision tree complexity.

Parity decision tree complexity versions of combinatorial measures are actually known to be polynomially related to parity decision tree complexity [28]. For analytical techniques it is known that existence of polynomial relation between D(f) and gran(f) (or spar(f)) is equivalent to Log-rank Conjecture for XOR-functions [9].

In view of this it is interesting to further study lower bounds for parity decision trees.

In this paper we prove a new lower bound for parity decision tree complexity of threshold functions. We show that

$$\mathsf{D}_{\oplus}(\text{{THR}}_{n+2}^{k+1})\geq \mathsf{D}_{\oplus}({\text{{THR}}_{n}^{k}})+1$$

for any k,n.

The combination of this result with granularity lower bound allows to show that for n = 8k + 2, k > 0 we have \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{3}})=n-1\), whereas all previous techniques give at most n − 2 lower bound. Thus, we give an example of a function, for which all known general techniques are not tight.

The rest of the paper is organized as follows. In Section 2 we provide necessary definitions, preliminary information and review lower bounds for parity decision trees. In Section 3 we study decision trees with binary queries as well as decision trees with AND and OR queries. In Section 4 we study parity decision tree complexity of threshold functions. In Section 5 we give concluding remarks.

2 Preliminaries

In many parts of the paper we assume that Boolean functions are functions of the form f : {0,1}n →{− 1,1}, for \(n \in \mathbb {N}\). That is, input bits are treated as 0 and 1 and to them we will usually apply operations over \(\mathbb {F}_{2}\). Output bits are treated as − 1 and 1 and the arithmetic will be over \(\mathbb {R}\). The value − 1 corresponds to ‘true’ and 1 corresponds to ‘false’. In other parts of the paper it is more convenient to consider Boolean functions in the form f : {− 1,1}n →{− 1,1} with the same semantics of − 1 and 1.

We denote the variables of functions by x = (x1,…,xn). We use the notation [n] = {1,…,n}.

2.1 Boolean Fourier Analysis

We briefly review the notation and needed facts from Boolean Fourier analysis. For extensive introduction see [19].

For functions \(f,g \colon \{0,1\}^{n} \to \mathbb {R}\) consider an inner product

$$ \langle f,g\rangle = \mathbf{E}_{x} f(x)g(x), $$

where the expectation is taken over uniform distribution of x on {0,1}n.

For a subset \(S \subseteq [n]\) we denote by \(\chi _{S}(x) = {\prod }_{i\in S} (-1)^{x_{i}}\) the Fourier character corresponding to S. We denote by \(\widehat {f}(S)=\langle f,\chi _{S} \rangle \) the corresponding Fourier coefficient of f.

It is well-known that for any x ∈{0,1}n we have \(f(x) = {\sum }_{S\subseteq [n]} \widehat {f}(S)\chi _{S}(x)\).

If f : {0,1}n →{− 1,1} (that is, if f is Boolean) then the well-known Parseval’s Identity holds:

$$ {\sum}_{S\subset [n]} \widehat{f}(S)^{2} =1. $$

By the support of the Boolean function f we denote

$$ \mathsf{Supp}(f) = \{S\subseteq [n] \mid \widehat{f}(S)\neq 0\}. $$

The sparsity of f is spar(f) = |Supp(f)|. Basically, the sparsity of f is the l0-norm of the vector of its Fourier coefficients.

Consider a binary fraction α, that is α is a rational number that can be written in a form that its denominator is a power of 2. By the granularity gran(α) of α we denote the minimal integer k ≥ 0 such that α ⋅ 2k is an integer.

We will also frequently use the following closely related notation. For an integer L denote by P(L) the maximal power of 2 that divides L. It is convenient to set \(\mathsf {P}(0)=\infty \).

Note that for Boolean f the Fourier coefficients of f are binary fractions. By the granularity of \(f\colon \{0,1\}^{n} \to \mathbb {Z}\) we call the following value

$$ \mathsf{gran}(f) = \max_{S\subseteq [n]} \mathsf{gran}(\widehat{f}(S)). $$

It is easy to see that for any f : {0,1}n →{− 1,1} it is true that 0 ≤gran(f) ≤ n − 1 and both of these bounds are achievable (for example, for \(f(x)=\bigoplus _{i} x_{i}\) and \(f(x)=\bigwedge _{i} x_{i}\) respectively).

It is known [7, 25] that gran(f) is always not far from the logarithm of spar(f):

$$ \frac{\log \mathsf{spar}(f)}{2} \leq \mathsf{gran}(f) \leq \log \mathsf{spar}(f)-1. $$

The first inequality can be easily obtained from Parseval’s identity. The second is less trivial (see [25] or [19, Exercise 3.32]). Again, both inequalities are tight (the first one is tight for inner product \(\text {{IP}}(x,y)=\bigoplus _{i} (x_{i}\wedge y_{i})\) or any other bent function [19]; the second one is tight for example for the conjunction of two variables).

For a Boolean function f{0,1}n →{− 1,1} denote by \(\deg _{2}(f)\) the degree of the multilinear polynomial \(p \in \mathbb {F}_{2}[x_{1},\ldots ,x_{n}]\) computing f as a Boolean function, that is for all \(x\in \mathbb {F}_{2}^{n}\) we have p(x) = 1 if f(x) = − 1 and p(x) = 0 otherwise. It is well known that such multilinear polynomial p is unique for any f and thus \(\deg _{2}(f)\) is well defined.

It is known that \(\deg _{2}(f)\leq \log \mathsf {spar}(f)\) for any f [2]. We observe that the granularity is also lower bounded by the degree of the function.

Lemma 1

For any f : {0,1}n →{− 1,1} we have \(\deg _{2}(f) \leq \mathsf {gran}(f)+1\).

The proof strategy is similar to the one of [2]. We present the proof for the sake of completeness.

Proof

For a function f : {0,1}n →{− 1,1} consider two subfunctions f0 and f1 on n − 1 variables obtained from f by setting variable xn to 0 and to 1 respectively. Note that for any \(S\subseteq [n-1]\) we have

$$ \begin{array}{@{}rcl@{}} \widehat{f}(S) &=& \mathbf{E}_{x\in \{0,1\}^{n}} f(x) \chi_{S}(x) \\ &=& \frac 12 \mathbf{E}_{x \in \{0,1\}^{n-1}} f_{0}(x) \chi_{S}(x) + \frac 12 \mathbf{E}_{x \in \{0,1\}^{n-1}} f_{1}(x) \chi_{S}(x) = \frac 12 \widehat{f}_{0}(S) + \frac 12 \widehat{f}_{1}(S) \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} &&\widehat{f}(S\cup\{n\})\\ \!&=&\! \mathbf{E}_{x\in \{0,1\}^{n}} f(x) \chi_{S}(x)\\ \!&=&\! \frac 12 \mathbf{E}_{x \in \{0,1\}^{n-1}} f_{0}(x) \chi_{S}(x) - \frac 12 \mathbf{E}_{x \in \{0,1\}^{n-1}} f_{1}(x) \chi_{S}(x) = \frac 12 \widehat{f}_{0}(S) - \frac 12 \widehat{f}_{1}(S). \end{array} $$

Thus,

\( \widehat {f}_{0}(S) = \widehat {f}(S) + \widehat {f}(S\cup \{n\}) \)

and

\( \widehat {f}_{1}(S) = \widehat {f}(S) - \widehat {f}(S\cup \{n\}). \)

In particular, the granularity of both f0 and f1 is not larger than the granularity of f. From this we conclude that the granularity of a subfunction of f is at most the granularity of f.

Denote \(d=\deg _{2}(f)\) and consider a monomial of degree d in the polynomial p for f. For simplicity of notation assume that this is the monomial x1xd. Fix all variables xi for i > d to 0. We get a subfunction g of f of d variables and degree d. As discussed above gran(g) ≤gran(f), so it is enough to show that dgran(g) + 1. For this note that since the function g is of maximal degree we have that |g− 1(− 1)| is odd (see, e.g. [12, Section 2.1]). Thus,

\( \widehat {g}(\emptyset ) = \mathbf {E}_{x\in \{0,1\}^{d}} g(x) = \frac {1}{2^{d}} \left (|g^{-1}(1)| - |g^{-1}(-1)| \right ) = \frac {1}{2^{d}} \left (2^{n} - 2|g^{-1}(-1)| \right ) \)

and the granularity of \(\widehat {g}(\emptyset )\) is d − 1. □

2.2 Decision Trees

A decision tree T is a rooted directed binary tree. Each of its leaves is labeled by − 1 or 1, each internal vertex v is labeled by some function \(q_{v} \colon \{0,1\}^{n} \to \{-1,1\}\). Each internal node has two outgoing edges, one labeled by − 1 and another by 1. A computation of T on input x ∈{0,1}n is the path from the root to one of the leaves that in each of the internal vertices v follows the edge, that has label equal to the value of qv(x). Label of the leaf that is reached by the path is the output of the computation. The tree T computes the function f : {0,1}n →{− 1,1} iff on each input x ∈{0,1}n the output of T is equal to f(x).

Decision tree models differ by the types of functions qv that are allowed in the vertices of the tree. For any set \(\mathcal {Q}\) of functions the decision tree complexity of the function f is the minimal depth of a tree (that is, the number of edges in the longest path from the root to a leaf) using functions from Q and computing f. We denote this value by \(\mathsf {D}_{\mathcal {Q}}(f)\).

The standard decision tree model allows to query individual variables in the vertices of the tree. The complexity in this model is denoted simply by D(f). In the paper we also consider D(f), D∧,∨(f), \(\mathsf {D}_{\mathrm {B}_2}(f)\) standing for \(\mathcal {Q}\) equal to the set of all parities, the set of all AND and OR functions and the set B2 of all binary functions respectively.

2.3 Parity Decision Trees

There are various techniques known for lower bounds for parity decision trees (that is, decision trees with parity queries). Since we are not aware of the exposition of some bounds that follow from known techniques and the exposition on the connection of parity decision tree complexity with multiplicative complexity, we survey them here.

As discussed in the Introduction, one technique is via sparsity through communication complexity.

Lemma 2

For any function f : {0,1}n →{− 1,1} we have \( \mathsf {D}_{\oplus }(f) \geq \frac {\log \mathsf {spar}(f)}{2}. \)

Although, if Log-rank conjecture for XOR-functions is true, this approach gives optimal bounds up to a polynomial, in many cases it does not help to determine the precise parity decision tree complexity of Boolean functions. For example, this approach always gives bounds of at most n/2 for functions of n variables.

Another lower bound obtained through analytical approach is via granularity.

Lemma 3

For any non-constant function f : {0,1}n →{− 1,1} we have D(f) ≥granf + 1.

It is not hard to deduce this lemma from [19, Exercise 3.26]. However, as we have not seen this statement in the literature, we provide a proof.

Proof

Along with the function f : {0,1}n →{− 1,1} consider the function \(f^{\prime }\colon \{0,1\}^{n} \to \{0,1\}\) such that \(f^{\prime }(x) = \frac {f(x)+1}{2}\). Clearly, the parity decision tree complexity of f and \(f^{\prime }\) are equal. Also it is easy to see directly from the definition that for any non-constant f we have \(\mathsf {gran}(f^{\prime })=\mathsf {gran}(f)+1\).

From Exercise 3.26 in [19] it follows that \(\mathsf {D}_{\oplus }(f^{\prime }) \geq \mathsf {gran}(f^{\prime })\).

Combining all of these together we get

\( \mathsf {D}_{\oplus }(f) = \mathsf {D}_{\oplus }(f^{\prime }) \geq \mathsf {gran}(f^{\prime }) = \mathsf {gran}(f)+1. \)

Another standard approach is through the degree of polynomials. It is well known that the complexity of a function in standard decision trees model is lower bounded by the degree of the function over \(\mathbb {R}\) (see, e.g. [5]). Completely analogously it can be shown that the parity decision tree complexity of a function is lower bounded by the degree of the function over \(\mathbb {F}_{2}\).

Although it is very similar to analogous connection for standard decision trees, we have not seen it in the literature.

Lemma 4

For any f : {0,1}n →{− 1,1} we have \(\mathsf {D}_{\oplus }(f) \geq \deg _{2}(f)\).

Proof

The proof of this lemma follows closely the proof connecting standard decision tree complexity of a function with its degree over \(\mathbb {R}\) (see, e.g. [5]).

Consider a parity decision tree T computing f with depth equal to D(f). Consider arbitrary leaf l of this tree and consider the path in T leading from the root to l. For computation to follow this path on input x in each internal vertex v the input x must satisfy some linear restriction L(x) = 1 (L(x) is the parity Lv(x) labeling v if the path follows the edge labeled by − 1 out of v and L(x) = Lv(x) ⊕ 1 if the path follows the edge labeled by 1). Denote all these linear forms in these restrictions along the path by L1(x),…,Lp(x), where pD(f). Thus, on input x we follow the path to l iff L1(x) ∧… ∧ Lp(x) is satisfied. Denote this expression by Tl(x).

Denote by S the set of all leaves of T that are labeled by − 1. For any input x we have that f(x) = − 1 iff the computation path in T reaches a leaf labeled with − 1 iff

\( \bigoplus _{l \in S} T_{l}(x)=1. \)

It is left to observe that the latter expression is a multilinear polynomial over \(\mathbb {F}_{2}\) of degree at most D(f). □

From the bounds discussed above it follows that the lower bound through granularity is stronger than the lower bounds through the sparsity and the degree. In fact, it allows to determine the exact complexity of some Boolean functions including majority and recursive majority. Since we have not seen these bounds presented in the literature, we present them here.

The majority function \(\text {{MAJ}}_{n} \colon \{0,1\}^{n} \to \{-1,1\}\) is defined as follows:

\( \text {{MAJ}}_{n}(x) = -1 \Leftrightarrow {\sum }_{i=1}^{n} x_{i} \geq \frac n2. \)

To state our result we will need the following notation: let B(n) be the number of ones in a binary representation of n.

Theorem 1

D(MAJn) = nB(n) + 1.

Recursive majority \(\text {{MAJ}}^{\otimes k}_3\) is a function on n = 3k variables and it can be defined recursively. For k = 1 we just let \(\text {{MAJ}}^{\otimes 1}_3=\text {{MAJ}}_{3}\). For k > 1 we let \( \text {{MAJ}}^{\otimes k}_3 = \text {{MAJ}}_{3}\left (\text {{MAJ}}^{\otimes k-1}_3,\text {{MAJ}}^{\otimes k-1}_3,\text {{MAJ}}^{\otimes k-1}_3\right ), \) where each \(\text {{MAJ}}^{\otimes k-1}_3\) is applied to a separate block of variables.

Theorem 2

\(\mathsf {D}_{\oplus }(\text {{MAJ}}^{\otimes k}_3) = \frac {n+1}{2}\), where n = 3k is the number of variables.

The upper bound in Theorems 1 is a simple adaptation of the folklore algorithm. Other parts of the proofs of Theorems 1 and 2 are purely technical. For these reasons we move these proofs to Appendix A and A.

Next we describe a connection to multiplicative complexity.

Multiplicative complexity c(f) of a Boolean function f is the minimal number of AND-gates in a circuit computing f and consisting of AND, ⊕ and NOT gates, each gate of fan-in at most 2 (for formal definitions from Circuit Complexity see, e.g. [12]). This measure was studied in Circuit Compexity [3, 4, 14] as well as in connection to Cryptography [15, 26] and providing an explicit function f on n variables with c(f) > n is an important open problem.

The following lemma was communicated to us by Alexander Kulikov [16] and with his permission we include it here.

Lemma 5

For any f on n variables D(f) ≤c(f) + 1.

Proof

The proof is by induction on s = c(f).

If s = 0, then f is computed by a circuit consisting of ⊕ and NOT gates and thus f is a linear form of its variables. We can compute it by one query in parity decision tree model.

For the step of induction, consider an arbitrary f and consider a circuit \(\mathcal {C}\) computing f with the number of AND-gates equal to c(f). Consider the first AND-gate g in \(\mathcal {C}\). Both of its inputs compute linear forms over \(\mathbb {F}_{2}\). Our decision tree algorithm queries one of inputs of g. Depending on the answer to the query, g computes either constant 0, or its second input. In both cases the gate g computes a linear form over \(\mathbb {F}_{2}\), so we can simplify the circuit and obtain a new circuit \(\mathcal {C}^{\prime }\) computing the same function on inputs consistent with the answer to the first query and with at most s − 1 AND-gates. By induction hypothesis in both cases the function computed by \(\mathcal {C}^{\prime }\) is computable in parity decision tree model with at most s queries. Overall, we make s + 1 queries. □

As a corollary from Theorem 1 and Lemma 5 we get the following lower bound on the multiplicative complexity of majority.

Corollary 1

c(MAJn) ≥ nB(n).

This improves a lower bound of [4]. Previously this lower bound was known only for n = 2k for some k [4].

Returing to lower bounds for parity decision trees, another known approach is of a more combinatorial flavor. For standard decision trees there are several combinatorial measures known that lower bound decision tree complexity. Among them the most common are certificate complexity and block sensitivity. In [28] these measures were generalized to the setting of parity decision tree complexity. Parity decision tree complexity versions of these measures are actually known to be polynomially related to parity decision tree complexity [28]. It is also known that the certificate complexity provides a better bound [28]. The paper [20] provides an example of a function for which combinatorial measures give polynomially better lower bound than granularity.

Another more combinatorial approach goes through analogs of certificate complexity and block sensitivity for parity decision trees [28]. Since parity block sensitivity is always less or equal than parity certificate complexity and we are interested in lower bounds, we will introduce only certificate complexity here.

For a function f : {0,1}n →{− 1,1} and x ∈{0,1}n denote by C(f,x) the minimal co-dimension of an affine subspace in {0,1}n that contains x and on which f is constant. The parity certificate complexity of f is \(C_{\oplus }(f) = \max \limits _{x} C_{\oplus } (f,x)\).

Lemma 6

[28] For any function f : {0,1}n →{− 1,1} we have D(f) ≥ C(f).

The described techniques allow to establish tight lower bounds for most standard functions. The complexity of both AND and OR is equal to n through, for example, certificate complexity. The exact complexity of MOD3 function can be determined through the degree lower bound. Lower bounds for majority and recursive majority were discussed above.

3 Decision Trees with B2-Queries

In this section we show a no(n) lower bound for the complexity of MAJn function for B2-queries. As a warm-up we start with the analysis of the complexity of MAJn in decision tree model with AND and OR queries of arbitrary fan-in. This model was studied in [1] with the relation to certain PRAM model.

In this section it will be convenient to switch to {− 1,1} variables, that is we will consider \(\text {{MAJ}}_{n} \colon \{-1,1\}^{n} \to \{-1,1\}\) that is equal to 1 iff \({\sum }_{i=1}^{n} x_{i} \geq 0\).

First we observe that the complexity of all monotone functions cannot be maximal.

Lemma 7

For any monotone function f : {− 1,1}n →{− 1,1} we have D∧,∨(f) ≤ n − 1.

Proof

We can query all variables one by one until two variables are left. Now, observe that a monotone function of two remaining variables is either a constant, or a variable, or AND2, or OR2. We can compute this function in at most one query. □

This upper bound is tight for MAJn.

Theorem 3

D∧,∨(MAJn) = n − 1.

Proof

The upper bound follows from Lemma 7.

For the lower bound we will argue by adversary argument, that is we will describe the strategy of query answering forcing the decision tree to make at least n − 1 queries.

During the computation we will fix the values of some of the variables. We will maintain an undirected graph G on the variables that are not yet fixed. Each vertex in this graph will have degree either 0 or 1 (that is, our graph is a matching). Each edge in the graph is labeled by either 1 or − 1. The intuition behind the edges is the following. We connect xi and xj by an edge labeled by a iff we add the restriction that at least one of the variables xi and xj is equal to a. That is, we are not allowed to fix both variables to − a in the future.

In the beginning the set of vertices of G consists of all variables and there are no edges. In one query the number of connected components will reduce by at most 1, with only one exception (when we remove one connected component without making queries). We will answer the queries in such a way that to know the value of the function the decision tree should reduce our graph to an empty graph. From this it follows that at least n − 1 queries are needed.

Along with the graph we maintain the parameter t that is equal to the sum of the values of already fixed variables. During most of the process we will have that t ∈{− 1,0}.

Next we describe how to answer the queries. If the query asks the value of one of the variables xi, there are two cases. If this variable is isolated in G, we fix the value of the variable in such a way that the new value of t still lies in {− 1,0}. If xi was connected by an edge to xj, we fix xi = 1 and xj = − 1. The value of t does not change. In both cases we remove one connected component from G.

Next suppose the query asks AND or OR of several variables. Without loss of generality consider a query \(\bigwedge _{i\in S} x_{i}\) for some \(S \subseteq [n]\) with |S|≥ 2. The case of OR-query is symmetric. We can assume that none of the variables in S are already fixed, since otherwise we can either answer the query without fixing new variables, or simplify the query. Suppose there is an edge {xi,xj} in G, such that iS. In this case we fix xi = 1 and xj = − 1. The answer to the query is 1 (that is, ‘false’). The number of connected components has reduced by 1 and t does not change. Next suppose that all vertices xi with iS are isolated. Since |S|≥ 2 we can consider two distinct variables xi and xj with i,jS. We connect these vertices by an edge with the label 1. Thus, we promise that at least one of the variables is 1 and the answer to the query is 1. Since we introduce one edge, the number of connected components reduces by 1. The value of t does not change since we do not fix any variables.

We maintain this query answering strategy until a certain condition is met. To describe this condition we need to introduce some notation. At an arbitrary point of computation denote by A the number of − 1-edges, by B the number of 1-edges and by C the number of isolated vertices. Note that A + B + C is the number of connected components in G.

At some point of the query answering we will have that A + C = 1 or B + C = 1. These cases are symmetric, without loss of generality suppose we have A + C = 1. If at this point of the computation we have t = 0 this might be a potential problem: note that none of 1-edges can change the balance to the negative side. If after fixing the last isolated vertex or the last − 1-edge the balance does not decrease, it must be non-negative for all assignments of variables consistent with the current restrictions. So, to keep the function non-constant we will fix the last isolated vertex or − 1-edge as soon as A + C = 1. If C = 1, we set the isolated vertex to − 1 and we have t = − 1 or t = − 2. If A = 1 and t = 0, we set both of the variables connected by the edge to − 1 and we have t = − 2. If A = 1 and t = − 1 we set one of the variables to 1 and the other to − 1 and we have t = − 1.

In the rest of the process we have that all the remaining vertices are connected by 1-edges. Answering the queries as before we keep t the same. Thus, there is an input consistent with our answers such that MAJn is − 1 on this input. On the other hand, if at least one of 1-edges is still present in the graph we can set both of its vertices to 1 and make the balance t non-negative. Thus, in this case there is also an assignment on which the value of the function is equal to 1. Thus, to make the function to be constant we should remove all connected components from G. □

We now proceed to the proof of the lower bound for binary queries. The main idea behind the proof is the same as in the previous theorem, but there are many technical problems we need to overcome.

Theorem 4

\(\mathsf {D}_{\mathrm {B}_2}(\text {{MAJ}}_{n}) \geq n-O(\sqrt {n})\).

Proof

Let us first classify the queries that can be made by functions in B2. First note, that the queries q and − q are equivalent. Next, there are functions in B2 depending on at most one variable. They correspond to querying just one one variable. Next, there are OR-type functions in B2, that is the function of the form \(({x_{i}^{a}} \vee {x_{j}^{b}})\) for a,b ∈{0,1}, where \({x_{i}^{1}}=x_{i}\) and \({x_{i}^{0}} = - x_{i}\). Finally, there are two XOR-type functions in B2. Due to the equivalence of a query and its negation, they correspond to the query xixj. Note that the last query basically asks whether variables xi and xj are equal.

The proof strategy is similar to the one in the previous proof. During the computation we will maintain the graph G. But now the vertices of G are new fresh variables that we denote by y1,…,yk (here k is the number of vertices in G). To each of the vertices yi we assign some integer weight ci. Some of the vertices are connected by edges, labeled by 1 or − 1. The edges form a matching. We will maintain that the weights of connected vertices are equal. We will maintain that \(1 \leq c_{i} \leq \sqrt {n}\).

The intuition behind the graph is the following. At each point of the computation some of the original input variables xi are fixed to constants and the rest are split into equivalence classes. All variables in each of the classes are fixed to some variable yj, or to its negation − yj. We will maintain the following relation

$$ {\sum}_{i\colon x_{i} \text{ is unfixed }} x_{i} = {\sum}_{j=1}^{k} c_{j} y_{j}. $$

Basically, if at some point we fix all unfixed variables yi, then the weighted sum of yj’s with weights cj is the same as the sum of all xi variables. In the beginning of the computation none of the variables xi are fixed and each variable constitutes its own equivalence class. In other words, initially k = n, for all i we set xi = yi, ci = 1 and there are no edges in the graph G.

We will answer queries in such a way that the number of connected components of G will reduce as slowly as possible. On almost all steps the number of connected components will reduce by 1, but sometimes we will have to reduce it by 2. We will show that such steps cannot happen too many times. We will show how to answer queries in such a way that to know the value of the function the decision tree must reduce the number of connected components to a small number.

We also maintain a parameter t that is equal to the sum of the values of already fixed variables xi.

The computation will proceed in two phases. In the first phase we will maintain that \(- \sqrt {n} \leq t \leq \sqrt {n}\).

We now explain how to answer the queries in the first phase. Note that each query to variables of x can be restated as a query to variables of y (since each xi is fixed either to a constant or to some variable yj). First we consider the case that the query addresses the variables of y that are isolated (as nodes of graph G).

Queries to Isolated Vertices Suppose the query asks the value of one of the variables yi. We then fix the value of the variable in such a way that ciyi and t have opposite signs. We remove the variable yi from the graph. Since \(c_{i} \leq \sqrt {n}\) the balance t is still at most \(\sqrt {n}\) in absolute value.

Suppose the query asks whether yi = yj. Suppose first that cicj, suppose without loss of generality that ci > cj. Then the adversary reply with yiyj, so we identify yj = −yi, remove the vertex yj from G and subtract cj from ci. It is easy to see that all properties are maintained. The number of connected components reduces by 1.

If on the other hand ci = cj, then if \(c_{i} > \sqrt {n}/2\), we fix yi = 1 and yj = − 1, and remove both vertices from G. The number of connected components in this case reduces by 2. If on the other hand \(c_{i} \leq \sqrt {n}/2\), we set yj = yi, remove yj from G and add cj to ci. The number of connected components reduces by 1.

Suppose next that the query asks the function yi ∨¬yj. In this case if t ≥ 0 we set yi = − 1, otherwise we fix yj = 1. In the first case we remove yi from G and in the second we remove yj. The answer to the query in both cases is − 1. The number of connected components reduces by 1 and since \(c_{i}, c_{j} \leq \sqrt {n}\) the balance t is still at most \(\sqrt {n}\) in absolute value.

Finally, suppose the query asks yiyj or yiyj. Suppose first that cicj, suppose without loss of generality that ci > cj. Then again we set yj = −yi, remove the vertex yj from G and subtract cj from ci. It is easy to see that all properties are maintained. The number of connected components reduces by 1.

If on the other hand ci = cj we connect yi and yj by an edge. We label the edge by − 1 for the case of yiyj query and by 1 for the case of yiyj query. The number of connected components reduces by 1.

Next we proceed to queries to non-isolated vertices.

Queries to Non-Isolated Vertices First consider arbitrary queries of the form yi, yi ∨¬yj, yiyj or yiyj and suppose yi is connected by an edge to some other vertex yl (possibly l = j). For all these types of queries we can fix the answer to the query by fixing yi to some constant. We also fix yl to the opposite constant and remove both vertices from G. Since ci = cl the balance t does not change. The number of connected components reduces by 1.

The only remaining case is the query of the form yi = yj for the case when yi is connected to some other vertex yl by an edge. If l = j we simply set yi = 1, yj = − 1 and remove both vertices from G. The balance t does not change and the number of connected components reduces by 1. If lj we let yi = yj and yl = −yj. We remove vertices yi and yl from the graph. Since ci and cl are equal the weight of yj does not change. The number of connected components reduce by 1.

We have described how to answer queries in the first phase. Next we describe at which point this phase ends. For this denote by A the sum of weights of vertices connected by − 1-edges, by B the sum of weights of vertices connected by 1-edges and by C the sum of weights of isolated vertices. The first phase ends once either \(A+C \leq 3 \sqrt n\) or \(B+C \leq 3 \sqrt n\). Without loss of generality assume that \(A+C \leq 3 \sqrt {n}\) (the other case is symmetric). Note that we can claim that \(A+C > \sqrt {n}\). Indeed, note that in one step of the first phase at most two vertices are removed and the weight of each vertex is at most \(\sqrt n\), so if \(A+C \leq \sqrt {n}\), then on the previous step we already had \(A+C \leq 3 \sqrt n\).

At this step of the computation we fix all isolated vertices and all vertices connected by − 1-edges to − 1. Before that we had \(-\sqrt n \leq t \leq \sqrt n\). Thus, since \(\sqrt {n} < A+C \leq 3\sqrt {n}\), after this step we have \(-4 \sqrt n \leq t < 0\) (we could be more careful here, but this only results in a multiplicative constant factor in \(O(\sqrt {n})\) in the theorem). After this the second phase of the computation starts. There are only vertices connected by 1-edges remained. We answer the queries as in the first phase. Note that the balance t does not change anymore. Thus if the sum of the weights of the remaining variables is at least \(4\sqrt {n}\), then the function is non-constant: on one hand setting one vertex in each pair to 1 and the other to − 1 we set function to − 1 and setting all variables to 1 we set the function to 1. Thus the function becomes constant only once the total weight of the remaining vertices is below \(4\sqrt {n}\), that is there are less than \(2\sqrt {n}\) connected components.

Let us now calculate how many queries the decision tree needs to make to set the function to a constant. In the beginning G has n connected components and in the end it has at most \(2\sqrt n\) connected components. On each step the number of connected components reduces by 1 with some exceptions that we consider below.

On the first phase there is the case when the number of connected components reduces by 2. Note that in this case the total weight of all vertices reduces by at least \(\sqrt n\). Since originally the total weight is n and the total weight never increases, this step can occur at most \(\sqrt n\) times.

Between the two phases we fix a lot of variables without answering any queries. Note that their total weight is at most \(3\sqrt {n}\), thus the number of connected components reduces by at most \(3\sqrt n\).

Thus, in total the decision tree needs to make at least \(n - 2\sqrt n - \sqrt n - 3 \sqrt n = n - O(\sqrt {n})\) queries to fix the function to a constant. □

We observe that the complexity of MAJn drops substantially if we allow to query parities of three variables.

Lemma 8

Suppose f : {− 1,1}n →{− 1,1} is symmetric function. Then there is a decision tree of depth \(\lceil \frac {2n}{3}\rceil \) making queries only of the form AND2, OR2 and XOR3 and computing f.

Proof

Split the variables in blocks of size 3. In each block query the parity of its variables. If the answer is − 1, query AND2 of any two variables in the block. If the answer to the first query is 1, query OR2 of any two variables in the block. It is easy to see that after these two queries we know the number of − 1 variables in the block. In the case when n is not divisible by 3, if there is a small block of size 1, it requires one query to handle. If there is block of size 2 we will handle it with two queries. Knowing the number of − 1 variables in all blocks is enough to output the value of the symmetric function. □

4 Parity Decision Tree Complexity of Threshold Functions

In this section we show a new lower bound for parity decision tree complexity of threshold functions.

To show that all previous techniques are not tight for some threshold functions we need an approach to prove even better lower bounds. We will do this via the following theorem.

Theorem 5

For any k and n we have \(\mathsf {D}_{\oplus }(\text {{THR}}_{n+2}^{k+1})\geq \mathsf {D}_{\oplus }({\text {{THR}}_{n}^{k}})+1\).

Proof

Let \(s = \mathsf {D}_{\oplus }(\text {{THR}}_{n+2}^{k+1})\). We will construct a parity decision tree for \({\text {{THR}}_{n}^{k}}\) making no more than s − 1 queries.

Denote the input variables to \({\text {{THR}}_{n}^{k}}\) by \(x=(x_{1},\ldots , x_{n}) \in \{0,1\}^{n}\). We introduce one more variable y (which we will fix later) and consider x1,…,xn,yy as inputs to the algorithm for \(\text {{THR}}_{n+2}^{k+1}\). Note that \({\text {{THR}}_{n}^{k}}(x)=\text {{THR}}_{n+2}^{k+1}(x,y,\neg y)\). Our plan is to simulate the algorithm for \(\text {{THR}}_{n+2}^{k+1}\) on (x,yy) (possibly reorded) and save one query on our way.

Consider the first query that the algorithm for \(\text {{THR}}_{n+2}^{k+1}\) makes. There are two substantially different cases: the first query asks parity of a proper subset of its inputs and the first query asks the parity of all inputs. We consider these two cases separately.

Suppose first that the query does not ask the parity of all input variables. Since the function \(\text {{THR}}_{n+2}^{k+1}\) is symmetric we can reorder the inputs in such a way that the query contains input y and does not contain ¬y, that is the query asks the parity \((\bigoplus _{i\in S} x_{i}) \oplus y\) for some \(S\subseteq [n]\). Now it is time for us to fix the value of y. We let \(y = \bigoplus _{i\in S} x_{i}\). Then the answer to the first query is 0, we can skip it and proceed to the second query. For each next query of the algorithm for \(\text {{THR}}_{n+2}^{k+1}\) if it contains y or ¬y (or both) we substitute them by \(\bigoplus _{i\in S} x_{i}\) and \((\bigoplus _{i\in S} x_{i})\oplus 1\) respectively. The result is the parity of some variables among x1,…,xn and we make this query to our original input x. Clearly the answer to the query to x is the same as the answer to the original query to (x,yy). Thus, making at most s − 1 queries we reach the leaf of the tree for \(\text {{THR}}_{n+2}^{k+1}\) and thus compute \(\text {{THR}}_{n+2}^{k+1}(x,y,\neg y)={\text {{THR}}_{n}^{k}}(x)\).

It remains to consider the case when the first query to \(\text {{THR}}_{n+2}^{k+1}\) is \((\bigoplus _{i=1}^{n} x_{i}) \oplus y \oplus \neg y\). This parity is equal to \(\bigoplus _{i=1}^{n} x_{i}\) and we make this query to x. Now we proceed to the second query in the computation of \(\text {{THR}}_{n+2}^{k+1}\) and this query does not query the parity of all input variables. We perform the same analysis as above for this query: rename the inputs, fix y to the parity of subset of x to make the answer to the query to be equal to 0, simulate further queries to (x,yy). Again we save one query in this case and compute \({\text {{THR}}_{n}^{k}}(x)\) in at most s − 1 queries. □

Next we analyze the decision tree complexity of \({\text {{THR}}_{n}^{2}}\) functions. For them the lower bound through granularity is tight. We need this analysis to use in combination with Theorem 5 to prove lower bound for \({\text {{THR}}_{n}^{3}}\).

Lemma 9

For even n we have \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{2}})=n\) and for odd n we have \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{2}})=n-1\).

Proof

We start with a lower bound.

Here we will need to consider two Fourier coefficients, \(\widehat {\text {{THR}}}^{2}_{n}(\emptyset )\) and \(\widehat {\text {{THR}}}^{2}_{n}([n])\). We start with the latter one.

We have

$$ \begin{array}{@{}rcl@{}} \widehat{\text{{THR}}}_{n}^{2}([n]) &=& \frac{1}{2^{n}}\left( {\sum}_{i=0}^{1} (-1)^{i} \binom{n}{i} - {\sum}_{i=2}^{n} (-1)^{i}\binom{n}{i}\right)\\ &=& \frac{1}{2^{n}}\!\left( \!2{\sum}_{i=0}^{1} (-1)^{i}\binom{n}{i} \!- {\sum}_{i=0}^{n} (-1)^{i}\binom{n}{i}\right) = \frac{1}{2^{n}} \left( 2{\sum}_{i=0}^{1} (-1)^{i}\binom{n}{i} - 0\!\right). \end{array} $$

From this we can see that \( \mathsf {gran}(\widehat {\text {{THR}}}_{n}^{2}([n])) =n - \mathsf {P}\left ({\sum }_{i=0}^{1} (-1)^{i}\binom {n}{i}\right ) -1 \) and thus

$$ \mathsf{D}_{\oplus}({\text{{THR}}_{n}^{2}}) \geq n - \mathsf{P}\left( {\sum}_{i=0}^{1} (-1)^{i}\binom{n}{i}\right). $$

By the same analysis for \(\widehat {\text {{THR}}}^{2}_{n}(\emptyset )\) we can show that

$$ \mathsf{D}_{\oplus}({\text{{THR}}_{n}^{2}}) \geq n - \mathsf{P}\left( {\sum}_{i=0}^{1} \binom{n}{i}\right). $$

Note that \({\sum }_{i=0}^{1} (-1)^{i}\binom {n}{i} = 1-n\) and \({\sum }_{i=0}^{1} \binom {n}{i} = 1+n\). From this for even n we clearly obtain a lower bound of \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{2}}) \geq n\). For odd n it is easy to see that one of the numbers 1 − n and 1 + n is not divisible by 4. Thus for odd n we obtain lower bound \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{2}}) \geq n-1\).

It remains to prove that the lower bound is tight for odd n. To provide an algorithm making at most n − 1 queries we again will split variables into blocks and again will assume that in the beginning all blocks are of size 1. We split all variables but one into pairs and check whether variables in each pair are equal. After this we have (n − 1)/2 blocks of size 2 and one block of size 1. If there is a balanced block of size 2, again we can just query one variable from each of the remaining blocks thus learning the number of ones in the input. This allows us to compute the function in at most n − 1 queries. If all blocks of size 2 contain equal variables, then note that the value of the function does not depend on the variable in the block of size 1. Indeed, \({\text {{THR}}_{n}^{2}}(x) = 1\) iff \({\sum }_{i} x_{i} \geq 2\) iff there is a block of size 2 containing variables equal to 1. Thus it remains to query one variable from each block of size 2, which again alows us to compute the function with at most n − 1 queries. □

Next we compute the granularity for threshold functions with threshold three.

Lemma 10

For n = 8m + 2 for integer m we have \(\mathsf {gran}({\text {{THR}}_{n}^{3}}) = n-3\).

Proof

For the upper bound we need to consider an arbitrary Fourier coefficient \(\widehat {\text {{THR}}}^{3}_{n}(S)\). We have

$$ \begin{array}{@{}rcl@{}} \widehat{\text{{THR}}}_{n}^{3}([S]) &=& \frac{1}{2^{n}}\left( {\sum}_{x, |x|\leq 2} \chi_{S}(x) - {\sum}_{x, |x|\geq 3} \chi_{S}(x)\right)\\ &=& \frac{1}{2^{n}}\left( 2{\sum}_{x, |x|\leq 2} \chi_{S}(x) - {\sum}_{x\in \{0,1\}^{n}} \chi_{S}(x)\right), \end{array} $$

where by |x| we denote \({\sum }_{i=1}^{n} x_{i}\). The second sum in the last expression is equal to either 2n or 0 depending on S. Thus we have

$$ \mathsf{gran}(\widehat{\text{{THR}}}_{n}^{3}(S)) =n - \mathsf{P}\left( {\sum}_{x, |x|\leq 2} \chi_{S}(x)\right) -1. $$
(1)

Denote the size of S by l. Then we have

$$ {\sum}_{x, |x|\leq 2} \chi_{S}(x) = 1 - l + (n - l) + \frac{l(l-1)}{2} - l(n-l) + \frac{(n-l)(n-l-1)}{2}, $$

where the first summand corresponds to x with |x| = 0, the next two summands correspond to |x| = 1 and the last three correspond to |x| = 2.

Rearranging this expression we obtain

$$ {\sum}_{x, |x|\leq 2} \chi_{S}(x) = \frac{4l^{2} + 2 + (n+1)(n-4l)}{2}. $$

We need to show that for n ≡ 2 (mod 8) this number is divisible by 4, that is its numerator is divisible by 8. Since divisibility by 8 depends only on the remainder of n when divided by 8, it is enough to check divisibility of the numerator by 8 for n = 2. We have

$$ 4l^{2} + 2 + (n+1)(n-4l) = 4l^{2} + 2 + 3(2-4l) = 4(l^{2} -3l + 2), $$

which is clearly divisible by 8 for all l. Thus \(\mathsf {P}\left ({\sum }_{x, |x|\leq 2} \chi _{S}(x)\right )\geq 2\) for n = 8m + 2 and

$$ \mathsf{gran}({\text{{THR}}_{n}^{3}}) \leq n - 3. $$

For the lower bound on the granularity it is enough to consider Fourier coefficients \(\widehat {\text {{THR}}}^{3}_{n}(\emptyset )\) and \(\widehat {\text {{THR}}}^{3}_{n}([n])\). For them we have

$$ {\sum}_{x, |x|\leq 2} \chi_{\emptyset}(x) = 1 + n + \frac{n(n-1)}{2} = \frac{2+n(n+1)}{2} $$

and

$$ {\sum}_{x, |x|\leq 2} \chi_{[n]}(x) = 1 - n + \frac{n(n-1)}{2} = \frac{2 + n(n-3)}{2}. $$

To show the lower bound it is enough to show that for any n = 8m + 2 at least one of these expressions is not divisible by 8, that is their numerators are not divisible by 16. It is straightforward to check that for n ≡ 2 (mod 16) we have 2 + n(n + 1) ≡ 8 (mod 16) and for n ≡ 10 (mod 16) we have 2 + n(n − 3) ≡ 8 (mod 16). In both cases by (1) we found a Fourier coefficients with granularity at least n − 3. □

We now show that for functions in Lemma 10 their decision tree complexity is greater than their granularity plus one. Note, that since granularity lower bound is not worse than the lower bounds through the sparsity and the degree, they also do not give tight lower bounds. Also it is easy to see that the certificate complexity does not give optimal lower bound as well (note that each input x lies in an affine subspace of dimension 2 on which the function is constant).

Theorem 6

For n = 8m + 2 for integer m > 0 we have \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{3}}) = n-1\).

Proof

For the lower bound we note that n − 2 is even and thus by Lemma 9 we have \(\mathsf {D}_{\oplus }(\text {{THR}}_{n-2}^{2})\geq n-2\). Then by Theorem 5 we have \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{3}})\geq n-1\).

For the upper bound we again view the inputs as blocks of size 1 and by checking equality of variables start combining all variables but two into larger blocks. We first combine them into blocks of size 2 and then combine all unbalanced blocks, except possibly one, to blocks of size 4. If in this process we ever encounter a balanced block we just query one variable from all other blocks thus learning the number of ones in the input in at most n − 1 queries. If all blocks contain equal variables, then there is one block of size 2. As in the proof of Lemma 9 we observe that two variables in this block do not affect the value of the function. Indeed, \({\text {{THR}}_{n}^{3}}(x) = 1\) iff \({\sum }_{i} x_{i} \geq 3\) iff there is a block of size 4 containing variables equal to 1. □

Thus, we have shown that previously known lower bounds are not tight for \(\text {{THR}}_{8m+2}^{3}\). However, the gap between the lower bound and the actual complexity is 1.

Remark 1

We note that from our analysis it is straightforward to determine the complexity of \({\text {{THR}}_{n}^{3}}\) for all n. If n = 4m or 4m + 3 for some m, then \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{3}})=n\) and if n = 4m + 1 or n = 4m + 2, then \(\mathsf {D}_{\oplus }({\text {{THR}}_{n}^{3}})=n-1\). The lower bounds (apart from the case covered by Theorem 6) follows from the consideration of \(\widehat {\text {{THR}}}^{3}_{n}(\emptyset )\) and \(\widehat {\text {{THR}}}^{3}_{n}([n])\) as in the proof of Lemma 10. The upper bound follows the same analysis as in the proof of Theorem 6.

5 Conclusion

The next natural question would be address the complexity of Boolean functions in decision tree model that can query functions of k variables for k > 2. In this model lower bounds of the form n/k are trivial, but it is not clear how to prove truly linear lower bounds. On the other hand, it is easy to show by counting argument that there are hard functions for this model. Recall, that the majority function has complexity at most 2n/3 for k = 3. So more complicated functions are needed here.

Another important direction is further studies of lower bounds for parity decision trees. One of the key goals here is to show that parity decision tree complexity and sparsity are polynomially related. This would resolve Log-rank Conjecture for XOR-functions.