1 Introduction

Today cryptography is spreading everywhere in a lot of devices and especially small, mobile, low energy and low cost pieces of equipment such as Bluetooth earpieces, RFID tags, sensors. To design and implement cipher algorithms on these devices, there is an eager need of small footprint Boolean or finite functions achieving a good trade-off in term of complexity and cryptographic properties [7]. Other discrete algorithms may also take advantage of these finite functions like hash tables computation for storing and sorting data.

Some works already deal with complexity issues of the partially symmetric Boolean functions [8] and the symmetric multivalued functions [6, 11, 12]. They show that decision diagrams [4, 15] are well suited to benefit from symmetries and we enhance this further on with excellent results for symmetric multivalued functions. Decision Diagrams (either binary or multivalued) are able both to provide a measure of the complexity of these functions and to achieve an efficient implementation. This is why our analysis makes huge use of this tool.

On general functions the size of the MDD is highly dependent on the order of the variables as can be shown with the direct storage access Boolean function of k + 2k variables, whose BDD size varies from 2k + 1 + 1 [5] to \(2^{2^{k+1}}\) [10]. Due to the symmetry, the MDD of symmetric functions have the same size whatever is the variables’ order. So, their study gives directly the best size of any MDD representation. Another asset of symmetric functions is that their MDD have bounds [6, 12] of small order, but no result expresses exactly the maximum size. No result provides functions achieving this maximum either. We investigate these points in order to establish the exact maximum value of these MDD, with different levels of reductions [4].

The balanced symmetric functions are already in use in some existing cryptographic algorithms like for instance θ function of SHA3 third round finalist Keccak [2]. Their use is also proposed in the tweaked version of SFINKS [3] and in the generic study on symmetric Boolean functions [7]. This shows that when properly combined with other functions they allow good results. The trade-off in term of complexity of the symmetric functions can help to withstand BDD based attacks [9, 14], especially with symmetric functions that maximise the MDD sizes.

In this paper, we study the structure of symmetric functions MDD and prove their maximum size. We exhibit the functions reaching this maximum value (the “hard” symmetric functions) and give an external characterization of such functions linked to De Bruijn sequences. We then derive some properties and counting results on these “hard” symmetric functions. Annexes provide tables of experimental results.

2 Definitions and notations

Let us consider \({E_{k}=\lbrace 0, 1, \ldots , k-1 \rbrace}\) a set of k elements, and n, q, m which are positive integers. Let \( C_{n}^{k} \) be the choose k among n binomial coefficient. Card(E) stands for the cardinal of the set E.

2.1 Multivalued functions

Definition 1

Given any positive integers n, q, m, we call multivalued function any function from \(E_{q}^{n}\) to E m . The set of multivalued functions is denoted by M n (q, m).

This set M n (q, m) contains \( m^{q^{n}}\) functions. A function f ∈ M n (q, m) is characterized by a vector \({f_{v} \in E_{m}^{q^{n}}}\) called its value vector, consisting in the evaluations of the function at every q n possible input:

$$ f_{v}=\left( f(0,\ldots,0),f(0,\ldots,0,1),f(0,\ldots,0,2),\ldots,f(q-1,\ldots,q-1) \right) \enspace. $$
(1)

2.2 Partitions

Definition 2

[1] A partition \({\pi=\left(\pi_{1},\ldots,\pi_{k}\right)}\) of an integer N bounded by an integer b is a sequence of numbers 0 ≤ π 1 ≤ ... ≤ π k  ≤ b such that \({N=\sum_{i=1}^{k} \pi_{i}}\).

A partition can also be represented by its “number of repetitions”:

$$ \pi= \langle r_{0},\ldots,r_{b} \rangle \,\mbox{ where }\, r_{i}:= {\rm{Card}} \left \lbrace \pi_{j }= i; j \in \lbrace 1, \ldots, k \rbrace \right \rbrace \enspace. $$
(2)

Thus, we have \({N=\sum_{i=0}^{b} i \times r_{i}}\) and \({k=\sum_{i=0}^{b} r_{i}}\). The two representations are equivalent.

We denote by Part(b, k, N) the set of all partitions of all integers lower or equal to N. For all \({x \in E_{q}^{n}}\), there is a single partition π(x) ∈ Part(q − 1, n,n(q − 1)) which represents x.

Example 1

n = 5, q = 3.

Let \({x=(2,1,1,0,1) \in E_{q}^{n}}\), then \({\pi(x)=(0,1,1,1,2)=\langle 1,3,1 \rangle}\).

Remark that two vectors which have the same partition, have the same components up to a permutation.

The Lemma of Andrew [1] gives the number of elements of the set Part(b, a, ab).

Lemma 1

\({\rm{Card}}({\rm{Part}}(b,a,ab))=C_{a+b}^{a}=C^{b}_{a+b}\) .

2.3 Symmetric multivalued functions

Definition 3

A multivalued function \(f:E_{q}^{n}\longrightarrow E_{m}\), is symmetric, if f is invariant under any permutation of its input’s variables:

$$ \forall\sigma\in S_{n}, \ f\left(x_{1},\ldots,x_{n}\right)=f\left(x_{\sigma(1)},\ldots,x_{\sigma(n)}\right). $$

The set of these symmetric multivalued functions is denoted by SM n (q, m).

Definition 4

We call symmetry class of \({x \in E_{q}^{n}}\), the set P n,q (x) of vectors obtained by permuting the coordinates of x defined by:

$$ P_{n,q}(x):=\left\lbrace y \in E_{q}^{n} \ ; \mbox{ there exists a partition $\pi$ such that } y=\pi(x) \right\rbrace \enspace. $$
(3)

According to the lemma of Andrew, we deduce that there are \(C_{n+q-1}^{q-1}\) symmetry classes in \(E_{q}^{n}\). We designate as representative of a class of symmetry, the smallest element in the lexicographical order, \(s_j = \left(0^{r0}, 1^{r1}, \ldots i^{ri}, \ldots (q-1)^{r(q-1)}\right)\) . We call jth symmetry class of \(E_{q}^{n}\) the class of symmetry whose representative s j is classified jth among all representatives according to the lexicographical order.

A symmetric multivalued function can be represented by a vector with values in E m , whose length equals \(C_{n+q-1}^{q-1}\). The components of the vector are the evaluations of the function for each representative of the symmetry classes. We call this vector a simplified value vector of the function:

$$\begin{array}{rll} f_{sv}&=&\left( f(0,\ldots,0),\ldots,f(0,\ldots,0,q-1),f(0,\ldots,0,1,1),\ldots,f(q-1,\ldots,q-1) \right) \\ &=&\left( f\left(s_0\right),f(s_1),\ldots,f(s_ {q-1}),f\left(s_q\right),\ldots,f\left(s_ {C_{n+q-1}^{q-1}-1}\right) \right) \end{array} $$
(4)

2.4 Multivalued decision diagrams

A Multivalued Decision Diagram (MDD) [15] is a generalization of a Binary Decision Diagram (BDD) [4]. In the same manner as the BDD represents and implements the Boolean functions, the MDD also represents and implements the multivalued functions.

Definition 5

A multivalued decision diagram (MDD) is a rooted directed acyclic graph G = (U, E) with two types of nodes:

  • the non-terminal nodes u which are labeled with a variable x i and have q outgoing edges e b labeled with the q possible values b in E q i.e. q children.

  • the terminal nodes u which are labeled with a value c in E m and have no outgoing edge.

A MDD is ordered if the variables labeling nodes in any path from the root to any terminal node are in the same order. Bryant [4] has defined a procedure reduce which reduces a BDD in a single and optimal way. This procedure applies two rules: the fusion rule and the suppression rule. Bryant’s procedures can easily be generalized to deal with the MDD.

Definition 6

The fusion rule says that two nodes are merged if their subgraphs are isomorphic (Fig. 1). The suppression rule says that a node is deleted if it has only one child node (Fig. 2).

Fig. 1
figure 1

The fusion rule applied on a MDD producing a QROMDD

Fig. 2
figure 2

The suppression rule applied on a QROMDD producing a ROMDD

Definition 7

A Reduced Ordered Multivalued Decision Diagram (ROMDD) is a MDD reduced by the full reduction procedure (i.e. fusion and suppression rules) (Figs. 1 and 2).

Definition 8

A Quasi Reduced Ordered Multivalued Decision Diagram (QROMDD) is a MDD reduced by using only the fusion rule.

2.5 Complexity

The number of nodes of a MDD (both terminal and non-terminal nodes) is called the size of the MDD. We call height of a node its distance to the top. The set of nodes having a same height \({k \in \lbrace 0,1,\ldots,n \rbrace}\) is called level k of the MDD.

Definition 9

Let f be in M n (q, m), we define:

  • its complexity, noted c Q (f), the size of its QROMDD.

  • its reduced complexity, noted c R (f), the size of its ROMDD.

Definition 10

We define SCQ n (q, m) and SCR n (q, m) as the largest complexities of the functions in SM n (q, m): \(SCQ_{n}(q,m):=\max \left \lbrace c_Q(f), f \in SM_{n}(q,m) \right \rbrace \) and \(SCR_{n}(q,m):=\max \left \lbrace c_R(f), f \in SM_{n}(q,m) \right \rbrace \).

3 Symmetric functions representation by simplified MDD

We put forward an optimized representation of symmetric functions by MDD called a simplified MDD which is a partially reduced MDD using the fusion rule. The idea is to associate a symmetry class to each node of the MDD. So it is linked to the simplified value vector of the function.

The simplified MDD of a symmetric function f in SM n (q, m) is defined as follows. Let u k,j be the jth node from the left of level \({k \in \lbrace 0,\ldots,n \rbrace}\) and \({j \in \lbrace 1,\ldots, {C^{q-1}_{k+q-1}} \rbrace}\), then u k,j represents the jth symmetry class of the set \(E_{q}^{k}\).

  • If k = n then u k,j is terminal. Its value is equal to f(s j ) where \({s_j \in E_{q}^{n}}\) is the representative of the jth class of symmetry of the set \(E_{q}^{n}\). So these terminal nodes show the simplified value vector of f.

  • Else, u k,j is not terminal and it has q distinct children. Each child represents a distinct symmetry class of the set \(E_{q}^{k+1}\).

A simplified MDD has a height equal to n + 1 and has \(C^{q-1}_{k+q-1}\) nodes on each of its levels k.

Example 2

Let f ∈ SM 3(3, 4). The nodes are labeled by the variable number and have 3 children with edges tagged by values in E 3. The terminal nodes are labeled by values in E 4 (Fig. 3).

Fig. 3
figure 3

Simplified MDD of f in SM 3(3, 4)

The hockey sticks property on Pascal’s triangle enables us to infer the number of nodes of a simplified MDD, it says that: \(\sum_{i=k}^{n} C_{i}^{k}=C^{k+1}_{n+1} \).

Thus, the simplified MDD has \(C_{q+n}^{q}\) nodes, which enables us to deduce an upper bound for the complexity of any symmetric multivalued function.

Lemma 2

[6] Let f be in SM n (q, m), then \(c_Q(f)\leq C_{n+q}^{q}\) .

We notice that the representation of a symmetric function by a simplified MDD is significantly smaller than by a generic MDD since for large n, we have:

  • the number of nodes of a simplified MDD is:

    $$ C_{q+n}^{q} \approx \frac{2^{q+n}}{e^{\frac{(n+q-2q)^{2}}{2(n+q)}} \sqrt{\frac{\pi(n+q)}{2}}} \enspace. $$
    (5)
  • the number of nodes of a generic MDD is:

    $$ \frac{q^{n+1}-q}{q-1} \approx q^{n} \enspace. $$
    (6)

4 Symmetric functions of maximum complexity

Definition 11

A symmetric multivalued function f in SM n (q, m) is called hard symmetric if its complexity attains SCQ n (q, m), i.e. if: c Q (f) = SCQ n (q, m).

We denote by HSM n (q, m) the set of hard symmetric multivalued functions:

$$ HSM_{n}(q,m):= \left\lbrace f \in SM_{n}(q,m) ; c_Q(f)=SCQ_{n}(q,m) \right\rbrace \enspace. $$
(7)

Definition 12

A symmetric multivalued function f in SM n (q, m) is called super hard symmetric if its complexity attains SCR n (q, m), i.e. if: c R (f) = SCR n (q, m). We denote by SHSM n (q, m) the set of super hard symmetric multivalued functions:

$$ SHSM_{n}(q,m):= \left\lbrace f \in SM_{n}(q,m) ; c_R(f)=SCR_{n}(q,m) \right\rbrace \enspace. $$
(8)

In the multivalued case, we notice that:

  • \(C_{k+q-1}^{q-1}\) is the number of nodes of a simplified MDD on the level k,

  • \(m^{C_{n-k+q-1}^{q-1}}\) is the number of symmetric functions with n − k variables.

To compute the complexity for all integer \({k \in \lbrace 0,1,\ldots,n \rbrace}\), we define SR n,q,m (k) by:

$$ \begin{array}[t]{lccc} SR_{n,q,m}: & \lbrace 0,1,\ldots,n \rbrace & \longmapsto & \mathbb{N} \\ & k & \longmapsto & \min \left( C_{k+q-1}^{q-1},m^{C_{n-k+q-1}^{q-1}} \right) \enspace. \end{array} $$
(9)

Definition 13

We call symmetric inflection level, h(n, q, m), the integer h such that h(0, q, m) = 0, h(1, q, m) = 1 and when n ≥ 2 then h is the unique integer verifying:

$$ \left\{ \begin{array}{lcl} 0 < h& \leq &n\\[3pt] C_{h+q-2}^{q-1} & < & m^{C_{n-h+q}^{q-1}} \\[6pt] C_{h+q-1}^{q-1} & \geq & m^{C_{n-h+q-1}^{q-1}} \end{array} \right. $$
(10)

i.e.

$$ \left\{ \begin{array}{lcl} SR_{n,q,m}(h-1) & = & C_{h+q-2}^{q-1} \\[6pt] SR_{n,q,m}(h) & = & m^{C_{n-h+q-1}^{q-1}} \end{array} \right. $$
(11)

Theorem 1

$$ \begin{aligned} \forall n \geq 1, \ SCQ_{n}(q,m)&= \displaystyle\sum\limits_{k=0}^{n} SR_{n,q,m}(k)\\ &= C_{q+h(n,q,m)-1}^{q} + \displaystyle\sum\limits_{k=h(n,q,m)}^{n} m^{C_{n-k+q-1}^{q-1}} \end{aligned} $$

Proof

We compute the maximum number of nodes that can appear in the QROMDD of a symmetric function. We start from its simplified MDD. By applying the fusion rule to a MDD, we know that there is fusion of nodes if and only if sub-graphs are isomorphic. The remaining nodes counts are then summed. Then apply the hockey sticks formula to the h(n, q, m) − 1 first terms of the sum. The terms \(m^{C_{n-k+q-1}^{q-1}}\) after inflection point are then summed up also.□

5 Simplified value vector in the case q = 2 and any m

5.1 General results for any n

For the particular case q = 2 , the simplified value vector can be read directly on the last level of the simplified MDD for any m.

Theorem 2

Given an integer a when n takes all the values between a + m a − 2 and a + m a + 1 − 1, i.e. n = a + m a + b − 2, for all \({b \in \lbrace 0, \ldots, (m-1)m^{a}\rbrace}\) , then the symmetric inflection level h(n,2,m) has the following properties:

  1. (i)

    h(n, 2, m) = m a + b − 1 = n − a + 1,

  2. (ii)

    SR n,2,m (h(n, 2, m)) is equal to m a ,

  3. (iii)

    SR n,2,m (h(n, 2, m) − 1) is equal to h(n, 2, m).

Proof

In the case q = 2,

$$ SR_{n,q,m}(k)=\min\left(C_{k+1}^{1},m^{C_{n-k+1}^{1}}\right) = \min\left(k+1,m^{n-k+1}\right) \enspace. $$
(12)

For (i), it is sufficient to check that the value n − a + 1 satisfies the conditions (10) on h(n, 2, m). For (ii) and (iii), the pair of inflection is calculated with the new value of h(n, 2, m) in (i): SR n,2,m (h(n, 2, m)) = SR n, 2, m (n − a + 1), SR n,2,m (h(n,2,m) − 1) = SR n,2,m (n − a).□

Example 3

Let f ∈ SM 10(2, 2). In this case, n = 10, a = 3 (Fig. 4).

Fig. 4
figure 4

Simplified MDD and sub-graph of f

Let G f be the simplified MDD associated to the symmetric function f ∈ SM n (q, m). We call sub-graph sG f of G f any sub-graph of height equals to n + 1 − h(n, 2, m) = a and whose root is of level h(n, 2, m) in G f . We call terminal vector into a sub-graph sG f , the values of the terminal nodes read from left to right. The terminal vector of a subgraph has a length equal to a.

By definition, a simplified MDD has \({C_{h(n,2,m)+1}^{1}=h(n,2,m)+1}\) sub-graphs sG f in the case q = 2.

Lemma 3

A function is hard if and only if the number of its sub-graphs sG f being not isomorphic is maximum.

Proof

Let us consider the simplified MDD of a hard symmetric function, then by applying the fusion rule, there will be fusion of nodes for the levels k with k bigger or equal to the symmetric inflection point. However we know that there is fusion of nodes if and only if the nodes are the roots of isomorphic sub-graphs.□

Corollary 1

Let n = a + m a + b − 2 where a ≥ 0, \({b \in \lbrace 0,\ldots,(m-1)m^{a} \rbrace}\) . If a function f in SM n (2, m) is hard then in its simplified value vector, it appears m a consecutive letter patterns of length a .

Proof

According to the Theorem 2, we know that a hard symmetric function has exactly m a nodes at its symmetric inflection level h(n, 2, m). Thus the simplified MDD of a hard function must have m a non-isomorphic sub-graphs sG f . The sub-graphs sG f have exactly the same structure, so we deduce from it that two sub-graphs are isomorphic if and only if their terminal vectors are identical. Terminal vectors length of these sub-graphs is equal to a, hence the result.□

5.2 De Bruijn sequences and terminal vectors

According to the previous theorem, for n = m a + a − 2, h(n, 2, m) + 1 = m a, i.e. the simplified MDD of a hard symmetric function has the same number of nodes at the level h(n,2,m) as its QROMDD. So we deduce the following theorem.

Theorem 3

Let n and a be two positive integers such that n = a + m a − 2 and let f ∈ SM n (2, m). Then f is hard if and only if in its simplified value vector, it appears exactly m a distinct subsequences with length equal to a .

This property is typical of De Bruijn sequences [13]. Let A be an alphabet of m letters, then a De Bruijn sequence B(m, a) is a cyclic sequence such that each subsequence with length equal to a appears exactly once. Each sequence B(m, a) has a length equal to m a. De Bruijn sequences can be constructed using a De Bruijn graph or by using finite fields [13]. There are \(\frac{m!^{m^{a-1}}}{m^{a}}\) distinct sequences B(m, a) [13].

A simplified MDD of a symmetric function with n = m a + a − 2 variables contains m a + a − 1 terminal nodes. The following theorem settles the link between the De Bruijn sequences and the simplified value vectors of hard symmetric functions.

Theorem 4

Let n ≥ 1 and a ≥ 0 be integers such that n = a + m a − 2. Then the simplified value vector of f ∈ HSM n (2, m) is a rotation of a De Bruijn sequence B(m,a) at the end of which one the (a − 1) first letters of the sequence are concatenated.

Example 4

Simplified value vector read in the simplified MDD of a function f ∈ HSM 9(2, 3), i.e. a = 2 (Fig. 5).

Fig. 5
figure 5

Simplified value vector of f

In this example, we can read 32 = 9 subsequences whose lengths are equal to 2: 00, 01, 10, 02, 21, 11, 12, 22, 20. They never appear more than once.

Corollary 2

Let n ≥ 1 and a ≥ 0 be integers such that n = a + m a − 2, the number of hard symmetric functions of parameters (n, 2, m) is equal to \({m!^{m^{a-1}}}\) .

Proof

It is sufficient to note that the number of hard symmetric functions is equal to the total number of sequences obtained by all possible rotations of the De Bruijn sequences B(m, a), i.e. the number of all the sequences multiplied by their size. □

Conjecture 1

Let a ≥ 0 and n such that n = a + 2a − 4 or n = a + 2a − 5. Then the number of super hard symmetric Boolean functions of parameters (n, 2, 2) is equal to \({2^{2^{a-1}}} - {2^{2^{a-1}-a+1}}\).

To enforce the first results obtained from computer search up to the value of a = 5 (see Table 3 Appendix B) we can notice that the simplified value vectors of these functions look like truncated De Bruijn sequences among which some are discarded. This point is emphasised by the observation that they are all hard functions. Other questions are; why n = a + 2a − 4 and n = a + 2a − 5 produce the same number of super hard functions, or can we easily link those two sets?

5.3 Algebraic degree of hard symmetric boolean functions

The following theorem links the periodicity of the simplified value vector of a symmetric Boolean function and its algebraic degree [7].

Theorem 5

[7] Let f be in SM n (2, 2), then the simplified value vector of f , \({v_{s}=\left( v_{s}(0),\ldots,v_{s}(n) \right)}\) is periodic with period 2t , 2t < n, if and only if \({\deg(f) \leq 2^{t} - 1}\) .

For n = a + 2a − 2, the simplified value vector v s of a hard symmetric Boolean function is periodic with period 2a, and by properties of De Bruijn sequence this vector cannot be periodic with period 2k, where k < a. Thus, according to the theorem 5 and its contraposition, we obtain the following result.

Theorem 6

The hard symmetric Boolean functions with n = a + 2a − 2 variables have degree belonging to integer interval \(\left\{ {2^{a-1}}, \ldots, {2^{a}-1} \right\}\) when a > 2.

6 Conclusion

We were interested primarily in the multivalued symmetric functions and their representations by QROMDD and ROMDD. We initially set out an efficient way to represent these functions by a MDD, then we gave a formula for the exact value of the complexity of the hard symmetric functions. For q = 2 (and any m) we could generalise the results concerning the simplified value vectors of these hard functions. We highlighted the links between De Bruijn sequences and the simplified value vectors of the hard symmetric functions with n = a + m a − 2 variables. We thus could count these hard functions in this particular case. For some other singular cases we could only conjecture the number of functions. The generalisation of our results to higher values of q would be interesting.

These hard symmetric functions can be a good compromise for a use in cryptography; being symmetric they have a low number of nodes but being hard they also appear among the most robust particularly against BDD based cryptanalysis. We have also shown that in the binary case their algebraic degree takes interesting values. For odd values of n, except 17 and 19, a significant number of balanced hard symmetric functions exists. The further characterisation of more detailed cryptographic properties of these functions will be of great interest.