Abstract
A multivalued function is a function from a set \(E_{q}^{n}\) to a set E m , where E k is a set which contains k elements. These functions are used in cryptography: cipher design, hash function design and in theoretical computer science. In this paper, we study the representation of these functions with Multivalued Decision Diagrams (MDD). This representation can be used both to measure complexity and to implement efficiently the functions in hardware. We are especially interested in symmetric functions. We show that symmetric functions MDDs have much lower size than classical functions MDDs. One major result is to determine exactly their MDD’s maximum size. Notably, we highlight the links between De Bruijn sequences and the most complex symmetric functions and new functions are exhibited in the case q = 2 and any m. Enumeration of these functions are supplied, they are shown to be sufficiently numerous to allow many applications.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Today cryptography is spreading everywhere in a lot of devices and especially small, mobile, low energy and low cost pieces of equipment such as Bluetooth earpieces, RFID tags, sensors. To design and implement cipher algorithms on these devices, there is an eager need of small footprint Boolean or finite functions achieving a good trade-off in term of complexity and cryptographic properties [7]. Other discrete algorithms may also take advantage of these finite functions like hash tables computation for storing and sorting data.
Some works already deal with complexity issues of the partially symmetric Boolean functions [8] and the symmetric multivalued functions [6, 11, 12]. They show that decision diagrams [4, 15] are well suited to benefit from symmetries and we enhance this further on with excellent results for symmetric multivalued functions. Decision Diagrams (either binary or multivalued) are able both to provide a measure of the complexity of these functions and to achieve an efficient implementation. This is why our analysis makes huge use of this tool.
On general functions the size of the MDD is highly dependent on the order of the variables as can be shown with the direct storage access Boolean function of k + 2k variables, whose BDD size varies from 2k + 1 + 1 [5] to \(2^{2^{k+1}}\) [10]. Due to the symmetry, the MDD of symmetric functions have the same size whatever is the variables’ order. So, their study gives directly the best size of any MDD representation. Another asset of symmetric functions is that their MDD have bounds [6, 12] of small order, but no result expresses exactly the maximum size. No result provides functions achieving this maximum either. We investigate these points in order to establish the exact maximum value of these MDD, with different levels of reductions [4].
The balanced symmetric functions are already in use in some existing cryptographic algorithms like for instance θ function of SHA3 third round finalist Keccak [2]. Their use is also proposed in the tweaked version of SFINKS [3] and in the generic study on symmetric Boolean functions [7]. This shows that when properly combined with other functions they allow good results. The trade-off in term of complexity of the symmetric functions can help to withstand BDD based attacks [9, 14], especially with symmetric functions that maximise the MDD sizes.
In this paper, we study the structure of symmetric functions MDD and prove their maximum size. We exhibit the functions reaching this maximum value (the “hard” symmetric functions) and give an external characterization of such functions linked to De Bruijn sequences. We then derive some properties and counting results on these “hard” symmetric functions. Annexes provide tables of experimental results.
2 Definitions and notations
Let us consider \({E_{k}=\lbrace 0, 1, \ldots , k-1 \rbrace}\) a set of k elements, and n, q, m which are positive integers. Let \( C_{n}^{k} \) be the choose k among n binomial coefficient. Card(E) stands for the cardinal of the set E.
2.1 Multivalued functions
Definition 1
Given any positive integers n, q, m, we call multivalued function any function from \(E_{q}^{n}\) to E m . The set of multivalued functions is denoted by M n (q, m).
This set M n (q, m) contains \( m^{q^{n}}\) functions. A function f ∈ M n (q, m) is characterized by a vector \({f_{v} \in E_{m}^{q^{n}}}\) called its value vector, consisting in the evaluations of the function at every q n possible input:
2.2 Partitions
Definition 2
[1] A partition \({\pi=\left(\pi_{1},\ldots,\pi_{k}\right)}\) of an integer N bounded by an integer b is a sequence of numbers 0 ≤ π 1 ≤ ... ≤ π k ≤ b such that \({N=\sum_{i=1}^{k} \pi_{i}}\).
A partition can also be represented by its “number of repetitions”:
Thus, we have \({N=\sum_{i=0}^{b} i \times r_{i}}\) and \({k=\sum_{i=0}^{b} r_{i}}\). The two representations are equivalent.
We denote by Part(b, k, N) the set of all partitions of all integers lower or equal to N. For all \({x \in E_{q}^{n}}\), there is a single partition π(x) ∈ Part(q − 1, n,n(q − 1)) which represents x.
Example 1
n = 5, q = 3.
Let \({x=(2,1,1,0,1) \in E_{q}^{n}}\), then \({\pi(x)=(0,1,1,1,2)=\langle 1,3,1 \rangle}\).
Remark that two vectors which have the same partition, have the same components up to a permutation.
The Lemma of Andrew [1] gives the number of elements of the set Part(b, a, ab).
Lemma 1
\({\rm{Card}}({\rm{Part}}(b,a,ab))=C_{a+b}^{a}=C^{b}_{a+b}\) .
2.3 Symmetric multivalued functions
Definition 3
A multivalued function \(f:E_{q}^{n}\longrightarrow E_{m}\), is symmetric, if f is invariant under any permutation of its input’s variables:
The set of these symmetric multivalued functions is denoted by SM n (q, m).
Definition 4
We call symmetry class of \({x \in E_{q}^{n}}\), the set P n,q (x) of vectors obtained by permuting the coordinates of x defined by:
According to the lemma of Andrew, we deduce that there are \(C_{n+q-1}^{q-1}\) symmetry classes in \(E_{q}^{n}\). We designate as representative of a class of symmetry, the smallest element in the lexicographical order, \(s_j = \left(0^{r0}, 1^{r1}, \ldots i^{ri}, \ldots (q-1)^{r(q-1)}\right)\) . We call jth symmetry class of \(E_{q}^{n}\) the class of symmetry whose representative s j is classified jth among all representatives according to the lexicographical order.
A symmetric multivalued function can be represented by a vector with values in E m , whose length equals \(C_{n+q-1}^{q-1}\). The components of the vector are the evaluations of the function for each representative of the symmetry classes. We call this vector a simplified value vector of the function:
2.4 Multivalued decision diagrams
A Multivalued Decision Diagram (MDD) [15] is a generalization of a Binary Decision Diagram (BDD) [4]. In the same manner as the BDD represents and implements the Boolean functions, the MDD also represents and implements the multivalued functions.
Definition 5
A multivalued decision diagram (MDD) is a rooted directed acyclic graph G = (U, E) with two types of nodes:
-
the non-terminal nodes u which are labeled with a variable x i and have q outgoing edges e b labeled with the q possible values b in E q i.e. q children.
-
the terminal nodes u which are labeled with a value c in E m and have no outgoing edge.
A MDD is ordered if the variables labeling nodes in any path from the root to any terminal node are in the same order. Bryant [4] has defined a procedure reduce which reduces a BDD in a single and optimal way. This procedure applies two rules: the fusion rule and the suppression rule. Bryant’s procedures can easily be generalized to deal with the MDD.
Definition 6
The fusion rule says that two nodes are merged if their subgraphs are isomorphic (Fig. 1). The suppression rule says that a node is deleted if it has only one child node (Fig. 2).
Definition 7
A Reduced Ordered Multivalued Decision Diagram (ROMDD) is a MDD reduced by the full reduction procedure (i.e. fusion and suppression rules) (Figs. 1 and 2).
Definition 8
A Quasi Reduced Ordered Multivalued Decision Diagram (QROMDD) is a MDD reduced by using only the fusion rule.
2.5 Complexity
The number of nodes of a MDD (both terminal and non-terminal nodes) is called the size of the MDD. We call height of a node its distance to the top. The set of nodes having a same height \({k \in \lbrace 0,1,\ldots,n \rbrace}\) is called level k of the MDD.
Definition 9
Let f be in M n (q, m), we define:
-
its complexity, noted c Q (f), the size of its QROMDD.
-
its reduced complexity, noted c R (f), the size of its ROMDD.
Definition 10
We define SCQ n (q, m) and SCR n (q, m) as the largest complexities of the functions in SM n (q, m): \(SCQ_{n}(q,m):=\max \left \lbrace c_Q(f), f \in SM_{n}(q,m) \right \rbrace \) and \(SCR_{n}(q,m):=\max \left \lbrace c_R(f), f \in SM_{n}(q,m) \right \rbrace \).
3 Symmetric functions representation by simplified MDD
We put forward an optimized representation of symmetric functions by MDD called a simplified MDD which is a partially reduced MDD using the fusion rule. The idea is to associate a symmetry class to each node of the MDD. So it is linked to the simplified value vector of the function.
The simplified MDD of a symmetric function f in SM n (q, m) is defined as follows. Let u k,j be the jth node from the left of level \({k \in \lbrace 0,\ldots,n \rbrace}\) and \({j \in \lbrace 1,\ldots, {C^{q-1}_{k+q-1}} \rbrace}\), then u k,j represents the jth symmetry class of the set \(E_{q}^{k}\).
-
If k = n then u k,j is terminal. Its value is equal to f(s j ) where \({s_j \in E_{q}^{n}}\) is the representative of the jth class of symmetry of the set \(E_{q}^{n}\). So these terminal nodes show the simplified value vector of f.
-
Else, u k,j is not terminal and it has q distinct children. Each child represents a distinct symmetry class of the set \(E_{q}^{k+1}\).
A simplified MDD has a height equal to n + 1 and has \(C^{q-1}_{k+q-1}\) nodes on each of its levels k.
Example 2
Let f ∈ SM 3(3, 4). The nodes are labeled by the variable number and have 3 children with edges tagged by values in E 3. The terminal nodes are labeled by values in E 4 (Fig. 3).
The hockey sticks property on Pascal’s triangle enables us to infer the number of nodes of a simplified MDD, it says that: \(\sum_{i=k}^{n} C_{i}^{k}=C^{k+1}_{n+1} \).
Thus, the simplified MDD has \(C_{q+n}^{q}\) nodes, which enables us to deduce an upper bound for the complexity of any symmetric multivalued function.
Lemma 2
[6] Let f be in SM n (q, m), then \(c_Q(f)\leq C_{n+q}^{q}\) .
We notice that the representation of a symmetric function by a simplified MDD is significantly smaller than by a generic MDD since for large n, we have:
-
the number of nodes of a simplified MDD is:
$$ C_{q+n}^{q} \approx \frac{2^{q+n}}{e^{\frac{(n+q-2q)^{2}}{2(n+q)}} \sqrt{\frac{\pi(n+q)}{2}}} \enspace. $$(5) -
the number of nodes of a generic MDD is:
$$ \frac{q^{n+1}-q}{q-1} \approx q^{n} \enspace. $$(6)
4 Symmetric functions of maximum complexity
Definition 11
A symmetric multivalued function f in SM n (q, m) is called hard symmetric if its complexity attains SCQ n (q, m), i.e. if: c Q (f) = SCQ n (q, m).
We denote by HSM n (q, m) the set of hard symmetric multivalued functions:
Definition 12
A symmetric multivalued function f in SM n (q, m) is called super hard symmetric if its complexity attains SCR n (q, m), i.e. if: c R (f) = SCR n (q, m). We denote by SHSM n (q, m) the set of super hard symmetric multivalued functions:
In the multivalued case, we notice that:
-
\(C_{k+q-1}^{q-1}\) is the number of nodes of a simplified MDD on the level k,
-
\(m^{C_{n-k+q-1}^{q-1}}\) is the number of symmetric functions with n − k variables.
To compute the complexity for all integer \({k \in \lbrace 0,1,\ldots,n \rbrace}\), we define SR n,q,m (k) by:
Definition 13
We call symmetric inflection level, h(n, q, m), the integer h such that h(0, q, m) = 0, h(1, q, m) = 1 and when n ≥ 2 then h is the unique integer verifying:
i.e.
Theorem 1
Proof
We compute the maximum number of nodes that can appear in the QROMDD of a symmetric function. We start from its simplified MDD. By applying the fusion rule to a MDD, we know that there is fusion of nodes if and only if sub-graphs are isomorphic. The remaining nodes counts are then summed. Then apply the hockey sticks formula to the h(n, q, m) − 1 first terms of the sum. The terms \(m^{C_{n-k+q-1}^{q-1}}\) after inflection point are then summed up also.□
5 Simplified value vector in the case q = 2 and any m
5.1 General results for any n
For the particular case q = 2 , the simplified value vector can be read directly on the last level of the simplified MDD for any m.
Theorem 2
Given an integer a when n takes all the values between a + m a − 2 and a + m a + 1 − 1, i.e. n = a + m a + b − 2, for all \({b \in \lbrace 0, \ldots, (m-1)m^{a}\rbrace}\) , then the symmetric inflection level h(n,2,m) has the following properties:
-
(i)
h(n, 2, m) = m a + b − 1 = n − a + 1,
-
(ii)
SR n,2,m (h(n, 2, m)) is equal to m a ,
-
(iii)
SR n,2,m (h(n, 2, m) − 1) is equal to h(n, 2, m).
Proof
In the case q = 2,
For (i), it is sufficient to check that the value n − a + 1 satisfies the conditions (10) on h(n, 2, m). For (ii) and (iii), the pair of inflection is calculated with the new value of h(n, 2, m) in (i): SR n,2,m (h(n, 2, m)) = SR n, 2, m (n − a + 1), SR n,2,m (h(n,2,m) − 1) = SR n,2,m (n − a).□
Example 3
Let f ∈ SM 10(2, 2). In this case, n = 10, a = 3 (Fig. 4).
Let G f be the simplified MDD associated to the symmetric function f ∈ SM n (q, m). We call sub-graph sG f of G f any sub-graph of height equals to n + 1 − h(n, 2, m) = a and whose root is of level h(n, 2, m) in G f . We call terminal vector into a sub-graph sG f , the values of the terminal nodes read from left to right. The terminal vector of a subgraph has a length equal to a.
By definition, a simplified MDD has \({C_{h(n,2,m)+1}^{1}=h(n,2,m)+1}\) sub-graphs sG f in the case q = 2.
Lemma 3
A function is hard if and only if the number of its sub-graphs sG f being not isomorphic is maximum.
Proof
Let us consider the simplified MDD of a hard symmetric function, then by applying the fusion rule, there will be fusion of nodes for the levels k with k bigger or equal to the symmetric inflection point. However we know that there is fusion of nodes if and only if the nodes are the roots of isomorphic sub-graphs.□
Corollary 1
Let n = a + m a + b − 2 where a ≥ 0, \({b \in \lbrace 0,\ldots,(m-1)m^{a} \rbrace}\) . If a function f in SM n (2, m) is hard then in its simplified value vector, it appears m a consecutive letter patterns of length a .
Proof
According to the Theorem 2, we know that a hard symmetric function has exactly m a nodes at its symmetric inflection level h(n, 2, m). Thus the simplified MDD of a hard function must have m a non-isomorphic sub-graphs sG f . The sub-graphs sG f have exactly the same structure, so we deduce from it that two sub-graphs are isomorphic if and only if their terminal vectors are identical. Terminal vectors length of these sub-graphs is equal to a, hence the result.□
5.2 De Bruijn sequences and terminal vectors
According to the previous theorem, for n = m a + a − 2, h(n, 2, m) + 1 = m a, i.e. the simplified MDD of a hard symmetric function has the same number of nodes at the level h(n,2,m) as its QROMDD. So we deduce the following theorem.
Theorem 3
Let n and a be two positive integers such that n = a + m a − 2 and let f ∈ SM n (2, m). Then f is hard if and only if in its simplified value vector, it appears exactly m a distinct subsequences with length equal to a .
This property is typical of De Bruijn sequences [13]. Let A be an alphabet of m letters, then a De Bruijn sequence B(m, a) is a cyclic sequence such that each subsequence with length equal to a appears exactly once. Each sequence B(m, a) has a length equal to m a. De Bruijn sequences can be constructed using a De Bruijn graph or by using finite fields [13]. There are \(\frac{m!^{m^{a-1}}}{m^{a}}\) distinct sequences B(m, a) [13].
A simplified MDD of a symmetric function with n = m a + a − 2 variables contains m a + a − 1 terminal nodes. The following theorem settles the link between the De Bruijn sequences and the simplified value vectors of hard symmetric functions.
Theorem 4
Let n ≥ 1 and a ≥ 0 be integers such that n = a + m a − 2. Then the simplified value vector of f ∈ HSM n (2, m) is a rotation of a De Bruijn sequence B(m,a) at the end of which one the (a − 1) first letters of the sequence are concatenated.
Example 4
Simplified value vector read in the simplified MDD of a function f ∈ HSM 9(2, 3), i.e. a = 2 (Fig. 5).
In this example, we can read 32 = 9 subsequences whose lengths are equal to 2: 00, 01, 10, 02, 21, 11, 12, 22, 20. They never appear more than once.
Corollary 2
Let n ≥ 1 and a ≥ 0 be integers such that n = a + m a − 2, the number of hard symmetric functions of parameters (n, 2, m) is equal to \({m!^{m^{a-1}}}\) .
Proof
It is sufficient to note that the number of hard symmetric functions is equal to the total number of sequences obtained by all possible rotations of the De Bruijn sequences B(m, a), i.e. the number of all the sequences multiplied by their size. □
Conjecture 1
Let a ≥ 0 and n such that n = a + 2a − 4 or n = a + 2a − 5. Then the number of super hard symmetric Boolean functions of parameters (n, 2, 2) is equal to \({2^{2^{a-1}}} - {2^{2^{a-1}-a+1}}\).
To enforce the first results obtained from computer search up to the value of a = 5 (see Table 3 Appendix B) we can notice that the simplified value vectors of these functions look like truncated De Bruijn sequences among which some are discarded. This point is emphasised by the observation that they are all hard functions. Other questions are; why n = a + 2a − 4 and n = a + 2a − 5 produce the same number of super hard functions, or can we easily link those two sets?
5.3 Algebraic degree of hard symmetric boolean functions
The following theorem links the periodicity of the simplified value vector of a symmetric Boolean function and its algebraic degree [7].
Theorem 5
[7] Let f be in SM n (2, 2), then the simplified value vector of f , \({v_{s}=\left( v_{s}(0),\ldots,v_{s}(n) \right)}\) is periodic with period 2t , 2t < n, if and only if \({\deg(f) \leq 2^{t} - 1}\) .
For n = a + 2a − 2, the simplified value vector v s of a hard symmetric Boolean function is periodic with period 2a, and by properties of De Bruijn sequence this vector cannot be periodic with period 2k, where k < a. Thus, according to the theorem 5 and its contraposition, we obtain the following result.
Theorem 6
The hard symmetric Boolean functions with n = a + 2a − 2 variables have degree belonging to integer interval \(\left\{ {2^{a-1}}, \ldots, {2^{a}-1} \right\}\) when a > 2.
6 Conclusion
We were interested primarily in the multivalued symmetric functions and their representations by QROMDD and ROMDD. We initially set out an efficient way to represent these functions by a MDD, then we gave a formula for the exact value of the complexity of the hard symmetric functions. For q = 2 (and any m) we could generalise the results concerning the simplified value vectors of these hard functions. We highlighted the links between De Bruijn sequences and the simplified value vectors of the hard symmetric functions with n = a + m a − 2 variables. We thus could count these hard functions in this particular case. For some other singular cases we could only conjecture the number of functions. The generalisation of our results to higher values of q would be interesting.
These hard symmetric functions can be a good compromise for a use in cryptography; being symmetric they have a low number of nodes but being hard they also appear among the most robust particularly against BDD based cryptanalysis. We have also shown that in the binary case their algebraic degree takes interesting values. For odd values of n, except 17 and 19, a significant number of balanced hard symmetric functions exists. The further characterisation of more detailed cryptographic properties of these functions will be of great interest.
References
Andrews, G.E.: The theory of partitions. In: Encyclopedia of Mathematics and Its Applications, vol. 2. Addison-Wesley, Reading (1976)
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Keccak Sponge Function Family Main Document. Submission to NIST (Round 2) (2009)
Braeken, A., Lano, J., Mentens, N., Preneel, B., Verbauwhede, I.: SFINKS: a synchronous stream cipher for restricted hardware environments. In: Proceedings of SKEW—Symmetric Key Encryption Workshop, Network of Excellence in Cryptology ECRYPT. Aarhus, Denmark (2005)
Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. C35 8, 677–691 (1986)
Bollig, B., Range, N., Wegener, I.: Exact OBDD Bounds for some Fundamental Functions. Electronic Colloquium on Computational Complexity, Report N° 49 (2007)
Butler, J.T., Herscovici, D.S., Sasao, T., Barton, R.J.: Average and worst case number of nodes in decision diagrams of symmetric multiple-valued functions. IEEE Trans. Comput. 46(4), 491–494 (1997)
Canteaut, A., Videau, M.: Symmetric Boolean functions. IEEE Trans. Inf. Theory 51(8), 2791–2811 (2005)
Heinrich-Litan, L., Molitor, P.: Least upper bounds for the size of OBDDs using symmetry properties. IEEE Trans. Comput. 49(4), 271–281 (2000)
Krause, M.: BDD-based cryptanalysis of keystream generators. In: Advances in Cryptology—EUROCRYPT 2002, LNCS 2332, pp. 222–237 (2002)
Michon, J.F., Valarcher, P., Yunes, J.B.: On Maximal QROBDD’s of Boolean functions. Theor. Inf. Appl. 9, 677–686 (2005)
Mouffron, M.: Transitive q-ary functions over finite fields or finite sets: counts, properties and applications. In: International Workshop on the Arithmetic of Finite Fields 2008, WAIFI 08, Siena, Italy, LNCS 5130, pp. 19–35 (2008)
Mouffron, M.: Balanced alternating and symmetric functions over finite sets. In: Workshop on Boolean Functions Cryptography and Applications (BFCA08), Copenhagen, Denmark, pp. 27–44 (2008)
Rosenfeld, V.R.: Enumerating De Bruijn sequences. MATCH Commun. Math. Comput. Chem. 45, 71–83 (2002)
Stegemann, D.: Extended BDD-based cryptanalysis of keystream generators. In: Proceedings of the 14th International Conference on Selected Areas in Cryptography (SAC’07), LNCS 4876, pp. 17–35 (2007)
Wegener, I.: Branching programs and binary decision diagrams—theory and applications. In: SIAM Monograph on Discrete Mathematics and Applications. ISBN 0-89871- 458-3 (2000)
Acknowledgements
We thank Boris Batteux for his computations on functions enumeration. We also thank the anonymous referees for excellent suggestions which greatly improved the clarity of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Algebraic degree distribution
These tables give the distribution of the algebraic degree of super hard symmetric functions and hard symmetric functions.
Appendix B: Maximum complexities of symmetric functions from \(E_2^n\) to E 2
This table gives the cardinal of the sets HSM n (q, m), SHSM n (q, m) and HSM n (q, m) ∩ SHSM n (q, m) for any n up to 35. The special cases n = a + 2a − 2 are in bold.
Appendix C: Number of balanced symmetric Boolean functions hard and super hard
This table gives the number of balanced hard and super hard symmetric Boolean functions for any odd n up to 53. There is no balanced hard nor super hard symmetric Boolean functions when n is even except 2 for n = 2.
Appendix D: Boolean symmetric functions’ complexity c R (f) distribution n = 1 to 33
Rights and permissions
About this article
Cite this article
Rovetta, C., Mouffron, M. De Bruijn sequences and complexity of symmetric functions. Cryptogr. Commun. 3, 207–225 (2011). https://doi.org/10.1007/s12095-011-0054-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12095-011-0054-2