Keywords

1 Introduction

At EUROCRYPT 1993, Matsui [31] introduced linear cryptanalysis as a new known-plaintext attack on the block cipher DES. Linear cryptanalysis is based on probabilistic linear relations or linear approximations, a concept introduced by Tardy-Corfdir and Gilbert [36].

The success of Matsui’s attack led to the development of a myriad of extensions and variants of linear approximations, and to more advanced techniques for their analysis [16, 32]. Despite significant advances, many questions related to linear cryptanalysis and its theoretical foundations remain unresolved.

Kaliski and Robshaw [25] suggested using multiple linear approximations. Hermelin, Cho and Nyberg [23] proposed the related multidimensional linear attack. Both extensions are widely used. Generalizations of linear cryptanalysis to groups other than \(\mathbb {F}_2^n\) were proposed by Granboulan, Levieil and Piret [20] and Baignères, Stern and Vaudenay [3]. The use of nonlinear approximations is another natural extension, and has been attempted by Knudsen and Robshaw [26], Harpes, Kramer and Massey [21] with I/O sums, Harpes and Massey [22] with partitioning attacks and recently by Beierle, Canteaut and Leander [4].

All of the above techniques rely on heuristic methods to glue together several approximations over multiple rounds of a cipher. These methods will be collectively referred to as the piling-up principle. This principle has traditionally been justified using independence or Markov chain assumptions [2, 31, 42], which can be related to earlier work on Markov ciphers in the context of differential cryptanalysis [28]. However, such assumptions are hard to reconcile with the key-dependence of approximations and the increased importance of cryptographic permutations. In fact, key-dependence is one of the fundamental difficulties of nonlinear cryptanalysis. Alternatively, the correlation matrix framework of Daemen et al. [16] is more suitable for the fixed-key setting. It motivates the piling-up principle using the dominant trail hypothesis. Beierle et al. [4] extend this approach by applying linear cryptanalysis to a nonlinearly transformed variant of the cipher.

In a different direction, Rijmen and Bogdanov [13] introduced zero-correlation linear cryptanalysis to exploit unbiased linear approximations. The construction of zero-correlation approximations relies on the miss-in-the-middle technique as opposed to the piling-up principle. At ASIACRYPT 2012, Bogdanov et al. [12] established a link between multidimensional zero-correlation approximations and integral distinguishers [27].

Finally, several lightweight block ciphers have been found vulnerable to weak-key attacks based on invariant subspaces [30] and nonlinear invariants [39]. These attacks have led to renewed interest in linear cryptanalysis and its generalizations. Abdelraheem et al. [1] found links between invariant subspaces and linear cryptanalysis. Moreover, nonlinear invariants provide one of the most compelling examples of nonlinearity in cryptanalysis, with applications including the analysis of SCREAM, iSCREAM, Midori-64 and MANTIS [5, 39]. At ASIACRYPT 2018, it was shown that invariant subspaces and nonlinear invariants can be described as eigenvectors of correlation matrices [5]. Furthermore, one of the invariants discovered in [5] corresponds to a perfect linear approximation. These results established a strong link between nonlinear invariants and linear cryptanalysis, but a true statistical generalization of the nonlinear invariant attack was left open. Lastly, Beierle et al. [4] extended the links discovered by Abdelraheem et al. to some classes of nonlinear invariants.

Contribution. A conceptually new way of thinking about linear cryptanalysis is introduced. It provides an alternative viewpoint for the foundations of linear cryptanalysis and has a number of concrete benefits. Firstly, it results in a systematized and unified description of the above-mentioned variants of linear cryptanalysis. Secondly, it leads to generalizations of the connections between these attacks, such as the link between integral and zero-correlation cryptanalysis and the links between invariants and linear approximations. Some of these results are illustrated in Table 1, and are discussed in more detail below. Thirdly, it suggests a general form of the piling-up principle. Finally, to illustrate the relevance for the working cryptanalyst, the approach is used to solve a problem posed by Beierle et al. [4].

Section 3 introduces a correspondence between cryptanalytic properties and vector spaces of complex-valued functions on the domain of a primitive. This results in a uniform description of the properties (sets, linear and nonlinear Boolean functions, ...) that are used in different variants of linear cryptanalysis. The correspondence generalizes the idea introduced in [5] that invariant subspaces and nonlinear invariants can be represented by complex vectors, which led to their characterization as eigenvectors of correlation matrices.

Table 1. Approximations for a function \(\mathsf {F}\) from the geometric viewpoint. Here, U and V are vector spaces (of dimension d) of functions. The notation follows Sects. 3 to 5.

Definition 4.1 characterizes an approximation of a cipher as a pair of vector spaces (UV), corresponding to input and output properties as sketched above. This results in a systematization of many variants of linear cryptanalysis, as summarized in Table 1. It will be shown that the type and quality of approximations is related to the geometric properties of the spaces U and V. Section 4.1 illustrates how this results in new insight into block cipher invariants and gives a realistic example of invariants related to non-real eigenvalues of correlation matrices, a problem that was left open at ASIACRYPT 2018 [5]. Theorem 4.2 generalizes the links between zero-correlation and integral attacks discovered by Bogdanov et al. [12]. For general approximations, principal correlations are introduced as a natural extension of the correlation of a linear approximation and it is shown how they relate to the complexity of optimal distinguishers discussed by Baignères, Junod and Vaudenay [2].

A general piling-up principle is given in Theorem 5.1. Its motivation is the result of geometric intuition. This avoids independence and Markov chain assumptions and simplifies working with fixed keys. Furthermore, the result evades the issues that are encountered when the dominant-trail approach of Daemen et al. is extended to the nonlinear case. Theorem 5.1 allows for much greater flexibility than previous formulations of the piling-up principle. In particular, it becomes possible to build trails that combine diverse cryptanalytic properties. This is illustrated in Sect. 5.3 by strengthening the links between linear approximations and invariants, extending previous work by Abdelraheem et al. [1] and Beierle et al. [4].

Finally, Sect. 6 introduces rank-one approximations to analyze cell-oriented ciphers. A tool to find optimal rank-one trails is introduced, and its application to searching for invariants is discussed. Perhaps surprisingly, the tool is based on numerical optimization on a Riemannian manifold. This is enabled by the generality of Sects. 3 to 5, which relaxes the search space by introducing many new types of approximations. The tool is provided as supplementary material. Rank-one approximations and the aforementioned tool are used in Sect. 7.3 to resolve a problem introduced by Beierle et al. [4], who describe it as “a major open problem”. It is representative of other concrete problems, and its solution relies on the general techniques that are introduced in Sects. 3 to 5.

2 Functions on Abelian Groups

The goal of this section is to introduce several concepts that will be used to develop a general theory of linear cryptanalysis in Sects. 3 to 5. These concepts provide the setting for the proposed geometric approach. It is assumed that the reader is familiar with finite Abelian groups and linear algebra in finite-dimensional inner product spaces.

It will be shown in Sect. 3 that many cryptanalytic properties can be described by complex-valued functions on the domain of a primitive. Section 2.1 discusses preliminaries related to the set of such functions. Section 2.2 introduces the Fourier transformation on finite Abelian groups. This will be an important tool to simplify the effect of constant (including key) additions. Finally, Sect. 2.3 discusses the geometry of subspaces of an inner product space.

2.1 Inner Product Space of Functions

Let G be a finite Abelian group, for example the domain of a block cipher. In fact, all of the properties in this section are valid for any set G. However, the results in Sect. 2.2 will require the assumption that G is a finite Abelian group. The \(\mathbb {C}\)-vector space of all functions from G to \(\mathbb {C}\), with the usual pointwise addition and scalar multiplication, will be denoted by \(\mathbb {C}G\). The standard inner product between two functions \(f, g \in \mathbb {C}G\) is defined by \(\langle f, g \rangle = \sum _{x \in G} \overline{f(x)} g(x)\), where \(\smash {\overline{f(x)}}\) denotes the complex-conjugate of f(x). Hence, the vector space \(\mathbb {C}G\) is a finite-dimensional inner product space. One also has a norm \(\Vert f\Vert _2 = \sqrt{\langle f, f \rangle }\), which carries the geometric interpretation of length. The modulus of the inner product between two normalized vectors can be interpreted as the cosine of the smallest angle enclosed by them – although for non-real vectors, several definitions of angles are plausible. The theory developed in Sects. 4 and 5 will draw on these geometric concepts for intuition.

The functions \(\delta _x\), which are equal to one at \(x \in G\) and zero everywhere else, clearly form an orthonormal basis for \(\mathbb {C}G\). This basis will be referred to as the standard basis. It follows that \(\mathbb {C}G\) is isomorphic to \(\mathbb {C}^{|G|}\) as an inner product space.

Recall that the tensor product of \(\mathbb {C}\)-vector spaces \(V_1, \ldots , V_n\) is another \(\mathbb {C}\)-vector space \(V_1 \otimes \cdots \otimes V_n\) of dimension \(\smash {\prod _{i = 1}^n \dim V_i}\) together with a multilinear map \(\otimes : \prod _{i = 1}^n V_i \rightarrow \bigotimes _{i = 1}^n V_i\), which has the universal property that it uniquely linearizes arbitrary multilinear maps. Specifically, for any \(T : \prod _{i = 1}^n V_i \rightarrow W\) linear in each variable (multilinear), there exists a unique linear map \(L : \bigotimes _{i = 1}^n V_i \rightarrow W\) such that \(T(v_1, \ldots , v_n) = L(v_1 \otimes \cdots \otimes v_n)\).

For the purposes of this paper, readers who are not familiar with tensor products may take the following characterization as a definition. Let \(G = A \oplus B\) be a direct sum of Abelian groups A and B. That is, the group G consists of all pairs (ab) with \(a \in A\) and \(b \in B\). The tensor product of \(\mathbb {C}A\) and \(\mathbb {C}B\) can then be characterized by \(\mathbb {C}A \otimes \mathbb {C}B \cong \mathbb {C}G\). Indeed, the linear map defined by \(\delta _{(a, b)} \mapsto \delta _{a} \otimes \delta _{b}\) for all \(a \in A\) and \(b \in B\) is an isomorphism. In this paper, \(\mathbb {C}G\) and \(\mathbb {C}A \otimes \mathbb {C}B\) will always be identified through this isomorphism. Hence, for \(f \in \mathbb {C}A\) and \(g \in \mathbb {C}B\), it can be said that \(f \otimes g \in \mathbb {C}G\) with \((f \otimes g)(a, b) = f(a)\,g(b)\).

A rank-one vector \(v \in \bigotimes _{i = 1}^n V_i\) is a vector of the form \(v = v_1 \otimes \cdots \otimes v_n\). Given bases for \(V_1, \ldots , V_n\), the set of all their tensor products is a basis of rank-one vectors for \(\bigotimes _{i = 1}^n V_i\). More generally, for any vector v there exists an integer \(r \ge 0\) such that \(v = \sum _{i = 1}^r \lambda _i \,{\textstyle \bigotimes _{j = 1}^n v_{i, j}}\), for some vectors \(v_{i, j} \in V_j\) and scalars \(\lambda _i \in \mathbb {C}\). The smallest r for which such a decomposition exists is called the tensor rank of v.

2.2 Fourier Analysis

Given a function \(f \in \mathbb {C}G\) and a constant \(t \in G\), one can define a new function by \(x \mapsto f(x + t)\). The effect of translations on the coordinates of functions in the standard basis of \(\mathbb {C}G\) is inconvenient: the basis vectors are shuffled around by the permutation \(\delta _x \mapsto \delta _{x + t}\), which corresponds to multiplication by a Toeplitz matrix. It would be more convenient if the effect of translation would be a simple scaling of the coordinates, i.e. multiplication by a diagonal matrix. This can be achieved by working with respect to a different basis.

To achieve the goal of diagonalization, the new basis vectors should be eigenvectors of the set of translation operations. This is achieved for any homomorphism \(\chi : G \rightarrow \mathbb {C}^\times \) from G to the multiplicative group of complex numbers \(\mathbb {C}^\times = \mathbb {C}\setminus \{0\}\), since \(\chi (x + t) = \chi (t) \chi (x)\) for any \(x, t \in G\). This leads to the following definition.

Definition 2.1

(Group characters [37]). Let G be a finite Abelian group. A (complex) character of G is a group homomorphism \(G \rightarrow \mathbb {C}^\times \). The (Pontryagin) dual of G is the group \(\smash {\widehat{G}}\) of all characters of G with respect to the pointwise product.

It is not hard to see that \(\widehat{G}\) is indeed an Abelian group. For example, the inverse of \(\chi \in \widehat{G}\) is the character \(x \mapsto \chi (-x)\). That is, \(\chi (-x) = \overline{\chi (x)}\).

Example 2.1

The dual of the additive group \(\mathbb {F}_2\) is \(\widehat{\mathbb {F}}_2 = \{x \mapsto 1, x \mapsto (-1)^x\}\). Indeed, these are the only two group homomorphisms \(\mathbb {F}_2 \rightarrow \mathbb {C}^\times \).    \(\vartriangleright \)

The functions in the dual group \(\widehat{G}\) form a basis for \(\mathbb {C}G\) that behaves well with respect to translation. Further properties of the dual group are given in Theorem 2.1 below. In particular, property (2) shows that the basis of characters is orthogonal.

Theorem 2.1

(Properties of dual groups [37]). If G is a finite Abelian group with dual \(\widehat{G}\), then

  1. (1)

    The dual group \(\smash {\widehat{G}}\) is isomorphic to G.

  2. (2)

    For all \(\smash {\chi , \psi \in \widehat{G}}\), it holds that \(\langle \chi , \psi \rangle = |G|\,\delta _{\chi }(\psi )\).

  3. (3)

    If \(G = H_1 \oplus H_2\) with \(\oplus \) the internal direct sum, then \(\smash {\widehat{G} = \widehat{H}_1 \oplus \widehat{H}_2}\).

By Theorem 2.1 (1), \(\widehat{G}\) can be identified with G. In general, this identification is not unique. However, there is a functorial isomorphism between the double dual of G and G itself, which identifies \(g \in G\) with the evaluation map \(\chi \mapsto \chi (g)\) in the dual of \(\widehat{G}\) [37]. This result justifies the term ‘dual group’. In order to avoid arbitrary choices, isomorphisms between \(\widehat{G}\) and G will be avoided throughout this paper. This makes no difference in specific calculations, but it is theoretically more elegant.

Example 2.2

Since the additive group \(\mathbb {F}_2^n\) is the direct sum of n copies of \(\mathbb {F}_2\), it follows from Theorem 2.1 (3) that the dual group is essentially the direct sum of n copies of \(\widehat{\mathbb {F}}_2\). Specifically, \(\widehat{\mathbb {F}}_2^n = \{x \mapsto \prod _{i=1}^n (-1)^{u_ix_i} = (-1)^{u^\top x}~|~u \in \mathbb {F}_2^n\}\). Note that identifying \(\widehat{\mathbb {F}}_2^n\) and \(\mathbb {F}_2^n\) requires choosing a basis for \(\mathbb {F}_2^n\).

The Fourier transformation \(\mathcal {F}\) is essentially a change of basis from the standard basis to the character basis. However, in order to avoid identifying \(\widehat{G}\) and G, we shall define \(\mathcal {F}\) as a transformation from \(\mathbb {C}G\) to \(\mathbb {C}\widehat{G}\). With this definition, the Fourier transformation maps a character \(\chi \in \widehat{G} \subset \mathbb {C}G\) directly to a multiple of the standard basis vector \(\delta _\chi \in \mathbb {C}\widehat{G}\). Since group characters are orthogonal, Definition 2.2 achieves the desired basis transformation.

Definition 2.2

(Fourier transformation [37]). Let \(f : G \rightarrow \mathbb {C}\) be a function. The Fourier transformation of f is the function \(\widehat{f} : \widehat{G} \rightarrow \mathbb {C}\) defined by

$$\begin{aligned} \widehat{f}(\chi ) = \langle \chi , f \rangle = \sum _{x \in G} \overline{\chi (x)} f(x) . \end{aligned}$$

The Fourier transformation is the map \(\mathcal {F} : \mathbb {C}G \rightarrow \mathbb {C}\widehat{G}\) such that \(\mathcal {F}f = \widehat{f}\).

The transformation \(\mathcal {F}\) is a vector space isomorphism. In fact, since \(\mathbb {C}G\) and \(\mathbb {C}\widehat{G}\) are algebras with either the pointwise product or convolution, \(\mathcal {F}\) is an isomorphism of algebras which swaps the pointwise product and convolution. This is by construction, since the set of convolution operators is generated by translations.

The vector space \(\mathbb {C}\widehat{G}\) is also an inner product space. In fact, due to the orthogonality of characters, the inner product between \(f_1, f_2 \in \mathbb {C}G\) coincides with the inner product of their Fourier transforms up to a constant factor: \(\langle \widehat{f}_1, \widehat{f}_2\rangle = |G|\,\langle f_1, f_2 \rangle .\) In other words, \(\mathcal {F} / \sqrt{|G|}\) is a unitary map and \(\mathcal {F}^{-1} = \mathcal {F}^* / |G|\) with \(\mathcal {F}^*\) the adjoint (conjugate transpose) of \(\mathcal {F}\).

To end this section, consider the case \(G = A \oplus B\). As mentioned above, one has \(\mathbb {C}G = \mathbb {C}A \otimes \mathbb {C}B\) (technically up to isomorphism). By Theorem 2.1 (3), the dual group satisfies \(\widehat{G} = \widehat{A} \oplus \widehat{B}\). Hence, one also has \(\mathbb {C}\widehat{G} = \mathbb {C}\widehat{A} \otimes \mathbb {C}\widehat{B}\). Consequently, the Fourier transformation on \(\mathbb {C}G\) is given by \(\mathcal {F}_{A} \otimes \mathcal {F}_{B}\). Equivalently, the matrix representation of \(\mathcal {F}\) in the standard basis is the Kronecker product of the matrix representations of \(\mathcal {F}_{A}\) and \(\mathcal {F}_{B}\) in the standard basis.

2.3 Subspaces of \(\mathbb {C}G\) and \(\mathbb {C}\widehat{G}\)

Sections 3 and 4 will demonstrate that subspaces of \(\mathbb {C}G\) and \(\mathbb {C}\widehat{G}\) are often more interesting for cryptanalysis than individual functions. For this reason, it will be convenient to extend the inner product notation \(\langle \cdot , \cdot \rangle \) to subspaces of \(\mathbb {C}G\). For subspaces \(U \subseteq \mathbb {C}G\) and \(V \subseteq \mathbb {C}G\), define the linear map \(\langle V, U \rangle _{} : U \rightarrow V\) by \(\langle V, U \rangle _{} = \pi _V \,\iota _U\), where \(\iota _U : U \rightarrow \mathbb {C}G\) is the inclusion map and \(\pi _V : \mathbb {C}G \rightarrow V\) is the orthogonal projection on V. A similar definition can be given for subspaces of \(\mathbb {C}\widehat{G}\). Note that \(\langle V, U \rangle _{} = \langle U, V \rangle _{}^*\) since projection and inclusion are adjoint.

Example 2.3

Let U and V be one-dimensional subspaces of \(\mathbb {C}G\) spanned by unit-norm vectors u and v respectively. By definition, \(\iota _U(\lambda u) = \lambda u\) and \(\pi _V(x) = v \langle v, x \rangle \). Consequently, \(\langle V, U \rangle _{} : U \rightarrow V\) is the map \(\lambda u \mapsto \langle v, u \rangle \lambda v\). The matrix representation of this map is thus simply the \(1 \times 1\) matrix containing the inner product \(\langle v, u \rangle \).

The transformation \(\langle V, U \rangle _{}\) comes with a geometric interpretation, which will be important in Sects. 4 and 5. Due to standard properties of orthogonal projection, \(\langle V, U \rangle _{}\) maps any \(u \in U\) to the nearest vector \(v \in V\). In addition, no other vector in V of the same length makes a smaller angle to u than v. This suggests that \(\langle V, U \rangle _{}\) encodes all information about the ‘angles’ between U and V. This can be made precise using the notion of principal angles between subspaces, which is due to Jordan [24]. The characterization below follows Björck and Golub [10].

Definition 2.3

(Principal angles). Let U and V be subspaces of an inner product space over \(\mathbb {C}\) of finite dimension and let \(d = \min \{\dim U, \dim V\}\). The principal angles \(0 \le \theta _1 \le \ldots \le \theta _d \le \pi / 2\) between U and V are recursively defined by (for \(i = 1, 2, \ldots , d\))

$$\begin{aligned} \cos \theta _i = \frac{\langle u_i, v_i \rangle }{\Vert u_i\Vert _2\Vert v_i\Vert _2} = \max _{\begin{array}{c} u \in U_i \setminus \{0\}\\ v \in V_i \setminus \{0\} \end{array}}\,\frac{|\langle u, v \rangle |}{\Vert u\Vert _2\Vert v\Vert _2}\,, \end{aligned}$$

where \(u_i \in U_i\) and \(v_i \in V_i\) are nonzero vectors for which the maximum on the right is achieved with \(\langle u_i, v_i \rangle \) a non-negative real number, \(U_i = U \cap \{u_1, \ldots , u_{i - 1}\}^\perp \) and \(V_i = V \cap \{v_1, \ldots , v_{i - 1}\}^\perp \).

The cosines of the principal angles are precisely the singular values of \(\langle V, U \rangle _{}\), and the singular vectors are the directions along which these angles are to be measured. This follows directly from the variational characterization of singular values. Further details may be found in [10].

3 Cryptanalytic Properties

Many cryptanalytic techniques rely only on partial information about the inputs and outputs of a primitive, such as membership of a set or the value taken by a Boolean function. Below, the structure of the inputs (or outputs) will be informally referred to as cryptanalytic input (or output) properties.

One of the obstacles to a more general approach to linear cryptanalysis and its variants, is the fact that different cryptanalytic properties are often described by disparate mathematical objects (such as sets, linear or nonlinear functions, ...). In a few cases, overcoming this difficulty has resulted in new or generalized results. Examples include the projection function approach of Wagner [42] and Baignères et al. [2], which enables unifying the data-complexity analysis of several attacks, and the observation that both invariant subspaces and nonlinear invariants correspond to eigenvectors of correlation matrices [5].

Section 3.1 introduces a general correspondence between cryptanalytic properties and subspaces of the inner product space \(\mathbb {C}G\). It works for all properties relevant to linear cryptanalysis and its variants, and in particular generalizes both examples just mentioned above. Section 3.2 describes how properties change when a function is applied to the state. This leads to a more general perspective on correlation matrices.

3.1 Correspondence Between Properties and Subspaces

The purpose of this section is to show that the cryptanalytic properties used in linear cryptanalysis and its variants are naturally described by functions \(G \rightarrow \mathbb {C}\), i.e. functions in the inner product space \(\mathbb {C}G\) from Sect. 2.1. This will be motivated from two viewpoints, which are dual to one another. Specifically, the following two perspectives will be advanced:

  1. (i)

    Cryptanalytic properties correspond to functions in \(\mathbb {C}G\).

  2. (ii)

    Cryptanalytic properties corrsepond to linear functions \(\mathbb {C}G \rightarrow \mathbb {C}\).

From viewpoint (i), a cryptanalytic property characterizes the state of a collection of inputs or outputs. For instance, probability distributions on G can be represented by functions \(G \rightarrow [0, 1] \subset \mathbb {C}\). Similarly, any subset S of G has an indicator function \(\mathbbm {1}_S \in \mathbb {C}G\). It will be shown below that the general idea of associating not just positive numbers, but also arbitrary complex-valued weights, to the elements of G is necessary to describe other types of properties.

According to (ii), properties describe a measurement or observation of the state of a collection of inputs or outputs. Importantly, only linear functions of the state vector are considered in the present framework. The set of linear functions \(\mathbb {C}G \rightarrow \mathbb {C}\) is itself a vector space \(\mathbb {C}G^*\), i.e. the dual vector space of \(\mathbb {C}G\). However, the explicit choice of the inner product in Sect. 2.1 identifies \(\mathbb {C}G\) and \(\mathbb {C}G^*\). Indeed, \(f \in \mathbb {C}G\) corresponds to the function \(g \mapsto \langle f, g \rangle \) in \(\mathbb {C}G^*\). This correspondence will be used throughout this paper, and both (i) and (ii) will be represened by elements of \(\mathbb {C}G\). For example, for a subset S, the indicator function \(\mathbbm {1}_S\) is dual to the function \(f \mapsto \langle \mathbbm {1}_S, f \rangle = \sum _{x \in S} f(x)\).

More generally, consider a subspace V of \(\mathbb {C}G\). Any function in V can then be interpreted according to either (i) or (ii). The assumption that the property must correspond to a subspace of \(\mathbb {C}G\) implies that it is possible to make arbitrary linear combinations of these functions.

Representing properties as subspaces of \(\mathbb {C}G\) comes with a geometric interpretation. Specifically, the inner product yields the observed outcome when a pair of properties with interpretations (i) and (ii) are combined. This aspect will be discussed in detail in Sect. 4. The remainder of this section is intended as a dictionary between conventional cryptanalytic properties and their corresponding subspaces.

A short summary for \(\smash {G = \mathbb {F}_2^n}\) is given in Table 2. The table includes both the subspaces of \(\mathbb {C}G\) and their Fourier transforms, which are subspaces of \(\mathbb {C}\widehat{G}\). Importantly, there are other useful subspaces which do not correspond to any of the constructions discussed below. One example will be discussed in Sect. 6.

Table 2. Commonly used cryptanalytic properties and their corresponding subspaces. The characters of \(\mathbb {F}_2^n\) are denoted by \(\chi _u(x) = (-1)^{u^\top x}\), where \(u \in \mathbb {F}_2^n\).

Probability Distributions. Several properties correspond to subspaces spanned by one or more probability distributions. Subspaces and sets are one example, since any set corresponds to the uniform distribution on that set (equivalently, its indicator function). Affine spaces are an important example and are used in the invariant subspace attack of Leander et al. [30].

Integral and division properties [17, 38] are also examplesFootnote 1, but their analysis is not the main focus of this paper. In this case, the corresponding vector space could be spanned by the indicator function of a set which is balanced on certain bits. However, the intermediate and output properties typically correspond to higher-dimensional vector spaces because they express several possible sets in which the state could be contained. Equivalently, following (ii), one observes the marginal (but not necessarily joint) distribution of several state bits.

Not many variants of linear cryptanalysis are directly based on non-uniform probability distributions. The statistical saturation attack of Collard and Standaert [15], in its original form, may be considered an example. In this attack, one estimates the key-dependent probability distribution of the state of a block cipher when some of the plaintext bits are constant and the others are uniform random. However, depending on how the estimated distribution is used, it may be more appropriate to approach this attack using the projection functions discussed below.

Projection Functions. Let \(\mathsf {F}{} : G \rightarrow H\) be a function between finite Abelian groups G and H, with H typically much smaller than G. In fact, H need not be a group for the construction below to work, but this will be assumed for simplicity. Such functions play an important role in Wagner’s framework of ‘commutative diagram cryptanalysis’, where they are called projections [42]. Baignères et al. [2] analyze the statistical properties of distinguishers based on balanced projections, such as \(\chi ^2\)-attacks [41], partitioning cryptanalysis [22] and multidimensional linear attacks [23].

From the viewpoint of (ii), a projection property gives access to the evaluation of \(\mathsf {F}{}\) on the state. Equivalently, the property allows observing any linear combination of the functions \(\delta _h \circ \mathsf {F}{}\), where \(\{\delta _h~|~h \in H\}\) is the standard basis of \(\mathbb {C}H\). More generally, any function on H can be ‘pulled back’ to G along the projection function \(\mathsf {F}\) and the projection property corresponds to the vector space of all such functions. This leads to Definition 3.1 below.

Definition 3.1

(Pullback). Let \(\mathsf {F}{} : G \rightarrow H\) be a function. The pullback operator along \(\mathsf {F}{}\) is the linear operator \({T^{\mathsf {F}}}^* : \mathbb {C}H \rightarrow \mathbb {C}G\) defined by \(f \mapsto f \circ \mathsf {F}{}\). The pullback space of \(\mathbb {C}H\) along \(\mathsf {F}{}\) is the image of \({T^{\mathsf {F}}}^*\): \({{\,\mathrm{im}\,}}{T^{\mathsf {F}}}^* = \{f \circ \mathsf {F}{}~|~f \in \mathbb {C}H\} \subseteq \mathbb {C}G\). Similarly, the Fourier transformation \(\mathcal {F}({{\,\mathrm{im}\,}}{T^{\mathsf {F}}}^*)\) of \({{\,\mathrm{im}\,}}{T^{\mathsf {F}}}^*\) will be called the pullback of \(\mathbb {C}H\) to \(\mathbb {C}\widehat{G}\) along \(\mathsf {F}{}\).

Let V be the vector space corresponding to the projection property defined by \(\mathsf {F}{}\), i.e. the pullback of \(\mathbb {C}H\) along \(\mathsf {F}{}\). It was already mentioned above that \(\{\delta _h \circ \mathsf {F}{}~|~h \in H\}\) is a basis for V. However, it is often more convenient to use the basis of functions \(\chi \circ \mathsf {F}{}\) where \(\chi \in \widehat{H}\). This choice behaves particularly well for homomorphisms \(\mathsf {F}{} : G \rightarrow H\) when working with the Fourier transformation of V, since \(\smash {\widehat{\chi \circ \mathsf {F}{}}} = \delta _{\chi \circ \mathsf {F}}\) in that case.

The following example describes the vector space corresponding to a Boolean projection function in more detail. Such properties are closely related to classical linear cryptanalysis, and more generally the I/O-sums of Harpes et al. [21] and the nonlinear approximations considered by Beierle et al. [4]. However, as discussed below, there is subtle difference.

Example 3.1

Let \(\mathsf {F}{} : \mathbb {F}_2^n \rightarrow \mathbb {F}_2\) be a Boolean function. Denote the characters of \(\mathbb {F}_2^n\) by \(\chi _u(x) = (-1)^{u^\top x}\). The pullback space V of \(\mathbb {C}\mathbb {F}_2\) along \(\mathsf {F}{}\) is equal to \(V = \mathrm {span}\{ \delta _0 \circ \mathsf {F}, \delta _1 \circ \mathsf {F}\} = \mathrm {span}\{ \mathbbm {1}, (-1)^{\mathsf {F}} \},\) with \(\mathbbm {1}= \chi _0\) the trivial character of \(\mathbb {F}_2^n\). Hence, the Fourier transformation of V is given by \(\mathcal {F}(V) = \mathrm {span}\{ \delta _\mathbbm {1}, \mathcal {F}[(-1)^{\mathsf {F}}] \}\). The function \(\smash {\mathcal {F}[(-1)^{\mathsf {F}}]}\) is often called the Walsh-Hadamard transform of \(\mathsf {F}\). If \(\mathsf {F}\) is a linear function, then \(\mathsf {F}(x) = u^\top x\) for some \(u \in \mathbb {F}_2^n\). Hence, \(\smash {(-1)^{\mathsf {F}}} = \chi _u\) and consequently \(\mathcal {F}(V) = \mathrm {span}\{\delta _{\mathbbm {1}}, \delta _{\chi _u}\}\).

Example 3.1 suggests that ordinary linear properties correspond to a vector space \(V = \mathrm {span}\{\delta _{\mathbbm {1}}, \delta _{\chi }\}\), where \(\chi \) is a character of the additive group \(\mathbb {F}_2^n\). Table 2 instead lists the one-dimensional space \(\mathrm {span}\{\delta _\chi \} \subset V\). For the analysis of permutations, there is no significant difference since \(\delta _{\mathbbm {1}}\) corresponds to a trivial invariant for any permutation (its domain). However, for general functions, the vector space V represents a strictly stronger property.

In general, many commonly used cryptographic properties correspond to subspaces of pullback spaces. This difference is not easily expressed in the formalism of Baignères et al. [2] and Wagner [42]. The next paragraph discusses several important examples.

Subspaces of Pullbacks. Example 3.1 generalizes to other finite Abelian groups. Let \(\mathsf {F}: G \rightarrow H\) be a homomorphism. Since \(\chi \circ \mathsf {F}\in \widehat{G}\) for any character \(\chi \) of H, the pullback V of \(\mathbb {C}H\) to \(\mathbb {C}\widehat{G}\) is spanned by the functions \(\smash {\delta _{\chi \circ \mathsf {F}}}\) with \(\chi \in \widehat{H}\). Hence, \(\dim V = |H|\). However, the dimension could be reduced by one for permutations. This is essentially the generalization of linear cryptanalysis proposed by Granboulan et al. [20, §3]. However, it is also reasonable to consider only one of the functions \(\delta _{\chi \circ \mathsf {F}}\). Since this results in one-dimensional subspaces and is closer to the spirit of ordinary linear cryptanalysis. This is essentially the generalization of linear cryptanalysis proposed by Baignères et al. [3]. The approach of Baignères et al. and its multidimensional generalization were recently used in the cryptanalysis of FF3.1 [8].

The difference between multiple and multidimensional linear cryptanalysis is of the same nature. For multiple linear properties, one uses a subspace spanned by one or more standard basis vectors \(\delta _{\chi }\). In multidimensional linear cryptanalysis, the considered characters form a subgroup of \(\smash {\widehat{G}}\) and consequently the subspace is the pullback of a homomorphism to some subgroup of G.

3.2 Transformations on \(\mathbb {C}G\) and \(\mathbb {C}\widehat{G}\)

This section investigates how properties, i.e. subspaces of \(\mathbb {C}G\), change when a function \(\mathsf {F}: G \rightarrow H\) is applied to the state of the primitive under analysis.

Definition 3.2

(Transition matrix). Let \(\mathsf {F}{} : G \rightarrow H\) be a function. Define \(T^\mathsf {F}{} : \mathbb {C}G \rightarrow \mathbb {C}H\) as the unique linear operator defined by \(\smash {\delta _x \mapsto \delta _{\mathsf {F}(x)}}\) for all \(x \in G\). The transition matrix of \(\mathsf {F}{}\) is the coordinate representation of \(T^\mathsf {F}{}\) with respect to the standard bases of \(\mathbb {C}G\) and \(\mathbb {C}H\).

Definition 3.2 only specifies the action of \(T^\mathsf {F}{}\) on the standard basis of \(\mathbb {C}G\), but this uniquely defines \(T^\mathsf {F}{}\) on all of \(\mathbb {C}G\). The choice of the notations \({T^{\mathsf {F}}}^*\) and \(T^\mathsf {F}{}\) for pullback (Definition 3.1) and transition (Definition 3.2) operators respectively is not arbitrary: these operators are indeed represented by conjugate-transposed matrices. In fact, \(T^\mathsf {F}{}\) could also be called the pushforward operator.

Note that the notation \(T^\mathsf {F}{}\) will be overloaded, referring to both the operator and its standard matrix representation. The coordinates of the matrix \(T^\mathsf {F}{}\) will be indexed by elements of G and H rather than by integers, since this avoids choosing an arbitrary ordering of the standard basis. In particular,

$$\begin{aligned} T_{y, x}^{\mathsf {F}} = \langle \delta _y, T^\mathsf {F}{} \delta _x \rangle = \langle \delta _y, \delta _{\mathsf {F}(x)} \rangle = \delta _{y}(\mathsf {F}(x)). \end{aligned}$$

An analog of Definition 3.2 for \(\mathbb {C}\widehat{G}\) is given in Definition 3.3. It generalizes the definition of correlation matrices given in [5] to arbitrary finite Abelian groups. The term correlation matrix is due to Daemen et al. [16], who defined these matrices in terms of their coordinates.

Definition 3.3

(Correlation matrix). Let \(\mathsf {F}{} : G \rightarrow H\) be a function between finite Abelian groups G and H. Define \(C^\mathsf {F}{} : \smash {\mathbb {C}\widehat{G} \rightarrow \mathbb {C}\widehat{H}}\) as the Fourier transformation of \(T^{\mathsf {F}}\). That is, \(C^{\mathsf {F}} = \mathcal {F}_H \, T^\mathsf {F}{} \, \mathcal {F}^{-1}_G\), with \(\mathcal {F}_H\) and \(\mathcal {F}_G\) the Fourier transformation on \(\mathbb {C}H\) and \(\mathbb {C}G\) respectively. The correlation matrix of \(\mathsf {F}{}\) is the coordinate representation of \(C^\mathsf {F}{}\) with respect to the standard bases of \(\smash {\mathbb {C}\widehat{G}}\) and \(\smash {\mathbb {C}\widehat{H}}\).

The notation \(C^\mathsf {F}{}\) will refer to both the linear operator and its standard matrix representation. Contrary to [5, 16], the coordinates will be indexed by elements of \(\widehat{G}\) in order to avoid arbitrary choices. Since \(T_{y, x}^\mathsf {F}{} = \delta _{y}(\mathsf {F}{}(x))\), the coordinates are given by

$$\begin{aligned} C^\mathsf {F}_{\chi , \psi } = \langle \delta _\chi , C^\mathsf {F}\delta _\psi \rangle = \frac{1}{|G|}\langle \chi , T^\mathsf {F}\psi \rangle = \frac{1}{|G|}\sum _{x \in G} \overline{\chi (\mathsf {F}(x))} \psi (x). \end{aligned}$$

For \(G = \mathbb {F}_2^n\) and \(H = \mathbb {F}_2^m\), and after identifying these groups with their dual, the expression above coincides with the original definition of correlation matrices given by Daemen et al. [16]. The following two theorems list the main properties of transition and correlation matrices that will be used throughout this paper. The last two properties in Theorem 3.1 also apply to correlation matrices. For (2), this follows from the fact that \(\mathcal {F}_{G_1 \oplus G_2}\) is essentially the same as \(\mathcal {F}_{G_1} \otimes \mathcal {F}_{G_2}\).

Theorem 3.1

(Properties of transition matrices). Let \(\mathsf {F}: G \rightarrow H\) be a function. The transition matrix of \(T^\mathsf {F}\) of \(\mathsf {F}\) has the following properties:

  1. (1)

    If \(\mathsf {F}\) is a bijection, then \(T^\mathsf {F}\) is a permutation matrix.

  2. (2)

    If \(\mathsf {F}= (\mathsf {F}_1, \ldots , \mathsf {F}_n)\) with \(\mathsf {F}_i : G_i \rightarrow H_i\), then \(\smash {T^\mathsf {F}= \bigotimes _{i = 1}^n T^{\mathsf {F}{}_i}}\).

  3. (3)

    If \(\mathsf {F}= \mathsf {F}_2 \circ \mathsf {F}_1\), then \(T^\mathsf {F}= T^{\mathsf {F}_2} T^{\mathsf {F}_1}\).

Proof

The first two claims directly follow from \(T^\mathsf {F}_{y, x} = \delta _{y}(\mathsf {F}(x))\). The third property is an immediate consequence of Definition 3.2.    \(\square \)

Theorem 3.2

(Properties of correlation matrices). Let \(\mathsf {F}: G \rightarrow H\) be a function between finite Abelian groups G and H. The correlation matrix \(C^\mathsf {F}\) of \(\mathsf {F}\) has the following properties:

  1. (1)

    If \(\mathsf {F}\) is a bijection, then \(C^\mathsf {F}\) is a unitary matrix.

  2. (2)

    If \(\mathsf {F}\) is a group homomorphism, then \(\smash {C^\mathsf {F}_{\chi , \psi } = \delta _{\chi \circ \mathsf {F}}(\psi )}\).

  3. (3)

    If \(G = H\) and \(\mathsf {F}(x) = x - t\) for some constant \(t \in G\), then \(C^\mathsf {F}\) is a diagonal matrix with \(C^\mathsf {F}_{\chi , \chi } = \chi (t)\).

Proof

By Theorem 3.1 (1), if \(\mathsf {F}\) is a permutation, then \(T^\mathsf {F}\) is a permutation matrix and thus unitary. Furthermore, since \(|G| = |H|\), both \(\mathcal {F}_H^* / \sqrt{|G|}\) and \(\mathcal {F}_G / \sqrt{|G|}\) are unitary matrices. Property (1) follows since the product of unitary matrices is unitary and \(C^\mathsf {F}= \mathcal {F}_H T^\mathsf {F}\mathcal {F}_G^{-1}\).

For (2), note that if \(\mathsf {F}\) is a group homomorphism, then so is \(\chi \circ \mathsf {F}: G \rightarrow \mathbb {C}^\times \). Hence, by the orthogonality of group characters, \(\smash {C^\mathsf {F}_{\chi , \psi } = \delta _{\chi \circ \mathsf {F}}(\psi )}\). As discussed in Sect. 2.2, property (3) holds by construction of the Fourier transformation. Indeed, note that the action of \(\mathsf {F}\) corresponds to a translation by t.    \(\square \)

4 Approximations

An approximation of a function \(\mathsf {F}{} : G \rightarrow H\) is essentially a pair consisting of an input and an output property. By the correspondence in Sect. 3, these properties can be represented by subspaces U and V. As discussed in Sect. 3, \(u \in U\) represents a state and \(v \in V\) corresponds to a linear measurement function or observation. The inner product \(\langle v, T^{\mathsf {F}{}} u \rangle \) gives the outcome of such an observation. This leads to Definition 4.1 below, where the approximation map represents all such inner products without relying on the choice of a specific basis. Given orthonormal bases \(u_1, u_2, \ldots \) and \(v_1, v_2, \ldots \) for U and V respectively, the coordinates of the matrix representing the approximation map are given by the inner products \(\langle v_i, T^\mathsf {F}u_i\rangle \).

Definition 4.1

(Approximation). Let G and H be finite Abelian groups. An approximation of a function \(\mathsf {F}: G \rightarrow H\) is a pair (UV) of subspaces \(U \subseteq \mathbb {C}\widehat{G}\) and \(V \subseteq \mathbb {C}\widehat{H}\). The approximation map of (UV) is a linear transformation \(\langle V, U \rangle _{\mathsf {F}} : U \rightarrow V\) defined by \(\langle V, U \rangle _{\mathsf {F}} = \pi _V \, C^{\mathsf {F}{}} \, \iota _U\), with \(\iota _U : U \rightarrow \smash {\mathbb {C}\widehat{G}}\) the inclusion map and \(\pi _V : \smash {\mathbb {C}\widehat{H}} \rightarrow V\) the orthogonal projection on V.

Definition 4.1 refers to subspaces of \(\mathbb {C}\widehat{G}\) and \(\mathbb {C}\widehat{H}\). An equivalent definition could be given for the subspaces \(\mathcal {F}_G^*(U) \subseteq \mathbb {C}G\) and \(\mathcal {F}_H^*(V) \subseteq \mathbb {C}H\), taking into account that \(C^\mathsf {F}{}\) should be replaced by \(T^\mathsf {F}{}\). The same remark applies to all definitions in this section and Sect. 5.

Note that the notation \(\langle V, U \rangle _{\mathsf {F}}\) is intentionally similar to the ‘inner product of subspaces’ notation \(\langle V, U \rangle _{}\) from Sect. 2.3. It will be shown in Sect. 4.1 that the maps \(\langle V, U \rangle _{\mathsf {F}}\) and \(\langle V, C^\mathsf {F}{}U \rangle _{}\) are indeed closely related and encode the same geometric information.

Example 4.1

Consider a linear approximation for a function \(\mathsf {F}{} : \mathbb {F}_2^n \rightarrow \mathbb {F}_2^m\). As listed in Table 2, linear properties correspond to one-dimensional spaces \(U = \mathrm {span}\{\delta _{\chi _u}\}\) and \(V = \mathrm {span}\{\delta _{\chi _v}\}\) with masks \(u \in \mathbb {F}_2^n\) and \(v \in \mathbb {F}_2^m\). As in Example 2.3, one has the inclusion map \(\iota _U(x) = x\) and the orthogonal projection \(\pi _V(x) = \langle \delta _{\chi _v}, x \rangle \delta _{\chi _v}\). Hence, \(\langle V, U \rangle _{\mathsf {F}}\) is given by \(\lambda \delta _{\chi _u} \mapsto \langle \delta _{\chi _v}, C^\mathsf {F}{} \delta _{\chi _u}\rangle \lambda \delta _{\chi _v} = C_{\chi _v, \chi _u}^\mathsf {F}{}\lambda \delta _{\chi _v}\). The same result holds for any pair of finite Abelian groups.

The main purpose of this section is to show that Definition 4.1 indeed encompasses all variants of linear cryptanalysis mentioned in Sect. 1, and leads to new insights for several of them.

As illustrated in Fig. 1, two geometrically intuitive edge cases of Definition 4.1 can be identified: parallel or orthogonal spaces V and \(C^\mathsf {F}U\). Approximations in the former category will be called ‘perfect’. This includes the important case of invariants. The latter case corresponds to a broad generalization of zero-correlation linear approximations. In the remaining cases, the vector spaces V and \(C^\mathsf {F}U\) are neither completely parallel nor fully orthogonal. All three cases are discussed in detail in Sects. 4.1 to 4.3.

Fig. 1.
figure 1

Geometric interpretation of Definition 4.1.

The geometric intuitions illustrated in Fig. 1 can be quantified using the concept of principal angles that was introduced in Sect. 2.3. This leads to the following definition of ‘principal correlations’. For linear approximations, the unique principal correlation coincides with the ordinary absolute correlation. Further aspects of principal correlations, such as their relation to the ‘capacity’ in multiple linear cryptanalysis, are discussed in Sect. 4.3.

Definition 4.2

(Principal correlations). Let (UV) be an approximation for a function \(\mathsf {F}: G \rightarrow H\) between finite Abelian groups G and H. Let \(d = \min \{\dim U, \dim V\}\). The principal correlations of the approximation (UV) are the d largest singular values of the approximation map \(\langle V, U \rangle _{\mathsf {F}}\).

The geometric interpretation of the principal correlations is due to the following result, which relates them to the principal angles between the subspaces \(C^\mathsf {F}{} U\) and V.

Theorem 4.1

Let (UV) be an approximation for a function \(\mathsf {F}: G \rightarrow H\) between finite Abelian groups G and H. Let \(d = \min \{\dim U, \dim V\}\). If \(\mathsf {F}{}\) is injective, then the principal correlations of the approximation (UV) are equal to the cosines of the d smallest principal angles between the subspaces \(C^\mathsf {F}{} U\) and V.

Proof

By Theorem 3.2 (1), \(C^\mathsf {F}{}\) is a unitary matrix if \(\mathsf {F}\) is a permutation. More generally, \([C^{\mathsf {F}}]^* C^\mathsf {F}{}\) is a nonzero multiple of the identity map if \(\mathsf {F}\) is an injection. That is, \(C^\mathsf {F}{}\) preserves the inner product up to multiplication by a constant. To prove this, show that the result holds for \(T^\mathsf {F}{}\) (by direct calculation) and then apply the same argument as in the proof of Theorem 3.2 (1).

If \(C^{\mathsf {F}{}}\) preserves the inner product up to multiplication by a nonzero constant, then \(u_{i + 1} \perp u_i\) implies \(C^{\mathsf {F}{}} u_{i + 1} \perp C^{\mathsf {F}{}} u_i\). Hence, the result follows from the fact that the variational characterization of singular values is then equivalent to the definition of principal angles (Definition 2.3).    \(\square \)

4.1 Invariants and Perfect Approximations

If the subspaces U and V are aligned as in Fig. 1a, the approximation (UV) will be called perfect. More formally, (UV) is perfect if \(C^\mathsf {F}{} U \subseteq V\). Alternatively, an approximation over a permutation \(\mathsf {F}\) is perfect if its principal correlations are equal to one.

Integral and division properties are of this type, but these traditionally ‘algebraic’ properties are not the main focus of this work. However, the case \(U = V\) is of particular interest since it leads to a class of approximations that will be called invariants, and which includes the invariant subspaces of Leander et al. [30] and the nonlinear invariants of Todo et al. [39].

Definition 4.3

(Invariant). Let \(\mathsf {F}: G \rightarrow G\) be a function. An approximation (VV) such that \(C^\mathsf {F}V \subseteq V\) will be called an invariant for \(\mathsf {F}\).

If \(\mathsf {F}{}\) is a permutation, all principal correlations of an invariant (VV) are equal to one. For general functions, this is not necessarily true. For example, if two distinct input distributions result in the same output distribution, it is natural to consider the difference of their probability mass functions as invariant.

Since transition matrices and correlation matrices of permutations have finite multiplicative order, they are diagonalizable. Consequently, by a standard linear algebra result for algebraically closed fields, any invariant V splits into one-dimensional invariant subspaces spanned by the eigenvectors of \(C^\mathsf {F}{}\). Hence, Definition 4.3 reduces to the characterization of invariants introduced in [5, Definition 2]

Despite the fact that the eigenvectors of \(C^\mathsf {F}\) determine all possible invariants, the more general characterization of invariants in Definition 4.3 sometimes leads to additional insight. This will be illustrated using the following example, which involves eigenvectors whose corresponding eigenvalue is imaginary – thereby addressing a problem left as future work by [5].

Example 4.2

Consider the following 4-bit S-box, defined in cycle notation:

$$\begin{aligned} \mathsf {S} = (\mathtt{0\ 7\ b\ 3\ \ d\ 5\ 9\ 6\ \ 8\ 2\ 1\ e\ \ a\ f\ c\ 4}). \end{aligned}$$

Further details about this S-box, including a lookup table representation, are given as supplementary material in the extended version [7]. From a cryptanalytic perspective, the properties of \(\mathsf S\) are seemingly excellent: the linear and differential properties are optimal, and it does not have any fixed points since it is a cyclic permutation. The last property implies that all eigenspaces of \(C^{\mathsf S}\) are one-dimensional, see for instance [5, §4.2]. An immediate consequence of this is that \(\mathsf S\) does not have any nontrivial invariant subspaces.

Denote the ring of integers modulo four by \(\mathbb {Z}_4\) and let \(f : \mathbb {F}_2^4 \rightarrow \mathbb {Z}_4\) be the function defined by \(f(\{\mathtt{0, d, 8, a}\}) = 0\), \(f(\{\mathtt{b, 9, 1, c}\}) = 2\), \(f(\{\mathtt{7, 5, 2, f}\}) = 1\) and \(f(\{\mathtt{3, 6, e, 4}\}) = 3\). By inspection of the cycle structure of \(\mathsf S\), one can see that \(f(\mathsf S(x)) = f(x) + 1\) for all \(x \in \mathbb {F}_2^4\). This property is reminiscent of nonlinear invariants, and in fact yields a nonlinear invariant for \(\mathsf S\) when reduced modulo two. Nevertheless, the property is more powerful than a nonlinear invariant since its defining function takes values in \(\mathbb {Z}_4\) rather than \(\mathbb {F}_2\). In fact, the use of \(\mathbb {Z}_4\)-approximations has been previously suggested by Parker and Raddum [33]. Properties such as f are to nonlinear invariants as nonlinear invariants are to invariant sets: just as a nonlinear invariant can be interpreted as a pair of sets that are potentially swapped by \(\mathsf S\), f can be interpreted as a pair of nonlinear invariants that are swapped by \(\mathsf S\).

To obtain a subspace V of \(\mathbb {C}\widehat{\mathbb {F}}^4_2\) from f, the pullback construction from Sect. 3.1 can be applied. Since \(\mathbb {Z}_4\) is cyclic of order four, one can deduce from Theorem 2.1 that \(\widehat{\mathbb {Z}}_4 = \{x \mapsto \zeta _4^{kx}~k \in \mathbb {Z}_4\}\) with \(\zeta _4\) a primitive fourth root of unity such as \(\sqrt{-1}\). Hence, using the basis of functions \(\smash {\widehat{\chi \circ f}}\) where \(\chi \in \widehat{\mathbb {Z}}_4\), yields

$$\begin{aligned} V= & {} \mathrm {span}\big \{\widehat{\zeta _4^{0}}, \widehat{\zeta _4^{f}}, \widehat{\zeta _4^{2f}}, \widehat{\zeta _4^{3f}}\big \} \\= & {} \mathrm {span}\big \{(1, \,\, 0, 0, 0, 0, \,\,\,\,\, 0, 0, \,\,\, 0, \,\,\,\, 0, 0, 0, \,\, 0, 0, \,\,\,\,\,\, 0, 0, 0)^\top ,\\&\qquad \,\,\,\, (0, \overline{\zeta _8}, 0, 0, 0, 2\zeta _8, 0, \overline{\zeta _8}, \,\,\,\, 0, 0, 0, \overline{\zeta _8}, 0, -\overline{\zeta _8}, 0, 0)^\top /\sqrt{8},\\&\qquad \,\,\,\, (0, \,\,\, 0, 1, 0, 1, \,\,\,\,\, 0, 0, \,\, 0, -1, 0, 0, \,\, 0, 0, \,\,\,\,\,\,\, 0, 1, 0)^\top /2,\\&\qquad \,\,\,\, (0, \zeta _8, 0, 0, 0, 2\overline{\zeta _8}, 0, \zeta _8, \,\,\,\, 0, 0, 0, \zeta _8, 0, -\zeta _8, 0, 0)^\top /\sqrt{8}\big \}. \end{aligned}$$

The choice of \(\smash {\widehat{\chi \circ f}}\) (up to a scalar multiple) as a basis is not arbitrary: since \(\chi (f(\mathsf S(x))) = \chi (1)\chi (f(x))\), it ensures that each basis vector is an eigenvector of \(C^\mathsf S\). Consequently, it is immediately clear that V is indeed an invariant. Note that the first vector listed above is the trivial eigenvector with eigenvalue one. The second and fourth vectors are complex-conjugate eigenvectors corresponding to the conjugate eigenvalues \(\zeta _4\) and \(\overline{\zeta _4}\). Finally, the third vector is an eigenvector with eigenvalue \(\zeta _4^2 = -1\). It corresponds to the nonlinear invariant obtained by reduction modulo two that was mentioned above.

For the purpose of obtaining an interesting example, the S-box \(\mathsf S\) was carefully chosen. In particular, by taking appropriate linear combinations of the two complex-conjugate eigenvectors above, one can see that V is spanned by four real vectors \(v_1, \ldots , v_4\) such that \(v_1^{\otimes 16}, \ldots , v_4^{\otimes 16}\) are all eigenvectors of \(C^{\mathsf L}\), where \(\mathsf L\) is the linear layer of Midori-64. Furthermore, these vectors are invariant under the round-constant and key-additions for \(2^{32}\) weak keys. In fact, \(v_3^{\otimes 16}\) is itself a nonlinear invariant for the same number of weak keys, but it has been shown that there exists a stronger four-dimensional invariant.

Moreover, there is a larger set of \(2^{96}\) weak keys for which \(v_1^{\otimes 16}\) and \(v_2^{\otimes 16}\) are still invariants for the whole cipher. This is due to the fact that Midori-64 alternates round keys, and because \(C^{\mathsf S} v_2 = -v_4\) and \(C^{\mathsf S} v_4 = v_2\). However, neither \(v_2\) nor \(v_4\) corresponds to a nonlinear invariant for \(\mathsf S\). One can think of the invariant obtained here as a ‘remnant’ of the stronger – yet valid for fewer keys – invariant described above. The supplementary material of the extended version [7] contains additional details regarding the preceding claims.    \(\vartriangleright \)

In general, a one-dimensional periodically repeating perfect approximation for a function \(\mathsf {F}\) must be an eigenvector of \([C^\mathsf {F}]^l\) with eigenvalue one for some positive integer l. These eigenvectors are linear combinations of the eigenvectors of \(C^\mathsf {F}\) with eigenvalues of order divisible by l.

4.2 Zero-Correlation Approximations

Zero-correlation linear approximations were introduced by Bogdanov and Rijmen [13]. They correspond to linear approximations \((\mathrm {span}\{\delta _\psi \}, \mathrm {span}\{\delta _\chi \})\) such that \(C^\mathsf {F}_{\chi , \psi } = 0\). That is, \(\delta _\chi \) is orthogonal to \(C^\mathsf {F}\delta _\psi \). This corresponds to the geometric situation sketched in Fig. 1b, motivating the following definition.

Definition 4.4

(Zero-correlation approximation). Let \(\mathsf {F}: G \rightarrow H\) be a function. An approximation (UV) such that \(V \perp C^\mathsf {F}U\) will be called a zero-correlation approximation for \(\mathsf {F}\). Equivalently, all principal correlations of a zero-correlation approximation (UV) are zero.

Zero-correlation and perfect approximations are closely related, despite being opposite extremes. In fact, this is clear from a geometrical point of view, see for instance Figs. 1a and 1b.

Theorem 4.2

If (UV) is a zero-correlation approximation, then \((U, V^\perp )\) is a perfect approximation and conversely.

Proof

Since (UV) is a zero-correlation approximation, any \(v \in C^\mathsf {F}{} U\) is orthogonal to V. Hence, \(C^\mathsf {F}{} U \subseteq V^\perp \). The proof of the converse result is analogous.

   \(\square \)

The statement and proof of Theorem 4.2 are deceptively simple, but the result is powerful. Indeed, it generalizes the well-known correspondence between multidimensional linear zero-correlation approximations and integral properties, first noted by Bogdanov et al. at ASIACRYPT 2012 [12]Footnote 2 and discussed futher by Sun et al. [35].

Definition 4.4 leads to a useful generalization of the miss-in-the-middle approach that is commonly used to find zero-correlation linear approximations. Suppose \(\mathsf {F}= \mathsf {F}_2 \circ \mathsf {F}_1\). Let \((U_1, V_1)\) and \((U_2, V_2)\) be approximations such that

$$\begin{aligned} C^{\mathsf {F}_1} U_1 \subseteq V_1 \perp V_2 \supseteq [C^{\mathsf {F}_2}]^* U_2. \end{aligned}$$

It then follows that \((U_1, U_2)\) is a zero-correlation approximation for \(\mathsf {F}_2 \circ \mathsf {F}_1\). Recall from Theorem 3.2 (1) that if \(\mathsf {F}_2\) is invertible, then \([C^{\mathsf {F}_2}]^* = C^{\mathsf {F}_2^{-1}}\).

Example 4.3

The key-recovery attacks on Midori-64 and MANTIS from ASIACRYPT 2018 [5] are based on a one-dimensional nonlinear zero-correlation approximation, and this property was obtained by connecting an ordinary integral property with a nonlinear invariant using the miss-in-the-middle approach discussed above. For completeness, a fully worked out version of this approximation is provided as supplementary material in the extended version [7].

The zero-correlation approximation in Example 4.3 can still be explained by mismatching activity patterns in the middle. The benefit of the geometric approach here is mainly that it clarifies that the combination of integral properties with invariants is a natural example of a more general principle, rather than just a ‘trick’. However, in some cases, a more refined and possibly key-dependent analysis is necessary to establish the orthogonality of the subspaces \(V_1\) and \(V_2\). Such an example will be encountered in Sect. 7.3.

4.3 General Approximations

It follows from Example 4.1 that the unique principal correlation for an ordinary linear approximation equals the absolute value of the (conventional) correlation of the linear approximation. For a fixed advantage, the data-complexity of a linear distinguisher is inversely proportional to the square of the correlation.

More generally, Baignères et al. [2] discuss the optimal data-complexity of distinguishers for a permutation \(\mathsf {F}{} : G_1 \rightarrow G_2\) based on balanced projections \(\mathsf P_1 : G_1 \rightarrow H_1\) and \(\mathsf P_2 : G_2 \rightarrow H_2\). As discussed in Sect. 3.1, these projections correspond to subspaces \(U = \mathrm {span}\{\delta _x \circ \mathsf P_1~|~x \in H_1\} \subseteq \mathbb {C}G_1\) and \(V = \mathrm {span}\{\delta _x \circ \mathsf P_2~|~x \in H_2\} \subseteq \mathbb {C}G_2\) by the pullback construction. The approximation map \(\langle V, U \rangle _{\mathsf {F}}\) can be represented by a matrix M with coordinates

$$\begin{aligned} M_{y, x}&= \frac{\langle \delta _y \circ \mathsf P_2, T^\mathsf {F}[\delta _x \circ \mathsf P_1] \rangle }{\Vert \delta _y \circ \mathsf P_2\Vert _2\,\Vert \delta _x \circ \mathsf P_1\Vert _2} \\&= \sqrt{\frac{|G_1|}{|G_2|}\frac{\Pr {[\mathsf P_1(\boldsymbol{z}_{1}) = x]}}{\Pr {[\mathsf P_{2}(\boldsymbol{z}_{2}) = y]}}} \;\Pr {[\mathsf P_{2}(\mathsf {F}(\boldsymbol{z}_1)) = y\,|\,\mathsf P_1(\boldsymbol{z}_1) = x]}, \end{aligned}$$

where \(\boldsymbol{z}_{1}\) is uniform random on \(G_1\) and \(\boldsymbol{z}_{2}\) is uniform random on \(G_{2}\). Since the approximations considered by Baignères et al. are balanced, \(\Pr {[\mathsf P_1(\boldsymbol{z}_{1}) = x]} = |H_1| / |G_1|\) and \(\Pr {[\mathsf P_2(\boldsymbol{z}_{2}) = y]} = |H_2| / |G_2|\), so the prefactor simplifies to \(\sqrt{|H_1|} /\sqrt{|H_2|}\). Recall that the Frobenius norm \(\Vert \cdot \Vert _F\) of a linear operator is the square root of the sum of its squared singular values. Equivalently, its square equals the sum of all squared coordinates of an arbitrary matrix representation with respect to an orthonormal basis. It follows that the Frobenius norm of \(\langle V, U \rangle _{\mathsf {F}}\) is given by

$$\begin{aligned} \Vert \langle U, V \rangle _{\mathsf {F}}\Vert _F^2 = \frac{|H_1|}{|H_2|} \sum _{\begin{array}{c} x \in H_1\\ y \in H_2 \end{array}} \Pr {[\mathsf P_{2}(\mathsf {F}(\boldsymbol{z}_1)) = y\,|\,\mathsf P_1(\boldsymbol{z}_1) = x]}^2. \end{aligned}$$

In particular, \(\Vert \langle U, V \rangle _{\mathsf {F}}\Vert _F^2 - 1\) is equal to the squared Euclidean imbalance as defined by Baignères et al. [2, Definition 7]. The term \(-1\) is due to the trivial invariant corresponding to the uniform distribution. If this is omitted, one obtains that the data-complexity of an optimal distinguisher is inversely proportional to the sum of the squared principal correlations. This generalizes to multiple linear distinguishers (which are not necessarily of projection type), in which case the squared Frobenius norm corresponds to the fixed-key capacity.

5 Trails

Most cryptographic primitives \(\mathsf {F}\) do not allow for a direct computation of the approximation map \(\langle V, U \rangle _{\mathsf {F}}\), even when U and V are low-dimensional. Indeed, if \(\mathsf {F}\) is devoid of structure, one is forced to estimate the approximation map empirically. Consequently, finding good approximations of the general type discussed in Sect. 4.3 is nontrivial.

However, cryptographic primitives are often a composition of highly structured round functions. That is, \(\mathsf {F}= \mathsf {F}_r \circ \mathsf {F}_{r - 1} \circ \cdots \circ \mathsf {F}_1\). By exploiting the structure of the functions \(\mathsf {F}_i\), one can often find approximations \((V_i, V_{i+1})\) such that \(\langle V_{i+1}, V_i \rangle _{\mathsf {F}_i}\) can be efficiently computed. This is for instance the case for linear cryptanalysis, and Sect. 6 will introduce rank-one approximations as another example for cell-oriented ciphers. The remaining task is to combine or ‘pile-up’ the individual approximations \((V_{i}, V_{i+1})\) for \(\mathsf {F}_i\) in order to obtain an approximation \((V_1, V_{r + 1})\) for \(\mathsf {F}\). The purpose of the piling-up principle, which will be discussed in Sect. 5.1, is to obtain an estimate of the approximation map \(\langle V_{r + 1}, V_1 \rangle _{\mathsf {F}}\).

Definition 5.1

(Trail). Let \(G_1, G_2, \ldots , G_{r + 1}\) be finite Abelian groups. A trail of vector spaces for a function \(\mathsf {F}= \mathsf {F}_r \circ \cdots \circ \mathsf {F}_1\) with \(\mathsf {F}_i : G_i \rightarrow G_{i + 1}\) is a tuple \((V_1, V_2, \ldots , V_{r + 1})\) of subspaces \(V_1 \subseteq \mathbb {C}\widehat{G}_1, \ldots , V_{r + 1} \subseteq \mathbb {C}\widehat{G}_{r + 1}\).

Similarly to ordinary linear trails, Definition 5.1 defines a sequence of compatible intermediate approximations. In particular, if all vector spaces \(V_i\) are spanned by a standard basis vector \(\delta _{\chi _i} \in \mathbb {C}\widehat{G}_i\), one obtains ordinary linear trails as defined by Matsui [31] and generalized to other groups by Baignères et al. [3]. Note that the compatibility requirement does not exclude taking one or more of the functions \(\mathsf {F}_i\) as the identity map.

5.1 Piling-Up Principle

As discussed in Sect. 1, methods for piling-up the approximations within a trail are often motivated by Markov chain assumptions, or a dominant trail hypothesis. Unfortunately, when the former assumption fails, it is often hard to understand why or how to resolve the problem. The latter approach has been mostly limited to the case of simple linear cryptanalysis.

Theorem 5.1 below provides an alternative motivation for the piling-up principle. The premise is that each approximation in a trail corresponds to a transformation of its input space, followed by an orthogonal projection on the input space of the next approximation. Each of these successive projections introduces an error, but orthogonal projection is optimal in the sense that it keeps the inner product between the state and its approximation maximal and the norm of the error minimal (see Sect. 2.3).

Theorem 5.1

(Piling-up principle). Let \((V_1, V_2, \ldots , V_{r+1})\) be a trail for a function \(\mathsf {F}= \mathsf {F}_r \circ \cdots \circ \mathsf {F}_1\). The approximation map of the approximation \((V_{r + 1}, V_1)\) for \(\mathsf {F}\) can be written as

$$\begin{aligned} \langle V_{r + 1}, V_1 \rangle _{\mathsf {F}} = \langle V_{r+1}, V_r \rangle _{\mathsf {F}_r} \cdots \langle V_{2}, V_1 \rangle _{\mathsf {F}_1} + E\,, \end{aligned}$$

where the error term E is the transformation given by

$$\begin{aligned} E = \sum _{i = 1}^{r - 1} \langle V_{r + 1}, V_{i+1} \rangle _{\mathsf {F}_r \circ \cdots \circ \mathsf {F}_{i + 1}} \langle V_{i + 1}^\perp , V_i \rangle _{\mathsf {F}_{i}} \cdots \langle V_2, V_1 \rangle _{\mathsf {F}_1}\,. \end{aligned}$$

Proof

The proof follows the above intuition of successive orthogonal projection, but keeps track of the error term. Recall from Definition 4.1 that \(\langle V, U \rangle _{\mathsf {F}} = \pi _V C^\mathsf {F}{} \iota _U\) where \(\pi _V\) is the orthogonal projector on V and \(\iota _U\) the inclusion map. Since \(\pi _V + \pi _{V^\perp }\) is equal to the identity map, one has the following decomposition for \(i = 1, \ldots , r - 1\):

$$\begin{aligned}&\langle V_{r + 1}, V_{i} \rangle _{\mathsf {F}_r \circ \cdots \circ \mathsf {F}_i}\\ =\;&\pi _{V_{r + 1}} C^{\mathsf {F}{}_r \circ \cdots \circ \mathsf {F}_{i + 1}} (\pi _{V_{i + 1}} + \pi _{V_{i + 1}^\perp }) C^{\mathsf {F}_i} \iota _{V_i}\\ =\;&\langle V_{r + 1}, V_{i + 1} \rangle _{\mathsf {F}_r \circ \cdots \circ \mathsf {F}_{i+1}} \langle V_{i + 1}, V_i \rangle _{\mathsf {F}_i} + \langle V_{r + 1}, V_{i+1} \rangle _{\mathsf {F}_r \circ \cdots \circ \mathsf {F}_{i + 1}} \langle V_{i + 1}^\perp , V_i \rangle _{\mathsf {F}_i}. \end{aligned}$$

The result follows by successively decomposing the factor \(\langle V_{r + 1}, V_{i + 1} \rangle _{\mathsf {F}_r \circ \cdots \circ \mathsf {F}_{i+1}}\) using the same expression.    \(\square \)

Theorem 5.1 generalizes the piling-up principle as used in many variants of linear cryptanalysis. This will be demonstrated in Sect. 5.2. Furthermore, allowing arbitrary subspaces \(V_i\) increases flexibility. Even if the spaces \(V_1\) and \(V_{r+1}\) correspond to a specific type of property, the intermediate vector spaces can represent seemingly unrelated properties. This will be illustrated in Theorem 5.3, and again in Sect. 6. In addition, since the formulation of Theorem 5.1 is basis-free, the choice of basis for these spaces can be arbitraryFootnote 3. This may have computational benefits.

5.2 Discussion of Theorem 5.1

In the one-dimensional case with \(V_i\) spanned by \(\delta _{\chi _i}\), Theorem 5.1 reduces to

$$\begin{aligned} C^\mathsf {F}_{\chi _{r + 1}, \chi _1} = \textstyle \prod _{i = 1}^r C^{\mathsf {F}_i}_{\chi _{i + 1}, \chi _i} + e, \end{aligned}$$

where the error term e can be written as a sum over all other linear trails. This is the fixed-key piling-up principle as stated in [16, §6.1] for \(\mathbb {F}_2^n\). It also implies the piling-up lemma as stated by Matsui [31] and generalized by Baignères et al. [3] to other groups (after taking the variance with respect to independent round keys). The composition result of Beierle et al. [4, Theorem 3] for one-dimensional nonlinear approximations is another special case.

A few examples of the higher-dimensional case can be found in the literature. Consider the case where all spaces \(V_i\) are pullbacks of \(\mathbb {C}H_i\) along balanced projection functions \(\mathsf {P}_i : G_i \rightarrow H_i\), as in Baignères et al. [2] and Wagner [42]. Like all results in this paper, Theorem 5.1 is basis-free and also applies to the spaces \(U_i = \mathcal {F}^{-1}(V_i) \subseteq \mathbb {C}G\) provided that one replaces \(C^{\mathsf {F}_i}\) by \(T^{\mathsf {F}_i}\). As shown in Sect. 4.3, relative to the bases \(\{\delta _x \circ \mathsf {P}_i / \Vert \delta _x \circ \mathsf {P}_i\Vert _2~|~x \in H_i\}\) for \(U_i\), the map \(\langle U_{i + 1}, U_i \rangle _{\mathsf {F}_i}\) can be represented by a matrix M with coordinates

$$\begin{aligned} M_{y, x} = \sqrt{\frac{|H_{i}\,\,\,\;\;|}{|H_{i + 1}|}} \;\Pr {[\mathsf P_{i + 1}(\mathsf {F}(\boldsymbol{z})) = y\,|\,\mathsf P_i(\boldsymbol{z}) = x]}, \end{aligned}$$

where \(\boldsymbol{z}\) is uniform random on \(|G_i|\). That is, there exist diagonal matrices \(D_i\) and \(D_{i + 1}\) such that \(D_{i + 1} M D_{i}^{-1}\) is the transition matrix considered in [2, 42]. These works follow the Markov chain assumption, which leads to using the product of round transition matrices as an approximation for the true transition matrix. The factors \(D_i\) and \(D_{i + 1}\) indeed cancel out, so that Theorem 5.1 yields the same result up to initial and final multiplication by \(D_{r + 1}\) and \(D_1^{-1}\) respectively.

In the case of multiple linear cryptanalysis [9, 25], it is common practice to combine many individual linear trails by adding their correlations. Alternatively, the squared correlations are added in order to estimate the variance of the correlation under the assumption of independent round keys. However, in general, strong approximations can often be found by taking into account the correlations between all pairs of approximations. Theorem 5.1 reflects this because, for multiple linear approximations, the coordinate representation of \(\langle V_{i + 1}, V_i \rangle _{\mathsf {F}_i}\) in the standard basis is a submatrix of the correlation matrix \(C^{\mathsf {F}_i}\). This approach has been (sometimes implicitly) used in several works, notably in analyses of Present [14], Puffin [29] and Spongent [11]. Note that this is often combined with key-averaging, but a careful analysis of the key-dependency would be both feasible and preferable in many cases.

5.3 Clustering and Linear Approximations from Invariants

A minimal condition for the applicability of the piling-up approximation is that one chooses the best trail from within a predetermined class of candidates, where the principal correlations can be used as a measure of quality. Indeed, by decomposing the error term in Theorem 5.1, one can see that it can be large if other trails result in better or comparable approximations.

However, it is also possible that the class of candidate trails is too limited to obtain a good estimate for \(\langle V_{r+1}, V_1 \rangle _{\mathsf {F}}\). In the context of linear cryptanalysis, this phenomenon has been called clustering by Daemen and Rijmen [18]. In some cases, clustering can be explained by broadening the set of candidate trails. At ASIACRYPT 2018, an example of a perfect linear approximation over full Midori-64 (with modified round constants) was presented [5]. However, full-round Midori-64 does not admit any high-correlation linear trails. This observation can be thought of as an extreme case of a more general phenomenon. At CRYPTO 2012, Abdelraheem et al. [1] showed that invariant subspaces give rise to linear approximations with higher-than-expected correlation. The same observation was later generalized to plateaued nonlinear invariants by Beierle et al. [4]. Plateaued nonlinear invariants are characterized by a flat Walsh-Hadamard transform, taking only two values up to sign. The results of Beierle et al. [4] can be summarized and generalized as follows.

Theorem 5.2

Let \(\mathsf {F}: G \rightarrow G\) be a function on a finite Abelian group G. Let \(v \in \mathbb {C}\widehat{G}\) be any function such that \(|v(\chi )| = 1/\sqrt{|\mathrm {supp}\;v|}\) for all \(\chi \in \mathrm {supp}\;v\) and zero elsewhere. If \(\mathrm {span}\{v\}\) is an invariant of \(\mathsf {F}\) in the sense of Definition 4.3, then there exist characters \(\chi , \psi \in \mathrm {supp}\;v\) such that \(|C^{\mathsf {F}}_{\chi , \psi }| \ge 1 / |\mathrm {supp}\;v|\).

Proof

By Definition 4.3, it holds that (the sum is over \(\chi , \psi \in \mathrm {supp}\;v\))

$$\begin{aligned} 1 = |\langle v, C^\mathsf {F}v\rangle |&= \Big |\sum _{\chi , \psi } \overline{v(\chi )} v(\psi ) C^{\mathsf {F}}_{\chi , \psi }\Big | \le |\mathrm {supp}\;v|\max _{\chi , \psi } |C^{\mathsf {F}}_{\chi , \psi }|. \end{aligned}$$

It follows that \(|C^{\mathsf {F}}_{\chi , \psi }| \ge 1 / |\mathrm {supp}\;v|\) for at least one pair \((\chi , \psi )\).    \(\square \)

Note that the same result is spread over two theorems in previous work [4, Theorem 4 and 5]: one for invariant subspaces, and one for plateaued nonlinear invariants. This illustrates the convenience of the general definitions in Sect. 4. To apply the results to the case of invariant subspaces, one only needs to know that the Fourier transformation of the indicator function of a subgroup \(H \subseteq G\) is flat with support size |G|/|H|. This follows from the Poisson-summation formula [37, Theorem 1]. See also the first entry of Table 2 for \(G = \mathbb {F}_2^n\).

Theorem 5.2 and the results above illustrate that a strong approximation using one kind of property tends to result in unexpectedly good approximations using other properties. This can be understood using Theorem 5.1. For example, let \(\mathrm {span}\{v\}\) with \(\Vert v\Vert _2 = 1\) be any one-dimensional invariant for \(C^\mathsf {F}{}\). Consider an ordinary linear approximation, i.e. a pair \((\mathrm {span}\{\delta _\psi \}, \mathrm {span}\{\delta _\chi \})\) where \(\psi , \chi \) are characters. Assuming \(\delta _{\psi } \not \perp v\) and \(\delta _{\chi } \not \perp v\), the correlation of the linear approximation over \(\mathsf {F}{}\) can be estimated using the following trail:

$$\begin{aligned} \delta _\psi \xrightarrow [\langle v, \delta _\psi \rangle _{}]{I} v \xrightarrow [1]{C^{\mathsf {F}{}}} v \xrightarrow [\langle \delta _\chi , v \rangle _{}]{I} \delta _\chi . \end{aligned}$$

Theorem 5.1 yields the estimate \(|\langle v, \delta _\psi \rangle \langle \delta _\chi , v \rangle | = |v(\psi ) v(\chi )|\) for the absolute correlation. If v is flat as in Theorem 5.2, then the piling-up approximation suggests that all approximations with \(\psi , \chi \in \mathrm {supp}\;v\) will have a correlation of roughly \(1 / |\mathrm {supp}\;v|\). In fact, this resolves a problem of Beierle et al., who note that “our arguments are non-constructive and therefore, we are not able to identify those highly-biased linear approximations” [4, §1]. In fact, it is easy to identify the highly-biased approximations in practice: generically, any approximation with \(\psi , \chi \in \mathrm {supp}\;v\) will do.

6 Rank-One Approximations

It is often convenient to represent the domain of a cipher as an array of m cells of n-bit vectors, because most of the operations in the cipher act on the cells in an independent way. In fact, in ciphers such as the AES, only the linear layer results in diffusion between cells. That is, let \(G = (\mathbb {F}_2^n)^m\). Recall from Sect. 2 that \(\mathbb {C}(\mathbb {F}_2^n)^m \cong [\mathbb {C}\mathbb {F}_2^n]^{\otimes m}\) and similarly for the dual group. For example, the probability distribution of a state with independent cells having distributions \(p_1, \ldots , p_m\), is represented by the rank-one tensor \(p_1 \otimes \cdots \otimes p_m \in [\mathbb {C}\mathbb {F}_2^n]^{\otimes m}\) (see Sect. 2.1 for definitions).

A rank-one approximation (UV) is any approximation such that U and V are spanned by a rank-one tensor. No further conditions are imposed on U and V. An important class of rank-one approximations is obtained from balanced Boolean functions \(f : (\mathbb {F}_2^n)^m \rightarrow \mathbb {F}_2\) such that \(f(x_1, \ldots , x_m) = \sum _{i = 1}^m f_i(x_i)\). As shown in Table 2, the corresponding vector space for such a property is spanned by the function \(\smash {(-1)^f = \bigotimes _{i = 1}^m\,(-1)^{f_i}}\). Equivalently, the Fourier transformation of the corresponding vector space is spanned by

$$\begin{aligned} \mathcal {F}[(-1)^f] = \textstyle \bigotimes _{i = 1}^m\,\mathcal {F}[(-1)^{f_i}], \end{aligned}$$

where \(\mathcal {F}[(-1)^{f_i}]\) is precisely the Walsh-Hadamard transform of \(f_i\). The invariants discussed in [5] and the nonlinear approximations considered by Beierle et al. [4] are of this type.

6.1 Theoretical Analysis of Rank-One Trails

By Theorem 3.1 (2), the correlation matrix of a layer of m identical S-boxes \(\mathsf S\) is equal to \((C^{\mathsf S})^{\otimes m}\). Indeed, correlation matrices are themselves tensors and the tensor rank (not to be confused with matrix rank) of \((C^{\mathsf S})^{\otimes m}\) is one. This expresses the fact that the S-box layer preserves independence of cells. A similar result holds for the key-addition step. Whereas the S-box layer preserves the rank-one structure of approximations, the linear layer tends to increase the rank. In fact, it is reasonable to interpret the rank as a measure of diffusion between the state cells. The correlation matrix of any function \(\mathsf {F}: (\mathbb {F}_2^n)^m \rightarrow (\mathbb {F}_2^n)^m\) is itself a tensor and can be decomposed as \(C^{\mathsf {F}} = \smash {\sum _{i = 1}^r \lambda _i\,{\textstyle \bigotimes _{j = 1}^m C_{i, j}}}\), where \(C_{i, j}\) are \(2^n \times 2^n\) matrices and r is the tensor rank of \(C^{\mathsf {F}}\).

Lemma 6.1

Let \(\mathsf {F}: (\mathbb {F}_2^n)^m \rightarrow (\mathbb {F}_2^n)^m\) be a function such that \(\mathsf {F}= (\mathsf G, \mathsf G, \ldots , \mathsf G)\) for some \(\mathsf G : \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n\). If \(C^{\mathsf G} = \sum _{i = 1}^r \lambda _i\,\bigotimes _{j = 1}^n C_{i, j}\), then

$$\begin{aligned} C^{\mathsf F} = \sum _{i_1, \ldots , i_m \in [r]} (\textstyle \prod _{k = 1}^m \lambda _{i_k}) \bigotimes _{k = 1}^m \bigotimes _{j = 1}^n C_{i_k, j}, \end{aligned}$$

where \([r] = \{1, \ldots , r\}\). In particular, the tensor rank of \(C^{\mathsf F}\) is at most \(r^m\).

Proof

By Theorem 3.1 (2), it holds that \(C^{\mathsf {F}} = (C^{\mathsf G})^{\otimes m}\). The result follows by expanding this expression using the multilinearity of tensor products.    \(\square \)

Lemma 6.1 can be used to obtain a decomposition of the correlation matrix of the MixColumn map of Midori-64 and MANTIS into \(2^8\) rank-one terms. This map \(\mathsf M : (\mathbb {F}_2^4)^4 \rightarrow (\mathbb {F}_2^4)^4\) can be represented by the following matrix over \(\mathbb {F}_{2^4}\):

$$\begin{aligned} \begin{pmatrix} 0 &{} 1 &{} 1 &{} 1\\ 1 &{} 0 &{} 1 &{} 1\\ 1 &{} 1 &{} 0 &{} 1\\ 1 &{} 1 &{} 1 &{} 0 \end{pmatrix}. \end{aligned}$$

Up to a reordering of the input bits, one can think of \(\mathsf M\) as a map \(\widetilde{\mathsf M} = (\mathsf L, \mathsf L, \mathsf L, \mathsf L)\) where \(\mathsf L\) corresponds to the same matrix as above, but over \(\mathbb {F}_2\). Specifically, \(\widetilde{\mathsf M} = \sigma \mathsf M \sigma \) where \(\sigma : (\mathbb {F}_2^4)^4 \rightarrow (\mathbb {F}_2^4)^4\) is the bit permutation defined by \(\sigma _i(x_1, \ldots , x_4) = (x_{1, i}, \ldots , x_{4, i})\). Since \(C^{\mathsf L}\) is a \(16 \times 16\) matrix, one can check that

$$\begin{aligned} C^{\mathsf L} = \frac{1}{2}\left[ \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) ^{\otimes 4} + \left( \begin{array}{cc} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) ^{\otimes 4} + \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) ^{\otimes 4} - \left( \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array}\right) ^{\otimes 4} \right] . \end{aligned}$$

To see this, it is helpful to observe that \(C^{\mathsf L}\) is symmetric as a tensor. Since \(\widetilde{\mathsf M} = \sigma \mathsf M \sigma \) where \(\sigma \) is a linear map corresponding to a reordering of bits, it follows from Theorem 3.2 (2) and Lemma 6.1 that

$$\begin{aligned} C^{\mathsf M} = 2^{-4} \sum _{i_1, i_2, i_3, i_4 \in [4]^4} (\textstyle \prod _{j = 1}^4 \lambda _{i_j}) \,\big [{\textstyle \bigotimes _{j = 1}^4 C_{i_j}}\big ]^{\otimes 4}. \end{aligned}$$

with \(\lambda _{1} = \lambda _2 = \lambda _3 = 1\) and \(\lambda _4 = -1\) and

$$\begin{aligned} C_1 = \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) ,\quad C_2 = \left( \begin{array}{cc} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) ,\quad C_3 = \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) ,\quad C_4 = \left( \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array}\right) . \end{aligned}$$

Hence, the tensor rank of \(C^{\mathsf M}\) is at most \(2^8\). This is significantly lower than the worst-case of \(2^{16}\). Practically speaking, this enables a detailed analysis of rank-one approximations for Midori-64 in Sect. 7.3. In fact, one can show that this decomposition is minimal i.e. the rank of \(C^{\mathsf {M}}\) is equal to \(2^8\).

Lemma 6.2

(Lemma 3.5 in [19]). Let \(V_1, \ldots , V_d\) be finite-dimensional vector spaces over \(\mathbb {C}\). If \(x_{i, 1}, \ldots , x_{i, r} \in V_i\) are linearly independent for \(i = 1, \ldots , d\), then the vector \(\sum _{i = 1}^r x_{1, i} \otimes x_{2, i} \otimes \cdots \otimes x_{d, i}\) in \(\bigotimes _{i = 1}^r V_i\) has tensor rank r.

To see why Lemma 6.2 implies the result, let \(V_i\) be the vector space of \(16 \times 16\) matrices over \(\mathbb {C}\). This is an inner product space under the Frobenius inner product \(\mathrm {Tr}\left( A^* B\right) \) between matrices A and B. It is easy to check that the matrices \(C_i\) defined above are mutually orthogonal with respect to this inner product. This implies the mutual orthogonality of the matrices \(\smash {\big [\bigotimes _{j = 1}^{4} C_{i_j}\big ]^{\otimes 4}}\). The result follows by the linear independence of orthogonal vectors.

6.2 Automated Analysis of Rank-One Trails

Let \(\mathsf {F}= \mathsf {F}_r \circ \cdots \circ \mathsf {F}_1\) be a permutation on \((\mathbb {F}_2^n)^m\). By Theorem 5.1, an optimal rank-one trail for \(\mathsf {F}\) can be found by solving the following optimization problem:

$$\begin{aligned}&\text {maximize}~\sum _{i = 1}^r \log _2 \big |\big \langle {\textstyle \bigotimes _{j = 1}^m} v_{i + 1, j}, \,C^{\mathsf {F}_i} \textstyle \bigotimes _{j = 1}^m v_{i, j} \big \rangle \big |\\&\text {subject to}~\Vert v_{i, j}\Vert _2 = 1~\text {for}~i = 1, \ldots , r + 1,\; j = 1, \ldots , m\\&\qquad \qquad \,\,\, ~v_{i, j}(\mathbbm {1}) = 0~\text {for}~(i, j) \in A~\text {and}~v_{i, j} = \delta _{\mathbbm {1}}~\text {otherwise}, \end{aligned}$$

where the last condition ensures that the vectors \(v_{i, j}\) are active and balanced, i.e. orthogonal to \(\delta _{\mathbbm {1}}\), on a predetermined pattern of cells A. Clearly, at least one cell must be active to obtain a nontrivial result. In practice, it is better to take the logarithm of the objective function in order to avoid vanishing gradients.

Restricting to real-valued \(v_{i, j}\), the above is an optimization problem over the product of several copies of the \((2^n-1)\)-dimensional unit sphere. This domain is a Riemannian manifold, and common iterative numerical optimization techniques such as steepest descent and conjugate gradient have been generalized to this setting [34]. This is the basic approach behind the automated method proposed in this section. The source code of the tool is provided as supplementary material and relies on the Pymanopt library [40].

The power of this method lies in the fact that it enables iterative convergence to an optimal trail. This is made possible because the general nature of rank-one approximations results in a relaxed, continuous optimization problem rather than a discrete one. Although it is sometimes necessary to ensure that the outermost vectors of the trail correspond to (for example) a Boolean function, there is no reason to impose the same condition on vectors which are internal to the trail.

Example 6.1

The tool can be applied to find rank-one invariants of arbitrary functions with a limited number of input and output bits, which is a difficult problem in general [5]. For example, Fig. 2 shows the iterative convergence towards an invariant of the Midori-64 linear layer. This process takes about a second on an ordinary computer. By optimizing over the ellipsoid of unit-norm vectors in the eigenspaces \(E_\lambda (C^{\mathsf S})\) of the correlation matrix \(C^{\mathsf S}\), joint invariants for the linear and S-box layer can be found. Instructions to reproduce this example are included as supplementary material in the supplementary material of the extended version [7]. The tool also implements a barrier method to find all rank-one invariants for a given linear layer.    \(\vartriangleright \)

Fig. 2.
figure 2

Correlation \(c_j\) at each step of the optimization process for finding invariants of the form \(v_1 \otimes v_2 \otimes v_3 \otimes v_4\) with \(v_i(\mathbbm {1}) = 0\) for the Midori-64 linear layer.

A number of challenges remain for larger problems. These include addressing key-dependence, which is simplified due to the use of the Fourier transform, and convergence issues. For completeness, the supplementary material of the extended version [7] summarizes the (somewhat technical) steps that were taken to address these challenges.

7 Open Problem of Beierle et al.

This section explains observations of Beierle et al. [4] regarding a nonlinear approximation for two rounds of Midori-64. More broadly, the results in this section lead to a deeper understanding of many nonlinear approximations of the Midori-64 round function.

7.1 Problem Statement

Beierle et al. [4, Section 4.4] consider a nonlinear approximation over two rounds of Midori-64, restricted to a single column of the state. Denote this function by \(\mathsf {F}{}\). Its correlation matrix is equal to \(C^{\mathsf F} = C^{\mathsf M} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf K_2} C^{\mathsf M} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf K_1}\), where \(\mathsf K_1\) and \(\mathsf K_2\) are key-addition maps, \(\mathsf S\) is the S-box and \(\mathsf M\) the matrix defined in Sect. 6.1. Recall from Sect. 1 that Beierle et al. [4] describe nonlinear approximations using linear properties of a nonlinearly transformed representation of the cipher. The details of their approach will not be discussed here; the geometric framework developed in Sects. 4 and 5 will be used instead. The nonlinear functions considered by Beierle et al. are of the form \(\sum _{i = 1}^4 f_i(x)\) with \(f_i : \mathbb {F}_2^4 \rightarrow \mathbb {F}_2\) and consequently, as discussed in Sect. 6 on page 23, correspond to approximations spanned by rank-one vectors. Specifically, the pair of nonlinear functions considered in [4, Section 4.4] corresponds to a one-dimensional approximation \((\mathrm {span}\{u \otimes v^{\otimes 3}\}, \mathrm {span}\{u \otimes v^{\otimes 3}\})\) for \(\mathsf {F}{}\) with

$$\begin{aligned} u&= 1/4\cdot (0, 1, 0, -1, 0, 1, 0, -1, 0, -1, 0, 1, 0, -1, 0, -3)^\top \\ v&= 1/2\cdot (0, 0, 0, -1, 0, 0, 0, -1, 0, \,\,\,\; 0, 0, 1, 0, \,\,\,\; 0, 0, -1)^\top . \end{aligned}$$

The coordinates above are given for the character basis \(\delta _{\chi _w}\) with lexicographic ordering of w. Note that v is an eigenvector of \(C^{\mathsf S}\). Beierle et al. estimate the correlation of the above approximation by (from the perspective of this paper) the following one-round trail, which has absolute correlation at least 9/32:

$$\begin{aligned} u \otimes v \otimes v \otimes v \xrightarrow [\pm 1\,\mathrm {or}\,\pm 1/2]{[C^{\mathsf S}]^{\otimes 4} C^{\mathsf K_i}} u \otimes v \otimes v \otimes v \xrightarrow [9/16] {C^{\mathsf M}} u \otimes v \otimes v \otimes v. \end{aligned}$$
(1)

The computation of the correlation over \(C^{\mathsf M}\) was done by a direct evaluation of the inner product \(\langle u \otimes v^{\otimes 3}, C^{\mathsf M}\,u \otimes v^{\otimes 3} \rangle \). This trail was believed to hold whenever \(K_i \in \mathbb {F}_2^4 \times \mathcal {K}^3\) for \(i = 1, 2\), with \(\mathcal {K} = \{(0, 0, x, y)~|~x,y\in \mathbb {F}_2\}\). The weak key set \(\mathcal K\) ensures the invariance of the tensor product factor v under key addition. Based on the above, one estimates an absolute correlation of at least \((9 / 32)^2\) over \(\mathsf F\). However, Beierle et al. experimentally observe that this estimate is not accurate:

  1. (i)

    When \(K_2 \in (\mathbb {F}_2^4 \setminus \mathcal {K}) \times \mathcal K^3\), the correlation is found to equal zero.

  2. (ii)

    For other keys, the correlation takes on various values, but is always significantly larger than the estimated minimum of 81/1024. Specifically, for \(K_1, K_2 \in \mathcal K^4\), the correlation ranges from 35/64 to \(40/64 = 5/8\). For other keys, it lies between 39/256 and 65/256.

In their conclusion, the authors remark that understanding this phenomenon is “a major open problem”. Sections 7.2 and 7.3 completely explains the above observations using the methods developed in Sects. 4 and 5.

7.2 Optimal Rank-One Trail

As shown in Sect. 6.1, the effect of the linear layer is nontrivial and this makes finding an optimal rank-one trail difficult. Hence, a simple explanation for observation (ii) could be that the trail (1) proposed by Beierle et al. is not a good guess. Using the tool from Sect. 6.2, it is easy to find the optimal rank-one trail – ignoring the effect of key-addition for now. Running the tool (the configuration is given in the extended version [7]) yields the following trail with absolute correlation at most 9/16:

$$\begin{aligned} u \otimes v^{\otimes 3} \xrightarrow [\pm 3/4\,\mathrm {or}\,\pm 1/4]{[C^{\mathsf S}]^{\otimes 4} C^{\mathsf {K}_1}} v^{\otimes 4} \xrightarrow [1]{C^\mathsf {M}} v^{\otimes 4} \xrightarrow [\pm 1]{[C^\mathsf {S}]^{\otimes 4} C^{\mathsf {K}_2}} v^{\otimes 4} \xrightarrow [3/4]{C^\mathsf {M}} u \otimes v^{\otimes 3}. \end{aligned}$$

A short calculation shows that the third step requires \(K_2 \in \mathcal K\), otherwise the trail has correlation zero. Furthermore, the correlation 3/4 in the first step occurs if and only if \(K_1 \in \mathcal {K}^4\). In hindsight, one might have guessed the above trail without detailed analysis: the choice of \(v^{\otimes 4}\) as an intermediate step is natural, since \(v^{\otimes 4}\) is an invariant for the round function. This is an instance of the general phenomenon discussed in the last paragraph of Sect. 5.3.

7.3 Theoretical Analysis of the Problem

The correlations predicted by the rank-one trail obtained in Sect. 7.2 are within 10 to \(30\%\) of the observed correlations reported by Beierle et al. [4, Tables 1–4]. However, the trail does not yet explain the zero-correlation approximation. In this section, the results from Sect. 6.1 will be used to find a minimal and complete set of rank-one trails for the approximation.

The propagation of \(u \otimes v^{\otimes 3}\) under the Midori-64 round function will first be analyzed. For the zero-correlation case, the miss-in-the-middle strategy from Sect. 4.2 will be used. It will then be shown that a relatively short formula for the exact key-dependent correlation of the approximation can be computed.

Let \(K_1 = (k_1, k_2, \ldots , k_{16}) \in \mathbb {F}_2^{16}\) and \(K_2 = (k_1', k_2', \ldots , k_{16}') \in \mathbb {F}_2^{16}\). The results in Sect. 6.1 can be used to compute the image of \(u \otimes v^{\otimes 3}\) under one round:

$$\begin{aligned} C^{\mathsf M} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf K_1}\,u \otimes v^{\otimes 3} = -\nu \,C^{\mathsf M} (C^{\mathsf S} C^{k_1\Vert \cdots \Vert k_4} u) \otimes v^{\otimes 3} = \nu \,v \otimes \big ({\textstyle \sum }_{i = 1}^{16} c_i\,v_i^{\otimes 3}\big ), \end{aligned}$$

where \(\nu = -\prod _{i = 2}^4 (-1)^{k_{4i - 1} + k_{4i}}\). The coefficients \(c_i\) and the vectors \(v_i\) are listed in the supplementary material of the extended version [7]. Note that, because \(C^{\mathsf M}\) has rank \(2^8\), one initially obtains \(2^8\) terms. However, this can be reduced to 16 by grouping terms appropriately. This can be done manually by exploiting the structure of the rank-decomposition, but Sage code to automate this is also provided as supplementary material. Since the vectors \(v_i\) are mutually orthogonal and this is preserved when multiplied with (the same) orthogonal matrices, Lemma 6.2 implies that the above decomposition is minimal. Interestingly, not all of the vectors \(v_i\) correspond to Boolean functions or probability distributions.

A similar computation can be performed for the inverse of the second round. Specifically, recalling that \(\mathsf S\) and \(\mathsf M\) are involutions,

$$\begin{aligned} C^{\mathsf K_2} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf M} \,u \otimes v^{\otimes 3} = \nu '\, C^{k_{1}'\Vert \cdots \Vert k_{4}'} v \otimes \big ({\textstyle \sum }_{i = 1}^{8} c_i'\,{\textstyle \bigotimes }_{j = 1}^3 (C^{k_{4j}'\Vert \cdots \Vert k_{4j + 4}'}v_i')\big ). \end{aligned}$$

The coefficients \(c_i'\) and the vectors \(v_i'\) are listed in the supplementary material of the extended version [7] and \(\nu ' = (-1)^{k_{3}' + k_{4}' + 1}\). The minimality of the above decomposition can again be established using Lemma 6.2.

Zero-Correlation Approximation. Let \(U = \mathrm {span}\{v\} \otimes (\mathbb {C}\widehat{\mathbb {F}}_2^4)^{\otimes 3}\) and \(V = \mathrm {span}\{C^{k_{1}'\Vert \cdots \Vert k_{4}'}\,v\} \otimes (\mathbb {C}\widehat{\mathbb {F}}_2^4)^{\otimes 3}\). The decompositions above clearly imply the following inclusions:

$$\begin{aligned} C^{\mathsf M} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf K_1}\,u \otimes v^{\otimes 3} \in U \quad \text {and}\quad C^{\mathsf K_2} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf M} \,u \otimes v^{\otimes 3} \in V. \end{aligned}$$

Consequently, if \(U \perp V\), the general miss-in-the-middle principle discussed in Sect. 4.2 implies that the approximation has correlation zero. This happens whenever \(\langle v, C^{k_{1}'\Vert \cdots \Vert k_{4}'}\,v \rangle = 0\). That is,

$$\begin{aligned}&\big \langle (0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 1)^\top ,\\&\;\; (0, 0, 0, 1, 0, 0, 0, (-1)^{k_2'}, 0, 0, 0, (-1)^{1 + k_1'}, 0, 0, 0, (-1)^{k_1' + k_2'})^\top \big \rangle \\&= 1 + (-1)^{k_1'} + (-1)^{k_2'} + (-1)^{k_1' + k_2'}, \end{aligned}$$

which equals zero unless \(k_1' = k_2' = 0\). This explains the condition \(K_2 \in (\mathbb {F}_2^4 \setminus \mathcal {K}) \times \mathcal {K}^3\) observed by Beierle et al. [4].

Refining the Correlation Estimate. Now assume \(K_2 \in \mathcal {K}^4\), so that the correlation is nonzero. A closer inspection of the vectors \(v_i\) and \(v_j'\) reveals that \(\smash {|\langle v_i, C^{k_{4j}'\Vert \cdots \Vert k_{4j + 4}'} v_j'\rangle |} \le 1/2\) unless \(i = 3\) and \(j = 1\). That is, when the inner product \(\langle C^{\mathsf K_2} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf M} \,u \otimes v^{\otimes 3}, C^{\mathsf K_2} [C^{\mathsf S}]^{\otimes 4} C^{\mathsf M} \,u \otimes v^{\otimes 3}\rangle \) is expanded using the decomposition above, the term corresponding to \(c_3 c_1'\) has a weight of one whereas all other terms have weight at most \(2^{-3}\). Since \(v_3 = v_1' = v\), this term corresponds to the trail from Sect. 7.2.

The correlation estimate can be improved by including additional trails. In principle, all 128 terms in the expanded inner product between the forward and backward expressions can be computed. The supplementary material contains a Sage script that computes a short formula for the exact key-dependent correlation of the approximation, which is also listed in the extended version [7].

In fact, due to the low rank of \(C^{\mathsf M}\), the same technique can be used to analyze any rank-one approximation of \(\mathsf {F}{}\). This includes all linear approximations. In general, the minimal number of rank-one trails can be higher or lower than \(16 \times 8\) (depending on the choice of the input and output property).