1 Introduction

The Walsh functions form an orthonormal basis (ONB), for the Hilbert space \(L^{2}[0,1]\), that can be interpreted roughly as the discrete analog of classic sines and cosines. For some applications, the Walsh functions have several advantages: for example they take only the values \(\pm 1\) on sub-intervals defined by dyadic fractions, thus making the computation of coefficients much easier. The Walsh functions are connected to probability, e.g., the Walsh expansion can be seen as conditional expectation, and the partial sums form a Doob martingale. The Walsh functions have found a wide range of applications: for example in modern communications systems (through the so-called Hadamard matrices, to recover information in the presence of noise and interference), and signal processing (reconstruction of signals by means of dyadic sampling theorems), see e.g. [1, 6, 13, 25] to mention a few.

There are certain features that make this ONB more desirable to work with than for example the Fourier system: the Walsh series associated to \(f\in L^{1}[0,1]\) converge pointwise a.e. to \(f\). This is also true for \(f\) with bounded variation at a continuity point of \(f\), e.g. see [19, 20, 24].

Various generalizations have been given, based on changing the \(L^{2}\) space [23], or changing the Rademacher functions [5]. For example, for the dyadic group \(G\) the Walsh functions can be viewed as characters on \(G\), or more generally starting with [23], as characters of a zero-dimensional, separable group. The generalized Walsh system based on \(N\)-adic numbers and exponentials functions in [5] has been used to construct algorithms for polynomial lattices (a particular kind of digital net which in turn can be used in sampling methods for multivariate integration), see [8, 9] and references therein.

In [11] a criteria is given to obtain ONBs from Cuntz algebra representations. These bases were obtained through a general principle which incorporates wavelets and an assortment of various ONBs, from the classic Fourier and Walsh bases, to bases on fractals (Cantor sets). This principle is based on the Cuntz relations and roughly asserts that a suitable representation of these relations gives rise to a ONB. By tinkering with the isometries satisfying the relations one obtains the above mentioned variety of ONBs. One of the examples recovers the classic Walsh ONB, and another generalizes it (the ONB from [5] is also recovered by starting with a \(N\times N\) Hadamard matrix in the construction below). It is this generalized ONB that we continue to study in this paper. We mention that properties such as convergence, continuity, periodicity etc. are discussed in [10] and [14].

In Sect. 2 we show that the ONB property of the generalized Walsh system is equivalent with the irreducibility of the Cuntz algebra representation from which it is built. In Sect. 3 we analyze the restriction of the generalized Walsh bases to finite-dimensional spaces. We show that the signal’s coefficients with respect to these bases can be easily read off of a tensor matrix. We also provide a change of generalized Walsh bases formula. From these finite-dimensional considerations, in Sect. 4 we provide an algorithm for a fast generalized Walsh transform, generalizing ideas from [18]. The fast transform was implemented in Maple and used in all our examples from Sect. 6 to compare the performances of Walsh and generalized Walsh transforms from a statistical point of view. To this end we use a simple compression scheme based on the variance criterion. In Sect. 5, inspired by a discrete version of the uncertainty principle [12] we show that the Walsh and generalized Walsh transforms satisfy such a principle albeit a discrepancy which occurs when the matrix giving rise to the generalized Walsh basis is not Hadamard. We further develop a concept of uncertainty with respect to unitary matrices and prove a few properties.

We recall next the construction of the generalized Walsh basis and the Cuntz algebra representation it arises from.

Let \(A=(a_{ij})_{i,j=0,\dots ,N-1}\) be an \(N\times N\) unitary matrix with constant first row, i.e., \(a_{0j}=\frac{1}{\sqrt{N}}\) for all \(j=0,\dots ,N-1\).

Define the functions (almost everywhere with respect to the Lebesgue measure):

$$ m_{i}(x)=\sqrt{N}\sum_{j=0}^{N-1}a_{ij} \chi _{[j/N,(j+1)/N)}(x) \quad \bigl(x\in [0,1],i=0,\dots ,N-1\bigr), $$
(1.1)

where \(\chi _{A}\) is the characteristic function of the set \(A\).

Note that \(m_{0}\equiv 1\).

On \(L^{2}[0,1]\) define the operators

$$ S_{i}f(x)=m_{i}(x)f(Nx\operatorname{mod}1)\quad \bigl(x\in [0,1],f\in L ^{2}[0,1], i=0,\dots ,N-1\bigr). $$
(1.2)

Note that \(S_{0}1=1\).

Theorem 1.1

[11]

The operators\((S_{i})_{i=0,1,\dots ,N-1}\)form a representation of the Cuntz algebra\(\mathcal{O}_{N}\)on\(L^{2}[0,1]\), i.e.,

$$ S_{i}^{*}S_{j}=\delta _{i,j}I_{L^{2}[0,1]} \quad \bigl(i,j\in \{0,1,\dots ,N-1 \}\bigr), \quad \quad \sum _{i=0}^{N-1}S_{i}S_{i}^{*}=I_{L^{2}[0,1]}. $$
(1.3)

Theorem 1.2

[11]

The family

$$ \mathcal{W}^{A}:= \bigl\{ S_{w_{1}}\dots S_{w_{p}}1: p \in \mathbb{N}, w_{1},\dots , w_{n}\in \{0,\dots ,N-1\} \bigr\} $$
(1.4)

is an orthonormal basis for\(L^{2}[0,1]\) (discarding of course the repetitions generated by the fact that\(S_{0}1=1\)). We call it the generalized Walsh basis that corresponds to the matrix\(A\).

2 The Representation of the Cuntz Algebra \(\mathcal{O}_{N}\)

In this section we study the representation of the Cuntz algebra defined by the isometries in (1.2). We show that this is a permutative representation as the ones studied in [3, 7, 1517]. We show that the ONB property is present precisely because the associated representation of the Cuntz algebra \(\mathcal{O}_{N}\) on \(L^{2}[0,1]\) is irreducible. We first find an irreducible representation, then show it is equivalent with the one in (1.2).

Let \(\varOmega \) be the set of finite words with digits in \(\{0,1,\dots ,N-1\}\) and that do not end in 0, including the empty word \(\emptyset \). Define the canonical vectors in \(l^{2}(\varOmega )\),

$$ e_{\omega }(\gamma )=\left \{ \textstyle\begin{array}{l@{\quad }l} 1,&\text{if }\gamma =\omega \\ 0,&\text{if }\gamma \neq \omega \end{array}\displaystyle \right . \quad (\omega ,\gamma \in \varOmega ). $$

Define the linear operators \(s_{i}\), \(i=0,1,\dots ,N-1\) on \(l^{2}( \varOmega )\) by

$$ s_{0}e_{\omega }=\left \{ \textstyle\begin{array}{l@{\quad}l} e_{\emptyset },&\text{if }\omega =\emptyset \\ e_{0\omega },&\text{otherwise}, \end{array}\displaystyle \right . \quad s_{i}e_{\omega }=e_{i\omega } \quad (\omega \in \varOmega ,i\neq 0) $$
(2.1)

(\(i\omega \) represents here concatenation of the digit \(i\) to the word \(\omega \)).

Theorem 2.1

The operators\((s_{i})_{i=0}^{N-1}\)form an irreducible representation of the Cuntz algebra\(\mathcal{O}_{N}\)on\(l^{2}(\varOmega )\).

If\(A\)is a unitary matrix with constant first row\(1/\sqrt{N}\)and\((S_{i})_{i=0}^{N-1}\)is the associated representation of the Cuntz algebra as in (1.2), then the linear operator\(W:L^{2}[0,1] \rightarrow l^{2}(\varOmega )\), defined by

$$ W(S_{\omega }1)=e_{\omega } \quad (\omega \in \varOmega ), $$
(2.2)

is unitary and intertwines the two representations.

Proof

We prove that the operators \((s_{i})_{i=0,\dots ,N-1}\) satisfy the Cuntz relations. It is easy to see that the operators are norm preserving, hence isometries, and that their ranges are orthogonal. Thus \(s_{i}^{*}s_{j}=\delta _{i,j}I\). Also we can see that \(s_{0}^{*}e_{ \emptyset }=s_{0}^{*}s_{0}e_{\emptyset }=e_{\emptyset }\), if \(\omega =\omega _{1}\omega _{2}\dots \omega _{n}\neq \emptyset \) then

$$ s_{0}^{*}e_{\omega }=s_{0}^{*}s_{\omega _{1}}e_{\omega _{2}\dots \omega _{n}}= \delta _{0,\omega _{1}}e_{\omega _{2}\dots \omega _{n}}, $$

and, for \(i\neq 0\), \(s_{i}^{*}e_{\emptyset }=0\), and if \(\omega = \omega _{1}\omega _{2}\dots \omega _{n}\neq \emptyset \) then

$$ s_{i}^{*}e_{\omega }=\delta _{i,\omega _{1}}e_{\omega _{2}\dots \omega _{n}}. $$

With this, we can easily check that

$$ \sum_{i=0}^{N-1}s_{i}s_{i}^{*}e_{\omega }=e_{\omega } \quad (\omega \in \varOmega ). $$

To prove that the representation is irreducible, we use [4, Theorem 5.1]. The subspace \(\mathcal{K}\) spanned by the vector \(e_{\emptyset }\) is invariant for the operators \(S_{i}^{*}\) and cyclic for the representation. Indeed \(s_{i}^{*}e_{\emptyset }=\delta _{0,i}e _{\emptyset }\), \(i=0,1,\dots ,N-1\). Also \(s_{\omega }\mathcal{K}\) contains \(e_{\omega }\) so \(\mathcal{K}\) is cyclic for the representation.

Define the operator \(V_{i}^{*}\) on \(\mathcal{K}\) by \(V_{i}^{*}\xi =S _{i}^{*}\xi \), \(\xi \in \mathcal{K}, i=0,1,\dots ,N-1\). By [4, Theorem 5.1], the commutant of the representation is in linear one-to-one correspondence with the fixed points of the map on linear operators on \(\mathcal{K}\): \(A\mapsto \sum_{i=0}^{N-1}V_{i}AV_{i}^{*}\). Since \(\mathcal{K}\) is one-dimensional, the space of the fixed points of this map is one dimensional, and therefore the commutant is trivial and the representation is irreducible.

Since \(\{S_{\omega }1 :\omega \in \varOmega \}\) is an orthonormal basis, the operator \(W\) is unitary and a simple check shows that \(WS_{i}(S _{\omega '}1)=s_{i} W(S_{\omega '})1\), and \(WS_{i}^{*}(S_{\omega }1)=s _{i}^{*} W(S_{\omega }1)\) for all \(\omega \in \varOmega \), \(i=0,1,\dots ,N-1\), which shows that \(W\) intertwines the two representations. □

The construction of the generalized Walsh basis and the previous theorem suggest to study conditions under which vector families of the type \(\{S_{\omega }v_{0}\}\) become ONB. Theorem 2.3 below provides a partial answer. We will need the following lemma (which is more general than Lemma 5 in [21] whose proof however is essentially the same).

Lemma 2.2

Let\(S_{i}:H\to H\), \(i=0,\ldots,N-1\)be a representation of the Cuntz algebra\(\mathcal{O}_{N}\)on the separable Hilbert space\(H\), and\(v_{0}\in H\)such that\(\|v_{0}\|=1\)and\(S_{0}(v_{0})=v_{0}\). Then the family

$$ \bigl\{ S_{\omega }v_{0}: \omega \textit{ is a finite word over }\{0,1,\ldots, N-1\} \bigr\} $$

(with repeats removed) is orthonormal.

Proof

Notice \(\langle S_{\omega } v_{0} , S_{\gamma } v_{0} \rangle =0\) whenever the first digit in \(\omega \) differs from the first digit in \(\gamma \). The same is true if \(\omega _{k} \neq \gamma _{k}\) for some \(k\). The remaining case is when \(\omega \) is a prefix in \(\gamma \) or vice versa. Assuming the former, the inner product reduces to one of the form \(\langle v_{0} , S_{\gamma '} v_{0} \rangle \) or its conjugate. Because \(S_{0}v_{0}=v_{0}\), the last inner product is again zero unless

$$ \gamma '= \underbrace{0\dots 0}_{n\geq 0\text{ times}}, $$

and in this case \(S_{\gamma '}v_{0}=v_{0}\), but the repetitions are removed. □

Theorem 2.3

Let\(\rho:=(S_{i}:H\to H)_{{i=0,\ldots,N-1}}\)be a representation of the Cuntz algebra\(\mathcal{O}_{N}\)on the separable Hilbert space\(H\), and\(v_{0}\in H\)such that\(\|v_{0}\|=1\)and\(S_{0}v_{0}=v_{0}\). The following statements are equivalent:

  1. (i)

    The family\(\{S_{\omega }v_{0}:\omega\ \textit{word over}\ \{0,1,\ldots, N-1\}\}\) (with repeats removed) is an ONB.

  2. (ii)

    The representation\(\rho \)is irreducible.

Proof

The implication (i) ⇒ (ii) follows by repeating the argument in the proof of Theorem 2.1, i.e., \(\rho \) is equivalent to an irreducible representation. For the converse, let \(K\) be the closed space spanned by all the vectors \(S_{\omega }v_{0}\). Clearly this space is invariant under all the isometries \(S_{i}\) and their adjoints \(S_{i}^{*}\) (due to the Cuntz relations). Therefore this subspace is invariant under the representation \(\rho \), so it must be equal to the entire space \(H\), because the representation is irreducible. □

3 Computations of Generalized Walsh Bases and of Coefficients

We can index the generalized Walsh basis by nonnegative integers \(\mathbb{N}_{0}\) as follows: for \(n\in \mathbb{N}_{0}\), write the base-\(N\) representation

$$ n=n_{1}+Nn_{2}+\cdots +N^{p}n_{p+1}+\cdots, \quad \text{where }n_{k}=0\text{ for }k\text{ large enough.} $$
(3.1)

Then

$$ W_{n}^{A}=S_{n_{1}}\dots S_{n_{p}}1, \quad \text{if }n_{k}=0\text{ for all }k>p\text{.} $$
(3.2)

Note that it does not matter if we pick a larger \(p\) in (3.2), because \(S_{0}1=1\).

We compute the generalized Walsh functions more explicitly.

Proposition 3.1

Let\(x\)be in\([0,1]\). We write the base\(N\)representation of\(x\)

$$ x=\frac{x_{1}}{N}+\frac{x_{2}}{N^{2}}+\cdots \quad \bigl(x_{1},x_{2}, \dots \in \{0,\dots ,N-1\}\bigr) $$
(3.3)

(we can ignore the cases when there are multiple representations of the same point, since we are working in\(L^{2}[0,1]\)and the set of such points has Lebesgue measure zero). For\(n\in \mathbb{N}_{0}\), consider the base\(N\)representation as in (3.1). Then

$$ W_{n}^{A}(x)=\sqrt{N^{p}}a_{n_{1}x_{1}}a_{n_{2}x_{2}} \dots a_{n_{p}x _{p}}, \quad \textit{if }n_{k}=0\textit{ for }k>p $$
(3.4)

(since\(n_{k}=0\)for\(k\)large, \(\sqrt{N}a_{n_{k}x_{k}}=1\)for\(k\)large, so we can use a larger\(p\)in (3.4) if needed). In other words,

$$ W_{n}^{A}(x)=\sqrt{N^{p}}\sum _{j_{1},\dots ,j_{p}=0}^{N-1}a_{n_{1} j _{1}}a_{n_{2}j_{2}}\dots a_{n_{p}j_{p}} \chi _{ [ \frac{j_{1}}{N}+\frac{j_{2}}{N^{2}}+\cdots +\frac{j_{p}}{N ^{p}},\frac{j_{1}}{N}+\frac{j_{2}}{N^{2}}+\cdots +\frac{j_{p}}{N^{p}}+\frac{1}{N ^{p}} ] }(x), $$
(3.5)

and here\(p\)has the same meaning as in (3.4).

Proof

By (3.2), with \(n\) as in (3.2), we have

$$ W_{n}^{A}=S_{n_{1}}\dots S_{n_{p}}1,\quad \text{if }n_{k}=0\text{ for }k>p. $$

Then

$$ W_{n}^{A}(x)=m_{n_{1}}(x)m_{n_{2}}(Nx \operatorname{mod}1)\quad m_{n _{p}}\bigl(N^{p}x \operatorname{mod}1\bigr). $$

If \(x\) is as in (3.3), then \(x\in [{x_{1}}/N,(x_{1}+1)/N)\) so \(m_{n_{1}}(x)=\sqrt{N}a_{n_{1}x_{1}}\). Then \(Nx\operatorname{mod}1= x_{2}/N+x_{3}/N^{2}+\cdots \) so \(m_{n_{2}}(Nx\operatorname{mod}1)= \sqrt{N}a_{n_{2}x_{2}}\); by induction, \(N^{p}x\operatorname{mod}1=x _{p}/N+x_{p+1}/N^{2}+\cdots \) and \(m_{n_{p}}(N^{p}x\operatorname{mod}1)= \sqrt{N} a_{n_{p}x_{p}}\).

It follows that

$$\begin{aligned} W_{n}^{A}(x)=\sqrt{N^{p}}a_{n_{1}x_{1}}a_{n_{2}x_{2}} \dots a_{n_{p}x _{p}}. \end{aligned}$$

 □

Definition 3.2

Let \(A\) be an \(N\times N\) matrix and \(B\) an \(M\times M\) matrix. Then the tensor product of the two matrices \(A\otimes B\) is an \(NM\times NM\) matrix with entries

$$ (A\otimes B)_{i_{1}+Ni_{2},j_{1}+N j_{2}}=A_{i_{1}j_{1}}B_{i_{2}j_{2}} \quad (i_{1},j_{1}=0,\dots ,N-1,\ i_{2},j_{2}=0, \dots ,M-1). $$

Thus the matrix \(A\otimes B\) has the block form

$$ A\otimes B= \begin{pmatrix} Ab_{0,0}&Ab_{0,1}& \ldots &Ab_{0,M-1} \\ Ab_{1,0}&Ab_{1,1}& \ldots &Ab_{1,M-1} \\ \vdots &\vdots &\ddots &\vdots \\ Ab_{M-1,0}&Ab_{M-1,1}& \ldots &Ab_{M-1,M-1} \\ \end{pmatrix} . $$

Note that the tensor operation is associative, \((A\otimes B)\otimes C=A \otimes (B\otimes C)\).

The matrix \(A^{\otimes n}\) is obtained by induction: \(A^{\otimes 1}:=A\), \(A^{\otimes n}=A\otimes A^{\otimes (n-1)}\).

For an \(N\times M\) matrix \(B\), we denote by \(\overline{B}\) the conjugate matrix of \(B\), i.e., with entries \(\overline{B}_{ij}:=\overline{(B _{ij})}\).

Proposition 3.3

Let\(x\)be in\([0,1]\)and let\(x=\frac{x_{1}}{N}+\frac{x_{2}}{N^{2}}+ \cdots , (x_{1},x_{2},\dots \in \{0,\dots ,N-1\}) \)be its base\(N\)representation. Let\(n=n_{1}+Nn_{2}+\cdots +N^{p-1}n_{p}\)be the base\(N\)representation of\(n\). Then

$$ W_{n}^{A}(x)=\sqrt{N^{p}}A^{\otimes p}_{n_{1}+Nn_{2}+\cdots +N^{p-1}n _{p}, x_{1}+Nx_{2}+\cdots +N^{p-1}x_{p}}. $$
(3.6)

Proof

By (3.4), we just have to prove, by induction, that

$$ A^{\otimes p}_{n_{1}+Nn_{2}+\cdots +N^{p-1}n_{p}, x_{1}+Nx_{2}+\cdots +N ^{p-1}x_{p}}=a_{n_{1},x_{1}}a_{n_{2},x_{2}}\cdots a_{n_{p},x_{p}}, $$
(3.7)

for all \(p\geq \) and \(n_{1},\dots ,n_{p},x_{1},\dots ,x_{p}\in \{0, \dots ,N-1\}\). This is clearly true for \(p=1,2\). Assume it is true for \(p\), we have

$$\begin{aligned} A^{\otimes (p+1)}_{n_{1}+Nn_{2}+\cdots +N^{p}n_{p+1},x_{1}+Nx_{2}+ \cdots +N^{p}x_{p+1}} =&\bigl(A\otimes A^{\otimes p} \bigr)_{n_{1}+N(n_{2}+\cdots +N ^{p-1}n_{p+1}),x_{1}+N(x_{2}+\cdots +N^{p-1}x_{p+1})} \\ =&a_{n_{1},x_{1}}a_{n_{2},x_{2}}\dots a_{n_{p+1},x_{p+1}}. \end{aligned}$$

 □

Proposition 3.4

Let\(n=n_{1}+Nn_{2}+\cdots N^{p-1}n_{p}\)be the base\(N\)representation of\(n\). Suppose the function\(f\)in\(L^{2}[0,1]\)is piecewise constant\(f_{k_{1},\dots ,k_{p}}\)on each interval

$$ \biggl[ \frac{k_{1}}{N}+\frac{k_{2}}{N^{2}}+\cdots + \frac{k_{p}}{N^{p}}, \frac{k_{1}}{N}+\frac{k_{2}}{N^{2}}+\cdots +\frac{k _{p}}{N^{p}}+\frac{1}{N^{p}} \biggr] , $$

\(k_{1},\dots ,k_{p}\in \{0,\dots ,N-1\}\). Then

$$ \bigl\langle f , W_{n}^{A} \bigr\rangle =\sum _{k_{1},\dots ,k_{p}=0} ^{N-1}\sqrt{N^{p}}f_{k_{1},\dots ,k_{p}} \overline{A}^{\otimes p} _{n_{1}+Nn_{2}+\cdots +N^{p-1}n_{p},k_{1}+Nk_{2}+\cdots +N^{p-1}k_{p}}. $$
(3.8)

Proof

The result follows immediately from Propositions 3.1 and 3.3. □

Notations and Identifications

Fix a scale \(N\geq 2\) and some resolution level \(p\). We identify a non-negative integer \(n\leq N^{p}-1\) with its base \(N\) representation \(n=n_{1}+Nn_{2}+\cdots+ N^{p-1}n_{p}\), \(n_{1},\dots ,n_{p-1}\in \{0,\dots , N-1\}\), \(n\leftrightarrow n_{1}n _{2}\dots n_{p}\). We index vectors in \(\mathbb{C}^{N^{p}}\) by numbers between 0 and \(N^{p}-1\) and identify functions which are constant on the intervals \([k/N^{p},(k+1)/N^{p})\) with their corresponding vector of values.

With these identifications, the formula (3.6) becomes

$$ W_{n}^{A}(x)=\sqrt{N^{p}}A^{\otimes p}_{n,x_{1}x_{2}\dots x_{p}}, $$
(3.9)

and the formula (3.8) becomes a multiplication of the matrix \(A^{\otimes p}\) by the vector \(f\).

$$ \bigl( \bigl\langle f , W_{n}^{a} \bigr\rangle \bigr)_{n=0}^{N^{p}-1}=\sqrt{N ^{p}}\, \overline{A}^{\otimes p}f. $$
(3.10)

Proposition 3.5

(Change of Basis Between Generalized Walsh Bases)

Let\(A= (a_{ij})_{i,j=0,1,\dots ,N-1}\)and\(B=(b_{ij})_{i,j=0,1,\dots ,N-1}\)be two\(N\)by\(N\)unitary matrices with constant first row. Then the change of basis matrix from\(\{W_{n}^{A}\}_{n\in \mathbb{N}_{0}}\)to\(\{W_{n}^{B}\}_{n\in \mathbb{N}_{0}}\)is given by the entries:

$$ M_{nm}= \bigl\langle W_{n}^{B} , W_{m}^{A} \bigr\rangle =\bigl(BA^{*} \bigr)_{n _{1}m_{1}}\bigl(BA^{*}\bigr)_{n_{2}m_{2}}\dots \bigl(BA^{*}\bigr)_{n_{p}m_{p}}=\bigl(BA^{*} \bigr)^{ \otimes p}_{nm} \quad (m,n\in \mathbb{N}), $$
(3.11)

where\(n=n_{1}+Nn_{2}+\cdots \)and\(m=m_{1}+Nm_{2}+\cdots \)are the base\(N\)representations, and\(p\)is chosen such that\(n_{k}=m_{k}=0\)for\(k>p\).

Proof

Using Proposition 3.1, (3.5) we have

$$\begin{aligned} M_{nm} =&\sum_{j_{1},j_{2},\dots ,j_{p}=0}^{N-1}N^{p}b_{n_{1}j_{1}}b _{n_{2}j_{2}}\dots b_{n_{p}j_{p}}\overline{a}_{m_{1}j_{1}} \overline{a}_{m_{2}j_{2}}\dots a_{m_{p}j_{p}}\cdot \frac{1}{N^{p}} \\ =& \Biggl( \sum_{j_{1}=0}^{N-1}b_{n_{1}j_{1}} \overline{a}_{m_{1}j_{1}} \Biggr) \Biggl( \sum_{j_{2}=0}^{N-1}b_{n_{2}j_{2}} \overline{a}_{m_{2}j_{2}} \Biggr) \dots \Biggl( \sum _{j_{p}=0}^{N-1}b_{n_{p}j_{p}}\overline{a}_{m_{p}j _{p}} \Biggr) \\ =&\bigl(BA^{*}\bigr)_{n_{1}m_{1}}\bigl(BA^{*} \bigr)_{n_{2}m_{2}}\dots \bigl(BA^{*}\bigr)_{n_{p}m_{p}} \end{aligned}$$

 □

4 A Fast Generalized Walsh Transform

Let \(I_{n}\) be the \(n\times n\) identity matrix. For a function \(f\) which is piecewise constant on intervals of length \(1/N^{p}\), as in Proposition 3.4, to find its coefficients in the generalized basis, according to (3.10), we need a multiplication by the matrix \(\overline{A}^{\otimes p}\). Since this is an \(N^{p}\times N ^{p}\) matrix, each coefficient requires \(N^{p}\) operations (by operation, we mean a multiplication and an addition of complex numbers). Since there are \(M:=N^{p}\) coefficients, we need \((N^{p})^{2}\) operations. However, using ideas from [18], we can improve the speed of these computations using a certain factorization, based on the following relations: if \(A,A'\) are \(n\times n\) matrices and \(B,B'\) are \(m\times m\) matrices then

$$ (A\otimes B)\cdot \bigl(A'\otimes B'\bigr)=\bigl(A \cdot A'\bigr)\otimes \bigl(B\cdot B'\bigr), $$

and in particular

$$ I_{n}\otimes \bigl(B\cdot B'\bigr)=(I_{n} \otimes B)\cdot \bigl(I_{n}\otimes B'\bigr). $$

Therefore, we have

$$\begin{aligned} \overline{A}^{\otimes p} =&(\overline{A}\otimes I_{N^{p-1}})\cdot \bigl(I _{N}\otimes \overline{A}^{\otimes (p-1)}\bigr)=(I_{N^{0}} \otimes \overline{A}\otimes I_{N^{p-1}})\\ &{}\cdot \bigl(I_{N}\otimes (\overline{A} \otimes I_{N^{p-2}})\cdot \bigl(I_{N}\otimes \overline{A}^{\otimes (p-2)}\bigr)\bigr)\\ =&(I_{N^{0}}\otimes \overline{A}\otimes I_{N^{p-1}})\cdot (I_{N}\otimes \overline{A}\otimes I_{N^{p-2}})\cdot \bigl(I_{N}\otimes I_{N}\otimes \overline{A}^{\otimes (p-2)} \bigr)\\ =&(I_{N^{0}}\otimes \overline{A}\otimes I_{N^{p-1}})\cdot (I_{N}\otimes \overline{A}\otimes I_{N^{p-2}})\cdot \bigl(I_{N^{2}}\otimes \overline{A} ^{\otimes (p-2)}\bigr)=\cdots \quad \text{by induction} \\ =&\prod_{k=0}^{p-1}(I_{N^{k}}\otimes \overline{A}\otimes I_{N^{p-1-k}}). \end{aligned}$$

Note that each term in the last product is a matrix which has at most \(N\) non-zero entries in each row/column. Thus for each coefficient, each multiplication will require at most \(N\) operations, and since there are \(p\) terms in the product, each coefficient will require \(Np\) operations. For all the \(N^{p}\) coefficients, one needs \(N^{p}\cdot p\cdot N\) operations, and with \(N\) fixed and \(p\) variable, and \(M=N^{p}\) this means \(O(M\log M)\) operations.

More precisely, we have, for \(k=0,\dots ,p-1\), and \(i_{1},j_{1}=0,1, \dots , N^{k}-1\), \(i_{2},j_{2}=0,1,\dots ,N-1\), \(i_{3},j_{3}=0,1, \dots ,N^{p-1-k}\)

$$ (I_{N^{k}}\otimes \overline{A}\otimes I_{N^{p-1-k}})_{i_{1}+N^{k}i _{2}+N^{k+1}i_{3}, j_{1}+N^{k}j_{2}+N^{k+1}j_{3}}= \delta _{i_{1},j_{1}} \overline{a}_{i_{2},j_{2}}\delta _{i_{3},j_{3}}. $$

Thus, for a vector \((v^{(k+1)}_{i})_{i=0}^{N^{p}-1}\), if \(v^{(k)}=(I _{N_{k}}\otimes \overline{A}\otimes I_{N^{p-1-k}})v^{(k+1)}\), then for \(i_{1}=0,1,\dots ,N^{k}-1\), \(i_{2}=0,1,\dots , N-1\), \(i_{3}=0,1, \dots ,N^{p-1-k}-1\),

$$ v^{(k)}\bigl(i_{1}+N^{k}i_{2}+N^{k+1}i_{3} \bigr)=\sum_{j_{2}=0}^{N-1} \overline{a}_{i_{2},j_{2}}v^{(k+1)} \bigl(i_{1}+N^{k}j_{2}+N^{k+1}i_{3} \bigr). $$

Using these operations \(p\) times, starting with a vector \(v\), we let \(v^{(p)}=v\) and then \(\overline{A}^{\otimes p}v=v^{(0)}\).

Note also that the change of base between two generalized Walsh bases (see Proposition 3.5) can be performed using a fast algorithm, by writing

$$ \bigl(BA^{*}\bigr)^{\otimes p}=\prod_{k=0}^{p-1} \bigl( I_{N^{k}}\otimes \bigl(BA ^{*}\bigr)\otimes I_{N^{p-1-k}} \bigr) . $$

5 Uncertainty Principles for Walsh Transforms

Let \(G \) be the finite abelian group \(\mathbb{Z}_{m}\) with \(m\) elements, and \(f:G\rightarrow \mathbf{C}\) a finite length signal on \(G\) with \(\hat{f}:G\to \mathbf{C}\) its (discrete) Fourier transform. The uncertainty principle in this set up relates the support of \(f\) and \(\hat{f}\) as follows (see [12]):

$$ |\mathrm{supp}(f)|\cdot |\mathrm{supp}(\hat{f})|\geq m $$
(5.1)

As a simple consequence one obtains

$$ |\mathrm{supp}(f)| + |\mathrm{supp}(\hat{f})|\geq 2\sqrt{ m} $$
(5.2)

These inequalities are useful in signal recovery in case of sparse signals. By exploiting a property of Chebotarev on the minors of the Fourier matrix when \(m\) is prime in [22] inequality (5.2) is vastly improved: \(|\mathrm{supp}(f)| + |\mathrm{supp}( \hat{f})|\geq m+1 \).

In this section we explore the uncertainty principle (5.1) for the (classic and generalized) Walsh transforms. In the following, we let \(A\) be a \(N\times N\) unitary matrix with constant row \(1/\sqrt{N}\), and \(p\geq 1\) an integer. We will denote by \(b_{ij}\), \(i,j=0,\ldots, N ^{P}-1\) the entries of the matrix \(A^{\otimes p}\). In this set-up if \(f\in \mathbf{C}^{p}\) is a signal then \(W_{A}(f)=\sqrt{N^{p}}A^{ \otimes p}f\) encodes its ‘frequencies’. Note that \(\|A^{\otimes p}f\|= \|f\|\). Define the generalized Walsh-Fourier transform corresponding to \(A\) by

$$ Tf(k):=\frac{1}{\sqrt{N^{p}}}W_{A}(f) =\bigl[ A^{\otimes p}f \bigr](k) \quad \bigl(k=0,\dots , N^{p}-1\bigr). $$

The next theorem shows that a certain discrepancy occurs between the classic Walsh and generalized Walsh transforms. While the classic Walsh satisfies (5.1), some generalized Walsh transforms may satisfy a weaker version.

Theorem 5.1

Let\(N\), \(p\)and\(A=(a_{ij})_{i,j=0,\dots ,N-1}\)as above and\(f\in \mathbf{C}^{N^{p}}\)a non zero vector. If\(\alpha >0\)satisfies\(\max_{i,j}|a_{ij}|\leq (\frac{1}{\sqrt{N}})^{\alpha }\)then:

  1. (i)

    The following uncertainty principle holds

    $$ |\mathrm{supp}(f)|\cdot |\mathrm{supp}(Tf)|\geq N^{p\alpha } $$
    (5.3)
  2. (ii)

    If\(A\)is the\(2\times 2\)matrix giving rise to the classic Walsh transform then:

    $$ |\mathrm{supp}(f)|\cdot |\mathrm{supp}(Tf)|\geq N^{p} $$
    (5.4)
  3. (iii)

    \(\alpha =1\)if and only if\(\sqrt{N}\cdot A\)is a Hadamard matrix, i.e., a matrix with all entries of absolute value one and orthogonal rows/columns.

  4. (iv)

    If\(A\)has real entries only and\(N>2\)then necessarily\(\alpha < 1\)unless\(N\)is even and\(A\)is a\(1/\sqrt{N}\)multiple of the Hadamard matrix.

Proof

(i) Using the calculations for \(A^{\otimes p}\) from Sect. 4, we know that its entries satisfy \(|b_{ij}|\leq 1/\sqrt{N ^{p\alpha }} \). Now we essentially follow the proof of (5.1). Using Cauchy-Schwartz we have:

$$\begin{aligned} \max_{k}|Tf(k)| \leq& \frac{1}{\sqrt{N^{p\alpha }}}\cdot \sum _{j\in \mathrm{supp}(f) }|f(j)|\cdot 1\\ \leq& \frac{1}{\sqrt{N^{p\alpha }}} \bigl( |\mathrm{supp}(f)|\bigr)^{1/2}\biggl(\sum _{j}|f(j)|^{2}\biggr)^{1/2}= \frac{1}{\sqrt{N^{p\alpha }}} \bigl( |\mathrm{supp}(f)|\bigr)^{1/2} \|f\|\\ =& \frac{1}{\sqrt{N^{p\alpha }}} \bigl( |\mathrm{supp}(f)|\bigr)^{1/2}\|A ^{\otimes p}f\|\\ =&\frac{1}{\sqrt{N^{p\alpha }}} \bigl( |\mathrm{supp}(f)|\bigr)^{1/2}\biggl(\sum _{j} |Tf(j)|^{2}\biggr)^{1/2} \\ \leq& \frac{1}{\sqrt{N^{p\alpha }}} \bigl( |\mathrm{supp}(f)|\bigr)^{1/2}\bigl( | \mathrm{supp}(Tf)|\bigr)^{1/2} \max_{k}|Tf(k)| \end{aligned}$$

Retaining the first and last terms and simplifying (\(f\) is non zero) we get (5.3).

(ii) follows from (i) because the entries of the (unique) \(2\times 2\) matrix giving rise to classic Walsh system satisfy \(|a_{ij}|=1/ \sqrt{2}\). Hence \(\alpha =1\) is applicable in (i).

(iii) If \(A\) is a \(N\times N\) unitary matrix having constant \(1/\sqrt{N}\) first row then \(\max |a_{ij}|\geq 1/\sqrt{N}\). However each row must have norm 1. This means that, in case \(\alpha =1\), all \(|a_{ij}|\) must be equal to \(1/\sqrt{N}\) thus \(\sqrt{N}\cdot A\) is Hadamard.

(iv) follows from (iii) and the fact that the entries in this case must be \(\pm {1}\). □

Using the inequality between the arithmetic and geometric means, we obtain:

Corollary 5.2

$$ |\mathrm{supp}(f)|+ |\mathrm{supp}(Tf)|\geq 2 N^{p\alpha /2} $$

Example 5.3

If \(f\) is a Dirac signal then equality is attained in (5.4). This follows easily from the form of the matrix \(A^{\otimes p}\) with \(A\) as in (6.1). The Dirac signal also produces equality in (5.3) when \(\sqrt{N}\cdot A\) is the \(N\times N\) Fourier matrix. The inequality (5.3) is not always optimal. For example with \(A\) as in (6.2) we have \(\max |a _{ij}|=\sqrt{6}/3\) and \(\alpha \approx 0.369\). With \(p=3\) and \(f_{i}=0\) for all \(i\) but \(f_{14}=1\) inequality (5.3) simply says \(8>3.375\).

Similarly to the context of the discrete Fourier transform [12], we exploit uncertainty to show that sparse signals can be uniquely recovered when a few transform coefficients are lost.

Theorem 5.4

With the matrix\(A\)and the constant\(\alpha \)as above, consider a signal\(f\in \mathbf{C}^{N^{p}}\)of which the following are known:

  1. (i)

    The number of non zero components of\(f\), \(N_{f}=|\mathrm{supp}(f)|\).

  2. (ii)

    A subset \(B\subseteq \mathrm{supp}(Tf)\) of observed ‘frequencies’ with which form the signal

    $$ \tilde{f}(k):= \left \{ \textstyle\begin{array}{l@{\quad}l} Tf(k), &\textit{if } k\in B \\ 0 ,&\textit{otherwise} \end{array}\displaystyle \right . $$
  3. (iii)

    The number of unobserved ‘frequencies’ \(N_{w}=|B^{c}|\) satisfies

    $$ 2N_{f}N_{w}< N^{p\alpha } $$
    (5.5)

Then\(f\)can be uniquely reconstructed from data (i), (ii) and (iii).

Proof

We prove uniqueness: if \(g\in \mathbf{C}^{N^{p}}\) is such that \(N_{f}=N_{g}\) and \(\tilde{f}=\tilde{g}\), then the signal \(h:=f-g\) satisfies

$$\begin{aligned} |\mathrm{supp}(h)| \leq &N_{f}+N_{g}=2N_{f}\\ | \mathrm{supp}(Th)| \leq& N_{w}\quad \text{because } Th(k)=0\text{ for } k \in B \end{aligned}$$

It follows that \(|\mathrm{supp}(h)|\cdot |\mathrm{supp}(Th)| < N^{p\alpha }\). This contradicts theorem (5.1) unless \(h=0\). □

Remark 5.5

As in [12], reconstruction of \(f\) is possible by solving the min-problem

$$ \min \bigl\{ \| T^{-1}\tilde{f}-v\|: \mathrm{supp}(v)\leq N_{f} \bigr\} . $$

The minimum must be attained and from uniqueness this happens when \(v=f\).

In Theorem 5.1 we must have \(\alpha \leq 1\). In terms of recovery by using uncertainty the best value is \(\alpha =1\), otherwise inequality (5.5) will severely restrict the number \(N_{w}\) of frequencies that can be missed or lost in transmission. It would be interesting to find efficient recovery algorithms similar to those in [12]. As mentioned in that paper, solving the min problem by brute force is inefficient.

One can generalize Theorem 5.1 by considering unitary matrices regardless of constant first row (which was necessary in order to obtain a generalized Walsh basis, see [11]). For a given unitary matrix we introduce its uncertainty constant. As we will see below, this constant satisfies some stability properties with respect to tensor products.

Theorem 5.6

Let\(A\)be a\(n\times n\)unitary matrix. Let\(M:=\max_{i,j}|A(ij)|\). Then

$$ |\mathrm{supp}(f)|\cdot |\mathrm{supp}(Af)|\geq \frac{1}{M^{2}} \quad \bigl(f\in \mathbb{C}^{n}\setminus \{0\}\bigr). $$
(5.6)

Proof

Let \(f\in \mathbb{C}\setminus \{0\}\). We have, using the Cauchy-Schwarz inequality:

$$\begin{aligned} \max_{k}|Af(k)| =&\max_{k} \biggl\vert \sum_{j} A(kj)f(j) \biggr\vert \leq M \sum _{j\in \mathrm{supp}(f)}|f(j)|\cdot 1\leq M \bigl( | \mathrm{supp(f)}| \bigr) ^{\frac{1}{2}}\|f\|\\ =& M \bigl( |\mathrm{supp}(f)| \bigr) ^{\frac{1}{2}}\|Af\|=M \bigl( | \mathrm{supp}(f)| \bigr) ^{\frac{1}{2}} \biggl( \sum _{j}|Af(j)|^{2} \biggr) ^{\frac{1}{2}} \\ =&M \bigl( |\mathrm{supp}(f)| \bigr) ^{\frac{1}{2}} \bigl( | \mathrm{supp}(Af)| \bigr) ^{\frac{1}{2}}\max_{k} |Af(k)|. \end{aligned}$$

This implies (5.6). □

Definition 5.7

Let \(A\) be an \(n\times n\) unitary matrix. We define the uncertainty constant of \(A\) to be

$$ \mu (A):=\min_{f\in \mathbb{C}^{n}\setminus \{0\}}|\mathrm{supp}(f)| \cdot | \mathrm{supp}(Af)|. $$

Corollary 5.8

For a unitary matrix\(A\), on has

$$ \mu (A)\geq \frac{1}{(\max_{i,j}|A(ij)|)^{2}}. $$
(5.7)

If\(\sqrt{N}A\)is an\(N\times N\)Hadamard matrix, then

$$ \mu (A)=N. $$
(5.8)

Proof

(5.7) follows from Theorem 5.6. If \(\sqrt{N}A\) is Hadamard, then \(\max_{i,j}|A(ij)|=\frac{1}{\sqrt{N}}\) so \(\mu (A) \geq N\). On the other hand \(|\mathrm{supp}(A\delta _{1})|=N\), so \(\mu (A)\leq 1\cdot N\), and (5.8) follows. □

With the relation (5.8), we obtain an interesting corollary about Hadamard matrices.

Corollary 5.9

Let\(\sqrt{N}A\)be an\(N\times N\)Hadamard matrix, and assume\(N_{1}\cdot N_{2}< N\), with\(N_{1},N_{2}\in \mathbb{N}\), and let\(n=N-N_{2}\). Then a matrix obtained from\(A\)by taking\(n\)rows and\(N_{1}\)columns has rank\(N_{1}\).

Proof

Let \(B\) be the matrix obtained from \(\sqrt{N}A\) by taking the rows \(i_{1},\dots ,i_{n}\) and the columns \(j_{1},\dots ,j_{N_{1}}\). Assume that the rank of \(B\) is strictly less then \(N_{1}\). That means the columns of \(B\) are linearly dependent so there is a non-zero vector \(v=\sum_{k=1}^{N_{1}}v_{k}\delta _{j_{k}}\) such that \(Bv=0\). Take then \(f\) in \(\mathbb{C}^{N}\) by completing the vector \(v\) with zeros. Then \(Av\) is zero on the components \(i_{1},\dots , i_{n}\). Thus \(| \mathrm{supp}(f)|\leq N_{1}\) and \(|\mathrm{supp}(Af)|\leq N-n=N_{2}\). Then

$$ |\mathrm{supp}(f)|\cdot |\mathrm{supp}(Af)|\leq N_{1}\cdot N_{2}< N= \mu (A). $$

This contradicts (5.8). So \(B\) has rank \(N_{1}\). □

Proposition 5.10

If\(A_{1}\)and\(A_{2}\)are unitary matrices, then

$$ \mu (A_{1}\otimes A_{2})\leq \mu (A_{1})\cdot \mu (A_{2}). $$
(5.9)

In particular, for a unitary matrix\(A\)and\(p\in \mathbb{N}\),

$$ \mu \bigl(A^{\otimes p}\bigr)\leq \mu (A)^{p}. $$
(5.10)

Proof

Let \(A_{1}\) be \(n_{1}\times n_{1}\) and \(A_{2}\) be \(n_{2}\times n_{2}\) unitary matrices. Let \(f_{1}\in \mathbb{C}^{n_{1}}\setminus \{0\}\) and \(f_{2}\in \mathbb{C}^{n_{2}}\setminus \{0\}\) be such that

$$ \mu (A_{1})=|\mathrm{supp}(f_{1})|\cdot | \mathrm{supp}(A_{1}f_{1})|, \qquad \mu (A_{2})=| \mathrm{supp}(f_{2})|\cdot |\mathrm{supp}(A_{2}f _{2})|. $$

We have, for \(0\leq i_{1}\leq n_{1}-1\), \(0\leq i_{2}\leq n_{2}-1\),

$$ (f_{1}\otimes f_{2}) (i_{1}+n_{1}i_{2})=f_{1}(i_{1})f_{2}(i_{2}), $$

so \((f_{1}\otimes f_{2})(i_{1}+n_{1}i_{2})\neq 0\) if and only if \(f_{1}(i_{1})\neq 0\) and \(f_{2}(i_{2})\neq 0\). Therefore

$$ |\mathrm{supp}(f_{1}\otimes f_{2})|=|\mathrm{supp}(f_{1})| \cdot | \mathrm{supp}(f_{2})|. $$

Then

$$\begin{aligned} \mu (A_{1}\otimes A_{2}) \leq& |\mathrm{supp}(f_{1} \otimes f_{2})| \cdot |\mathrm{supp}\bigl( (A_{1}\otimes A_{2}) (f_{1}\otimes f_{2})\bigr)| \\ =&|\mathrm{supp}(f_{1})|\cdot |\mathrm{supp}(A_{1}f_{1})| \cdot | \mathrm{supp}(f_{2})|\cdot |\mathrm{supp}(A_{2}f_{2})|= \mu (A_{1}) \cdot \mu (A_{2}). \end{aligned}$$

The inequality (5.10) follows from (5.9) by induction. □

Corollary 5.11

Let\(A_{1}\)and\(A_{2}\)be unitary matrices and\(M_{1}:=\max_{i,j}|A _{1}(ij)|\), \(M_{2}:=\max_{i,j}|A_{2}(ij)|\). Then

$$ \frac{1}{M_{1}^{2}M_{2}^{2}}\leq \mu (A_{1}\otimes A_{2})\leq \mu (A _{1})\cdot \mu (A_{2}). $$
(5.11)

In particular, if\(A\)is a unitary matrix and\(M:=\max_{i,j}|A(ij)|\), then, for\(p\geq 1\),

$$ \frac{1}{M^{p}}\leq \mu \bigl(A^{\otimes p}\bigr)\leq \mu (A)^{p}. $$
(5.12)

If\(A\)is an\(N\times N\)Hadamard matrix then

$$ \mu \bigl(A^{\otimes p}\bigr)=N^{p}. $$
(5.13)

Proof

Note that

$$ \max_{i,j}|(A_{1}\otimes A_{2}) (ij)|=\max _{i,j}|A_{1}(ij)|\cdot \max_{i,j}|A_{2}(ij)|. $$

Then the result follows from Theorem 5.6 and Proposition 5.10. The inequality (5.12) follows from (5.11) by induction. If \(A\) is a Hadamard matrix then so is \(A^{\otimes p}\), and (5.13) follows from (5.8). □

Example 5.12

Let us investigate the uncertainty constant in dimension 3. Consider a \(3\times 3\) matrix \(A\) with constant first row \(\frac{1}{\sqrt{3}}\).

\(\mu (A)=1\). If \(\mu (A)=1\) this means that there exists \(f\in \mathbb{C}^{3}\) with \(|\mathrm{supp}(f)|=|\mathrm{supp}(Af)|=1\). Then \(f=\delta _{a}\) and \(Af=\delta _{b}\) for some \(a,b\in \{1,2,3\}\), which means that one of the columns of \(A\) is a canonical vector \(\delta _{b}\). Obviously, this cannot happen if the first row is constant, because, then, one of the columns is \((1,0,0)^{T}\) (say the first one), and reading the rows in the other columns, we get that the vectors \((1,1)\), \((A(22),A(23))\) and \((A(32),A(33))\) are orthogonal, which is impossible.

\(\mu (A)=2\). If \(\mu (A)=2\) then (case 1) there exists \(f\in \mathbb{C}^{3}\) with \(|\mathrm{supp}(f)|=1\) and \(|\mathrm{supp}(Af)|=2\) or (case 2) there exists \(f\in \mathbb{C}^{3}\) with \(|\mathrm{supp}(f)|=2\) and \(|\mathrm{supp}(Af)=1\).

In case 1, \(f=\delta _{i}\) and, by a permutation, we can assume \(f=\delta _{1}\) and \(Af\) has the zero, so the first column of \(A\) has a zero, and, by a permutation, we can assume that it is on the second row. Therefore the matrix has the form

$$ \begin{pmatrix} \frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}} \\ 0&a&b \\ c&d&e \end{pmatrix} . $$

Since the first two rows are orthogonal we get that \(a+b=0\) so \(b=-a\). Multiplying the row by a scalar of absolute value 1, we can assume \(a\) is real and positive. Since the row has norm 1, we get that \(2a^{2}=1\) so \(a=\frac{1}{\sqrt{2}}\), \(b=-\frac{1}{\sqrt{2}}\). Since the last two rows are orthogonal, we get \(d-e=0\) so \(e=d\). As before, we can assume \(d\) is real and positive. Since row 1 and row 3 are orthogonal, we get \(c+2d=0\) so \(c=-d\). Since the norm of the row 3 is 1, we obtain \(6d^{2}=1\) so \(d=\frac{1}{\sqrt{6}}\). Thus the matrix is

$$ A= \begin{pmatrix} \frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}} \\ 0&\frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}} \\ -\frac{2}{\sqrt{6}}&\frac{1}{\sqrt{6}}&\frac{1}{\sqrt{6}} \end{pmatrix} . $$
(5.14)

In case 2, we have \(Af=\delta _{a}\) and \(|\mathrm{supp}(f)|=2\). Then \(|\mathrm{supp}(A^{*}\delta _{a})|=2\) so a row in \(A\) has a zero, which means a column in \(A\) has a zero, and we are back to case 1.

In general, of course if we take \(f=\delta _{1}\) then \(|\mathrm{supp}(f)| \cdot |\mathrm{supp}(Af)|=3\) so \(\mu (A)\leq 3\). Therefore, for all \(3\times 3\) matrices that cannot be obtained from the matrix in (5.14) by permutation of columns, permutations of the last two rows, or multiplications of the last two rows by unimodular constants, we have that the uncertainty constant \(\mu (A)=3\).

6 Generalized Walsh vs. Classic Walsh and DCT

In this section we take a look at how the generalized Walsh transforms compare statistically to the Walsh and DCT (discrete cosine) transforms. Recall that the classic Walsh transform corresponds to the (unique) \(2\times 2\) unitary matrix \(A\) with constant \(1/\sqrt{2}\) first row:

$$ A:=\left ( \textstyle\begin{array}{c@{\quad}c} \frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}& -\frac{1}{\sqrt{2}} \end{array}\displaystyle \right ) $$
(6.1)

We will do this analysis on 1-dimensional signals \(X\in \mathbf{R} ^{n}\). We will implement the following compression scheme under a fixed orthogonal transform \(T:\mathbf{R}^{n}\to \mathbf{R}^{n}\), i.e. \(T^{*}T=I\). This scheme is inspired by the so-called “variance criterion” (see e.g. [2]) however we do not consider classes of signals and covariance matrices built from expected values. Instead we use the straightforward ‘covariance’ matrices below. In the following \(v^{t}\) denotes a column vector (the transpose of row vector \(v\)).

  • Fix an integer \(M< n\);

  • For a column vector \(X=(x_{1},x_{2},\dots ,x_{n})^{t}\) consider \(Y=TX=(y_{1},y_{2},\dots ,y_{n})^{t}\) in \(\mathbf{R}^{n}\);

  • Calculate the matrix \(\mathrm{cov}(Y):=YY^{t}\);

    From the diagonal pick the highest \(M\) ‘variances’ \(\sigma _{i_{1},i _{1}}\), \(\sigma _{i_{2},i_{2}},\ldots,\sigma _{i_{M},i_{M}}\);

    Form the compressed signal \(\tilde{Y}:=(0,\ldots,0,y_{{i_{1}}},0,\ldots,y _{{i_{2}}},\dots ,0, y_{{i_{M}}},0,\ldots,0)\);

  • \(\tilde{X}:=T^{-1}(\tilde{Y})=T^{*}(\tilde{Y})\) is an approximation of \(X\).

  • Graph the (normalized) variance distribution

    $$ k\to \frac{\sigma _{i_{k}, i_{k}}}{\mathrm{Trace}(YY^{t})} $$

    to visualize how efficient is \(T\) when \(k\) components are kept (\(n-k\) removed) corresponding to the highest \(k\) variances.

The reason this scheme should work is based on the fact that the minimum error is achieved when the transform matrix is made of the eigenvectors of \(\mathrm{cov}(X)\) i.e. \(\mathrm{cov}(Y)=T\mathrm{cov}(X)T^{*}\) is a diagonal matrix with entries the eigenvalues of \(\mathrm{cov}(X)\). We prove this result below by adapting ideas from [2] pp. 201–202. We do not have to deal with the expected value operator in the case of a fixed signal. Still, keeping the highest variances achieves compression. The error function we wish to minimize will be expressed in simple scalar products.

Proposition 6.1

Let\(X\in \mathbf{R}^{n}\)be a fixed vector and\(M< n\)a positive integer. For any orthogonal transform\(T:\mathbf{R}^{n}\to \mathbf{R} ^{n}\)consider the vectors\(Y:=TX\)and\(\tilde{Y}:=(y_{1},y_{2},\ldots,y _{M}, 0,\ldots0)^{t}\), and define the error\(E_{T}:=\|X-T^{*}\tilde{Y}\| ^{2}\). Then:

  1. (i)

    \(\mathrm{cov}(Y)=T\mathrm{cov}(X)T^{*}\);

  2. (ii)

    \(\min \{ E_{T} \mid T^{*}T=I \}\)is attained when\(T^{*}\)is the transform whose columns\(v_{1}^{t}, v_{2}^{t}, \ldots, v _{n}^{t}\)are the eigenvectors of\(\mathrm{cov}(X)\). Hence\(\mathrm{cov}(Y)\)is a diagonal matrix.

Proof

(i) The formula follows easily from the definition \(\mathrm{cov}(Y)=TX(TX)^{t}\) and \(T^{t}=T^{*} \) because \(T\) has real valued entries.

(ii) For an arbitrary \(T\), let \(v_{i}^{t}\) be the column vectors of \(T^{*}\). If \(Y=TX=(y_{1},\dots ,y_{n})^{t}\), then \(X=T^{*}Y=\sum_{i=1} ^{n}y_{i}v_{i}^{t}\). Because \(T\) is orthogonal

$$ E_{T}=\|X-T^{*}\tilde{Y}\|^{2}= \|T^{*}Y-T^{*}\tilde{Y}\|^{2}=\|Y- \tilde{Y} \|^{2}=\sum_{i=M+1}^{n}y_{i}^{2}= \sum_{i=M+1}^{n} \bigl\langle v _{i}^{t} , X \bigr\rangle ^{2}. $$

The minimum is subject to the constraints \(v_{i}v_{i}^{t}=1\) (we discard the orthogonality conditions \(v_{i}v_{j}^{t}=0\) for \(i\neq j\), because, as we will see, the minimum, under these conditions, will be realized for eigenvectors of \(\mathrm{cov}(X)\), which can be chosen to be orthogonal).

With \(\lambda _{i}\) being the Lagrange multipliers we look for critical points of the function

$$ \tilde{E}(v)=\sum_{i=M+1}^{n} \bigl\langle v_{i}^{t} , X \bigr\rangle ^{2}-\sum _{i=M+1}^{n}\lambda _{i} \bigl(v_{i}v_{i}^{t}-1\bigr), $$

in variables \((v_{i})_{i=1,\dots ,n}\).

Note that, for row vectors \(v\), if \(f(v)= \langle v^{t} , X \rangle ^{2}\), then \(\nabla _{v}(f)=2 \langle v^{t} , X \rangle X=2(XX ^{t})v^{t}=2\mathrm{cov}(X)v^{t}\). Also \(\nabla _{v} (vv^{t})=2v^{t}\). Then \(\nabla _{v_{i}}(\tilde{E})=0\) implies \(\mathrm{cov}(X)v_{i}^{t}=XX ^{t}v_{i}^{t}=\lambda _{i} v_{i}^{t}\).

In conclusion, the minimum is obtained when the column vectors of \(T^{*}\) are eigenvectors for \(\mathrm{cov}(X)\), and therefore \(\mathrm{cov}(Y)=T\mathrm{cov}(X)T^{*}\) is a diagonal matrix in the canonical basis. □

By deleting a component \(y_{j}\) from the transformed signal we mean setting \(y_{j}=0\). If \(T^{*}\) diagonalizes \(\mathrm{cov}(X)\) the removal of the \(j^{\mathrm{th}}\) component will result in a mean square error increase by the corresponding eigenvalue \(\lambda _{j}\). To achieve compression it makes sense to discard the lowest \(n-M\) eigenvalues. This optimum transform (Karhunen-Loeve) is not cheap to implement, however. The implementation of Walsh and other transforms is more suitable (we also analyze DCT by the same method). For these transforms the covariance matrix has non zero off-diagonal terms, nevertheless by discarding the lowest values from its diagonal still produces compression, as seen in the following examples.

Example 6.2

We illustrate the above principle for \(T_{1}=A^{\otimes 6}\) and \(T_{2}=B^{\otimes 6}\) with \(3\times 3\) matrices \(A\) and \(B\) below (when entries contain decimals (for the matrix \(B\)), the matrices involved are ‘almost’ unitary because our Maple code approximates the solutions of the orthogonality equations). The signal \(X\) is a vector of length \(3^{6}\) with values in \([0,1]\) (the vector is a column in a black and white image). We set to zero \(90\%\) of \(T_{i}X\) components and keep 70 components out of 729 that correspond to the highest variances. In Fig. 1, the signal \(X\) is shown with its approximations where the mean square error is 0.84 under transform \(T_{1}\) and 0.95 under \(T_{2}\). We also graph the variance of both transforms with respect to number of components (\(x\)-axis). The area under each curve for a given number of components indicates the energy contained in those components: e.g. with respect to component interval \([1,20]\) transform \(T_{1}\) performs better than \(T_{2}\).

$$ A:=\left ( \textstyle\begin{array}{c@{\quad }c@{\quad }c} \frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}} \\ 0 & \frac{1}{\sqrt{2}}& -\frac{1}{\sqrt{2}} \\ -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{1}{ \sqrt{6}} \end{array}\displaystyle \right ) , \quad \quad B:=\left ( \textstyle\begin{array}{c@{\quad }c@{\quad }c} \frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}} \\ - 0.2&- 0.58& 0.78 \\ - 0.79& 0.57& 0.22 \end{array}\displaystyle \right ) $$
(6.2)
Fig. 1
figure 1

Variance distribution (left) for \(T_{1}\)-solid curve packs more energy than \(T_{2}\)-dash. Approximation of a signal with \(T_{1}\) (middle) and \(T_{2}\) (right)

Example 6.3

We compare now the DCT, the classic Walsh and a generalized Walsh transform based on the 4 by 4 matrix

$$ \left ( \textstyle\begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sqrt{2}}&- \frac{1}{\sqrt{2}} \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{2 }&- \frac{1}{2} \end{array}\displaystyle \right ) $$
(6.3)

In this example we choose a \(2^{8}\) length vector with high variation defined as follows

$$ X(i):=\left \{ \textstyle\begin{array}{l@{\quad}l} \frac{i}{3i+1},&\text{if }i|9 \\ \frac{i}{i+1},&\text{otherwise} \end{array}\displaystyle \right . $$

In Fig. 2 the variance distribution of the transforms (normalized variances in decreasing order of magnitude with respect to transform components) is depicted in order to check that the best error is obtained for the transform whose variance is higher. We keep \(45\%\) of the \(TX\) components and replace the rest with zeros. The best approximation (see Fig. 3) is given by DCT with error 0.01, followed by classic Walsh with error 0.08 and the generalized Walsh with error 0.33.

Fig. 2
figure 2

Variance distribution: DCT (solid), Walsh (dash) and generalized \(4\times 4\) Walsh (dot)

Fig. 3
figure 3

Approximation with DCT (left), classic Walsh (middle) and generalized 4 by 4 Walsh (right) with 45% of components

It seems that high variation signals are treated better by DCT than (generalized) Walsh. However with the same signal now extended to 729 components \(X_{i}, i=1\cdots3^{6}\), the generalized Walsh transforms associated with the 3 by 3 matrices \(A\) and \(B\) above produce compression strikingly different from DCT, see Fig. 4. The errors in recovering \(X\) after removing again \(90\%\) from \(T_{i}X\) with \(T_{1}=A^{\otimes 6}\) and \(T_{2}=B^{\otimes 6}\) are 0.02 and 0.08 respectively. Thus \(T_{1}\) is more efficient than \(T_{2}\) however both are stronger than DCT, the classic Walsh and the generalized Walsh (associated to the 4 by 4 matrix above) when \(90\%\) of \(X_{i}, i=1\cdots2^{8}\) components are zeroed out. In this case the latter transforms produce errors 0.38, 5.25 and 6.43 respectively.

Fig. 4
figure 4

Approximation of a high variation signal with two generalized 3 by 3 Walsh transforms. 90% components are set to 0. Recovery errors 0.02 and 0.08

The phenomenon in the example above suggests the following question: given the dimension of the matrix \(N\) and a prescribed signal \(X\) in \(\mathbb{R}^{N^{p}}\), which \(N\times N\) unitary matrix \(A\) will produce a compression-efficient generalized Walsh transform?