1 Introduction

The short-time Fourier transform (STFT) of a signal \(x\in \mathbb {C}^N\) can be interpreted as the Fourier transform of the signal multiplied by a sliding window \(w\in \mathbb {C}^W\)

$$\begin{aligned} Y_{m,r}(x,w) = \sum _{n = 0}^{N-1} x[n] w[rL -n]e^{-2\pi \iota nm/N}, \end{aligned}$$
(1.1)

for \(0\le m\le N-1\) and \(0\le r\le R-1\), where L is the separation between sections, \(R = N/\gcd (N,L)\) is the number of short time sections, and \(w[n] =0\) for \(W \le n \le N-1\). We assume that all signals are periodic, and thus all indices should be considered modulo N.

This paper studies the fundamental conditions allowing unique signal recovery—up to unavoidable ambiguities that will be precisely defined later—from the magnitude of its STFT \(|Y_{m,r}(x,w)|\), namely, from its phaseless STFT measurements. In particular, we study two cases: (1) the window function w is known, and (2) the blind case when w is unknown and needs to be recovered simultaneously with the signal x. We prove near-optimal bounds for both cases. For the known-window case, we show that no more than 4N measurements suffice to recover the 2N parameters of \(x\in \mathbb {C}^N\), substantially improving upon previous results [18, 45]. In the blind case, we prove that merely \(\sim 4 N + 2W\) measurements determine the \(2N+2W\) parameters that define the signal and window. As far as we know, this is the first uniqueness result for the blind setup.

Section 2 introduces and discusses the main results of this paper, which are proved in Sect. 3. It should be emphasized that our results concern only the question of uniqueness, and do not imply that practical algorithms can robustly recover the signal with only O(N) measurements; the computational and stability properties of different algorithms were studied in [2, 16, 18, 25, 36, 37, 45, 47]. Nevertheless, in Sect. 4, we show numerical results suggesting that O(N) might suffice for signal recovery, when the window is known.

Motivation. The motivation of this paper is twofold. First, phaseless STFT measurements naturally arise in ptychography: a computational method of microscopic imaging, in which the specimen is scanned by a localized beam and Fourier magnitudes of overlapping windows are recorded [23, 41, 46, 50, 56, 57]. The precise structure of the window might be unknown a priori and thus standard algorithms in the field optimize over the signal and the window simultaneously [34, 42, 53]. This paper illustrates the fundamental conditions required for unique recovery in ptychography, regardless of the specific algorithm used. Second, this paper is part of ongoing efforts to unveil the mathematical and algebraic properties standing at the heart of the phase retrieval problem—the problem of recovering a signal from phaseless measurements [7, 10, 12, 30, 51]. Next, we succinctly present some of the main results in the field.

The Phase Retrieval Problem. Phase retrieval is the problem of recovering a signal \(x\in \mathbb {C}^N\) from

$$\begin{aligned} y = |Ax|, \end{aligned}$$
(1.2)

for some sensing matrix \(A\in \mathbb {C}^{M\times N}\), where the absolute value should be understood entry-wise. In some cases, we may also assume prior knowledge on the signal, such as sparsity or known support. The phaseless periodic STFT setup is a special case of (1.2), where the matrix A represents samples of the STFT operator. The first mathematical and statistical works on phase retrieval focused on a random “generic” matrix A, see for example [3, 4, 19, 20, 29, 52]. These works were extended to the coded diffraction model [19, 33], which resembles our model, but the deterministic sliding window is replaced by a set of random masks. Unfortunately, the measurements in practice are not random, and thus this line of work is of theoretical rather than applicable interest.

In recent years, there has been a growing interest in deterministic phase retrieval setups that better describe imaging applications. In particular, the non-periodic phaseless STFT problem with a known window was studied in [13, 38, 43, 44]. This setup differs from our case since out of range indices are set equal to zero, and there are \(\left\lceil (N+W-1)/L \right\rceil \) distinct short-time sections instead of \(R=N/\gcd (N,L)\) in the periodic case. The authors of [38] proved unique recovery with \(\sim N\) samples, and also proposed a convex program to recover the signal. The blind case was studied by two of the authors in [13], who proved that the signal and the window can be recovered, up to trivial ambiguities of dimension L, from \(\sim 10(N+W)\) measurements. In this work, we show that in the periodic case, \(\sim 4N\) and \(\sim 4N+2W\) measurements are enough in the known-window case and blind case, respectively. The continuous STFT setup was studied in [1, 28, 31, 32].

More phase retrieval applications whose fundamental conditions for unique recovery were studied include ultra-short pulse characterization using frequency-resolved optical gating (FROG) [14, 17, 54] or using multi-mode fibers [15, 55, 58], X-ray crystallography (recovering a sparse signal from its Fourier magnitude) [11, 26], recovering a one-dimensional signal from its Fourier magnitude [8, 9, 24, 35], holographic phase retrieval [5, 6], and vectorial phase retrieval [48, 49].

2 Main Results

We begin by stating our result for the known-window case.

Theorem 2.1

(Known window) For a generic known window vector \(w \in \mathbb {C}^W\), a generic vector \(x \in \mathbb {C}^N\) can be recovered, up to a global phase, from

$$\begin{aligned} 2(2W-1) + \left\lceil {{(4 \alpha -1)(N - (W+\alpha ))}\over {\alpha }}\right\rceil \end{aligned}$$

phaseless periodic STFT measurements of step length L, where \(\alpha = \gcd (L,N)\).

Remark 2.2

We say that a condition holds for generic signals x and windows w if the set of signals and windows for which the condition does not hold is defined by polynomial conditions. In particular, the set of pairs \((x,w) \in \mathbb {C}^N \times \mathbb {C}^W\) for which the conclusion of Theorem 2.1 holds is dense and its complement For a precise definition of the term generic see Definition 3.1.

It is not hard to deduce that Theorem 2.1 implies that the number of required measurements for signal recovery is smaller than

$$\begin{aligned} 4N-\frac{N-W}{\alpha }-2<4N, \end{aligned}$$

while the number of parameters to be recovered is 2N. If N is a prime number, then \(\alpha =1\) (independently of L) and the bound improves to \(\sim 3N+W\). For a long window \(W\approx N\), the bound tends to 4N. Figure 1a presents the bound of Theorem 2.1 for \(N = 100\) as a function of W for various values of L. As can be seen, the curves are bounded by 4N.

Fig. 1
figure 1

The bounds of Theorems 2.1 and 2.4 for \(N=100\) as a function of the window length W, for various values of L

Remark 2.3

Given a vector \(y \in \mathbb {C}^N\), let \(T_\ell y\) denote the cyclically shifted vector defined by \((T_\ell y)[n] = y[n - \ell ]\) with all indices taken modulo N. Likewise, define the modulated vector \(M_my\) by setting \((M_my)[n] = \omega ^{mn} y[n]\), where \(\omega = e^{2\pi \iota /N}\). For a given generic window vector \(w \in \mathbb {C}^W\), the vectors \(f_{m,r} = M_mT_{rL}w\) form an NR-element frame in \(\mathbb {C}^N\) consisting of vectors whose supports all have length W. With this notation, the phaseless STFT measurement \(|Y_{m,r}(x)|\) equals to the phaseless frame measurement \(|\langle x,f_{m,r}\rangle |\). Theorem 2.1 implies that a subset of the \(\{f_{m,r}\}\) forms a highly structured frame with less than 4N elements for which it is possible to recover a generic vector, up to global phase, from its phaseless frame measurements. By contrast, [4, Theorem 3.4] implies that if \(M \ge 2N\) then for a generic M-element frame it is possible to recover a generic vector, up to a global phase, from its phaseless frame measurements. Also, note that if \(M \ge 4N-4\) then [22, Theorem 1.1] states that for a generic M-element frame every vector can be recovered, up to a global phase, from its phaseless frame measurements.

Our second result deals with the blind case where the window w is unknown, and therefore there are \(2N+2W\) parameters to be recovered.

Theorem 2.4

(Unknown window) A generic pair \((x, w) \in \mathbb {C}^N \times \mathbb {C}^W\) can be recovered, up to a group of trivial ambiguities of dimension \(\alpha +2\) defined in Proposition 3.4, from at most

$$\begin{aligned} 3(2W-1) + \left\lceil {{(4 \alpha -1)(N -(W +2\alpha ))}\over {\alpha }} \right\rceil \end{aligned}$$

phaseless periodic STFT measurements of step length L, where \(\alpha = \gcd (L,N)\).

Once again, the set of pairs \((x,w) \in \mathbb {C}^N \times \mathbb {C}^W\) for which the conclusion of Theorem 2.1 holds is dense and its complement has measure 0.

Theorem 2.4 shows that the number of measurements is bounded by

$$\begin{aligned} 4N+2W-\frac{N-W}{\alpha } - 3<4N +2W, \end{aligned}$$

exceeding the number of parameters to be recovered by a constant smaller than 2. For \(\alpha =1\), the bound reads \(\sim 3N+3W\): much smaller than \(4N+2W\) for \(W\ll N\), which is the typical situation in ptychography—a chief motivation of this paper. However, in contrast to the known-window case, in the blind case \(\alpha \) has a big impact on the dimensionality of the ambiguity group: the dimension of the ambiguity group is \(\alpha +2\), substantially larger than the dimension one ambiguity in the known-window case. Therefore, if possible, in this case it is preferable to choose a prime N. Figure 1b presents the bound of Theorem 2.4 for \(N = 100\) as a function of W.

The proofs of both theorems rest on extensions of technical results proved in [13]. The key point is that \(\sim 4W\) (known window) or \(\sim 6W\) (unknown window) phaseless periodic STFT measurements determine the Fourier intensity functions of short sequences of vectors in \(\mathbb {C}^W\) that satisfy certain polynomial constraints. Using the method of [24, Theorem 5.3], we show that the Fourier phase retrieval problem is solvable for generic vectors satisfying these constraints. Knowledge of these short sequences gives information about some of the entries in the vector x and in the blind case fully determine the window. We then use [13, Proposition IV.2] to bound the number of further phaseless STFT measurements needed to fully determine the signal x.

3 Proofs

3.1 Preliminaries

3.1.1 Notation About the Discrete Fourier Transform

In this section we establish some notation about the discrete Fourier transform and Fourier intensity function. For a reference, see [8, 24].

If \(y \in \mathbb {C}^W\) is a vector, let \({\hat{y}}(\omega )=y[0] + y[1] \omega + \cdots y[W-1] \omega ^{W-1}\) be the polynomial on the unit circle \(\omega = e^{-\iota \theta }\in S^1\). The discrete Fourier transform vector \({\hat{y}}\) is obtained by evaluating this polynomial at the W-th roots of unity; i.e.,

$$\begin{aligned} {\hat{y}} = \left( {\hat{y}}(1), {\hat{y}}(\eta ), \ldots , {\hat{y}}(\eta ^{W-1})\right) , \end{aligned}$$

where \(\eta = e^{-2\pi \iota /W}\).

By abuse of notation, we will sometimes view \(\omega \) as a coordinate on the entire complex plane and then we can speak about the roots of \({\hat{y}}(\omega )\). We typically assume that our vectors satisfy \(y[0], y[W-1] \ne 0\) so the polynomial \({\hat{y}}(\omega )\) will have \(W-1\) (not necessarily distinct) roots in \(\mathbb {C}\). If \((\beta _1, \ldots , \beta _{W-1})\) are the roots of \({\hat{y}}(\omega )\), then we can write

$$\begin{aligned} {\hat{y}}(\omega ) = y[W-1](\omega - \beta _1) \cdots (\omega - \beta _{W-1}). \end{aligned}$$

Given a vector \(y=(y[0], \ldots , y[W-1])\), the Fourier intensity of y is \(A_y(\omega ) = |{\hat{y}}(\omega )|^2\). Expanding out and using the fact that \(\overline{\omega } = \omega ^{-1}\) on the circle \(S^1\), the Fourier intensity function factors as [24]

$$\begin{aligned} A_y(\omega ) = \omega ^{1-W} \overline{y[0]} y[W-1] (\omega - \beta _1)\left( \omega - {1\over {\overline{\beta _1}}}\right) \ldots (\omega - \beta _{W-1})\left( \omega - {1\over {\overline{\beta _{W-1}}}}\right) .\nonumber \\ \end{aligned}$$
(3.1)

(Note that for any complex number \(\beta \), \({1\over {\overline{\beta }}}= {\beta \over {|\beta |^2}}\), a fact we will use extensively.) If \(y'\) is another vector such that \(A_y = A_{y'}\), then the proof of [9, Theorem 3.1] implies

$$\begin{aligned} \hat{y'}(\omega ) = e^{\iota \theta }|y[W-1]| \prod _{i \in I}|\beta _i|\left( \omega - {1\over {\overline{\beta }_i}}\right) \prod _{i \notin I} (\omega - \beta _i), \end{aligned}$$

for some subset \(I \subset [1, W-1]\).

3.1.2 Notation for the STFT Measurements

For our proofs, it is convenient to use the fact that x is periodic and that \(w[n] =0\) for \(W \le n \le N-1\) to rewrite the STFT (1.1) as

$$\begin{aligned} Y_{m,r}(x,w) = \eta _m^{rL}\sum _{n=0}^{N-1} \eta _{-m}^n x[rL -n]w[n], \end{aligned}$$
(3.2)

where \(\eta _m:=e^{2\pi \iota m/N}\), so \(\eta _{-m}:=e^{-2\pi \iota m/N}\) and \(\eta _m^n:=e^{2\pi \iota mn/N}\). Let \(T_{rL} x \circ w\), where

$$\begin{aligned} T_{rL}x = (x[rL], x[rL -1], \ldots , x[N-1-rL])\in \mathbb {C}^{N} \end{aligned}$$

be the vector x shifted by rL, and \(\circ \) denotes the entry-wise product. Thus, for fixed r, the measurements \(\{Y_{m,r}\}_{m=0}^{N-1}\) determine N values of the Fourier transform of the vector \(y_{rL} = T_{rL} x\circ w\), where the indices are taken modulo R. The phaseless STFT measurements \(|Y_{m,r}|_{m=0}^{N-1}\) give N values of the Fourier intensity function \(A_{y_{rL}}\) of the vector \(y_{rL}\).

3.1.3 Terminology from Algebraic Geometry

Definition 3.1

A property \(\textbf{P}\) holds generically on \(\mathbb {C}^M\) if the set \(Z\subset \mathbb {C}^M\) where property \(\textbf{P}\) does not hold is contained in a subset Y of \(\mathbb {C}^M\) defined by a non-zero polynomial. More generally, if \(X \subset \mathbb {C}^M\) is a subset defined by polynomial equations, then a property \(\textbf{P}\) holds generically on X if the set \(Z \subset X\) where property \(\textbf{P}\) does not hold is contained in a subset of X, which is defined by a polynomial which does not vanish identically on X.

3.2 Proof of Theorem 2.1

Since w and x are generic, we assume that \(w[0], \ldots , w[W-1]\) and \(x[0], \ldots , x[N-1]\) are all non-zero. By applying the action the group of ambiguities, \(S^1\), we can also assume that x[0]w[0] is real and positive. Since w is fixed and known, in this section we will use the notation \(Y_{m,r}(x)\) instead of \(Y_{m,r}(x,w)\).

Let \(x'\) be a solution to the system of quadratic equations \(\{|Y_{m,r}(x')|^2 = |Y_{m,r}(x)|^2\}\). We will use a recursive method to show that for generic x, there is a unique solution \(x'\) with \(x'[0]w[0]\) real and positive and that \(x'\) can be determined using at most

$$\begin{aligned} 2(2W-1) + \left\lceil (4 \alpha -1) {N - (W+\alpha )\over {\alpha }}\right\rceil < 4N \end{aligned}$$

phaseless STFT measurements. The proof consists of two main stages, outlined below.

3.2.1 Step 1: Determining \(x[\alpha ], x[\alpha -1], \ldots , x[-W+1]\) with \(4W-2\) Phaseless STFT Measurements

Using \(2W-1\) phaseless measurements of the form \(|Y_{m,0}|\) for \(2W-1\) different values of m we can obtain the Fourier intensity function of the vector

$$\begin{aligned} y_0 = T_0x \circ w = (x[0]w[0], x[-1]w[1],\ldots , x[-W+1]w[W-1]). \end{aligned}$$

Likewise, \(2W-1\) phaseless measurements of the form \(|Y_{m,r_1}|\), where \(r_1L \equiv \alpha \bmod R\), determine the Fourier intensity function of the vector

$$\begin{aligned} y_\alpha = T_\alpha x \circ w = (x[\alpha ]w[0], \ldots , x[\alpha -W +1]w[W-1]). \end{aligned}$$

Note that because \(\alpha = \gcd (L,N)\) and \(R = N/\alpha \), there is a unique \(r_1\) with \(0 < r_1 \le R-1\) such that \(r_1L \equiv \alpha \bmod R\). The two vectors \(y_0\) and \(y_\alpha \) are not algebraically independent as they satisfy the linear equations

$$\begin{aligned} w[j+\alpha ] y_{0}[j] = w[j]y_{\alpha }[j+\alpha ], \quad j=0, \ldots , W-1-\alpha . \end{aligned}$$
(3.3)

The proof of the following result is somewhat technical and is given in Appendix A. Recall from Sect. 3.1.1 that if \(y \in \mathbb {C}^W\), \(A_y\) denotes the Fourier intensity function \(|{\hat{y}}(\omega )|^2\).

Proposition 3.2

A generic pair of vectors \((y_0, y_\alpha )\) satisfying equations (3.3) is determined, up to a global phase, from the Fourier intensity functions of \(y_0\) and \(y_\alpha \). Precisely, if \((y'_0, y'_\alpha )\) is a pair of vectors satisfying equations (3.3) such that \(A_{y_0} = A_{y'_0}\) and \(A_{y_\alpha } = A_{y'_\alpha }\), then \((y'_0,y'_\alpha ) = e^{\iota \theta }(y_0,y_\alpha )\) for some \(e^{\iota \theta } \in S^1\).

We also need the following lemma.

Lemma 3.3

If all coordinates of w are non-zero, then for any pair \((y_0, y_\alpha )\) satisfying equations (3.3) there exists a vector x such that \((y_0, y_\alpha ) = (x \circ w, T_\alpha x \circ w)\).

Proof

Given \(y_0, y_\alpha \) satisfying (3.3), define a vector x by setting

$$\begin{aligned} x[n] = {\left\{ \begin{array}{ll} y_0[n]/w[-n]&{} \quad \text {if} \quad -W +1 \le n \le 0, \\ y_\alpha [n]/w[\alpha -n]&{} \quad \text {if} \quad 0 < n \le \alpha , \\ \text {arbitrary}&{} \quad \text {else}. \end{array}\right. } \end{aligned}$$

Then, it is easy to check that \((y_0,y_\alpha ) = (x \circ w, T_\alpha x \circ w)\). \(\square \)

Proposition 3.2 and Lemma 3.3 imply that for generic (xw) the vectors \(x \circ w\) and \(T_\alpha \circ w\) are uniquely determined, up to a global phase, by \(2(2W-1)\) phaseless STFT measurements of the form \(|Y_{0,m}(x)|\) and \(|Y_{r_1,m}(x)|\). In particular, if \(x'\) is another vector such that \(|Y_{m,0}(x')| = |Y_{m,r_1}(x')|\) for \(2W-1\) distinct values of m, then \((x' \circ w, T_\alpha x' \circ w) = e^{\iota \theta }(x\circ w, T_\alpha x \circ w)\). By imposing the condition that x[0]w[0] is positive real, we can eliminate the global phase ambiguity and conclude that \((x'\circ w, T_\alpha x' \circ w) = (x\circ w, T_\alpha x \circ w)\). In other words, \((x'[0]w[0], \ldots x'[-W +1]w[W-1]) = (x[0]w[0], \ldots , x[W-1]w[W-1])\) and \((x'[\alpha ]w[0], \ldots x'[\alpha -W +1] w[W-1]) = (x[\alpha ]w[0],\ldots x[\alpha -W+1]w[W-1])\). If we assume that \(w[0], \ldots , w[W-1]\) are non-zero, then it follows that \(x'[n] = x[n]\) for \(-W+1 \le n \le \alpha \). Therefore, we conclude that the \(2(2W-1)\) phaseless STFT measurements determine \(W+\alpha \) entries of the signal x, namely, \(x[-W+1], x[-W+2], \ldots x[0], \ldots x[\alpha ]\).

3.2.2 Determining the Remaining \(N-(W+\alpha )\) Entries of x using \((4\alpha -1)\left\lceil \frac{N-(W+\alpha )}{\alpha } \right\rceil \) Phaseless STFT Measurements

Consider the vector

$$\begin{aligned} y_{2\alpha } = (x[2\alpha ]w[0], \ldots , x[\alpha +1]w[\alpha -1], x[\alpha ]w[\alpha ], \ldots , x[-W + 2\alpha +1]x[W-1]). \end{aligned}$$

By Step 1 we know the entries \(y_{2\alpha }[n]\) for \(n \in [\alpha , W-1] \subset [0, W-1]\). In particular, all unknown entries of \(y_{2\alpha }\) lie in the subset \(S = [0, \alpha -1]\) of \([0, W-1]\). Hence, by [13, Proposition IV.3, Corollary IV.4], a generic vector \(y_{2\alpha }\) can be recovered from the values of its Fourier intensity function \(A_{y_{2\alpha }}\) at \(2|S-S| -1 + 2|S|\) distinct roots of unity. In our case, \(|S| = |S-S| = \alpha \). Hence, \(y_{2\alpha }\) can be recovered from the value of \(A_{y_{2\alpha }}\) at \(4 \alpha -1\) distinct roots of unity. Now, the phaseless STFT measurements \(|Y_{m,r_2}|\), where \(r_2 L \equiv 2\alpha \bmod N\), are the values of the Fourier intensity function of \(y_{2\alpha }\). Hence, we can recover \(y_{2\alpha }\) from \(|Y_{m,r_2}|\) for \(4\alpha -1\) values of m.

We can now complete the proof by induction. If \(x[-W+1], \ldots , x[j \alpha ]\) are known, then we require \(4 \alpha -1\) phaseless measurements of the form \(|Y_{r_j,m}|\) to determine the next \(\alpha \) entries \(x[j\alpha +1], \ldots x[(j+1)\alpha ]\) of x. (Here, \(r_{j}L \equiv j\alpha \bmod R\)). It follows that we can determine all entries of x from (at most)

$$\begin{aligned} 2(2W-1) + \left\lceil \frac{(4\alpha -1)(N-(W+\alpha ))}{\alpha } \right\rceil \end{aligned}$$

phaseless STFT measurements.

3.3 Proof of Theorem 2.4

In this section, we prove that even if the window is not known, we can recover a generic pair \((x,w) \subset \mathbb {C}^N \times \mathbb {C}^W\) from \(\sim 4N+2W\) measurements, up to the action of the group G of trivial ambiguities. The strategy of our proof follows the proof of Theorem 2.1. We begin by explicitly define the group of ambiguities.

3.3.1 The Group of Ambiguities

Let G be the group \(S^1 \times (\mathbb {C}^*)^\alpha \times \mathbb {Z}_R\), where we identify \(\mathbb {Z}_R\) with the group of R-th roots of unity. We define an action of G on \(\mathbb {C}^N \times \mathbb {C}^W\) as follows:

  • \(e^{\iota \theta }\in S^1\) acts by \(e^{\iota \theta }(x,w) = (e^{\iota \theta }x, e^{\iota \theta }w)\).

  • \( \lambda = (\lambda [0], \ldots , \lambda [\alpha -1]) \in (\mathbb {C}^*)^\alpha \) acts on x by

    $$\begin{aligned} \begin{aligned} \left( \lambda [0] x[0], \lambda [\overline{1}]x[1], \ldots , \lambda [\overline{N-1}] x[N-1]\right) , \end{aligned} \end{aligned}$$

    and on w by

    $$\begin{aligned} \begin{aligned} \left( \lambda [0]^{-1} w[0], \lambda [\overline{-1}]^{-1} w[1], \ldots , \lambda [\overline{-W+1}]^{-1}w[W-1]\right) , \end{aligned} \end{aligned}$$

    where \(\overline{j}\) indicates the residue of j modulo \(\alpha \).

  • If \(\omega \) is an R-th root of unity, then \(\omega \) acts by \(\omega (x,w) = (x',w')\), where \(x'[n] = \omega ^{\lfloor n/\alpha \rfloor } x[n]\) and \(w'[n] = \omega ^{\lceil n/\alpha \rceil } w[n]\). Note that since R|N this action is well defined even though our indices are always taken modulo N.

Proposition 3.4

If \(g \in G\) then for all mr, we have \(|Y_{m,r}(x,w)| =|Y_{m,r}(g(x,w))|\); i.e., the phaseless STFT periodic STFT measurements are invariant under the action of G.

Proof

The action of \(S^1\) on \(\mathbb {C}^N \times \mathbb {C}^W\) clearly preserves the magnitude of the STFT measurements. The STFT measurements are measurements of Fourier transform of the vectors \(y_{j\alpha }(x,w) = (x[j\alpha ]w[0], \ldots , x[j\alpha -W +1]w[W-1])\), where \(j\in [0,R-1]\) is defined by equation \(j \alpha \equiv rL \bmod N\). If \(\lambda = (\lambda _0, \ldots , \lambda _{\alpha -1})\), then \(y_{j\alpha }(\lambda (x,w))[n] = \lambda [\overline{j\alpha }-n] \lambda [\overline{-n}]^{-1} x[j\alpha -n]w[n].\) Since \(j\alpha -n \equiv -n \bmod \alpha \), we see that \(y_{j\alpha }(\lambda (x,w))[n] = y_{r\alpha }(x,w)[n]\). In other words, the action of \((\mathbb {C}^*)^\alpha \) preserves the STFT measurements. Finally, if \(\omega ^{R} = 1\) then \(y_{j\alpha }(\omega (x,w))[n] = \omega ^{j}y_{j\alpha }(x,w)\). Hence, the \(y_{j\alpha }(x,w)\) and \(y_{j \alpha }(\omega (x,w))\) have the same Fourier intensity functions. \(\square \)

3.3.2 Strategy of the Proof of Theorem 2.4

Our goal is to prove that for generic (xw), if \(|Y_{m,r}(x',w')| = |Y_{m,r}(x,w)|\) then \((x',w')\) is related to (xw) by the action of the ambiguity group G. Moreover, we will show that we can determine \((x', w')\) using at most

$$\begin{aligned} 3(2W-1) + \left\lceil {{(4 \alpha -1)(N -(W +2\alpha ))}\over {\alpha }} \right\rceil \end{aligned}$$

STFT measurements.

To begin, by applying the \(S^1 \times (\mathbb {C}^*)^\alpha \) factor in G, we may assume that \(w[0], \ldots , w[\alpha -1]\) are known (for example, we can assume that they are all equal to 1) and that x[0] is positive real. Hence, our goal is to show that if \((x',w')\) is a solution for \(|Y_{m,r}(x', w')| = |Y_{m,r}(x,w)|\) with \(x'[0]\) positive real and \(w'[0] \ldots w'[\alpha -1] =1\), then \((x',w')\) is obtained from (xw) by the action of the group of R-th roots of unity.

3.3.3 Recovery of \(y_0, y_\alpha , y_{-\alpha }\), up to a Phase, from \(3(2W-1)\) Measurements

Consider the three vectors

  1. (1)

    \(y_{-\alpha }:=(x[-\alpha ]w[0],\ldots ,x[-\alpha -(W-1)]w[W-1])\);

  2. (2)

    \(y_{0}:=(x[0]w[0],\ldots ,x[-(W-1)]w[W-1]])\);

  3. (3)

    \(y_{\alpha }:=(x[\alpha ]w[0],\ldots , x[0]w[\alpha ], \ldots x[\alpha -(W-1)]w[W-1])\).

The phaseless \(3(2W-1)\) measurements of the form \(|Y_{m,0}(x,w)|, |Y_{m,r_1}(x,w)|, |Y_{m,r_{-1}}(x,w)|\), for \(2W-1\) distinct values of m, determine the Fourier intensity functions \(A_{y_0}, A_{y_\alpha }, A_{y_{-\alpha }}\), respectively. Here \(r_1, r_{-1} \in [0,R-1]\) are defined by the condition that \(\alpha \equiv r_1 L \bmod N\) and \(-\alpha \equiv r_{-1} L \bmod N\).

The triple \((y_{0},y_{-\alpha },y_{\alpha })\) satisfies the quadratic relations

$$\begin{aligned} y_{-\alpha }[\ell ]y_{\alpha }[\ell + \alpha ] = y_{0}[\ell ]y_{0}[\ell +\alpha ], \quad \ell =0,\ldots ,W-1- \alpha . \end{aligned}$$
(3.4)

By construction, the map \(\Phi :\mathbb {C}^{N}\times \mathbb {C}^{W}\rightarrow \mathbb {C}^{W}\times \mathbb {C}^{W}\times \mathbb {C}^{W}\), \(\Phi (x,w)= (y_{0},y_{-\alpha }, y_{\alpha })\), has image contained in the algebraic subset of \((\mathbb {C}^W)^3\) by equations (3.4). Let Z be the closure of the image. The following proposition is proved in Appendix 1.

Proposition 3.5

For generic \((z_0,z_\alpha , z_{-\alpha }) \in Z \subset (\mathbb {C}^W)^3\), if \((z'_0, z'_\alpha , z'_{-\alpha }) \in Z\) have the same Fourier intensity functions as \((z_0,z_\alpha , z_{-\alpha })\), then there are angles \(\theta _0, \theta _\alpha \) such that \(z'_0 = e^{\iota \theta _0}z_0\), \(z'_\alpha = e^{\iota (\theta _0 + \theta _{\alpha })}z_\alpha \), and \(z'_{-\alpha } = e^{\iota (\theta _0- \theta _\alpha )} z_{-\alpha }\).

Applying the action of the subgroup \(S^1 \times \mathbb {C}^*\) of the ambiguity group G we may assume that x[0] is real and positive and that

$$\begin{aligned} w[0] =\ldots = w[\alpha -1]=1. \end{aligned}$$

It then follows from Proposition 3.5 that if \((x',w')\) is a pair such that \(|Y_{m,r}(x',w')| = |Y_{m,r}(x,w)|\) for \(2W-1\) distinct values of m for \(r = 0, r_1, r_{-1},\) then we may assume that

$$\begin{aligned} y_0(x',w')= & {} y_0(x,w),\\ y_\alpha (x',w')= & {} e^{\iota \theta _\alpha }y_\alpha (x,w),\\ y_{-\alpha }(x',w')= & {} e^{-\iota \theta _\alpha }y_{-\alpha }(x,w). \end{aligned}$$

and

$$\begin{aligned} w'[0] = w[0],\ldots w'[\alpha -1] =w[\alpha -1]. \end{aligned}$$

It follows that \(x'[-\ell ]=x[-\ell ]\) for \(\ell =0,\ldots ,\alpha -1\). The equality \(y_{\alpha }(x',w')[\alpha +\ell ] = e^{\iota \theta _\alpha } y_{\alpha }(x,w)[\alpha +l]\) implies

$$\begin{aligned} w'[\alpha +\ell ]=e^{\iota \theta _{\alpha }} w[\alpha +\ell ]. \end{aligned}$$

Since \(y_{0}(x',w')[\alpha +\ell ]=y_{0}(x,w)[\alpha +\ell ]\), we conclude that

$$\begin{aligned} x'[-\alpha -\ell ]=e^{-\iota \theta _{\alpha }} x[-\alpha -\ell ]. \end{aligned}$$

The equality \(y_{\alpha }(x',w')[2\alpha +\ell ]=e^{\iota \theta _\alpha } y_{\alpha }(x,w)[2\alpha +\ell ]\), then implies that

$$\begin{aligned} w'[2\alpha +\ell ]=e^{2\iota \theta _{\alpha }}w[2\alpha +\ell ]. \end{aligned}$$

Going back to \(y_{0}(x',w')[2\alpha +\ell ]\) and \(y_{0}(x,w)[2\alpha +\ell ]\), we deduce that

$$\begin{aligned} x'[-2\alpha -\ell ]=e^{-2\iota \theta _{\alpha }}x[-2\alpha -\ell ]. \end{aligned}$$

This procedure goes on. In the end, we conclude that

$$\begin{aligned} w'[n]=e^{\iota \lfloor n/\alpha \rfloor \theta _{\alpha }}w[n], \end{aligned}$$

for \(n = 0, \ldots W-1\) and

$$\begin{aligned} x'[m]= e^{\iota \lceil m/\alpha \rceil \theta _{\alpha }} x[m], \end{aligned}$$

for the \(W+2\alpha \) values \(m = \alpha , \alpha -1, \ldots , 0, \ldots , -(W-1+\alpha )\).

3.3.4 Determining the Other Values of x[n]

We can now proceed recursively to compute x[n] for \(n \notin [-W+1-\alpha , \alpha ]\). Consider the vector

$$\begin{aligned} y_{2\alpha }[x,w] = (x[2\alpha ]w[0], x[2\alpha -1]w[1], \ldots x[\alpha ]w[\alpha ], \ldots , x[2\alpha -W +1]w[W-1]). \end{aligned}$$

By our first step, we know the \(W-\alpha \) entries of \(y_{2\alpha }\) up to the unknown common phase \(e^{2\iota \theta _\alpha }\). Precisely, \(y_{2\alpha }(x',w')[n] = e^{2 \iota \theta _\alpha } y_{2\alpha }(x,w)[n]\) for \(n \ge \alpha \). In particular, we know the last \(W-\alpha \) entries of the vector \(z_{2\alpha } = y_{2\alpha }(x',w')/(x'[\alpha ]w'[\alpha ])\). (Note that we assume that \(x[\alpha ]w[\alpha ]\) is non-zero.) Also, since \(|x[\alpha ]w[\alpha ]|\) is known, the STFT measurements \(Y_{r_2,m}(x,w)/|x[\alpha ]w[\alpha ]|\) give values of the Fourier intensity function \(A_{z_{2\alpha }}\) of \(z_{2\alpha }\). By [13, Corollary IV.3], the vector \(z_{2\alpha }\) can be determined from \(4 \alpha -1\) phaseless measurements. It follows that for \(0\le \ell \le \alpha -1\), \(x'[2\alpha -\ell ]w'[\ell ] = e^{ 2\iota \theta _\alpha } x[2\alpha -\ell ]w[\ell ]\). Since we have assumed that \(w'[\ell ] = w[\ell ] =1\) for \(0 \le \ell \alpha -1\), we deduce that \(x'[\alpha + \ell ] = e^{2\iota \theta _\alpha }x[\alpha +\ell ]\) for \(0 \le \ell \alpha -1\).

We can now continue by recursion, using \(4\alpha -1\) phaseless STFT measurements at each step, to determine that \(y_{j \alpha }(x',w') = e^{\iota j \theta _\alpha }y_{j\alpha }\) for \(j = 3,\ldots , \lceil (N-W-2\alpha )/\alpha \rceil \). This in turn implies that \(x'[n] = e^{\iota \lceil n/\alpha \rceil \theta _\alpha }x[n]\). However, since our indexing is taken modulo N, \(x[-n] = x[N-n]\) so that \(e^{\iota \lceil -n/\alpha \rceil \theta _\alpha } = e^{\iota \lceil N-n/\alpha \rceil \theta _{\alpha }}\). Recalling that \(N = R \alpha \) we see that this condition is equivalent to the condition that \(R \theta _\alpha \equiv 0 \bmod 2 \pi \); i.e., \(e^{\iota \theta _\alpha }\) is an R-th root of unity. Hence, \((x',w')\) is equivalent to (xw) under the action of the ambiguity group G, as desired.

4 Numerical Experiments

We conducted numerical experiments to examine the bound of Theorem 2.1. To recover the signal from samples of its phaseless STFT measurements, we used the relaxed-reflect-reflect (RRR) algorithm, whose \((t+1)\)st iteration reads

$$\begin{aligned} y^{t+1} = y^{t} + \beta (P_1(2P_2(y^{t})-y^{t})-P_2(y^{t})), \end{aligned}$$
(4.1)

where \(P_1\) and \(P_2\) are projection operators, and \(\beta \) is a parameter; we set \(\beta =1/2.\) RRR is a general computational framework for constraint satisfaction problems, such as phase retrieval, graph coloring, sudoku, and protein folding [26, 27]. In our setting, the algorithm aims to estimate the full \(N^2\) STFT entries (with phases) from a random subset of its magnitudes. The full STFT uniquely determines the corresponding signal. In particular, in our setting, the first projection, \(P_1\), is the orthogonal projector onto the subspace of matrices which are the STFT of some signal. Namely,

$$\begin{aligned} P_1 = AA^\dagger , \end{aligned}$$
(4.2)

where A is the STFT operator as a matrix, and \(A^\dagger \) is its pseudo-inverse. The second projection, \(P_2\), uses the measured data and is acting by

$$\begin{aligned} (P_2z)[i] = {\left\{ \begin{array}{ll} \text {sign}(z[i])|y[i]|,&{} \quad i\in M, \\ z[i],&{} \quad i\notin M, \end{array}\right. } \end{aligned}$$
(4.3)

where M denotes the set of STFT entries for which the magnitudes are known (the measurements), |y[i]| is the ith STFT magnitude, and \(\text {sign}(z[i]):=\frac{z[i]}{|z[i]|}\).

We use RRR since it is guaranteed to halt only when both constraints are satisfied [39, Corollary 4]. Therefore, we expect (although not guaranteed) to find a point whose phaseless STFT matches the measurements after enough RRR iterations. The number of iterations required to find such a point provides a measure of hardness [26]. In our experiments, we stopped the algorithm when the ratio \(||y^{t+1}-y^{t}||/||y^{t}||\) dropped below \(10^{-8}\), or after a maximum of \(10^4\) iterations. We did not conduct experiments for the blind case (Theorem 2.4) since, as far as know, there is no algorithm that is guaranteed to find a feasible point.

In our experiments, we set \(N=11\) and collected KN STFT magnitudes; the entries were chosen uniformly at random for \(K=2,4,6,8.\) The entries of the real underlying signal were drawn from a Gaussian distribution with mean zero and variance 1. Note that since the signal is real, the number of parameters to be recovered is N, and not 2N as in Theorem 2.1. The entries of the window were drawn from the same distribution. For each K, we conducted 100 trials for each pair of (LW), where \(L=1,\ldots ,6\) and \(W=1,\ldots ,11.\) We declared a successful trial if the relative error between the estimated signal and the underlying signal (up to a sign) dropped below \(10^{-4}\).

Figure 2 reports the success rate and the average number of RRR iterations per KWL. As expected, the success rate increases with K. For \(K=2\) (2N STFT magnitudes), we can see that for \(L\le 5\) and large enough W, the RRR usually does not require many iterations, but it does not always find a solution. Nevertheless, the success rate is not negligible. For \(K=6\) and \(K=8\), the success rate tends to 1 for \(L\le 5.\) As can be seen, the true solution is found after a small number of iterations, indicating that the problem is rather easy in this regime. Overall, these experiments indicate that indeed a signal can be recovered from a subset of its phaseless STFT magnitudes, and in some cases, quite easily.

Fig. 2
figure 2

The success rate (left column) and average number of iterations (right column) for recovering a signal from its NK phaseless STFT measurements for \(K=2,4,6,8\)

5 Orbit Frame Phase Retrieval

The periodic STFT phase retrieval problem leads to a natural mathematical generalization which we refer to as phase retrieval for orbit frames. Let H be a compact group acting on \(\mathbb {C}^N\). The orbit of a possibly unknown generating kernel \(u \in \mathbb {C}^N\) is the set \(\{hu| h \in H\}\). An orbit frame is a matrix \(A\in \mathbb {C}^{M \times N}\) \((M\ge N)\) of rank N whose rows are samples of the vectors in hu. The phase retrieval problem for an orbit frame is determining whether a vector x can be recovered, up to symmetries, from the phaseless measurements \(|Ax| \in \mathbb {R}_{\ge 0}^M\).

The definition of orbit frames is broad, and our main focus for future work is the case where the group H is of the form \(G \times \mathbb {T}\), where \(\mathbb {T}\) is subgroup of \(S^1\) acting on \(\mathbb {C}^N\) with weights \((0,1 \ldots , N-1)\), and G is a finite group. In this model, our phaseless frame measurements on a vector x are samples of the Fourier intensity functions \(|{\widehat{D_1x}}(\omega )|^2, \ldots , |{\widehat{D_rx}}(\omega )|\), where \(D_1, \ldots D_r\) are diagonal matrices obtained from the action of the group G on the kernel vector u, and \(\widehat{D_1x}, \ldots \widehat{D_rx}\) are the Fourier transforms of \({D_1x}, \ldots {D_rx}\). In particular, the periodic STFT model can be thought of as a special case, where \(\mathbb {T}= \mathbb {Z}_N\) is the group of N-th roots of unity, \(H = \mathbb {Z}_N\) is the group of cyclic translations and the kernel \(u=w\) has support length W. (When the kernel u is arbitrary, this is a Gabor frame; perfect phase retrieval for full Gabor frames was studied in [18].) The diagonal matrices \(D_1, \ldots D_r\) are \({{\,\textrm{diag}\,}}(w), {{\,\textrm{diag}\,}}(T_{L}w), \ldots {{\,\textrm{diag}\,}}(T_{L(R-1)}w)\), where \(T_L\) is the translation operator shifting the entries of w by L entries. The phaseless periodic STFT measurements are obtained by sampling the functions \(|{\widehat{D_jx}}(\omega )|^2\) at the N-th roots of unity.

The orbit frame phase retrieval problem has been previously studied by a number of authors [18, 21, 40, 45] with the main focus being on constructing large frames, typically of size \(M=O(N^2)\), which admit perfect reconstruction from phaseless measurements. As in this paper, we wish to construct smaller frames, of size O(N), for which generic vectors can be recovered from phaseless measurements. Although this problem is mathematically motivated, understanding the information-theoretic limits of the general model has the potential to inspire physicists and engineers to develop new measurement techniques.