Key words:

Introduction

The problem of identification of a time-variant communication channel arose in the 1950s as the problem of secure long-range wireless communications became increasingly important due to the geopolitical situation at the time. Some of the theoretical and practical advances made then are described in this paper, and more recent advances extending the theory to more general operators, and onto a more rigorous mathematical footing, known as sampling of operators are developed here as well.

The launching point for the theory of operator sampling is the early work of the third-named author in his Master’s thesis at MIT, entitled “Sampling models for linear time-variant filters” [19], see also [22, 23], and [21] in which he reviews the identification problem for time-variant channels. The third named author as well as Bello in subsequent work [6] were attempting to understand and describe the theoretical limits of identifiability of time-variant communication channels. Section “Historical Remarks” of this paper describes in some detail their work and explores some of the important mathematical challenges they faced. In Section “Operator Sampling”, we describe the more recently developed framework of operator sampling. Results addressing the problem considered by Bello are based on insights on finite dimensional Gabor systems which are presented in Section “Linear Independence Properties of Gabor Frames”. Malikiosis’s recent result [32] allows for the generalization of those results to a higher-dimensional setting, these are stated and proven in Section “Generalizations of operator sampling to higher dimensions”. We conclude the paper in Section “Further results on operator sampling” with a short summary of the sampling of operators literature, that is, of results presented in detail elsewhere.

Historical Remarks

The Cold War Origins of the Rake System

In 1958, Price and Green published A Communication Technique for Multi-path Channels in Proc. IRE [56], in which they describe a communication system called Rake, designed to solve the multi-path problem. When a wireless transmitter does not have line-of-sight with the receiver, the transmitted signal is reflected possibly multiple times before reaching the receiver. Reflection by stationary objects such as the ground or buildings introduces random time delays to the signal, and reflection or refraction by moving objects such as clouds, the troposphere, ionosphere, or a moving vehicle produce random frequency or Doppler shifts in the signal as well. Due to scattering and absorption, the reflected signals are randomly amplitude-attenuated too. The problem is to recover the transmitted signal as accurately as possible from the superposition of time-frequency-shifted and randomly amplitude-attenuated versions of it. Since the location and velocities of the reflecting objects change with time, the effects of the unknown, time-variant channel must be estimated and compensated for.

Price and Green’s paper [56] was the disclosure in the literature of a long-distance system of wide-band or spread-spectrum communications that had been developed in response to strategic needs related to the Cold War. This fascinating story has been described in several articles by those directly involved [12, 55, 57, 58]. We present a summary of those remarks and of the Rake system below. The goal is to motivate the original work of the third-named author on which the theory of operator sampling is based.

In the years following World War II, the Soviet Union was exercising its power in Eastern Europe with a major point of contention being Berlin, which the Soviets blockaded in the late 1940s. This made secure communication with Berlin a top priority. As Paul Green describes it,

[T]he Battle of Berlin was raging, the Russians having isolated the city physically on land, so that the Berlin Airlift was resorted to, and nobody knew when all the communication links would begin to be subjected to heavy Soviet jamming. [12]

By 1950, with a shooting war in Korea about to begin, the Army Signal Corps approached researchers at MIT to develop secure, and reliable wireless communication with the opposite ends of the world. According to Green,

It is difficult today to recall the fearful excitement of those times. The Russians were thought to be 12 feet high in anything having to do with applying mathematics to communication problems (“all Russians were Komogorovs or Kotelnikovs”)....[T]here was a huge backlog of unexploited theory lying around, and people were beginning to build digital equipment with the unheard of complexity of a hundred or so vacuum tube-based bits (!). And the money flowed. [12]

The effort was called Project Lincoln (precursor to Lincoln Laboratory). The researchers were confronted by two main problems: 1) making a communications system robust to noise and deliberate jamming, and 2) enabling good signal recovery from multiple paths.

Spread Spectrum communications and NOMAC

The technique chosen to address the first problem is an application of the notion, already well understood and used by that time, that combatting distortions from noise and jamming can be achieved by spreading the signal over a wide frequency band. The idea of spreading the spectrum had been around for a long time [51, 55, 63] and can be found even in a now famous Hedy Lamarr-George Antheil patent of 1942 [33, 55], which introduced the concept later called “frequency hopping”. The system called NOMAC (Noise Modulation and Correlation) was developed in the early 1950s and used noise-like (pseudo-noise or PN) signals to achieve spectrum spreading. Detailed discussion of its history can be found in [12, 55, 64].

The huge backlog of “unexploited theory” mentioned above included the recent work of Claude Shannon on communication theory [61], of Norbert Wiener on correlation functions and least mean squares prediction and filtering [65], and recent applications of statistical decision theory to detection problems in radar and communications.

The communication problem addressed by NOMAC was to encode data represented by a string of ones and zeros into analog signals that could be electromagnetically transmitted over a noisy communication channel in a way that foiled “jamming” by enemies. The analog signals x 1(⋅ ) and x 0(⋅ ), commonly called Mark and Space, associated with the data digits 1 and 0, were chosen to be waveforms of approximate bandwidth B, and with small cross correlation. The target application was 60 wpm teletype, with 22 msec per digit (called a baud), which corresponds to a transmission rate of 1∕0. 022 sec = 45 Hz. The transmitted signals were chosen to have a bandwidth of 10 KHz, which was therefore expected to yield a “jamming suppression ratio” of 10, 000∕45 = 220, or 23 db [12, 64]. The jamming ratio is often called the “correlation gain”, because the receiver structure involves cross correlation of the received signal with each of the possible transmitted signals. If the correlation with the signal x 1(⋅ ) is larger than the one with the signal x 0(⋅ ), then it is decided that the transmitted signal corresponded to the digit 1. This scheme can be shown to be optimum in the sense of minimum probability of error provided that the transmitted signals are not distorted by the communications channel and that the receiver noise is white Gaussian noise (see, for example, [16]). The protection against jamming is because unless the jammer has good knowledge of the noise-like transmitted signals, any jamming signals would just appear as additional noise at the output of the crosscorrelations.

More details on the nontrivial ideas required for building a practical system can be found in the references. We may mention that the key ideas arose from three classified MIT dissertations by Basore [4], Pankowski [34], and Green [10], in fact, documents on NOMAC remained classified until 1961 [12].

A transcontinental experiment was run on a NOMAC system, but was found to have very poor performance because of the presence of multiple paths; the signals arriving at the receiver by these different paths sometimes interfere destructively. This is the phenomenon of “fading”, which causes self-jamming of the system. Some improvement was achieved by adding additional circuitry and the receiver to separately identify and track the two strongest signals and combine them after phase correction; this use of time and space diversity enabled a correlation gain of 17 db, 6 db short of the expected performance. It was determined that this loss was because of the neglected weaker paths, of which there could be as many as 20 or 30. So attention turned to a system that would allow the use of all the different paths.

The RAKE system

One conceptual basis for this new system was provided by the doctoral thesis of Robert Price [52], the main results of which were published in 1956 [53]. In a channel with severe multi-path the signal at the receiver is composed of large number of signals of different amplitudes and phases and so Price made the assumption that the received “signal” was a Gaussian random process. He studied the problem of choosing between the hypothesis

$$\displaystyle{H_{i}: w(\cdot ) = Ax_{i}(\cdot ) + n(\cdot ),\quad i = 0,1,}$$

where the random time variant linear communication channel A is such that the {Ax i (⋅ )} are Gaussian processes. In this case, the earlier cross correlation detection scheme makes no sense, because the “signal” arriving at the receiver is not deterministic but is a sample function of a random process, which is not available to the receiver because it is corrupted by the additive noise. Price worked out the optimum detection scheme and then ingeniously interpreted the mathematical formulas to conclude that the new receiver forms least mean-square estimates of the {Ax i (⋅ )} and then crosscorrelates the w(⋅ ) against these estimates. In practice of course, one does not have enough statistical information to form these estimates and therefore more heuristic estimates are used and this was done in the actual system that was built. The main heuristic, from Wiener’s least mean-square smoothing filter solution and earlier insights, is that one should give greater weight to paths with higher signal-to-noise ratio.

So Price and Green devised a new receiver structure comprised of a delay line of length 3 ms intervals (the maximum expected time spread in their channel), with 30 taps spaced every 1∕10 Khz, or 100 μs. This would enable the capture of all the multi-path signals in the channel. Then the tap gains were made proportional to the strength of the signal received at that tap. Since a Mark/Space decision was only needed every 22 ms (for the transmission rate of 60 wpm), and since the fading rate of the channel was slow enough that the channel characteristics remain constant over even longer than 22 ms, tap gains could be averaged over several 3 ms intervals. The new system was called “Rake”, because the delay line structure resembled that in a typical garden rake!

Trials showed that this scheme worked well enough to recover the 6 db loss experienced by the NOMAC system. The system was put into production and was successfully used for jam-proof communications between Washington DC and Berlin during the “Berlin crisis” in the early 60s.

HF communications is no longer very significant, but the Rake receiver has found application in a variety of problems such as sonar, the detection of underground nuclear explosions, and planetary radar astronomy (pioneered by Price and Green, [11, 54]) and currently it is much used in mobile wireless communications. It is interesting to note that the eight racks of equipment needed to build the Rake system in the 1960s is now captured in a small integrated circuit chip in a smart phone!

However the fact that the Rake system did not perform satisfactorily when the fading rates of the communication channel were not very slow led MIT professor John Wozencraft, (who had been part of the Rake project team at Lincoln Lab) to suggest in 1957 (even before the open 1958 publication of the Rake system) to his new graduate student Thomas Kailath a fundamental study of linear time-variant communication channels and their identifiability for his Masters thesis. While time-variant linear systems had begun to be studied at least as early as 1950 (notably by Zadeh [66]), in communication systems there are certain additional constraints, notably limits on the bandwidths of the input signal and the duration of the channel memory. So a more detailed study was deemed to be worthwhile.

Kailath’s Time-Variant Channel Identification Condition

In the paper [19], the author considers the problem of measuring a channel whose characteristics vary rapidly with time. He considers the dependence of any theoretical channel estimation scheme on how rapidly the channel characteristics change and concludes that there are theoretical limits on the ability to identify a rapidly changing channel. He models the channel A as a linear time-variant filter and defines

\(A(\lambda,t) =\) response of A, measured at time t to a unit impulse input at time \(t-\lambda\).

\(A(\lambda,t)\) is one form of the time-variant impulse response of the linear channel that emphasizes the role of the “age” variable \(\lambda\). The channel response to an input signal x(⋅ ) is

$$\displaystyle{Ax(t) =\int A(\lambda,t)\,x(t-\lambda )\,d\lambda.}$$

An impulse response \(A(\lambda,t) = A(\lambda )\) represents a time-invariant filter. Further, the author states

Therefore the rate of variation of \(A(\lambda,t)\) with t, for fixed \(\lambda\), is a measure of the rate of variation of the filter. It is convenient to measure this variation in the frequency domain by defining a function \(\mathcal{A}\)

$$\displaystyle{\mathcal{A}(\lambda,f) =\int _{ -\infty }^{\infty }A(\lambda,t)e^{-2\pi ift}dt\quad }$$

Then he defines

$$\displaystyle{B =\max _{\lambda }[b - a,\text{ where }\mathcal{A}(\lambda,f) = 0\text{ for }f\notin [a,b]\,].}$$

While symmetric support is assumed in the paper, this definition makes clear that non-rectangular regions of support are already in view. Additionally, he defines the memory as the maximum time-delay spread in response to an impulse of the channel as

$$\displaystyle{L =\max _{t}[\min _{\lambda '}\text{ such that }A(\lambda,t) = 0,\ \lambda \geq \lambda '].}$$

In short, the assumption in the continuation of the paper is that

$$\displaystyle{\mathop{\mathrm{supp}}\nolimits \mathcal{A}(\lambda,f) \subseteq [0,L] \times [-W,W]}$$

where W = B∕2. The function \(\mathcal{A}(\lambda,f)\) is often called the spreading function of the channel. He then asks under what assumptions on L and B = 2W can such a channel be measured? In the context of the Rake system, this translates to the question of whether there are limits on the rate of variation of the filter that can assure that the measurement filter can be presumed to be effective.

The author’s assertion is that as long as BL ≤ 1, then a “simple measurement scheme” is sufficient.

We have assumed that the bandwidth of any “tap function”, \(A_{\lambda }(\cdot )\,[= A(\lambda,\cdot )]\), is limited to a frequency region of width B, say a low-pass region (−W, W) for which B = 2W. Such band-limited taps are determined according to the Sampling theorem, by their values at the instants i∕2W, i = 0, ±1, ±2, .

If the memory, L, of the filter, \(A(\lambda,t)\) is less than 1∕2W these values are easily determined: we put in unit impulses to \(A(\lambda,t)\) at instants 0,  1∕2W,  2∕2W, , T, and read off from the responses the desired values of the impulse response \(A(\lambda,t)\). [...] If L ≤ 1∕2W, the responses to the different input impulses do not interfere with one another and the above values can be unambiguously determined.

In other words, sufficiently dense samples of the tap functions can be obtained by sending an impulse train \(\sum _{n}\delta _{n/2W}\) through the channel. Indeed,

$$\displaystyle{A\big(\sum _{n}\delta _{n/2W}\big)(t) =\sum _{n}\int A(\lambda,t)\,\delta _{n/2W}(t-\lambda )\,d\lambda =\sum _{n}A(t - n/2W,t).}$$

Evaluating the operator response at \(t =\lambda _{0} + n_{0}/2W\), n 0 ∈ , we obtain

$$\displaystyle\begin{array}{rcl} A\big(\sum _{n}\delta _{n/2W}\big)(\lambda _{0} + n_{0}/2W)& =& \sum _{n}A(\lambda _{0} + (n_{0} - n)/2W,\lambda _{0} + n_{0}/2W) {}\\ & =& A(\lambda _{0},\lambda _{0} + n_{0}/2W) {}\\ \end{array}$$

since L ≤ 1∕2W implies that \(A(\lambda _{0} + (n_{0} - n)/2W,\lambda _{0} + n_{0}/2W) = 0\) if nn 0. In short, for each \(\lambda\), the samples \(A(\lambda,\lambda +n/2W)\) for n ∈  can be recovered.

The described Kailath sounding procedure is depicted in Figure 1. In this visualization, we plot the kernel κ(s, t) = A(ts, t) of the operator A, that is,

$$\displaystyle{Ax(t) =\int A(\lambda,t)\,x(t-\lambda )\,d\lambda =\int A(t - s,t)\,x(s)\,ds =\int \kappa (t,s)\,x(s)\,ds.}$$
Fig. 1
figure 1

Kailath sounding of A with \(\mathop{\mathrm{supp}}\nolimits \mathcal{A}(\lambda,f) \subseteq [0,L] \times [-W,W]\) and L = 1∕2W. The kernel κ(t, s) is displayed on the (t, s) plane, the impulse train \(\sum _{n}\delta _{n/2W}(s)\) on the s-axis, and the output signal \(Ax(t) = A\big(\sum _{n}\delta _{n/2W}\big)(t) =\sum _{n}A(t - n/2W,t) =\sum _{n}\kappa (t,n/2W)\). The sample values of the tab functions \(A_{\lambda }(t) = A(\lambda,t) =\kappa (t,t-\lambda )\) can be read off Ax(t).

Necessity of Kailath’s Condition for Channel Identification

For the “simple measurement scheme” to work, BL ≤ 1 is sufficient but could be restrictive.

We need, therefore, to devise more sophisticated measurement schemes. However, we have not pursued this question very far because for a certain class of channels we can show that the condition

$$\displaystyle{L \leq 1/2W,\text{ i.e. },BL \leq 1}$$

is necessary as well as sufficient for unambiguous measurement of \(A(\lambda,t)\). The class of channels is obtained as follows: We first assume that there is a bandwidth constraint on the possible input signals to \(A(\lambda,t)\), in that the signals are restricted to (−W i , W i ) in frequency. We can now determine a filter \(A_{W_{i}}(\lambda,t)\) that is equivalent to \(A(\lambda,t)\) over the bandwidth (−W i , W i ), and find necessary and sufficient conditions for unambiguous measurement of \(A_{W_{i}}(\lambda,t)\). If we now let \(W_{i} \rightarrow \infty \), this condition reduces to condition (1), viz: L ≤ 1∕2W. Therefore, condition (1) is valid for all filters \(A(\lambda,t)\) that may be obtained as the limit of band-limited channels. This class includes almost all filters of physical interest. The argument is worked out in detail in Ref. 6 Footnote 1 but we give a brief outline here.

The class of operators in view here can be described as limits (in some unspecified sense) of operators whose impulse response \(A(\lambda,t)\) is bandlimited to [−W i , W i ] in \(\lambda\) for each t and periodic with period T > 0 in t for each \(\lambda\). Here, T is assumed to have some value larger than the maximum time over which the channel will be operated. We could take it as the duration of the input signal to the channel.

The restriction to input signals bandlimited to (−W i , W i ) indicates that it suffices to know the values of \(A(\lambda,t)\) or \(\mathcal{A}(\lambda,f)\) for a finite set of values of \(\lambda\): \(\lambda = 0\), 1∕2W i , 2∕2W i , , L, assuming for simplicity that L is a multiple of 1∕2W i . Therefore, we can write

$$\displaystyle\begin{array}{rcl} A(\lambda,t) =\sum _{n}A(n/2W_{i},t)\,\mathrm{sinc}_{W_{i}}(\lambda -n/2W_{i}),& & {}\\ \end{array}$$

where \(\mathrm{sinc}_{W_{i}}(t) =\sin (2\pi W_{i}t)/(2\pi W_{i}t)\) so that as \(W_{i} \rightarrow \infty \), \(\mathrm{sinc}_{W_{i}}(t)\) becomes more concentrated at the origin.

Also, T-periodicity in t allows us to write

$$\displaystyle{A(\lambda,t) =\sum _{k}A(\lambda,k/T)\,e^{2\pi ikt/T},}$$

so that combining gives

$$\displaystyle{A(\lambda,t) =\sum _{n}\sum _{k}A(n/2W_{i},k/T)\,\mathrm{sinc}_{W_{i}}(\lambda -n/2W_{i})\,e^{2\pi ikt/T}.}$$

Based on the restriction to bandlimited input signals which are T periodic, we have obtained a representation of A which is neither compactly supported in \(\lambda\) nor bandlimited in t. However, the original restriction that

$$\displaystyle{\mathop{\mathrm{supp}}\nolimits \mathcal{A}(\lambda,f) \subseteq [0,L] \times [-W,W]}$$

motivates the assumption that we are working with finite sums, viz.

$$\displaystyle{A(\lambda,t) =\sum _{n/2W_{i}\in [0,L]}\sum _{k/t\in [-W,W]}A(n/2W_{i},k/T)\,\mathrm{sinc}_{W_{i}}(\lambda -n/2W_{i})\,e^{2\pi ikt/T}.}$$

This is how the author obtains the estimate that there are at most (2W i L​ +​ 1)(2WT​ +​ 1) degrees of freedom in any impulse response A in the given class.

For any input signal x(t) bandlimited to [−W i , W i ], the output will be bandlimited to [−WW i , W + W i ]. Specifically,

$$\displaystyle\begin{array}{rcl} Ax(t)& =& \int A(\lambda,t)\,x(t-\lambda )\,d\lambda {}\\ & =& \sum _{n/2W_{i}\in [0,L]}\sum _{k/T\in [-W,W]}A(n/2W_{i},k/T)\,e^{2\pi ikt/T} {}\\ & & \qquad \qquad \qquad \int x(t-\lambda )\,\mathrm{sinc}_{W_{i}}(\lambda -n/2W_{i})\,d\lambda {}\\ & =& \sum _{n/2W_{i}\in [0,L]}\sum _{k/T\in [-W,W]}A(n/2W_{i},k/T)\,e^{2\pi ikt/T} {}\\ & & \qquad \qquad \qquad (x {\ast}\mathrm{ sinc}_{W_{i}})(t - n/2W_{i}). {}\\ \end{array}$$

Since \(e^{2\pi ikt/T}\,(x {\ast}\mathrm{ sinc}_{W_{i}})(t - n/2W_{i})\) is bandlimited to [−W i , W i ] + (kT) for kT ∈ [−W, W], it follows that Ax(t) is bandlimited to [−WW i , W + W i ].

If we restrict our attention to signals x(t) time-limited to [0, T], the output signal Ax(t) will have duration T + L, and Ax(⋅ ) will be completely determined by its samples at \(\frac{n} {2(W+W_{i})} \in [0,T + L]\), from which we can identify 2(T + L)(W + W i ) + 1 degrees of freedom.

In order for identification to be possible, the number of degrees of freedom of the output signal must be at least as large as the number of degrees of freedom of the operator, i.e.

$$\displaystyle\begin{array}{rcl} 2W_{i}T + 2W_{i}L + 2WT + 2WL + 1& =& {}\\ 2(T + L)(W_{i} + W) + 1& \geq & (2WT + 1)(2W_{i}L + 1) {}\\ & =& 2WT + 2W_{i}L + 1 + 4W_{i}WTL {}\\ \end{array}$$

which reduces ultimately to

$$\displaystyle\begin{array}{rcl} \frac{1} {1 - 1/(2W_{i}T)} \geq 2WL = BL.& & {}\\ \end{array}$$

That is, BL needs to be strictly smaller than 1 in the approximation while BL = 1 may work in the limiting case \(W_{i} \rightarrow \infty \) (and/or \(T \rightarrow \infty \)).

This result got a lot of attention because it corresponded with experimental evidence that Rake did not function well when the condition BL < 1 was violated. It led to the designation of “underspread” and “overspread” channels for which BL was less than or greater than 1.

Some Remarks on Kailath’s Results

This simple argument is surprising, particularly in light of the fact that the author obtained a deep result in time-frequency analysis with none of the tools of modern time-frequency analysis at his disposal. He very deftly uses the extremely useful engineering “fiction” that the dimension of the space of signals essentially bandlimited to [−W, W] and time-limited to [0, T] is approximately 2WT. The then recent papers of Landau, Slepian, and Pollak [28, 29], which are mentioned explicitly in [19], provided a rigorous mathematical framework for understanding the phenomenon of essentially simultaneous band- and time-limiting. While the existence of these results lent considerable mathematical heft to the argument, they were not incorporated into a fully airtight mathematical proof of his theorem.

In the proof we have used a degrees-of-freedom argument based on the sampling theorem which assumes strictly bandlimited functions. This is an unrealistic assumption for physical processes. It is more reasonable to call a process band (or time) limited if some large fraction of its energy, say 95%, is contained within a finite frequency (or time) region. Recent work by Landau and Slepian has shown the concept of approximately 2TW degrees of freedom holds even in such cases. This leads us to believe that our proof of the necessity of the BL ≤ 1 condition is not merely a consequence of the special properties of strictly band-limited functions. It would be valuable to find an alternative method of proof.

While Kailath’s Theorem is stated for channel operators whose spreading functions are supported in a rectangle, it is clear that the later work of Bello [6] was anticipated and more general regions were in view. This is stated explicitly.

We have not discussed how the bandwidth, B is to be defined. There are several possibilities: we might take the nonzero f-region of \(\mathcal{A}(\lambda,f)\); or use a “counting” argument. We could proceed similarly for the definition of L. As a result of these several possibilities, the value 1, of the threshold in the condition BL ≤ 1 should be considered only as an order of magnitude value.

...constant and predictable variations in B and L, due for example to known Doppler shifts or time displacements, would yield large values for the absolute values of the time and frequency spreadings. However such predictable variations should be subtracted out before the BL product is computed; what appears to be important is the area covered in the time- and frequency-spreading plane rather than the absolute values of B and L. (emphasis added)

The reference to “counting” as a definition of bandwidth clearly indicates that essentially arbitrary regions of support for the operator spreading function were in view here, and that a necessity argument relying on degrees of freedom and not the shape of the spreading region was anticipated. The third-named author did not pursue the measurement problem studied in his MS thesis because he went on in his PhD dissertation to study the optimum (in the sense of minimum probability of error) detector scheme of which Rake is an intelligent engineering approximation. See [20, 21, 23].

The mathematical limitations of the necessity proof in [19] can be removed by addressing the identification problem directly as a problem on infinite-dimensional space rather than relying on finite-dimensional approximations to the channel. This approach also avoids the problem of dealing with simultaneously time and frequency-limited functions. In this way, the proof can be made completely mathematically rigorous. This approach is described in Section “Kailath’s necessity proof and operator identification”.

Bello’s time-variant Channel Identification Condition

Kailath’s Theorem was generalized by Bello in [6] along the lines anticipated in [19]. Bello’s argument follows that of [19] in its broad outlines but with some significant differences. Bello clearly anticipates some of the technical difficulties that have been solved more recently by the authors and others and which have led to the general theory of operator sampling.

Continuing with the notation of this section, Bello considers channels with spreading function \(\mathcal{A}(\lambda,f)\) supported in a rectangle [0, L] × [−W, W]. If L and W are all that is known about the channel, then Kailath’s criterion for measurability requires that 2WL ≤ 1. Bello considers channels for which 2WL may be greater than 1 but for which

$$\displaystyle{S_{A} = \vert \mathop{\mathrm{supp}}\nolimits \mathcal{A}(\lambda,f)\vert \leq 1}$$

and argues that this is the most appropriate criterion to assess measurability of the channel modeled by A.

In order to describe Bello’s proof we will fix parameters T ≫ L and W i  ≫ W and following the assumptions earlier in this section, assume that inputs to the channel are time-limited to [0, T] and (approximately) bandlimited to [−W i , W i ]. Under this assumption, Bello considers the spreading function of the channel to be approximated by a superposition of point scatterers, viz.

$$\displaystyle{\mathcal{A}(\lambda,f) =\sum _{n}\sum _{k}A_{n,k}\,\delta (f - (k/T))\,\delta (\lambda -(n/2W_{i})).}$$

Hence the response of the channel to an input x(⋅ ) is given by

$$\displaystyle\begin{array}{rcl} Ax(t)& =& \int \!\!\!\int x(t-\lambda )\,e^{2\pi if(t-\lambda )}\mathcal{A}(\lambda,f)\,d\lambda \,df \\ & =& \sum _{n}\sum _{k}A_{n,k}\,x(t - (n/2W_{i}))\,e^{2\pi i(k/T)(t-(n/2W_{i}))}.{}\end{array}$$
(1)

Note that this is a continuous-time Gabor expansion with window function x(⋅ ) (see, e.g., [13]). By standard density results in Gabor theory, the collection of functions \(\{x(t - (n/2W_{i}))\,e^{2\pi i(k/T)(t-(n/2W_{i}))}\}\) is overcomplete as soon as 2TW i  > 1. Consequently, without further discretization, the coefficients A n, k are in principle unrecoverable. Taking into consideration support constraints on \(\mathcal{A}\), we assume that the sums are finite, viz.

$$\displaystyle{\bigg( \frac{n} {2W_{i}}, \frac{k} {T}\bigg) \in \mathop{\mathrm{supp}}\nolimits \mathcal{A}.}$$

Hence determining the channel characteristics amounts to finding A n, k for those pairs (n, k). It should be noted that for a given spreading function \(\mathcal{A}(\lambda,f)\) for which \(\mathop{\mathrm{supp}}\nolimits \mathcal{A}\) is a Lebesgue measurable set, given ε > 0, there exist T and W i sufficiently large that the number of such (n, k) is no more than 2S A W i T(1 +ε). On the other hand, for a given T and W i , there exist spreading functions \(\mathcal{A}(\lambda,f)\) with arbitrarily small non-convex S A for which the number of nonzero coefficients A n, k can be large. For example, given T and W i , S A could consist of rectangles centered on the points (n∕(2W i ), kT) with arbitrarily small total area.

By sampling, (1) reduces to a discrete, bi-infinite linear system, viz.

$$\displaystyle{ Ax\bigg( \frac{p} {2W_{i}}\bigg) =\sum _{n}\sum _{k}A_{n,k}\,x\bigg(\frac{p - n} {2W_{i}} \bigg)\,e^{2\pi i \frac{k} {T}( \frac{p-n} {2W_{i}})} }$$
(2)

for p ∈ . Note that (2) is the expansion of a vector in a discrete Gabor system on 2(), a fact not mentioned by Bello, and of which he was apparently unaware. Specifically, defining the translation operator \(\mathcal{T}\) and the modulation operator \(\mathcal{M}\) on 2 by

$$\displaystyle{ \mathcal{T} x(n) = x(n - 1),\qquad \mbox{ and}\qquad \mathcal{M}x(n) = e^{\pi in/(TW_{i})}x(n), }$$
(3)

(2) can be rewritten as

$$\displaystyle{ Ax\bigg( \frac{p} {2W_{i}}\bigg) =\sum _{n}\sum _{k}(\mathcal{T}^{n}\,\mathcal{M}^{k}x)(p)\,A_{ n,k}. }$$
(4)

Since there are only finitely many nonzero unknowns in this system, Bello’s analysis proceeds by looking at finite sections of (4) and counting degrees of freedom.

Necessity. Following the lines of the necessity argument in [19], we note that there are at least 2(T + L)(W + W i ) degrees of freedom in the output vector Ax(t), that is, at least that many independent samples of the form Ax(p∕2W i ), and as observed above, no more than 2S A W i T(1 +ε) nonzero unknowns A n, k . Therefore, in order for the A n, k to be determined in principle, it must be true that

$$\displaystyle{2W_{i}T(1+\epsilon )S_{A} \leq 2(T + L)(W + W_{i})}$$

or

$$\displaystyle{S_{A} \leq \frac{(T + L)(W + W_{i})} {W_{i}T(1+\epsilon )}.}$$

Letting \(T,\,W_{i} \rightarrow \infty \) and ε → 0, we arrive at S A  ≤ 1.

Sufficiency. Considering a section of the system (4) based on the assumption that \(\mathop{\mathrm{supp}}\nolimits \mathcal{A}\subseteq [0,L] \times [-W,W]\), the system has approximately 2W i (T + L) equations in (2W i T)(2WL) unknowns. Since L and 2W are simply the dimensions of a rectangle that encloses the support of \(\mathcal{A}\), 2WL may be quite large and independent of S A . Hence the system will not in general be solvable. However by assuming that S A  < 1, only approximately S A (2W i T) of the A n, k do not vanish and the system reduces to one in which the number of equations is roughly equal to the number of unknowns. In this case it would be possible to solve (4) as long as the collection of appropriately truncated vectors \(\{\mathcal{T}^{n}\mathcal{M}^{k}x: A_{n,k}\neq 0\}\) forms a linearly independent set for some vector x.

In his paper, Bello was dealing with independence properties of discrete Gabor systems apparently without realizing it, or at least without stating it explicitly. Indeed, he argues in several different ways that a vector x that produces a linearly independent set should exist, and intriguingly suggests that a vector consisting of ± 1 should exist with the property that the Grammian of the Gabor matrix corresponding to the section of (4) being considered is diagonally dominant.

The setup chosen below to prove Bello’s assertion leads to the consideration of a matrix whose columns stem from a Gabor system on a finite-dimensional space, not on a sequence space.

Operator Sampling

The first key contribution of operator sampling is the use of frame theory and time-frequency analysis to remove assumptions of simultaneous band- and time-limiting, and also to deal with the infinite number of degrees of freedom in a functional analytic setting (Section “Operator classes and operator identification”). A second key insight is the development of a “simple measurement scheme” of the type used by the third-named author but that allows for the difficulties identified by Bello to be resolved. This insight is the use of periodically weighted delta-trains as measurement functions for a channel. Such measurement functions have three distinct advantages.

First, they allow for the channel model to be essentially arbitrary and clarify the reduction of the operator identification problem to a finite-dimensional setting without imposing a finite dimensional model that approximates the channel. Second, it combines the naturalness of the simple measurement scheme described earlier with the flexibility of Bello’s idea for measuring channels with arbitrary spreading support. Third, it establishes a connection between identification of channels and finite-dimensional Gabor systems and allows us to determine windowing vectors with appropriate independence properties.

In Section “Operator classes and operator identification”, we introduce some operator-theoretic descriptions of some of the operator classes that we are able to identify, and discuss briefly different ways of representing such operators. Such a discussion is beneficial in several ways. First, it contains a precise definition of identifiability, which comes into play when considering the generalization of the necessity condition for so-called overspread channels (Section “Kailath’s necessity proof and operator identification”). Second, we can extend the necessity condition to a very large class of inputs. In other words, we can assert that in a very general sense, no input can identify an overspread channel. Third, it allows us to include both convolution operators and multiplication operators (for which the spreading functions are distributions) in the operator sampling theory. The identification of multiplication operators via operator sampling reduces to the classical sampling formula, thereby showing that classical sampling is a special case of operator sampling. In Section “Kailath’s necessity proof and operator identification” we present a natural formalization of the original necessity proof of [19] (Section “Necessity of Kailath’s Condition for Channel Identification”) to the infinite-dimensional setting, which involves an interpretation of the notion of an under-determined system to that setting. Finally, in Section “Identification of operator Paley-Wiener spaces by periodically weighted delta-trains” we present the scheme given first in [41, 45] for the identification of operator classes using periodically weighted delta trains and techniques from modern time-frequency analysis.

Operator classes and operator identification

We formally consider an arbitrary operator as a pseudodifferential operator represented by

$$\displaystyle\begin{array}{rcl} Hf(x) =\int \sigma _{H}(x,\xi )\widehat{f}(\xi )\,e^{2\pi ix\xi }\,d\xi,& &{}\end{array}$$
(5)

where \(\sigma _{H}(x,\xi ) \in L^{2}(\mathbb{R}^{2})\) is the Kohn-Nirenberg (KN) symbol of H. The spreading function η H (t, ν) of the operator H is the symplectic Fourier transform of the KN symbol, viz.

$$\displaystyle\begin{array}{rcl} \eta _{H}(t,\nu ) =\int \!\!\!\!\int \sigma _{H}(x,\xi )\,e^{-2\pi i(\nu x-\xi t)}\,dx\,d\xi & &{}\end{array}$$
(6)

and we have the representation

$$\displaystyle\begin{array}{rcl} Hf(x) =\int \!\!\!\!\int \eta _{H}(t,\nu )\,\mathcal{T}_{t}\,\mathcal{M}_{\nu }f(x)\,d\nu \,dt& &{}\end{array}$$
(7)

where \(\mathcal{T}_{t}f(x) = f(x - t)\) is the time-shift operator and \(\mathcal{M}_{\nu }f(x) = e^{2\pi i\nu x}\,f(x)\) is the frequency-shift operator.

This is identical to the representation given in [19] where \(\eta _{H}(t,\nu ) = \mathcal{A}(\nu,t)\), see Section “Kailath’s Time-Variant Channel Identification Condition”.

To see more clearly where the spreading function arises in the context of communication theory, we can define the impulse response of the channel modeled by H, denoted h H (x, t), by

$$\displaystyle{Hf(x) =\int h_{H}(x,t)\,f(x - t)\,dt.}$$

Note that if h H were independent of x, then H would be a convolution operator and hence a model for a time-invariant channel. In fact, with κ H (x, t) being the kernel of the operator H,

$$\displaystyle\begin{array}{rcl} Hf(x)& =& \int \kappa _{H}(x,t)\,f(t)\,dt {}\end{array}$$
(8)
$$\displaystyle\begin{array}{rcl} & =& \int h_{H}(x,t)\,f(x - t)\,dt {}\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} & =& \iint \eta _{H}(t,\nu )\,e^{2\pi i\nu (x-t)}\,f(x - t)\,d\nu \,dt{}\end{array}$$
(10)
$$\displaystyle\begin{array}{rcl} & =& \int \sigma _{H}(x,\xi )\,\widehat{f}(\xi )\,e^{2\pi ix\xi }d\xi,{}\end{array}$$
(11)

where

$$\displaystyle\begin{array}{rcl} h_{H}(x,t)& =& \kappa _{H}(x,x - t) \\ & =& \int \sigma _{H}(x,\xi )\,e^{2\pi i\xi t}\,d\xi, \\ & =& \int \eta _{H}(t,\nu )\,e^{2\pi i\nu (x-t)}\,d\nu.{}\end{array}$$
(12)

With this interpretation, the maximum support of η H (t, ν) in the first variable corresponds to the maximum spread of a delta impulse sent through the channel and the maximum support of η H (t, ν) in the second variable corresponds to the maximum spread of a pure frequency sent through the channel.

Since we are interested in operators whose spreading functions have small support, it is natural to define the following operator classes, called operator Paley-Wiener spaces (see [38]).

Definition 1.

For \(S \subseteq \mathbb{R}^{2}\), we define the operator Paley-Wiener spaces OPW(S) by

$$\displaystyle\begin{array}{rcl} OPW(S)& =& \{H \in \mathcal{L}(L^{2}(\mathbb{R}),L^{2}(\mathbb{R})): \ \mathop{\mathrm{supp}}\nolimits \eta _{ H} \subseteq S,\,\|\sigma _{H}\|_{L^{2}} < \infty \}. {}\\ \end{array}$$

Remark 1.

In [38, 42], the spaces OPW p, q(S), \(1 \leq p,\,q < \infty \), were considered, where L 2-membership of \(\sigma _{H}\) is replaced by

$$\displaystyle{\|\sigma _{H}\|_{L^{p,q}} =\Big (\int \Big(\int \vert \sigma _{H}(x,\xi )\vert ^{q}d\xi \Big)^{p/q}\,dx\Big)^{1/p}}$$

with the usual adjustments made when either \(p = \infty \) or \(q = \infty \). OPW p, q(S) is a Banach space with respect to the norm \(\|H\|_{OPW^{p,q}} =\|\sigma _{H}\|_{L^{p,q}}\). Note that if S is bounded, then \(OPW^{\infty,\infty }(S)\) consists of all bounded operators whose spreading function is supported on S. In fact, the operator norm is then equivalent to the \(OPW^{\infty,\infty }(S)\) norm, where the constants depend on S [26].

The general definition is beneficial since it also allows the inclusion of convolution operators with kernels whose Fourier transforms lie in L q() (\(OPW^{\infty,q}(\mathbb{R})\)) and multiplication operators whose multiplier is in L p() (\(OPW^{p,\infty }(\mathbb{R})\)).

The goal of operator identification is to find an input signal g such that each operator H in a given class is completely and stably determined by Hg. In other words, we ask that the operator HHg be continuous and bounded below on its domain. In our setting, this translates to the existence of c 1, c 2 > 0 such that

$$\displaystyle{ c_{1}\,\|\sigma _{H}\|_{L^{2}} \leq \| Hg\|_{L^{2}} \leq c_{2}\,\|\sigma _{H}\|_{L^{2}},\quad H \in OPW(S). }$$
(13)

This definition of identifiability of operators originated in [24]. Note that (13) implies that the mapping HHg is injective, that is, that Hg = 0 implies that H ≡ 0, but is not equivalent to it. The inequality (13) adds to injectivity the assertion that H is also stably determined by Hg in the sense that a small change in the output Hg would correspond to a small change in the operator H. Such stability is also necessary for the existence of an algorithm that will reliably recover H from Hg. In this scheme, g is referred to as an identifier for the operator class OPW(S) and if (13) holds, we say that operator identification is possible.

In trying to find an explicit expression for an identifier, we use as a starting point the “simple measurement scheme” of [19], in which g is a delta train, viz. \(g =\sum _{n}\delta _{nT}\) for some T > 0. In the framework of operator identification the channel measurement criterion in [19] takes the following form [24, 38, 41].

Theorem 1.

For \(H \in OPW\big([0,T]\times [-\varOmega /2,\varOmega /2]\big)\) with TΩ≤1, we have

$$\displaystyle\begin{array}{rcl} \|H\sum _{k\in \mathbb{Z}}\delta _{kT}\|_{L^{2}(\mathbb{R})} = T\|\sigma _{H}\|_{L^{2}},& & \\ \end{array}$$

and H can be reconstructed by means of

$$\displaystyle\begin{array}{rcl} \kappa _{H}(x + t,x) =\chi _{[0,T]}(t)\sum _{n\in \mathbb{Z}}\big(H\sum _{k\in \mathbb{Z}}\delta _{kT}\big)(t + nT)\,\frac{\sin (\pi T(x - n))} {\pi T(x - n)} & &{}\end{array}$$
(14)

where χ [0,T] (t) = 1 for t ∈ [0,T] and 0 elsewhere and with convergence in the L 2 norm and uniformly in x for every t.

As was observed earlier, the key feature of this scheme is that the spacing of the deltas in the identifier is sufficiently large so as to allow the response of the channel to a given delta to “die out” before the next delta is sent. In other words, the parameter T must exceed the time-spread of the channel. On the other hand, the rate of change of the channel, as measured by its bandwidth Ω, must be small enough that its impulse response can be recovered from “samples” of the channel taken T time units apart. In particular, the samples of the impulse response T units apart can be easily determined from the output. In the general case considered by Bello, in which the spreading support of the operator is not contained in a rectangle of unit area, this intuition breaks down.

Specifically, suppose that we consider the operator class OPW(S) where \(S \subseteq [0,T_{0}] \times [-\varOmega _{0}/2,\varOmega _{0}/2]\) and T 0 Ω 0 ≫ 1 but where | S |  < 1. Then sounding the channel with a delta train of the form \(g =\sum _{n}\delta _{nT_{0}}\) would severely undersample the impulse response function. Simply increasing the sampling rate, however, would produce overlap in the responses of the channel to deltas close to each other. An approach to the undersampling problem in the literature of classical sampling theory is to sample at the low rate transformed versions of the function, chosen so that the interference of the several undersampled functions can be dealt with. This idea has its most classical expression in the Generalized Sampling scheme of Papoulis [35]. Choosing shifts and constant multiples of our delta train results in an identifier of the form \(g =\sum _{n}c_{n}\,\delta _{nT}\) where the weights (c n ) have period P (for some P ∈ ) and T > 0 satisfies PT > T 0.

If g is discretely supported (for example, a periodically-weighted delta-train), then we refer to operator identification as operator sampling. The utility of periodically weighted delta trains for operator identification is a cornerstone of operator sampling and has far-reaching implications culminating in the developments outlined in Sections “Generalizations of operator sampling to higher dimensions” and “Further results on operator sampling”.

Kailath’s necessity proof and operator identification

In Section “Necessity of Kailath’s Condition for Channel Identification” we presented the proof of the necessity of the condition BL ≤ 1 for channel identification as given in [19]. The argument consisted of finding a finite-dimensional approximation of the channel H, and then showing that, given any putative identifier g, the number of degrees of freedom present in the output Hg must be at least as large as the number of degrees of freedom in the channel itself. For this to be true in any finite-dimensional setting, we must have BL < 1 and so in the limit we require BL ≤ 1. In essence, if BL > 1, we have a linear system with fewer equations than unknowns which necessarily has a nontrivial nullspace. The generalization of this notion to the infinite-dimensional setting is the basis of the necessity proof that appears in [24]. In this section, we present an outline of that proof, and show how the natural tool for this purpose once again comes from time-frequency analysis.

To see the idea of the proof, assume that BL > 1 and for simplicity let \(S = [-\frac{L} {2}, \frac{L} {2} ] \times [-\frac{B} {2}, \frac{B} {2} ]\). The goal is to show that for any sounding signal s in an appropriately large space of distributionsFootnote 2, the operator Φ s : OPW(S) ⟶ L 2(), HHs, is not stable, that is, it does not possess a lower bound in the inequality (13).

First, define the operator E: l 0( 2) ⟶ O P W(S), where l 0( 2) is the space of finite sequences equipped with the l 2 norm, by

$$\displaystyle{E(\sigma ) = E(\{\sigma _{k,l}\}) =\sum _{k,l}\sigma _{k,l}\mathcal{M}_{k\lambda /L}\mathcal{T}_{l\lambda /B}\,P\,\mathcal{T}_{-l\lambda /B}\mathcal{M}_{-k\lambda /L}}$$

where \(1 <\lambda\) is chosen so that \(1 <\lambda ^{4} < BL\) and where P is a time-frequency localization operator whose spreading function η P (t, ν) is infinitely differentiable, supported in S, and identically one on \([-\frac{L} {2\lambda }, \frac{L} {2\lambda } ] \times [-\frac{B} {2\lambda }, \frac{B} {2\lambda } ]\). It is easily seen that the operator E is well defined and has spreading function

$$\displaystyle{\eta _{E(\sigma )}(t,\nu ) =\eta _{P}(t,\nu )\,\sum _{k,l}\sigma _{k,l}\,e^{2\pi i(k\lambda t/L-l\lambda \nu /B)}.}$$

By construction, it follows that for some constant c 1, \(\|E(\sigma )\|_{OPW(S)} \geq c_{1}\|\sigma \|_{l^{2}(\mathbb{Z}^{2})}\), for all \(\sigma\), and that for any distribution s, Ps decays rapidly in time and in frequency.

Next define the Gabor analysis operator C g : L 2() ⟶ l 2( 2) by

$$\displaystyle{C_{g}(s) =\{\langle s,\mathcal{M}_{k\lambda ^{2}/L}\mathcal{T}_{l\lambda ^{2}/B}g\rangle \}_{k,l\in \mathbb{Z}}}$$

where \(g(x) = e^{-\pi x^{2} }\). A well-known theorem in Gabor theory asserts that \(\{\mathcal{M}_{k\alpha }\mathcal{T}_{l\beta }g\}_{k,l\in \mathbb{Z}}\) is a Gabor frame for L 2() for every α β < 1 [31, 59, 60]. Consequently C g satisfies, for some c 2 > 0, \(\|C_{g}(s)\|_{l^{2}(\mathbb{Z}^{2})} \geq c_{2}\,\|s\|_{L^{2}(\mathbb{R})}\) for all s, since \(\lambda ^{2}/L\, \cdot \lambda ^{2}/B =\lambda ^{4}/BL < 1\).

For any s, consider the composition operator

$$\displaystyle{C_{g} \circ \varPhi _{s} \circ E: l_{0}(\mathbb{Z}^{2})\longrightarrow l^{2}(\mathbb{Z}^{2}).}$$

The crux of the proof lies in showing that this composition operator is not stable, that is, it does not have a lower bound. Since C g and E are both bounded below, it follows that Φ s cannot be stable. Since s ∈ S0() was arbitrary, this completes the proof.

To complete this final step we examine the canonical bi-infinite matrix representation of the above defined composition of operators, that is, the matrix M = (m k′, l′, k, l ) that satisfies

$$\displaystyle{(C_{g} \circ \varPhi _{s} \circ E(\sigma ))_{k',l'} =\sum _{k,l}m_{k',l',k,l}\,\sigma _{k,l}.}$$

It can be shown that M has the property that for some rapidly decreasing function w(x),

$$\displaystyle{ \vert m_{k',l',k,l}\vert \leq w(\max \{\vert \lambda k' - k\vert,\vert \lambda l' - l\vert \}). }$$
(15)

The proof is completed by the following Lemma. Its proof can be found in [24] and generalizations can be found in [37].

Lemma 1.

Given \(M = (m_{j',j})_{j',j\in \mathbb{Z}^{2}}\) . If there exists a monotonically decreasing function w: R 0 + ⟶ R 0 + with w = O(x −2−δ ), δ > 0, and constants \(\lambda > 1\) and K 0 > 0 with \(\vert m_{j',j}\vert < w(\|\lambda j' - j\|_{\infty })\) for \(\|\lambda j' - j\|_{\infty } > K_{0}\) , then M is not stable.

Intuitively, this result asserts that a bi-infinite matrix whose entries decay rapidly away from a skew diagonal behaves like a finite matrix with more rows than columns (see Figure 2). Such a matrix will always have a nontrivial nullspace. In the case of an infinite matrix what can be shown is that at best its inverse will be unbounded.

Fig. 2
figure 2

A \(1/\lambda -\) slanted matrix M. The matrix is dominated by entries on a slanted diagonal of slope \(1/\lambda\).

We can make a more direct connection from this proof to the original necessity argument in [19] in the following way. If we restrict our attention to sequences \(\{\sigma _{k,l}\}\) with a fixed finite support of size say N, then the image of this subspace of sequence space under the mapping E is an N-dimensional subspace of OPW(S). The operator P is essentially a time-frequency localization operator. This fact is established in [24] and follows from the rapid decay of the Fourier transform of η P . Since η P itself is concentrated on a rectangle of area \(BL/\lambda ^{2}\), its Fourier transform will be concentrated on a rectangle of area \(\lambda ^{2}/BL\). From this it follows that for \(\sigma\) as described above, the operator \(E(\sigma )\) essentially localizes a function to a region in the time-frequency plane of area \(N(\lambda ^{2}/BL)\).

Considering now the Gabor analysis operator C g , we observe that the Gaussian g(x) essentially occupies a time-frequency cell of area 1, and that this function is shifted in the time-frequency plane by integer multiples of \((\lambda ^{2}/B,\lambda ^{2}/L)\). Hence to “cover” a region in the time-frequency plane of area \(N(\lambda ^{2}/BL)\) would require only about

$$\displaystyle{\frac{N(\lambda ^{2}/BL)} {\lambda ^{4}/BL} = \frac{N} {\lambda ^{2}} }$$

time-frequency shifts. So roughly speaking, in order to resolve N degrees of freedom in the operator \(E(\sigma _{k,l})\), we have only \(N/\lambda ^{2} < N\) degrees of freedom in the output of the operator \(E(\sigma _{k,l})s\).

Identification of operator Paley-Wiener spaces by periodically weighted delta-trains

Theorem 1 is based on arguments outlined in Section “Kailath’s Time-Variant Channel Identification Condition” and applies only to OPW(S) if S is contained in a rectangle of area less than or equal to one. In the following, we will develop the tools that allow us to identify OPW(S) for any compact set S of Lebesgue measure less than one.

In our approach we discretize the channel by covering the spreading support S with small rectangles of fixed sidelength, which we refer to as a rectification of S. As long as the measure of S is less than one, it is possible to do this in such a way that the total area of the rectangles is also less than one. This idea seems to bear some similarity to Bello’s philosophy of sampling the spreading function on a fixed grid but with one fundamental difference. Bello’s approach is based on replacing t and x by samples, thereby approximating the channel. For a better approximation, sampling on a finer grid is necessary, which results in a larger system of equations that must be solved. In our approach, as soon as the total area of the rectification is less than one, the operator modeling the channel is completely determined by the discrete model. Once this is achieved, identification of the channel reduces to solving a single linear system of equations at each point (Figure 3).

Fig. 3
figure 3

A set not satisfying Kailath’s condition is rectified with 1∕(T Ω) = P ∈ , the rectification has area ≤ 1, Ω max ≤ 1∕T, and T max ≤ 1∕Ω.

Given parameters T > 0 and P ∈ , we assume that S is rectified by rectangles of size T ×Ω, where Ω = 1∕(TP), such that the total area of the rectangles is less than one. Given a period-P sequence c = (c n ) n ∈  , we then define the periodically weighted delta-train g by \(g =\sum _{n\in \mathbb{Z}}c_{n}\,\delta _{nT}\). The goal of this subsection is to describe the scheme by which a linear system of P equations in a priori P 2 unknowns can be derived by which an operator H ∈ OPW(S) can be completely determined by Hg(x). In this sense, the “degrees of freedom” in the operator class OPW(S), and that of the output function Hg(x) are precisely defined and can be effectively compared (Figure 4).

Fig. 4
figure 4

Channel sounding of \(OPW([0,2/3]\times [-1/4,1/4]\, \cup \, [4/3,2]\times [-1/2,1/2])\) using a P-periodically weighted delta train g. The kernel κ(x, y) takes values on the (x, y)-plane, the sounding signal g, a weighted impulse train, is defined on the y-axis, and the output signal \(Hg(x) =\int \kappa (x,y)g(y)dy\) is displayed on the x-axis. Here, the sample values of the tab functions h(x, t) = κ(x, tx) are not easily read of the response Hg(x) as, for example, for x ∈ [2T, 3T] = [4∕3, 2] we have Hg(x) = 0. 7κ(x, 0) + 0. 6κ(x, 2T) = 0. 7h(x, x) +. 6h(x, 2Tx). In detail, we have g = +0. 7δ −2+0. 5δ −4∕3+0. 6δ −2∕3+0. 7δ 0+0. 5δ 2∕3+0. 6δ 4∕3+0. 7δ 2+0. 5δ 8∕3+, so P = 3, T = 2∕3, Ω = 1∕PT = 1∕2, c n  = 0. 7 if \(n\!\!\mod 3 = 0\), c n  = 0. 5 if \(n\!\!\mod 3 = 1\), c n  = 0. 6 if \(n\!\!\mod 3 = 2\).

The basic tool of time-frequency analysis that makes this possible is the Zak transform (see [13]).

Definition 2.

The non-normalized Zak Transform is defined for \(f \in \mathcal{S}(\mathbb{R})\) Footnote 3, and a > 0 by

$$\displaystyle{Z_{a}f(t,\nu ) =\sum _{n\in \mathbb{Z}}f(t - an)\,e^{2\pi ian\nu }.}$$

Z a f(t, ν) satisfies the quasi-periodicity relations

$$\displaystyle{Z_{a}f(t + a,\nu ) = e^{2\pi ia\nu }\,Z_{ a}f(t,\nu )}$$

and

$$\displaystyle{Z_{a}f(t,\nu +1/a) = Z_{a}f(t,\nu ).}$$

\(\sqrt{a}\,Z_{ a}\) can be extended to a unitary operator from L 2() onto L 2([0, a]×[0, 1∕a]).

A somewhat involved but elementary calculation yields the following (see [44, 46] and Section “Proof of Lemma 2”).

Lemma 2.

Let T > 0, P ∈ ℕ, c = (c n ), and g be given as above. Then for all (t,ν) ∈ ℝ 2 , and \(p = 0,\,1,\,\ldots,\,P-1\) ,

$$\displaystyle\begin{array}{rcl} & & e^{-2\pi i\nu Tp}\,(Z_{ TP} \circ H)g(t + Tp,\nu ) \\ & & =\varOmega \,\sum _{ q,\,m=0}^{P-1}(T^{q}\,M^{m}c)_{ p}\,e^{-2\pi i\nu Tq}\,\eta _{ H}^{Q}(t + Tq,\nu +m/TP).{}\end{array}$$
(16)

Here \(\mathcal{T}\) and \(\mathcal{M}\) are the translation and modulation operators given in Definition 3, and η H Q(t, ν) is the quasiperiodization of η H ,

$$\displaystyle{ \eta _{H}^{Q}(t,\nu ) =\sum _{ k}\sum _{\ell}\eta _{H}(t + kTP,\nu +\ell/T)\,e^{-2\pi i\nu kTP} }$$
(17)

whenever the sum is defined (Figure 4).

Under the additional simplifying assumption that the spreading function η H (t, ν) is supported in the large rectangle [0, TP] × [0, 1∕T], and by restricting (16) to the rectangle [0, T] × [0, 1∕(TP)], we arrive at the P × P 2 linear system

$$\displaystyle{ \mathbf{Z}_{Hg}(t,\nu )_{p} =\sum _{ q,m=0}^{P-1}G(c)_{ p,(q,m)}\,\boldsymbol{\eta }_{H}(t,\nu )_{(q,m)} }$$
(18)

where

$$\displaystyle{ \mathbf{Z}_{Hg}(t,\nu )_{p} = (Z_{TP} \circ H)g(t + pT,\nu )\,e^{-2\pi i\nu pT}, }$$
(19)
$$\displaystyle{ \boldsymbol{\eta }_{H}(t,\nu )_{(q,m)} =\varOmega \,\eta _{H}(t + qT,\nu +m/TP)\,e^{-2\pi i\nu qT}\,e^{-2\pi iqm/P}, }$$
(20)

and where G(c) is a finite Gabor system matrix (23). If (18) can be solved for each (t, ν) ∈ [0, T] × [0, 1∕(TP)], then the spreading function for an operator H can be completely determined by its response to the periodically weighted delta-train g.

As anticipated by Bello, two issues now become relevant. (1) We require that \(\mathop{\mathrm{supp}}\nolimits \eta _{H}\) occupy no more than P of the shifted rectangles [0, T] × [0, 1∕(TP)] + (qT, k∕(TP)), so that (18) has at least as many equations as unknowns. This forces \(\vert \mathop{\mathrm{supp}}\nolimits \eta _{H}\vert \leq 1\). (2) We require that c be chosen in such a way that the P × P system formed by removing the columns of G(c) corresponding to vanishing components of \(\boldsymbol{\eta }_{H}\) is invertible. That such c exist is a fundamental cornerstone of operator sampling and is the subject of the next section.

Based on the existence of c such that any set of P columns of G(c) form a linearly independent set, we can prove the following [45].

Theorem 2.

For \(S \subseteq (0,\infty )\times \mathbb{R}\) compact with |S| < 1, there exists T > 0 and P ∈ ℕ, and a period-P sequence c = (c n ) such that \(g =\sum _{n}c_{n}\,\delta _{nT}\) identifies OPW(S). In particular, there exist period-P sequences b j = (b j,k ), and integers 0 ≤ q j , m j ≤ P−1, for 0 ≤ j ≤ P−1 such that

$$\displaystyle\begin{array}{rcl} h(x,t)& =& e^{-\pi it/T}\sum _{ k}\sum _{j=0}^{P-1}\big[b_{ j,k}\,Hg(t - (q_{j} - k)T) \\ & & e^{2\pi im_{j}(x-t)/PT}\,\phi ((x - t) + (q_{ j} - k)T)\,r(t - q_{j}T)\big]{}\end{array}$$
(21)

where \(r,\phi \in \mathcal{S}(\mathbb{R})\) satisfy

$$\displaystyle{ \sum _{k\in \mathbb{Z}}r(t + kT) = 1 =\sum _{n\in \mathbb{Z}}\widehat{\phi }(\gamma +n/PT), }$$
(22)

where \(r(t)\widehat{\phi }(\gamma )\) is supported in a neighborhood of [0,T]×[0,1∕PT], and where the sum in  (21) converges unconditionally in L 2 and for each t uniformly in x.

Equation (21) is a generalization of (14) which is easily seen by choosing \(\phi (x) =\sin (\pi PTx)/(\pi PTx)\) and r(t) to be the characteristic function of [0, T).

Linear Independence Properties of Gabor Frames

Finite Gabor Frames

Definition 3.

Given P ∈ , let ω = e 2π iP and define the translation operator \(\mathcal{T}\) on \((x_{0},\,\ldots,\,x_{P-1}) \in \mathbb{C}^{P}\) by

$$\displaystyle{\mathcal{T} x = (x_{P-1},x_{0},\,x_{1},\,\ldots,x_{P-2}),}$$

and the modulation operator \(\mathcal{M}\) on \(\mathbb{C}^{P}\) by

$$\displaystyle{\mathcal{M}x = (\omega ^{0}x_{ 0},\omega ^{1}x_{ 1},\,\ldots,\,\omega ^{P-1}x_{ P-1}).}$$

Given a vector \(c \in \mathbb{C}^{P}\) the finite Gabor system with window c is the collection \(\{\mathcal{T}^{q}\mathcal{M}^{p}c\}_{q,p=0}^{P-1}\). Define the full Gabor system matrix G(c) to be the P × P 2 matrix

$$\displaystyle{ G(c) = \left [\,\,D_{0}\,W_{P}\,\,\,\,D_{1}\,W_{P}\,\,\,\,\cdots \,\,\,\,D_{P-1}\,W_{P}\,\,\right ] }$$
(23)

where D k is the diagonal matrix with diagonal

$$\displaystyle{\mathcal{T}^{k}c = (c_{ P-k},\,\ldots,\,c_{P-1},\,c_{0},\,\ldots,\,c_{P-k-1}),}$$

and W P is the P × P Fourier matrix W P  = (e 2π i n mP) n, m = 0 P−1.

Remark 2.

(1) For 0 ≤ q, p ≤ P − 1, the (q + 1)st column of the submatrix D p W P is the vector \(\mathcal{M}^{p}\mathcal{T}^{q}c\) where the operators \(\mathcal{M}\) and \(\mathcal{T}\) are as in Definition 3. This means that each column of the matrix G(c) is a unimodular constant multiple of an element of the finite Gabor system with window c, namely \(\{e^{-2\pi ipq/P}\,\mathcal{T}^{q}\mathcal{M}^{p}c\}_{q,p=0}^{P-1}\).

(2) Note that the finite Gabor system defined above consists of P 2 vectors in \(\mathbb{C}^{P}\) which form an overcomplete tight frame for \(\mathbb{C}^{P}\) [30]. For details on Gabor frames in finite dimensions, see [9, 27, 30] and the overview article [39].

(3) Note that we are abusing notation slightly by identifying a vector \(c \in \mathbb{C}^{P}\) with a P-periodic sequence c = (c n ) in the obvious way.

Definition 4.

[8] The Spark of an M × N matrix F is the size of the smallest linearly dependent subset of columns, i.e.,

$$\displaystyle{Spark(F) =\min \{\| x\|_{0}: Fx = 0,\ \ x\neq 0\}}$$

where ∥ x ∥ 0 is the number of nonzero components of the vector x. If Spark(F) = M + 1, then F is said to have full Spark. Spark(F) = k implies that any collection of fewer than k columns of F is linearly independent.

Finite Gabor frames are generically full Spark

The existence of Gabor matrices with full Spark has been addressed in [30, 32]. The results in these two papers are as follows.

Theorem 3.

[30] If P ∈ ℕ is prime, then there exists a dense, open subset of \(c \in \mathbb{C}^{P}\) such that every minor of the Gabor system matrix G(c) is nonzero. In particular, for such c, G(c) has full Spark.

Theorem 4.

[32] For every P ∈ ℕ, there exists a dense, open subset of \(c \in \mathbb{C}^{P}\) such that the Gabor system matrix G(c) has full Spark.

The goal of this subsection is to outline the proof of Theorems 3 and 4. We will adopt some of the following notation and terminology of [32].

Let P ∈  and let M be an P × P submatrix of G(c). For 0 ≤ κ < P let κ be the number of columns of M chosen from the submatrix D κ W P of (23). While the vector  = ( κ ) κ = 0 P−1 does not determine M uniquely, it describes the matrix M sufficiently well for our purposes. Define M κ to be the P × κ matrix consisting of those columns of M chosen from D κ W P . Given the ordered partition \(B = (B_{0},\,B_{1},\,\ldots,\,B_{P-1})\) where \(\{B_{0},\,B_{1},\,\ldots,\,B_{P-1}\}\) forms a partition of \(\{0,\,\ldots,\,P - 1\}\), and where for each 0 ≤ κ < P, | B κ  |  =  κ , let M κ (B κ ) be the κ × κ submatrix of M κ whose rows belong to B κ . Then \(\det (M) =\sum \prod _{ \kappa =0}^{P-1}\det (M_{\kappa }(B_{\kappa }))\) where the sum is taken over all such ordered partitions B. This formula is called the Lagrange expansion of the determinant.

Each ordered partition B corresponds to a permutation on P as follows. Define the trivial partition \(A = (A_{0},\,A_{1},\,\ldots,\,A_{P-1})\) by

$$\displaystyle{A_{j} =\{\sum _{ i=0}^{j-1}\ell_{ i},\big(\sum _{i=0}^{j-1}\ell_{ i}\big) + 1,\,\ldots,\,\big(\sum _{i=0}^{j}\ell_{ i}\big) - 1\}}$$

so that A 0 = [0,  0 − 1], A 1 = [ 0,  0 + 1 + 1], \(\ldots\), A P−1 = [ 0 + ⋯ + P−2, P − 1]. Then given \(B = (B_{0},\,B_{1},\,\ldots,\,B_{P-1})\) there is a permutation \(\sigma \in S_{P}\) such that \(\sigma (A_{\kappa }) = B_{\kappa }\) for all κ. This \(\sigma\) is unique up to permutations that preserve A, that is, up to τ ∈ S P such that τ(A κ ) = A κ for all κ. Call such a permutation trivial and denote by Γ the subgroup of S P consisting of all trivial permutations. Then the ordered partitions B of P can be indexed by equivalence classes of permutations \(\sigma \in S_{P}/\varGamma\).

The key observation is that \(\det (M)\) is a homogeneous polynomial in the P variables \(c_{0},\,c_{1},\,\ldots,\,c_{P-1}\) and we can write

$$\displaystyle{ \det (M) =\sum _{\sigma \in S_{P}/\varGamma }a_{\sigma }\,C^{\sigma } }$$
(24)

where the monomial \(C^{\sigma }\) is given by

$$\displaystyle{C^{\sigma } =\prod _{ \kappa =0}^{P-1}\,\prod _{ j\in \sigma (A_{\kappa })}c_{(j-\kappa )(mod\ P)}.}$$

If it can be shown that this polynomial does not vanish identically, then we can choose a dense, open subset of \(c \in \mathbb{C}^{P}\) for which \(\det (M)\neq 0\). Since there are only finitely many P × P submatrices of G(c) it follows that there is a dense, open subset of c for which \(\det (M)\neq 0\) for all M, and we conclude that, for these c, G(c) has full Spark.

Following [32], we say that a monomial \(C^{\sigma _{0}}\) appears uniquely in (24) if for every \(\sigma \in S_{P}/\varGamma\) such that \(\sigma \neq \sigma _{0}\), \(C^{\sigma }\neq C^{\sigma _{0}}\). Therefore, in order to show that the polynomial (24) does not vanish identically, it is sufficient to show that (1) there is a monomial \(C^{\sigma }\) that appears uniquely in (24) and (2) the coefficient \(a_{\sigma }\) of this monomial does not vanish.

Obviously, whether or not (24) vanishes identically does not depend on how the variables c i are labelled. More specifically, if the variables are renamed by a cyclical shift of the indices, viz., c i c (i+γ)mod P for some 0 ≤ γ < P, then

$$\displaystyle{\det (M)(c_{\gamma +1},\,\ldots,\,c_{P-1},\,c_{0},\,\ldots,\,c_{\gamma }) = \pm \,\det (M')(c_{0},\,\ldots,\,c_{P-1})}$$

where M′ is a P × P submatrix described by the vector

$$\displaystyle{\ell' = (\ell_{\gamma +1},\,\ldots,\,\ell_{P-1},\,\ell_{0},\,\ldots,\,\ell_{\gamma }).}$$

The lowest index monomial

In [30], a monomial referred to in [32] as the lowest index (LI) monomial is defined that has the required properties when P is prime. In order to see this, note first that each coefficient \(a_{\sigma }\) in the sum (24) is the product of minors of the Fourier matrix W P and since P is prime, Chebotarev’s Theorem says that such minors do not vanish [62]. More specifically,

$$\displaystyle{a_{\sigma }\,C^{\sigma } = \pm \,\prod _{\kappa =0}^{P-1}\det (M_{\kappa }(\sigma (A_{\kappa })))}$$

and for each κ, the columns of M κ are columns of W P where each row has been multiplied by the same variable c j and \(M_{\kappa }(\sigma (A_{\kappa }))\) is a square matrix formed by choosing κ rows of M κ . Hence for each κ, \(\det (M_{\kappa }(\sigma (A_{\kappa })))\) is a monomial in c with coefficients a constant multiple of a minor of W P . Since \(a_{\sigma }\) is the product of those minors, it does not vanish.

Note moreover that each submatrix \(M_{\kappa }(\sigma (A_{\kappa }))\) is an κ × κ matrix, so that \(\det (M_{\kappa }(\sigma (A_{\kappa })))\) is the sum of a multiple of the product of κ ! diagonals of \(M_{\kappa }(\sigma (A_{\kappa }))\). Hence \(a_{\sigma }\,C^{\sigma }\) is the sum of multiples of the product of \(\prod _{\kappa =0}^{P-1}\ell_{\kappa }!\) generalized diagonals of M.

We define the LI monomial as in [30] as follows. If M is 1 × 1, then \(\det (M)\) is a multiple of a single variable c j and we define the LI monomial, p M by p M  = c j . If M is d × d, let c j be the variable of lowest index appearing in M. Choose any entry of M in which c j appears, eliminate the row and column containing that entry, and call the remaining (d − 1) × (d − 1) matrix M′. Define p M  = c j p M. It is easy to see that the monomial p M is independent of the entry of M chosen at each step. In order to show that the LI monomial appears uniquely in (24), we observe as in [30] that the number of diagonals in \(\det (M)\) that correspond to the LI monomial is \(\prod _{\kappa =0}^{P-1}\ell_{\kappa }!\). Since this is also the number of generalized diagonals appearing in the calculation of each \(\det (M_{\kappa }(\sigma (A_{\kappa })))\), it follows that this monomial appears only once. The details of the argument can be found in Section “Proof of Theorem 3”. Note that because P is prime, this argument is valid no matter how large the matrix M is. In other words, M does not have to be a P × P submatrix in order for the result to hold. Consequently, given k < P and M an arbitrary P × k submatrix of G(c), we can form the k × k matrix M 0 by choosing k rows of M in such a way that the LI monomial of M 0 contains at most only the variables \(c_{0},\,\ldots,\,c_{k-1}\). This observation leads to the following theorem for matrices with arbitrary Spark.

Theorem 5.

[46] If P ∈ ℕ is prime, and 0 < k < P, there exists an open, dense subset of \(c \in \mathcal{C}^{k} \times \{ 0\} \subseteq \mathbb{C}^{P}\) with the property that Spark(G(c)) = k + 1.

This result has implications for relating the capacity of a time-variant communication channel to the area of the spreading support, see [46].

The consecutive index monomial

In [32], a monomial referred to as the consecutive index (CI) monomial is defined that has the required properties for any P ∈ . The CI monomial, C I, is defined as the monomial corresponding to the identity permutation in S P Γ, that is, to the equivalence class of the trivial partition \(A = (A_{0},\,A_{1},\,\ldots,\,A_{P-1})\). Hence

$$\displaystyle{C^{I} =\prod _{ \kappa =0}^{P-1}\,\prod _{ j\in A_{\kappa }}c_{(j-\kappa )mod\ P}.}$$

For each κ, the monomial appearing in \(\det (M_{\kappa }(A_{\kappa }))\), \(\prod _{j\in A_{\kappa }}c_{(j-\kappa )mod\ P}\), consists of a product of k variables c j with consecutive indices modulo P.

That a I ≠ 0 follows from the observation that for each κ, \(\det (M_{\kappa }(A_{\kappa }))\) is a monomial whose coefficient is a nonzero multiple of a Vandermonde determinant and hence does not vanish (for details, see [32]). The proof that C I appears uniquely in (24) amounts to showing that, with respect to an appropriate cyclical renaming of the variables c i , the CI monomial uniquely minimizes the quantity \(\varLambda (C^{\sigma }) =\sum _{ i=0}^{P-1}i^{2}\,\alpha _{i}\), where α i is the exponent of the variable c i in \(C^{\sigma }\). An abbreviated version of the proof of this result as it appears in [32] is given in Section “Proof of Theorem 4”.

As a final observation, we quote the following corollary that provides an explicit construction of a unimodular vector c such that G(c) has full Spark.

Corollary 1.

[32] Let \(\zeta = e^{2\pi i/(P-1)^{4} }\) or any other primitive root of unity of order (P − 1) 4 where P ≥ 4. Then the vector

$$\displaystyle{c = (1,\,\zeta,\,\zeta ^{4},\,\zeta ^{9},\,\ldots,\,\zeta ^{(P-1)^{2} })}$$

generates a Gabor frame for which G(c) has full Spark.

Generalizations of operator sampling to higher dimensions

The operator representations (5), (6), and (7) hold verbatim for higher dimensional variables \(x,\xi,t,\nu \in \mathbb{R}^{d}\). In this section, we address the identifiability of

$$\displaystyle\begin{array}{rcl} OPW(S)& =& \{H \in \mathcal{L}(L^{2}(\mathbb{R}^{d}),L^{2}(\mathbb{R}^{d})): \mathop{\mathrm{supp}}\nolimits \mathcal{F}_{ s}\sigma _{H} \subseteq S,\,\|\sigma _{H}\|_{L^{2}} < \infty \} {}\\ \end{array}$$

where \(S \subseteq \mathbb{R}^{2d}\).

Looking at the components of the multidimensional variables separately, Theorem 1 easily generalizes as follows.

Theorem 6.

For \(H \in OPW\big(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[-\varOmega _{\ell}/2,\varOmega _{\ell}/2]\big)\) with T Ω ≤1, ℓ = 1,…,d, we have

$$\displaystyle\begin{array}{rcl} \|H\sum _{k_{1}\in \mathbb{Z}}\ldots \sum _{k_{d}\in \mathbb{Z}}\delta _{(k_{1}T_{1},\ldots,k_{d}T_{d})}\|_{L^{2}(\mathbb{R})} = T_{1}\ldots T_{d}\|\sigma _{H}\|_{L^{2}},& & {}\\ \end{array}$$

and H can be reconstructed by means of

$$\displaystyle\begin{array}{rcl} \kappa _{H}(x + t,x)& =& \chi _{\prod _{\ell=1}^{d}[0,T_{\ell}]}(t)\sum _{n_{1}\in \mathbb{Z}}\ldots \sum _{n_{d}\in \mathbb{Z}} {}\\ & & \quad \big(H\sum _{k_{1}\in \mathbb{Z}}\ldots \sum _{k_{d}\in \mathbb{Z}}\delta _{(k_{1}T_{1},\ldots,k_{d}T_{d})}\big)(t + (n_{1}T_{1},\ldots,n_{d}T_{d}) {}\\ & & \quad \frac{\sin (\pi T_{1}(x_{1} - n_{1}))} {\pi T_{1}(x_{1} - n_{1})} \ldots \frac{\sin (\pi T_{d}(x_{d} - n_{d}))} {\pi T_{d}(x_{d} - n_{d})} {}\\ \end{array}$$

with convergence in the L 2 norm.

In the following, we address the situation where S is not contained in a set \(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[-\varOmega _{\ell}/2,\varOmega _{\ell}/2]\) with T Ω  ≤ 1,  = 1, , d. For example, \(S = [0,1] \times [0,2] \times [0, \frac{1} {4}] \times [0,1] \subseteq \mathbb{R}^{4}\) of volume \(\frac{1} {2}\) is not covered by Theorem 6.

To give a higher dimensional variant of Theorem 2, we shall denote pointwise products of finite and infinite length vectors k and T by \(k \star T\), that is, \(k \star T = (k_{1}T_{1},\ldots,k_{d}T_{d})\) for \(k,T \in \mathbb{C}^{d}\). Similarly, kT = (k 1T 1, , k d T d ).

Theorem 7.

If \(S \subseteq (0,\infty )^{d}\times \mathbb{R}^{d}\) is compact with |S| < 1, then OPW(S) is identifiable. Specifically, there exist T 1 ,…,T d > 0 and pairwise relatively prime natural numbers P 1 ,…,P d such that

$$\displaystyle{S \subseteq \prod _{\ell=1}^{d}[0,P_{\ell}T_{\ell}]\times \prod _{\ell =1}^{d}[-1/(2T_{\ell}),1/(2T_{\ell})],}$$

and a sequence \(c = (c_{n}) \in \ell^{\infty }(\mathbb{Z}^{d})\) which is P periodic in the ℓ-th component n such that \(g =\sum _{n\in \mathbb{Z}^{d}}c_{n}\,\delta _{n\star T}\) identifies OPW(S). In fact, for such g there exists for each \(j \in J =\prod _{ \ell=1}^{d}\{0,1,\ldots,P_{\ell}-1\}\) a sequence b j = (b j,k ) which is P periodic in k and 2d-tuples (q j ,m j ) ∈ J × J with

$$\displaystyle\begin{array}{rcl} h(x,t)& =& e^{-\pi i\sum _{\ell=1}^{d}t_{\ell}/T_{\ell} }\sum _{k\in \mathbb{Z}^{d}}\sum _{j\in J}\big[b_{j,k}\,Hg(t - (q_{j} - k) \star T) \\ & & e^{2\pi im_{j}\cdot ((x-t)/P\star T)}\,\phi ((x - t) + (q_{ j} - k) \star T)\,r(t - q_{j} \star T)\big].{}\end{array}$$
(25)

The functions \(r,\phi \in \mathcal{S}(\mathbb{R}^{d})\) are assumed to satisfy

$$\displaystyle{ \sum _{k\in \mathbb{Z}^{d}}r(t + k \star T) = 1 =\sum _{n\in \mathbb{Z}^{d}}\widehat{\phi }(\gamma +(n/(P \star T)), }$$
(26)

and \(r(t)\widehat{\phi }(\gamma )\) is supported in a neighborhood of \(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[0,1/P_{\ell}T_{\ell}]\) . The sum in (25) converges unconditionally in L 2 and for each t uniformly in x.

This result follows from adjusting the proof of Theorem 7 to the higher dimensional setting. For example, it will employ the Zak transform

$$\displaystyle{Z_{T\star P}f(t,\nu ) =\sum _{n\in \mathbb{Z}^{d}}f(t - n \star P \star T)\,e^{2\pi i\nu \cdot (P\star T)},}$$

where P = (P 1, , P d ). We are then led again to a system of linear equations of the form

$$\displaystyle{ \mathbf{Z}_{Hg}(t,\nu )_{p} =\sum _{q\in J}\sum _{m\in J}G(c)_{p,(q,m)}\,\boldsymbol{\eta }_{H}(t,\nu )_{(q,m)} }$$
(27)

where as before

$$\displaystyle\begin{array}{rcl} & \mathbf{Z}_{Hg}(t,\nu )_{p} = (Z_{T\star P} \circ H)g(t + p \star T,\nu )\,e^{-2\pi i\nu p\star T}, & {}\\ & \boldsymbol{\eta }_{H}(t,\nu )_{(q,m)} = (T_{1}P_{1}\ldots T_{d}P_{d})^{-1}\,\eta _{H}(t + q \star T,\nu +(m/(T \star P))& {}\\ & e^{-2\pi i\nu \cdot (q\star T)}\,e^{-2\pi iq\cdot (m/P)}, & {}\\ \end{array}$$

and where G(c) is now a multidimensional finite Gabor system matrix similar to (23).

In order to show that the spreading function for operator H can be completely determined by its response to the periodically weighted d-dimensional delta-train g, we need to show that (27) can be solved for each \((t,\nu ) \in \prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[0,1/(T_{\ell}P_{\ell})]\) if \(c \in \mathbb{C}^{P_{1}\times \ldots \times P_{d}}\) is chosen appropriately.

To see that a choice of c is possible, observe that the product group \(\mathbb{Z}_{P_{1}} \times \ldots \times \mathbb{Z}_{P_{d}}\) is isomorphic to the cyclic group \(\mathbb{Z}_{P_{1}\cdot \ldots \cdot P_{d}}\) since the P are chosen pairwise relatively prime. Theorem 4 applied to the cyclic group \(\mathbb{Z}_{P_{1}\cdot \ldots \cdot P_{d}}\) guarantees the existence of \(\widetilde{c} \in \mathbb{C}^{P_{1}\cdot \ldots \cdot P_{d}}\) so that the Gabor system matrix \(G(\widetilde{c})\) is full spark. We can now define \(c \in \mathbb{C}^{P_{1}\times \ldots \times P_{d}}\) by setting

$$\displaystyle{c_{n_{1},\ldots,n_{d}} =\widetilde{ c}_{n_{1}+n_{2}\,P_{1}+n_{3}\,P_{1}P_{2}+\ldots +n_{d}\,P_{1}\ldots P_{d-1}},\quad n = (n_{1},\ldots,n_{d}) \in J}$$

and observe that G(c) is simply a rearrangement of \(G(\widetilde{c})\), hence, G(c) is full spark.

Further results on operator sampling

The results discussed in this paper are discussed in detail in [6, 22, 24, 38, 41] and [46]. The last listed article contains the most extensive collection of operator reconstruction formulas, including extensions to some OPW(S) with S unbounded. Moreover, some hints on how to use parallelograms to rectify a set S for operator sampling efficiently are given.

A central result in [46] is the classification of all spaces OPW(S) that are identifiable for a given \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) for c n being P-periodic.

The papers [38, 42] address some functional analytic challenges in operator sampling, and [26] focuses on the question of operator identification if we are restricted to using more realizable identifiers, for example, truncated and modified versions of g, namely, \(\widetilde{g}(t) =\sum _{ n=0}^{N}c_{n}\varphi (t - nT)\). The problem of recovering parametric classes of operators in OPW(S) is discussed in [2, 3].

In the following, we briefly review literature that address some other directions in operator sampling.

Multiple Input Multiple Output

A Multiple Input Multiput Output (MIMO) channel H with N transmitters and M receivers can be modeled by an N × M matrix whose entries are time-varying channel operators H mn  ∈ OPW(S mn ). For simplicity, we write H ∈ OPW(S). Assuming that the operators H mn are independent, a sufficient criterion for identifiability is given by \(\sum _{n=1}^{N}\vert S_{mn}\vert \leq 1\) for m = 1, , M. Conversely, if for a single m, \(\sum _{n=1}^{N}\vert S_{mn}\vert > 1\), then OPW(S) is not identifiable by any collection s 1, , s N of input signals [36, 43].

A somewhat dual setup was considered in [18]. Namely, a Single Input Single Output (SISO) channel with S being large, say S = [0, M] × [−N∕2, N∕2] with N, M ≥ 2. As illustrated above, OPW([0, M] × [−N∕2, N∕2]) is not identifiable, but if we are allowed to use MN (infinite duration) input signals g 1, , g MN , then H ∈ OPW([0, M] × [−N∕2, N∕2]) can be recovered from the MN outputs Hg 1, , Hg MN .

Irregular Sampling of Operators

The identifier \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) is supported on the lattice T ℤ in . In general, for stable operator identification, choosing a discretely supported identifier is reasonable, indeed, in [26] it is shown that identification for OPW(S) in full requires the use of identifiers that neither decay in time nor in frequency. (Recovery of the characteristics of H during a fixed transmission band and a fixed transmission interval can be indeed recovered when using Schwartz class identifiers [26].)

In irregular operator sampling, we consider identifiers of the form \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{\lambda _{n}}\) for nodes \(\lambda _{n}\) that are not necessarily contained in a lattice. If such g identifies OPW(S), then we refer to \(\mathop{\mathrm{supp}}\nolimits g =\{\lambda _{n}\}\) as a sampling set for OPW(S), and similarly, the sampling rate of g is defined to be

$$\displaystyle{D(g) = D(\mathop{\mathrm{supp}}\nolimits g) = D(\varLambda ) =\lim _{r\rightarrow \infty }\frac{n^{-}(r)} {r} }$$

where

$$\displaystyle{n^{-}(r) =\inf _{ x\in \mathbb{R}}\#\{\varLambda \cap [x,x + r]\}}$$

assuming that the limit exists [18, 46].

To illustrate a striking difference between irregular sampling of functions and operators, note that is a sampling set for \(OPW([0,1] \times [-\frac{1} {2}, \frac{1} {2}])\) as well as for the Paley Wiener space \(PW([-\frac{1} {2}, \frac{1} {2}])\), but the distribution \(g = c_{0}\delta _{\lambda _{0}} +\sum _{n\in \mathbb{Z}\setminus \{0\}}c_{n}\delta _{n}\) does not identify \(OPW([0,1] \times [-\frac{1} {2}, \frac{1} {2}])\), regardless of our choice of c n and \(\lambda _{0}\neq 0\). This shows that, for example, Kadec’s \(\frac{1} {4}\) th theorem does not generalize to the operator setting [18].

In [46] we give with \(D(g) = D(\varLambda ) \geq B(S)\) a necessary condition on the (operator) sampling rate based on the bandwidth B(S) of OPW(S) which is defined as

$$\displaystyle\begin{array}{rcl} B(S) =\sup _{t\in \mathbb{R}}\vert \mathop{\mathrm{supp}}\nolimits \eta (t,\nu )\vert =\Big\|\int _{\mathbb{R}}\chi _{S}(\cdot,\nu )\,d\nu \Big\|_{\infty }.& &{}\end{array}$$
(28)

Here, χ S denotes the characteristic function of S. This quantity can be interpreted as the maximum vertical extent of S and takes into account gaps in S. Moreover, in [46] we discuss the goal of constructing \(\{\lambda _{n}\}\) of small density, and/or large gaps in order to reserve time-slots for information transmission. Results in this direction can be interpreted as giving bounds on the capacity of a time-variant channel in OPW(S) in terms of | S | [46].

Finally, we give in [46] an example of an operator class OPW(S) that cannot be identified by any identifier of the form \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) with T > 0 and periodic c n , but requires coefficients that form a bounded but non-periodic sequence. In this case, S is a parallelogram and B(S) = D(g) (see Figure 5)

Fig. 5
figure 5

The operator class OPW(S) with \(S = (2,\ 2\;\ \sqrt{2},\ \sqrt{2} + 1/2)[0,1]^{2}\) whose area equals 1 and bandwidth equals 1∕2 is identifiable by a (non-periodically) weighted delta train with sampling density 1∕2. It is not identifiable using a periodically weighted delta train.

Sampling of OPW(S) with unknown S

In some applications, it is justified to assume that the set S has small area, but its shape and location are unknown. If further S satisfies some basic geometric assumptions that guarantee that S is contained in [0, TP] × [−1∕2T, 1∕2T] and only meets few rectangles of the rectification grid [kT, (k + 1)T] × [qTP, (q + 1)∕TP], then recovery of S and, hence, an operator in OPW(S) is possible [15, 46].

The independently obtained results in [15, 46] employ the same identifiers \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{\lambda _{n}}\) as introduced above. Operator identification is therefore again reduced to solving (18), that is, the system of P linear equations

$$\displaystyle{ \mathbf{Z}(t,\nu ) = G(c)\boldsymbol{\eta }(t,\nu ) }$$
(29)

for the vector \(\boldsymbol{\eta }(t,\nu ) \in \mathbb{C}^{P^{2} }\) for (t, ν) ∈ [0, T] × [−1∕2TP, 1∕2TP]. While the zero components of \(\boldsymbol{\eta }(t,\nu )\) are not known, the vector is known to be very sparse. Hence, for fixed (t, ν), we can use the fact that G(c) is full spark and recover \(\boldsymbol{\eta }(t,\nu )\) if it has at most P∕2 nonzero entries. Indeed, assume \(\boldsymbol{\eta }(t,\nu )\) and \(\widetilde{\boldsymbol{\eta }}(t,\nu )\) solve (29) and both have at most P∕2 nonzero entries. Then \(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu )\) has at most P nonzero entries and the fact that G(c) is full spark indicates that \(G(c)(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu )) = 0\) implies \(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu ) = 0\).

Clearly, under the geometric assumptions alluded to above, the criterion that at most P∕2 rectangles in the grid are met can be translated to the unknown area of S has measure less than or equal to 1/2.

In [15], this area 1/2 criterion is improved by showing that H can be identified whenever at most P − 1 rectangles in the rectification grid are met by S. This result is achieved by using a joint sparsity argument, based on the assumption that for all (t, ν), the same cells are active.

Alternatively, the “area 1/2” result can be strengthened by not assuming that for all (t, ν), the same cells are active. This requires solving (29), for \(\boldsymbol{\eta }(t,\nu )\) sparse, for each considered (t, ν) independently, see Figure 6 and [46].

Fig. 6
figure 6

For S the union of the colored sets, OPW(S) is identifiable even though 7 > 3 boxes are active, implying that S cannot be rectified with P = 3 and T = 1 (see Section “Identification of operator Paley-Wiener spaces by periodically weighted delta-trains”). Recovering η from Hg requires solving three systems of linear equations, one to recover η on the yellow support set, one to recover η on the red support set, and one to recover η on the blue support set. The reconstruction formula (21) does not apply for this set S.

It must be added though, that solving (29) for \(\boldsymbol{\eta }(t,\nu )\) being P∕2 sparse is not possible for moderately sized P, for example for P > 15. If we further reduce the number of active boxes, then compressive sensing algorithms such as Basis Pursuit and Orthogonal Matching Pursuit become available, as is discussed in the following section.

Finite dimensional operator identification and compressive sensing

Operator sampling in the finite dimensional setting translates into the following matrix probing problem [5, 7, 49]. For a class of matrices \(\boldsymbol{\mathcal{M}}\in \mathbb{C}^{P\times P}\), find \(c \in \mathbb{C}^{P}\) so that we can recover \(M \in \boldsymbol{\mathcal{M}}\) from Mc (Figure 7).

Fig. 7
figure 7

The matrix probing problem: find c so that the map \(\boldsymbol{\mathcal{M}}\longrightarrow \mathbb{C}^{P}\), MMc is injective and therefore invertible.

The classes of operator considered here are of the form \(M_{\boldsymbol{\eta }} =\sum _{\lambda }\boldsymbol{\eta }_{\lambda }B_{\lambda }\) with \(B_{\lambda } = B_{p,q} = \mathcal{T}^{p}\mathcal{M}^{q}\), and the matrix identification problem is reduced to solving

$$\displaystyle{ \boldsymbol{Z} = M_{\boldsymbol{\eta }}c =\sum _{ p,q=0}^{P-1}\boldsymbol{\eta }_{ p,q}\big(\ \mathcal{T}^{p}\mathcal{M}^{q}c\big) = G(c)\boldsymbol{\eta }, }$$
(30)

where c is chosen appropriately; this is just (29) with the dependence on (t, ν) removed.

If \(\boldsymbol{\eta }\) is assumed to be k-sparse, (Figure 8) we arrive at the classical compressive sensing problem with measurement matrix \(G(c) \in \mathbb{C}^{P\times P^{2} }\) which depends on c = (c 0, c 1, , c P−1). To achieve recovery guarantees for Basis Pursuit and Orthogonal Matching Pursuit, averaging arguments have to be used that yield results on the expected qualities of G(c). This problem was discussed in [40, 49, 50] as well as, in slightly different terms, in [1, 17] (Figure 9). The strongest results were achieved in [25] by estimating Restricted Isometry Constants for c being a Steinhaus sequence. These results show that with high probability, G(c) has the property that Basis Pursuit recovers \(\boldsymbol{\eta }\) from \(G(c)\boldsymbol{\eta }\) for every k sparse \(\boldsymbol{\eta }\) as long as \(k \leq C\,P/\log ^{2}P\) for some universal constant C.

Fig. 8
figure 8

Time-frequency structured measurement matrix G(c) with c randomly chosen.

Fig. 9
figure 9

Support sets of autocorrelation functions, the general case, the WSSUS case, and the tensor case.

Stochastic operators and channel estimation

It is common that models of wireless channels and radar environments take the stochastic nature of the medium into account. In such models, the spreading function η(t, ν) (and therefore the operator’s kernel and Kohn–Nirenberg symbol) are random processes, and the operator is split into the sum of its deterministic portion, representing the mean behavior of the channel, and its zero-mean stochastic portion that represents the noise and the environment.

The detailed analysis of the stochastic case was carried out in [47, 48]. One of the foci of these works lies in the goal of determining the second-order statistics of the (zero mean) stochastic process η(τ, ν), that is, its so-called covariance function \(R(\tau,\nu,\tau ',\nu ') = \mathbb{E}\{\eta (\tau,\nu )\,\overline{\eta (\tau ',\nu ')}\}\). In [47, 48], it was shown that a necessary but not sufficient condition for the identifiability of (τ, ν, τ′, ν′) from the output covariance \(A(t,t') = \mathbb{E}\{Hg(t)\,\overline{Hg(t')}\}\) is that R(τ, ν, τ′, ν′) is supported on a bounded set of 4-dimensional volume less than or equal to one. Unfortunately, for some sets \(S \subseteq \mathbb{R}^{4}\) of arbitrary small measure, the respective stochastic operator Paley–Wiener space StOPW(S) of operators with R η supported on S is not identifiable; this is a striking difference to the deterministic setup where the geometry of S does not play a role at all.

In [67, 68] the special case of wide-sense stationary operators with uncorrelated scattering, or WSSUS operators is considered. These operators are characterized by the property that

$$\displaystyle{ R\eta (t,\nu,t',\nu ') = C_{\eta }(t,\nu )\,\delta (t - t')\,\delta (\nu -\nu '). }$$

The function C η (t, ν) is then called the scattering function of H. Our results on the identifiability of stochastic operator classes allowed for the construction of two estimators for scattering functions [67, 68]. The estimator given in [67] is applicable, whenever the scattering function of H has bounded support. Note that the autocorrelation of a WSSUS operator is supported on a two-dimensional plane in 4 which therefore has 4D volume 0, a fact that allows us to lift commonly assumed restrictions on the size of the 2D area of the support of the scattering function.

For details, formal definitions of identifiability and detailed statements of results we refer to the papers [47, 48, 67, 68].