Abstract
This paper reviews some results on the identifiability of classes of operators whose Kohn-Nirenberg symbols are band-limited (called band-limited operators), which we refer to as sampling of operators. We trace the motivation and history of the subject back to the original work of the third-named author in the late 1950s and early 1960s, and to the innovations in spread-spectrum communications that preceded that work. We give a brief overview of the NOMAC (Noise Modulation and Correlation) and Rake receivers, which were early implementations of spread-spectrum multi-path wireless communication systems. We examine in detail the original proof of the third-named author characterizing identifiability of channels in terms of the maximum time and Doppler spread of the channel, and do the same for the subsequent generalization of that work by Bello. The mathematical limitations inherent in the proofs of Bello and the third author are removed by using mathematical tools unavailable at the time. We survey more recent advances in sampling of operators and discuss the implications of the use of periodically weighted delta-trains as identifiers for operator classes that satisfy Bello’s criterion for identifiability, leading to new insights into the theory of finite-dimensional Gabor systems. We present novel results on operator sampling in higher dimensions, and review implications and generalizations of the results to stochastic operators, MIMO systems, and operators with unknown spreading domains.
The inversion of the traditional alphabetical ordering of authors is at the suggestion of the third author, who desires that those at the end of the alphabet get some recognition.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Key words:
- Sampling
- Gabor frame
- Delta trains
- Kohn-Nirenberg symbol
- Spreading function
- Operator Paley-Wiener space
- Operator identification
- Operator sampling
- Channel identification
- Channel measurement
- Rake receiver
- Time-variant filters
- Band-limited operators
- Gabor matrices
- Stochastic operator
- Compressive sensing
- Matrix probing
- MIMO channel
Introduction
The problem of identification of a time-variant communication channel arose in the 1950s as the problem of secure long-range wireless communications became increasingly important due to the geopolitical situation at the time. Some of the theoretical and practical advances made then are described in this paper, and more recent advances extending the theory to more general operators, and onto a more rigorous mathematical footing, known as sampling of operators are developed here as well.
The launching point for the theory of operator sampling is the early work of the third-named author in his Master’s thesis at MIT, entitled “Sampling models for linear time-variant filters” [19], see also [22, 23], and [21] in which he reviews the identification problem for time-variant channels. The third named author as well as Bello in subsequent work [6] were attempting to understand and describe the theoretical limits of identifiability of time-variant communication channels. Section “Historical Remarks” of this paper describes in some detail their work and explores some of the important mathematical challenges they faced. In Section “Operator Sampling”, we describe the more recently developed framework of operator sampling. Results addressing the problem considered by Bello are based on insights on finite dimensional Gabor systems which are presented in Section “Linear Independence Properties of Gabor Frames”. Malikiosis’s recent result [32] allows for the generalization of those results to a higher-dimensional setting, these are stated and proven in Section “Generalizations of operator sampling to higher dimensions”. We conclude the paper in Section “Further results on operator sampling” with a short summary of the sampling of operators literature, that is, of results presented in detail elsewhere.
Historical Remarks
The Cold War Origins of the Rake System
In 1958, Price and Green published A Communication Technique for Multi-path Channels in Proc. IRE [56], in which they describe a communication system called Rake, designed to solve the multi-path problem. When a wireless transmitter does not have line-of-sight with the receiver, the transmitted signal is reflected possibly multiple times before reaching the receiver. Reflection by stationary objects such as the ground or buildings introduces random time delays to the signal, and reflection or refraction by moving objects such as clouds, the troposphere, ionosphere, or a moving vehicle produce random frequency or Doppler shifts in the signal as well. Due to scattering and absorption, the reflected signals are randomly amplitude-attenuated too. The problem is to recover the transmitted signal as accurately as possible from the superposition of time-frequency-shifted and randomly amplitude-attenuated versions of it. Since the location and velocities of the reflecting objects change with time, the effects of the unknown, time-variant channel must be estimated and compensated for.
Price and Green’s paper [56] was the disclosure in the literature of a long-distance system of wide-band or spread-spectrum communications that had been developed in response to strategic needs related to the Cold War. This fascinating story has been described in several articles by those directly involved [12, 55, 57, 58]. We present a summary of those remarks and of the Rake system below. The goal is to motivate the original work of the third-named author on which the theory of operator sampling is based.
In the years following World War II, the Soviet Union was exercising its power in Eastern Europe with a major point of contention being Berlin, which the Soviets blockaded in the late 1940s. This made secure communication with Berlin a top priority. As Paul Green describes it,
[T]he Battle of Berlin was raging, the Russians having isolated the city physically on land, so that the Berlin Airlift was resorted to, and nobody knew when all the communication links would begin to be subjected to heavy Soviet jamming. [12]
By 1950, with a shooting war in Korea about to begin, the Army Signal Corps approached researchers at MIT to develop secure, and reliable wireless communication with the opposite ends of the world. According to Green,
It is difficult today to recall the fearful excitement of those times. The Russians were thought to be 12 feet high in anything having to do with applying mathematics to communication problems (“all Russians were Komogorovs or Kotelnikovs”)....[T]here was a huge backlog of unexploited theory lying around, and people were beginning to build digital equipment with the unheard of complexity of a hundred or so vacuum tube-based bits (!). And the money flowed. [12]
The effort was called Project Lincoln (precursor to Lincoln Laboratory). The researchers were confronted by two main problems: 1) making a communications system robust to noise and deliberate jamming, and 2) enabling good signal recovery from multiple paths.
Spread Spectrum communications and NOMAC
The technique chosen to address the first problem is an application of the notion, already well understood and used by that time, that combatting distortions from noise and jamming can be achieved by spreading the signal over a wide frequency band. The idea of spreading the spectrum had been around for a long time [51, 55, 63] and can be found even in a now famous Hedy Lamarr-George Antheil patent of 1942 [33, 55], which introduced the concept later called “frequency hopping”. The system called NOMAC (Noise Modulation and Correlation) was developed in the early 1950s and used noise-like (pseudo-noise or PN) signals to achieve spectrum spreading. Detailed discussion of its history can be found in [12, 55, 64].
The huge backlog of “unexploited theory” mentioned above included the recent work of Claude Shannon on communication theory [61], of Norbert Wiener on correlation functions and least mean squares prediction and filtering [65], and recent applications of statistical decision theory to detection problems in radar and communications.
The communication problem addressed by NOMAC was to encode data represented by a string of ones and zeros into analog signals that could be electromagnetically transmitted over a noisy communication channel in a way that foiled “jamming” by enemies. The analog signals x 1(⋅ ) and x 0(⋅ ), commonly called Mark and Space, associated with the data digits 1 and 0, were chosen to be waveforms of approximate bandwidth B, and with small cross correlation. The target application was 60 wpm teletype, with 22 msec per digit (called a baud), which corresponds to a transmission rate of 1∕0. 022 sec = 45 Hz. The transmitted signals were chosen to have a bandwidth of 10 KHz, which was therefore expected to yield a “jamming suppression ratio” of 10, 000∕45 = 220, or 23 db [12, 64]. The jamming ratio is often called the “correlation gain”, because the receiver structure involves cross correlation of the received signal with each of the possible transmitted signals. If the correlation with the signal x 1(⋅ ) is larger than the one with the signal x 0(⋅ ), then it is decided that the transmitted signal corresponded to the digit 1. This scheme can be shown to be optimum in the sense of minimum probability of error provided that the transmitted signals are not distorted by the communications channel and that the receiver noise is white Gaussian noise (see, for example, [16]). The protection against jamming is because unless the jammer has good knowledge of the noise-like transmitted signals, any jamming signals would just appear as additional noise at the output of the crosscorrelations.
More details on the nontrivial ideas required for building a practical system can be found in the references. We may mention that the key ideas arose from three classified MIT dissertations by Basore [4], Pankowski [34], and Green [10], in fact, documents on NOMAC remained classified until 1961 [12].
A transcontinental experiment was run on a NOMAC system, but was found to have very poor performance because of the presence of multiple paths; the signals arriving at the receiver by these different paths sometimes interfere destructively. This is the phenomenon of “fading”, which causes self-jamming of the system. Some improvement was achieved by adding additional circuitry and the receiver to separately identify and track the two strongest signals and combine them after phase correction; this use of time and space diversity enabled a correlation gain of 17 db, 6 db short of the expected performance. It was determined that this loss was because of the neglected weaker paths, of which there could be as many as 20 or 30. So attention turned to a system that would allow the use of all the different paths.
The RAKE system
One conceptual basis for this new system was provided by the doctoral thesis of Robert Price [52], the main results of which were published in 1956 [53]. In a channel with severe multi-path the signal at the receiver is composed of large number of signals of different amplitudes and phases and so Price made the assumption that the received “signal” was a Gaussian random process. He studied the problem of choosing between the hypothesis
where the random time variant linear communication channel A is such that the {Ax i (⋅ )} are Gaussian processes. In this case, the earlier cross correlation detection scheme makes no sense, because the “signal” arriving at the receiver is not deterministic but is a sample function of a random process, which is not available to the receiver because it is corrupted by the additive noise. Price worked out the optimum detection scheme and then ingeniously interpreted the mathematical formulas to conclude that the new receiver forms least mean-square estimates of the {Ax i (⋅ )} and then crosscorrelates the w(⋅ ) against these estimates. In practice of course, one does not have enough statistical information to form these estimates and therefore more heuristic estimates are used and this was done in the actual system that was built. The main heuristic, from Wiener’s least mean-square smoothing filter solution and earlier insights, is that one should give greater weight to paths with higher signal-to-noise ratio.
So Price and Green devised a new receiver structure comprised of a delay line of length 3 ms intervals (the maximum expected time spread in their channel), with 30 taps spaced every 1∕10 Khz, or 100 μs. This would enable the capture of all the multi-path signals in the channel. Then the tap gains were made proportional to the strength of the signal received at that tap. Since a Mark/Space decision was only needed every 22 ms (for the transmission rate of 60 wpm), and since the fading rate of the channel was slow enough that the channel characteristics remain constant over even longer than 22 ms, tap gains could be averaged over several 3 ms intervals. The new system was called “Rake”, because the delay line structure resembled that in a typical garden rake!
Trials showed that this scheme worked well enough to recover the 6 db loss experienced by the NOMAC system. The system was put into production and was successfully used for jam-proof communications between Washington DC and Berlin during the “Berlin crisis” in the early 60s.
HF communications is no longer very significant, but the Rake receiver has found application in a variety of problems such as sonar, the detection of underground nuclear explosions, and planetary radar astronomy (pioneered by Price and Green, [11, 54]) and currently it is much used in mobile wireless communications. It is interesting to note that the eight racks of equipment needed to build the Rake system in the 1960s is now captured in a small integrated circuit chip in a smart phone!
However the fact that the Rake system did not perform satisfactorily when the fading rates of the communication channel were not very slow led MIT professor John Wozencraft, (who had been part of the Rake project team at Lincoln Lab) to suggest in 1957 (even before the open 1958 publication of the Rake system) to his new graduate student Thomas Kailath a fundamental study of linear time-variant communication channels and their identifiability for his Masters thesis. While time-variant linear systems had begun to be studied at least as early as 1950 (notably by Zadeh [66]), in communication systems there are certain additional constraints, notably limits on the bandwidths of the input signal and the duration of the channel memory. So a more detailed study was deemed to be worthwhile.
Kailath’s Time-Variant Channel Identification Condition
In the paper [19], the author considers the problem of measuring a channel whose characteristics vary rapidly with time. He considers the dependence of any theoretical channel estimation scheme on how rapidly the channel characteristics change and concludes that there are theoretical limits on the ability to identify a rapidly changing channel. He models the channel A as a linear time-variant filter and defines
\(A(\lambda,t) =\) response of A, measured at time t to a unit impulse input at time \(t-\lambda\).
\(A(\lambda,t)\) is one form of the time-variant impulse response of the linear channel that emphasizes the role of the “age” variable \(\lambda\). The channel response to an input signal x(⋅ ) is
An impulse response \(A(\lambda,t) = A(\lambda )\) represents a time-invariant filter. Further, the author states
Therefore the rate of variation of \(A(\lambda,t)\) with t, for fixed \(\lambda\), is a measure of the rate of variation of the filter. It is convenient to measure this variation in the frequency domain by defining a function \(\mathcal{A}\)
$$\displaystyle{\mathcal{A}(\lambda,f) =\int _{ -\infty }^{\infty }A(\lambda,t)e^{-2\pi ift}dt\quad }$$
Then he defines
While symmetric support is assumed in the paper, this definition makes clear that non-rectangular regions of support are already in view. Additionally, he defines the memory as the maximum time-delay spread in response to an impulse of the channel as
In short, the assumption in the continuation of the paper is that
where W = B∕2. The function \(\mathcal{A}(\lambda,f)\) is often called the spreading function of the channel. He then asks under what assumptions on L and B = 2W can such a channel be measured? In the context of the Rake system, this translates to the question of whether there are limits on the rate of variation of the filter that can assure that the measurement filter can be presumed to be effective.
The author’s assertion is that as long as BL ≤ 1, then a “simple measurement scheme” is sufficient.
We have assumed that the bandwidth of any “tap function”, \(A_{\lambda }(\cdot )\,[= A(\lambda,\cdot )]\), is limited to a frequency region of width B, say a low-pass region (−W, W) for which B = 2W. Such band-limited taps are determined according to the Sampling theorem, by their values at the instants i∕2W, i = 0, ±1, ±2, ….
If the memory, L, of the filter, \(A(\lambda,t)\) is less than 1∕2W these values are easily determined: we put in unit impulses to \(A(\lambda,t)\) at instants 0, 1∕2W, 2∕2W, …, T, and read off from the responses the desired values of the impulse response \(A(\lambda,t)\). [...] If L ≤ 1∕2W, the responses to the different input impulses do not interfere with one another and the above values can be unambiguously determined.
In other words, sufficiently dense samples of the tap functions can be obtained by sending an impulse train \(\sum _{n}\delta _{n/2W}\) through the channel. Indeed,
Evaluating the operator response at \(t =\lambda _{0} + n_{0}/2W\), n 0 ∈ ℤ, we obtain
since L ≤ 1∕2W implies that \(A(\lambda _{0} + (n_{0} - n)/2W,\lambda _{0} + n_{0}/2W) = 0\) if n ≠ n 0. In short, for each \(\lambda\), the samples \(A(\lambda,\lambda +n/2W)\) for n ∈ ℤ can be recovered.
The described Kailath sounding procedure is depicted in Figure 1. In this visualization, we plot the kernel κ(s, t) = A(t − s, t) of the operator A, that is,
Necessity of Kailath’s Condition for Channel Identification
For the “simple measurement scheme” to work, BL ≤ 1 is sufficient but could be restrictive.
We need, therefore, to devise more sophisticated measurement schemes. However, we have not pursued this question very far because for a certain class of channels we can show that the condition
$$\displaystyle{L \leq 1/2W,\text{ i.e. },BL \leq 1}$$is necessary as well as sufficient for unambiguous measurement of \(A(\lambda,t)\). The class of channels is obtained as follows: We first assume that there is a bandwidth constraint on the possible input signals to \(A(\lambda,t)\), in that the signals are restricted to (−W i , W i ) in frequency. We can now determine a filter \(A_{W_{i}}(\lambda,t)\) that is equivalent to \(A(\lambda,t)\) over the bandwidth (−W i , W i ), and find necessary and sufficient conditions for unambiguous measurement of \(A_{W_{i}}(\lambda,t)\). If we now let \(W_{i} \rightarrow \infty \), this condition reduces to condition (1), viz: L ≤ 1∕2W. Therefore, condition (1) is valid for all filters \(A(\lambda,t)\) that may be obtained as the limit of band-limited channels. This class includes almost all filters of physical interest. The argument is worked out in detail in Ref. 6 Footnote 1 but we give a brief outline here.
The class of operators in view here can be described as limits (in some unspecified sense) of operators whose impulse response \(A(\lambda,t)\) is bandlimited to [−W i , W i ] in \(\lambda\) for each t and periodic with period T > 0 in t for each \(\lambda\). Here, T is assumed to have some value larger than the maximum time over which the channel will be operated. We could take it as the duration of the input signal to the channel.
The restriction to input signals bandlimited to (−W i , W i ) indicates that it suffices to know the values of \(A(\lambda,t)\) or \(\mathcal{A}(\lambda,f)\) for a finite set of values of \(\lambda\): \(\lambda = 0\), 1∕2W i , 2∕2W i , …, L, assuming for simplicity that L is a multiple of 1∕2W i . Therefore, we can write
where \(\mathrm{sinc}_{W_{i}}(t) =\sin (2\pi W_{i}t)/(2\pi W_{i}t)\) so that as \(W_{i} \rightarrow \infty \), \(\mathrm{sinc}_{W_{i}}(t)\) becomes more concentrated at the origin.
Also, T-periodicity in t allows us to write
so that combining gives
Based on the restriction to bandlimited input signals which are T periodic, we have obtained a representation of A which is neither compactly supported in \(\lambda\) nor bandlimited in t. However, the original restriction that
motivates the assumption that we are working with finite sums, viz.
This is how the author obtains the estimate that there are at most (2W i L + 1)(2WT + 1) degrees of freedom in any impulse response A in the given class.
For any input signal x(t) bandlimited to [−W i , W i ], the output will be bandlimited to [−W − W i , W + W i ]. Specifically,
Since \(e^{2\pi ikt/T}\,(x {\ast}\mathrm{ sinc}_{W_{i}})(t - n/2W_{i})\) is bandlimited to [−W i , W i ] + (k∕T) for k∕T ∈ [−W, W], it follows that Ax(t) is bandlimited to [−W − W i , W + W i ].
If we restrict our attention to signals x(t) time-limited to [0, T], the output signal Ax(t) will have duration T + L, and Ax(⋅ ) will be completely determined by its samples at \(\frac{n} {2(W+W_{i})} \in [0,T + L]\), from which we can identify 2(T + L)(W + W i ) + 1 degrees of freedom.
In order for identification to be possible, the number of degrees of freedom of the output signal must be at least as large as the number of degrees of freedom of the operator, i.e.
which reduces ultimately to
That is, BL needs to be strictly smaller than 1 in the approximation while BL = 1 may work in the limiting case \(W_{i} \rightarrow \infty \) (and/or \(T \rightarrow \infty \)).
This result got a lot of attention because it corresponded with experimental evidence that Rake did not function well when the condition BL < 1 was violated. It led to the designation of “underspread” and “overspread” channels for which BL was less than or greater than 1.
Some Remarks on Kailath’s Results
This simple argument is surprising, particularly in light of the fact that the author obtained a deep result in time-frequency analysis with none of the tools of modern time-frequency analysis at his disposal. He very deftly uses the extremely useful engineering “fiction” that the dimension of the space of signals essentially bandlimited to [−W, W] and time-limited to [0, T] is approximately 2WT. The then recent papers of Landau, Slepian, and Pollak [28, 29], which are mentioned explicitly in [19], provided a rigorous mathematical framework for understanding the phenomenon of essentially simultaneous band- and time-limiting. While the existence of these results lent considerable mathematical heft to the argument, they were not incorporated into a fully airtight mathematical proof of his theorem.
In the proof we have used a degrees-of-freedom argument based on the sampling theorem which assumes strictly bandlimited functions. This is an unrealistic assumption for physical processes. It is more reasonable to call a process band (or time) limited if some large fraction of its energy, say 95%, is contained within a finite frequency (or time) region. Recent work by Landau and Slepian has shown the concept of approximately 2TW degrees of freedom holds even in such cases. This leads us to believe that our proof of the necessity of the BL ≤ 1 condition is not merely a consequence of the special properties of strictly band-limited functions. It would be valuable to find an alternative method of proof.
While Kailath’s Theorem is stated for channel operators whose spreading functions are supported in a rectangle, it is clear that the later work of Bello [6] was anticipated and more general regions were in view. This is stated explicitly.
We have not discussed how the bandwidth, B is to be defined. There are several possibilities: we might take the nonzero f-region of \(\mathcal{A}(\lambda,f)\); or use a “counting” argument. We could proceed similarly for the definition of L. As a result of these several possibilities, the value 1, of the threshold in the condition BL ≤ 1 should be considered only as an order of magnitude value.
...constant and predictable variations in B and L, due for example to known Doppler shifts or time displacements, would yield large values for the absolute values of the time and frequency spreadings. However such predictable variations should be subtracted out before the BL product is computed; what appears to be important is the area covered in the time- and frequency-spreading plane rather than the absolute values of B and L. (emphasis added)
The reference to “counting” as a definition of bandwidth clearly indicates that essentially arbitrary regions of support for the operator spreading function were in view here, and that a necessity argument relying on degrees of freedom and not the shape of the spreading region was anticipated. The third-named author did not pursue the measurement problem studied in his MS thesis because he went on in his PhD dissertation to study the optimum (in the sense of minimum probability of error) detector scheme of which Rake is an intelligent engineering approximation. See [20, 21, 23].
The mathematical limitations of the necessity proof in [19] can be removed by addressing the identification problem directly as a problem on infinite-dimensional space rather than relying on finite-dimensional approximations to the channel. This approach also avoids the problem of dealing with simultaneously time and frequency-limited functions. In this way, the proof can be made completely mathematically rigorous. This approach is described in Section “Kailath’s necessity proof and operator identification”.
Bello’s time-variant Channel Identification Condition
Kailath’s Theorem was generalized by Bello in [6] along the lines anticipated in [19]. Bello’s argument follows that of [19] in its broad outlines but with some significant differences. Bello clearly anticipates some of the technical difficulties that have been solved more recently by the authors and others and which have led to the general theory of operator sampling.
Continuing with the notation of this section, Bello considers channels with spreading function \(\mathcal{A}(\lambda,f)\) supported in a rectangle [0, L] × [−W, W]. If L and W are all that is known about the channel, then Kailath’s criterion for measurability requires that 2WL ≤ 1. Bello considers channels for which 2WL may be greater than 1 but for which
and argues that this is the most appropriate criterion to assess measurability of the channel modeled by A.
In order to describe Bello’s proof we will fix parameters T ≫ L and W i ≫ W and following the assumptions earlier in this section, assume that inputs to the channel are time-limited to [0, T] and (approximately) bandlimited to [−W i , W i ]. Under this assumption, Bello considers the spreading function of the channel to be approximated by a superposition of point scatterers, viz.
Hence the response of the channel to an input x(⋅ ) is given by
Note that this is a continuous-time Gabor expansion with window function x(⋅ ) (see, e.g., [13]). By standard density results in Gabor theory, the collection of functions \(\{x(t - (n/2W_{i}))\,e^{2\pi i(k/T)(t-(n/2W_{i}))}\}\) is overcomplete as soon as 2TW i > 1. Consequently, without further discretization, the coefficients A n, k are in principle unrecoverable. Taking into consideration support constraints on \(\mathcal{A}\), we assume that the sums are finite, viz.
Hence determining the channel characteristics amounts to finding A n, k for those pairs (n, k). It should be noted that for a given spreading function \(\mathcal{A}(\lambda,f)\) for which \(\mathop{\mathrm{supp}}\nolimits \mathcal{A}\) is a Lebesgue measurable set, given ε > 0, there exist T and W i sufficiently large that the number of such (n, k) is no more than 2S A W i T(1 +ε). On the other hand, for a given T and W i , there exist spreading functions \(\mathcal{A}(\lambda,f)\) with arbitrarily small non-convex S A for which the number of nonzero coefficients A n, k can be large. For example, given T and W i , S A could consist of rectangles centered on the points (n∕(2W i ), k∕T) with arbitrarily small total area.
By sampling, (1) reduces to a discrete, bi-infinite linear system, viz.
for p ∈ ℤ. Note that (2) is the expansion of a vector in a discrete Gabor system on ℓ 2(ℤ), a fact not mentioned by Bello, and of which he was apparently unaware. Specifically, defining the translation operator \(\mathcal{T}\) and the modulation operator \(\mathcal{M}\) on ℓ 2 by
(2) can be rewritten as
Since there are only finitely many nonzero unknowns in this system, Bello’s analysis proceeds by looking at finite sections of (4) and counting degrees of freedom.
Necessity. Following the lines of the necessity argument in [19], we note that there are at least 2(T + L)(W + W i ) degrees of freedom in the output vector Ax(t), that is, at least that many independent samples of the form Ax(p∕2W i ), and as observed above, no more than 2S A W i T(1 +ε) nonzero unknowns A n, k . Therefore, in order for the A n, k to be determined in principle, it must be true that
or
Letting \(T,\,W_{i} \rightarrow \infty \) and ε → 0, we arrive at S A ≤ 1.
Sufficiency. Considering a section of the system (4) based on the assumption that \(\mathop{\mathrm{supp}}\nolimits \mathcal{A}\subseteq [0,L] \times [-W,W]\), the system has approximately 2W i (T + L) equations in (2W i T)(2WL) unknowns. Since L and 2W are simply the dimensions of a rectangle that encloses the support of \(\mathcal{A}\), 2WL may be quite large and independent of S A . Hence the system will not in general be solvable. However by assuming that S A < 1, only approximately S A (2W i T) of the A n, k do not vanish and the system reduces to one in which the number of equations is roughly equal to the number of unknowns. In this case it would be possible to solve (4) as long as the collection of appropriately truncated vectors \(\{\mathcal{T}^{n}\mathcal{M}^{k}x: A_{n,k}\neq 0\}\) forms a linearly independent set for some vector x.
In his paper, Bello was dealing with independence properties of discrete Gabor systems apparently without realizing it, or at least without stating it explicitly. Indeed, he argues in several different ways that a vector x that produces a linearly independent set should exist, and intriguingly suggests that a vector consisting of ± 1 should exist with the property that the Grammian of the Gabor matrix corresponding to the section of (4) being considered is diagonally dominant.
The setup chosen below to prove Bello’s assertion leads to the consideration of a matrix whose columns stem from a Gabor system on a finite-dimensional space, not on a sequence space.
Operator Sampling
The first key contribution of operator sampling is the use of frame theory and time-frequency analysis to remove assumptions of simultaneous band- and time-limiting, and also to deal with the infinite number of degrees of freedom in a functional analytic setting (Section “Operator classes and operator identification”). A second key insight is the development of a “simple measurement scheme” of the type used by the third-named author but that allows for the difficulties identified by Bello to be resolved. This insight is the use of periodically weighted delta-trains as measurement functions for a channel. Such measurement functions have three distinct advantages.
First, they allow for the channel model to be essentially arbitrary and clarify the reduction of the operator identification problem to a finite-dimensional setting without imposing a finite dimensional model that approximates the channel. Second, it combines the naturalness of the simple measurement scheme described earlier with the flexibility of Bello’s idea for measuring channels with arbitrary spreading support. Third, it establishes a connection between identification of channels and finite-dimensional Gabor systems and allows us to determine windowing vectors with appropriate independence properties.
In Section “Operator classes and operator identification”, we introduce some operator-theoretic descriptions of some of the operator classes that we are able to identify, and discuss briefly different ways of representing such operators. Such a discussion is beneficial in several ways. First, it contains a precise definition of identifiability, which comes into play when considering the generalization of the necessity condition for so-called overspread channels (Section “Kailath’s necessity proof and operator identification”). Second, we can extend the necessity condition to a very large class of inputs. In other words, we can assert that in a very general sense, no input can identify an overspread channel. Third, it allows us to include both convolution operators and multiplication operators (for which the spreading functions are distributions) in the operator sampling theory. The identification of multiplication operators via operator sampling reduces to the classical sampling formula, thereby showing that classical sampling is a special case of operator sampling. In Section “Kailath’s necessity proof and operator identification” we present a natural formalization of the original necessity proof of [19] (Section “Necessity of Kailath’s Condition for Channel Identification”) to the infinite-dimensional setting, which involves an interpretation of the notion of an under-determined system to that setting. Finally, in Section “Identification of operator Paley-Wiener spaces by periodically weighted delta-trains” we present the scheme given first in [41, 45] for the identification of operator classes using periodically weighted delta trains and techniques from modern time-frequency analysis.
Operator classes and operator identification
We formally consider an arbitrary operator as a pseudodifferential operator represented by
where \(\sigma _{H}(x,\xi ) \in L^{2}(\mathbb{R}^{2})\) is the Kohn-Nirenberg (KN) symbol of H. The spreading function η H (t, ν) of the operator H is the symplectic Fourier transform of the KN symbol, viz.
and we have the representation
where \(\mathcal{T}_{t}f(x) = f(x - t)\) is the time-shift operator and \(\mathcal{M}_{\nu }f(x) = e^{2\pi i\nu x}\,f(x)\) is the frequency-shift operator.
This is identical to the representation given in [19] where \(\eta _{H}(t,\nu ) = \mathcal{A}(\nu,t)\), see Section “Kailath’s Time-Variant Channel Identification Condition”.
To see more clearly where the spreading function arises in the context of communication theory, we can define the impulse response of the channel modeled by H, denoted h H (x, t), by
Note that if h H were independent of x, then H would be a convolution operator and hence a model for a time-invariant channel. In fact, with κ H (x, t) being the kernel of the operator H,
where
With this interpretation, the maximum support of η H (t, ν) in the first variable corresponds to the maximum spread of a delta impulse sent through the channel and the maximum support of η H (t, ν) in the second variable corresponds to the maximum spread of a pure frequency sent through the channel.
Since we are interested in operators whose spreading functions have small support, it is natural to define the following operator classes, called operator Paley-Wiener spaces (see [38]).
Definition 1.
For \(S \subseteq \mathbb{R}^{2}\), we define the operator Paley-Wiener spaces OPW(S) by
Remark 1.
In [38, 42], the spaces OPW p, q(S), \(1 \leq p,\,q < \infty \), were considered, where L 2-membership of \(\sigma _{H}\) is replaced by
with the usual adjustments made when either \(p = \infty \) or \(q = \infty \). OPW p, q(S) is a Banach space with respect to the norm \(\|H\|_{OPW^{p,q}} =\|\sigma _{H}\|_{L^{p,q}}\). Note that if S is bounded, then \(OPW^{\infty,\infty }(S)\) consists of all bounded operators whose spreading function is supported on S. In fact, the operator norm is then equivalent to the \(OPW^{\infty,\infty }(S)\) norm, where the constants depend on S [26].
The general definition is beneficial since it also allows the inclusion of convolution operators with kernels whose Fourier transforms lie in L q(ℝ) (\(OPW^{\infty,q}(\mathbb{R})\)) and multiplication operators whose multiplier is in L p(ℝ) (\(OPW^{p,\infty }(\mathbb{R})\)).
The goal of operator identification is to find an input signal g such that each operator H in a given class is completely and stably determined by Hg. In other words, we ask that the operator H ↦ Hg be continuous and bounded below on its domain. In our setting, this translates to the existence of c 1, c 2 > 0 such that
This definition of identifiability of operators originated in [24]. Note that (13) implies that the mapping H ↦ Hg is injective, that is, that Hg = 0 implies that H ≡ 0, but is not equivalent to it. The inequality (13) adds to injectivity the assertion that H is also stably determined by Hg in the sense that a small change in the output Hg would correspond to a small change in the operator H. Such stability is also necessary for the existence of an algorithm that will reliably recover H from Hg. In this scheme, g is referred to as an identifier for the operator class OPW(S) and if (13) holds, we say that operator identification is possible.
In trying to find an explicit expression for an identifier, we use as a starting point the “simple measurement scheme” of [19], in which g is a delta train, viz. \(g =\sum _{n}\delta _{nT}\) for some T > 0. In the framework of operator identification the channel measurement criterion in [19] takes the following form [24, 38, 41].
Theorem 1.
For \(H \in OPW\big([0,T]\times [-\varOmega /2,\varOmega /2]\big)\) with TΩ≤1, we have
and H can be reconstructed by means of
where χ [0,T] (t) = 1 for t ∈ [0,T] and 0 elsewhere and with convergence in the L 2 norm and uniformly in x for every t.
As was observed earlier, the key feature of this scheme is that the spacing of the deltas in the identifier is sufficiently large so as to allow the response of the channel to a given delta to “die out” before the next delta is sent. In other words, the parameter T must exceed the time-spread of the channel. On the other hand, the rate of change of the channel, as measured by its bandwidth Ω, must be small enough that its impulse response can be recovered from “samples” of the channel taken T time units apart. In particular, the samples of the impulse response T units apart can be easily determined from the output. In the general case considered by Bello, in which the spreading support of the operator is not contained in a rectangle of unit area, this intuition breaks down.
Specifically, suppose that we consider the operator class OPW(S) where \(S \subseteq [0,T_{0}] \times [-\varOmega _{0}/2,\varOmega _{0}/2]\) and T 0 Ω 0 ≫ 1 but where | S | < 1. Then sounding the channel with a delta train of the form \(g =\sum _{n}\delta _{nT_{0}}\) would severely undersample the impulse response function. Simply increasing the sampling rate, however, would produce overlap in the responses of the channel to deltas close to each other. An approach to the undersampling problem in the literature of classical sampling theory is to sample at the low rate transformed versions of the function, chosen so that the interference of the several undersampled functions can be dealt with. This idea has its most classical expression in the Generalized Sampling scheme of Papoulis [35]. Choosing shifts and constant multiples of our delta train results in an identifier of the form \(g =\sum _{n}c_{n}\,\delta _{nT}\) where the weights (c n ) have period P (for some P ∈ ℕ) and T > 0 satisfies PT > T 0.
If g is discretely supported (for example, a periodically-weighted delta-train), then we refer to operator identification as operator sampling. The utility of periodically weighted delta trains for operator identification is a cornerstone of operator sampling and has far-reaching implications culminating in the developments outlined in Sections “Generalizations of operator sampling to higher dimensions” and “Further results on operator sampling”.
Kailath’s necessity proof and operator identification
In Section “Necessity of Kailath’s Condition for Channel Identification” we presented the proof of the necessity of the condition BL ≤ 1 for channel identification as given in [19]. The argument consisted of finding a finite-dimensional approximation of the channel H, and then showing that, given any putative identifier g, the number of degrees of freedom present in the output Hg must be at least as large as the number of degrees of freedom in the channel itself. For this to be true in any finite-dimensional setting, we must have BL < 1 and so in the limit we require BL ≤ 1. In essence, if BL > 1, we have a linear system with fewer equations than unknowns which necessarily has a nontrivial nullspace. The generalization of this notion to the infinite-dimensional setting is the basis of the necessity proof that appears in [24]. In this section, we present an outline of that proof, and show how the natural tool for this purpose once again comes from time-frequency analysis.
To see the idea of the proof, assume that BL > 1 and for simplicity let \(S = [-\frac{L} {2}, \frac{L} {2} ] \times [-\frac{B} {2}, \frac{B} {2} ]\). The goal is to show that for any sounding signal s in an appropriately large space of distributionsFootnote 2, the operator Φ s : OPW(S) ⟶ L 2(ℝ), H ↦ Hs, is not stable, that is, it does not possess a lower bound in the inequality (13).
First, define the operator E: l 0(ℤ 2) ⟶ O P W(S), where l 0(ℤ 2) is the space of finite sequences equipped with the l 2 norm, by
where \(1 <\lambda\) is chosen so that \(1 <\lambda ^{4} < BL\) and where P is a time-frequency localization operator whose spreading function η P (t, ν) is infinitely differentiable, supported in S, and identically one on \([-\frac{L} {2\lambda }, \frac{L} {2\lambda } ] \times [-\frac{B} {2\lambda }, \frac{B} {2\lambda } ]\). It is easily seen that the operator E is well defined and has spreading function
By construction, it follows that for some constant c 1, \(\|E(\sigma )\|_{OPW(S)} \geq c_{1}\|\sigma \|_{l^{2}(\mathbb{Z}^{2})}\), for all \(\sigma\), and that for any distribution s, Ps decays rapidly in time and in frequency.
Next define the Gabor analysis operator C g : L 2(ℝ) ⟶ l 2(ℤ 2) by
where \(g(x) = e^{-\pi x^{2} }\). A well-known theorem in Gabor theory asserts that \(\{\mathcal{M}_{k\alpha }\mathcal{T}_{l\beta }g\}_{k,l\in \mathbb{Z}}\) is a Gabor frame for L 2(ℝ) for every α β < 1 [31, 59, 60]. Consequently C g satisfies, for some c 2 > 0, \(\|C_{g}(s)\|_{l^{2}(\mathbb{Z}^{2})} \geq c_{2}\,\|s\|_{L^{2}(\mathbb{R})}\) for all s, since \(\lambda ^{2}/L\, \cdot \lambda ^{2}/B =\lambda ^{4}/BL < 1\).
For any s, consider the composition operator
The crux of the proof lies in showing that this composition operator is not stable, that is, it does not have a lower bound. Since C g and E are both bounded below, it follows that Φ s cannot be stable. Since s ∈ S′0(ℝ) was arbitrary, this completes the proof.
To complete this final step we examine the canonical bi-infinite matrix representation of the above defined composition of operators, that is, the matrix M = (m k′, l′, k, l ) that satisfies
It can be shown that M has the property that for some rapidly decreasing function w(x),
The proof is completed by the following Lemma. Its proof can be found in [24] and generalizations can be found in [37].
Lemma 1.
Given \(M = (m_{j',j})_{j',j\in \mathbb{Z}^{2}}\) . If there exists a monotonically decreasing function w: R 0 + ⟶ R 0 + with w = O(x −2−δ ), δ > 0, and constants \(\lambda > 1\) and K 0 > 0 with \(\vert m_{j',j}\vert < w(\|\lambda j' - j\|_{\infty })\) for \(\|\lambda j' - j\|_{\infty } > K_{0}\) , then M is not stable.
Intuitively, this result asserts that a bi-infinite matrix whose entries decay rapidly away from a skew diagonal behaves like a finite matrix with more rows than columns (see Figure 2). Such a matrix will always have a nontrivial nullspace. In the case of an infinite matrix what can be shown is that at best its inverse will be unbounded.
We can make a more direct connection from this proof to the original necessity argument in [19] in the following way. If we restrict our attention to sequences \(\{\sigma _{k,l}\}\) with a fixed finite support of size say N, then the image of this subspace of sequence space under the mapping E is an N-dimensional subspace of OPW(S). The operator P is essentially a time-frequency localization operator. This fact is established in [24] and follows from the rapid decay of the Fourier transform of η P . Since η P itself is concentrated on a rectangle of area \(BL/\lambda ^{2}\), its Fourier transform will be concentrated on a rectangle of area \(\lambda ^{2}/BL\). From this it follows that for \(\sigma\) as described above, the operator \(E(\sigma )\) essentially localizes a function to a region in the time-frequency plane of area \(N(\lambda ^{2}/BL)\).
Considering now the Gabor analysis operator C g , we observe that the Gaussian g(x) essentially occupies a time-frequency cell of area 1, and that this function is shifted in the time-frequency plane by integer multiples of \((\lambda ^{2}/B,\lambda ^{2}/L)\). Hence to “cover” a region in the time-frequency plane of area \(N(\lambda ^{2}/BL)\) would require only about
time-frequency shifts. So roughly speaking, in order to resolve N degrees of freedom in the operator \(E(\sigma _{k,l})\), we have only \(N/\lambda ^{2} < N\) degrees of freedom in the output of the operator \(E(\sigma _{k,l})s\).
Identification of operator Paley-Wiener spaces by periodically weighted delta-trains
Theorem 1 is based on arguments outlined in Section “Kailath’s Time-Variant Channel Identification Condition” and applies only to OPW(S) if S is contained in a rectangle of area less than or equal to one. In the following, we will develop the tools that allow us to identify OPW(S) for any compact set S of Lebesgue measure less than one.
In our approach we discretize the channel by covering the spreading support S with small rectangles of fixed sidelength, which we refer to as a rectification of S. As long as the measure of S is less than one, it is possible to do this in such a way that the total area of the rectangles is also less than one. This idea seems to bear some similarity to Bello’s philosophy of sampling the spreading function on a fixed grid but with one fundamental difference. Bello’s approach is based on replacing t and x by samples, thereby approximating the channel. For a better approximation, sampling on a finer grid is necessary, which results in a larger system of equations that must be solved. In our approach, as soon as the total area of the rectification is less than one, the operator modeling the channel is completely determined by the discrete model. Once this is achieved, identification of the channel reduces to solving a single linear system of equations at each point (Figure 3).
Given parameters T > 0 and P ∈ ℕ, we assume that S is rectified by rectangles of size T ×Ω, where Ω = 1∕(TP), such that the total area of the rectangles is less than one. Given a period-P sequence c = (c n ) n ∈ ℤ , we then define the periodically weighted delta-train g by \(g =\sum _{n\in \mathbb{Z}}c_{n}\,\delta _{nT}\). The goal of this subsection is to describe the scheme by which a linear system of P equations in a priori P 2 unknowns can be derived by which an operator H ∈ OPW(S) can be completely determined by Hg(x). In this sense, the “degrees of freedom” in the operator class OPW(S), and that of the output function Hg(x) are precisely defined and can be effectively compared (Figure 4).
The basic tool of time-frequency analysis that makes this possible is the Zak transform (see [13]).
Definition 2.
The non-normalized Zak Transform is defined for \(f \in \mathcal{S}(\mathbb{R})\) Footnote 3, and a > 0 by
Z a f(t, ν) satisfies the quasi-periodicity relations
and
\(\sqrt{a}\,Z_{ a}\) can be extended to a unitary operator from L 2(ℝ) onto L 2([0, a]×[0, 1∕a]).
A somewhat involved but elementary calculation yields the following (see [44, 46] and Section “Proof of Lemma 2”).
Lemma 2.
Let T > 0, P ∈ ℕ, c = (c n ), and g be given as above. Then for all (t,ν) ∈ ℝ 2 , and \(p = 0,\,1,\,\ldots,\,P-1\) ,
Here \(\mathcal{T}\) and \(\mathcal{M}\) are the translation and modulation operators given in Definition 3, and η H Q(t, ν) is the quasiperiodization of η H ,
whenever the sum is defined (Figure 4).
Under the additional simplifying assumption that the spreading function η H (t, ν) is supported in the large rectangle [0, TP] × [0, 1∕T], and by restricting (16) to the rectangle [0, T] × [0, 1∕(TP)], we arrive at the P × P 2 linear system
where
and where G(c) is a finite Gabor system matrix (23). If (18) can be solved for each (t, ν) ∈ [0, T] × [0, 1∕(TP)], then the spreading function for an operator H can be completely determined by its response to the periodically weighted delta-train g.
As anticipated by Bello, two issues now become relevant. (1) We require that \(\mathop{\mathrm{supp}}\nolimits \eta _{H}\) occupy no more than P of the shifted rectangles [0, T] × [0, 1∕(TP)] + (qT, k∕(TP)), so that (18) has at least as many equations as unknowns. This forces \(\vert \mathop{\mathrm{supp}}\nolimits \eta _{H}\vert \leq 1\). (2) We require that c be chosen in such a way that the P × P system formed by removing the columns of G(c) corresponding to vanishing components of \(\boldsymbol{\eta }_{H}\) is invertible. That such c exist is a fundamental cornerstone of operator sampling and is the subject of the next section.
Based on the existence of c such that any set of P columns of G(c) form a linearly independent set, we can prove the following [45].
Theorem 2.
For \(S \subseteq (0,\infty )\times \mathbb{R}\) compact with |S| < 1, there exists T > 0 and P ∈ ℕ, and a period-P sequence c = (c n ) such that \(g =\sum _{n}c_{n}\,\delta _{nT}\) identifies OPW(S). In particular, there exist period-P sequences b j = (b j,k ), and integers 0 ≤ q j , m j ≤ P−1, for 0 ≤ j ≤ P−1 such that
where \(r,\phi \in \mathcal{S}(\mathbb{R})\) satisfy
where \(r(t)\widehat{\phi }(\gamma )\) is supported in a neighborhood of [0,T]×[0,1∕PT], and where the sum in (21) converges unconditionally in L 2 and for each t uniformly in x.
Equation (21) is a generalization of (14) which is easily seen by choosing \(\phi (x) =\sin (\pi PTx)/(\pi PTx)\) and r(t) to be the characteristic function of [0, T).
Linear Independence Properties of Gabor Frames
Finite Gabor Frames
Definition 3.
Given P ∈ ℕ, let ω = e 2π i∕P and define the translation operator \(\mathcal{T}\) on \((x_{0},\,\ldots,\,x_{P-1}) \in \mathbb{C}^{P}\) by
and the modulation operator \(\mathcal{M}\) on \(\mathbb{C}^{P}\) by
Given a vector \(c \in \mathbb{C}^{P}\) the finite Gabor system with window c is the collection \(\{\mathcal{T}^{q}\mathcal{M}^{p}c\}_{q,p=0}^{P-1}\). Define the full Gabor system matrix G(c) to be the P × P 2 matrix
where D k is the diagonal matrix with diagonal
and W P is the P × P Fourier matrix W P = (e 2π i n m∕P) n, m = 0 P−1.
Remark 2.
(1) For 0 ≤ q, p ≤ P − 1, the (q + 1)st column of the submatrix D p W P is the vector \(\mathcal{M}^{p}\mathcal{T}^{q}c\) where the operators \(\mathcal{M}\) and \(\mathcal{T}\) are as in Definition 3. This means that each column of the matrix G(c) is a unimodular constant multiple of an element of the finite Gabor system with window c, namely \(\{e^{-2\pi ipq/P}\,\mathcal{T}^{q}\mathcal{M}^{p}c\}_{q,p=0}^{P-1}\).
(2) Note that the finite Gabor system defined above consists of P 2 vectors in \(\mathbb{C}^{P}\) which form an overcomplete tight frame for \(\mathbb{C}^{P}\) [30]. For details on Gabor frames in finite dimensions, see [9, 27, 30] and the overview article [39].
(3) Note that we are abusing notation slightly by identifying a vector \(c \in \mathbb{C}^{P}\) with a P-periodic sequence c = (c n ) in the obvious way.
Definition 4.
[8] The Spark of an M × N matrix F is the size of the smallest linearly dependent subset of columns, i.e.,
where ∥ x ∥ 0 is the number of nonzero components of the vector x. If Spark(F) = M + 1, then F is said to have full Spark. Spark(F) = k implies that any collection of fewer than k columns of F is linearly independent.
Finite Gabor frames are generically full Spark
The existence of Gabor matrices with full Spark has been addressed in [30, 32]. The results in these two papers are as follows.
Theorem 3.
[30] If P ∈ ℕ is prime, then there exists a dense, open subset of \(c \in \mathbb{C}^{P}\) such that every minor of the Gabor system matrix G(c) is nonzero. In particular, for such c, G(c) has full Spark.
Theorem 4.
[32] For every P ∈ ℕ, there exists a dense, open subset of \(c \in \mathbb{C}^{P}\) such that the Gabor system matrix G(c) has full Spark.
The goal of this subsection is to outline the proof of Theorems 3 and 4. We will adopt some of the following notation and terminology of [32].
Let P ∈ ℕ and let M be an P × P submatrix of G(c). For 0 ≤ κ < P let ℓ κ be the number of columns of M chosen from the submatrix D κ W P of (23). While the vector ℓ = (ℓ κ ) κ = 0 P−1 does not determine M uniquely, it describes the matrix M sufficiently well for our purposes. Define M κ to be the P × ℓ κ matrix consisting of those columns of M chosen from D κ W P . Given the ordered partition \(B = (B_{0},\,B_{1},\,\ldots,\,B_{P-1})\) where \(\{B_{0},\,B_{1},\,\ldots,\,B_{P-1}\}\) forms a partition of \(\{0,\,\ldots,\,P - 1\}\), and where for each 0 ≤ κ < P, | B κ | = ℓ κ , let M κ (B κ ) be the ℓ κ × ℓ κ submatrix of M κ whose rows belong to B κ . Then \(\det (M) =\sum \prod _{ \kappa =0}^{P-1}\det (M_{\kappa }(B_{\kappa }))\) where the sum is taken over all such ordered partitions B. This formula is called the Lagrange expansion of the determinant.
Each ordered partition B corresponds to a permutation on ℤ P as follows. Define the trivial partition \(A = (A_{0},\,A_{1},\,\ldots,\,A_{P-1})\) by
so that A 0 = [0, ℓ 0 − 1], A 1 = [ℓ 0, ℓ 0 + ℓ 1 + 1], \(\ldots\), A P−1 = [ℓ 0 + ⋯ + ℓ P−2, P − 1]. Then given \(B = (B_{0},\,B_{1},\,\ldots,\,B_{P-1})\) there is a permutation \(\sigma \in S_{P}\) such that \(\sigma (A_{\kappa }) = B_{\kappa }\) for all κ. This \(\sigma\) is unique up to permutations that preserve A, that is, up to τ ∈ S P such that τ(A κ ) = A κ for all κ. Call such a permutation trivial and denote by Γ the subgroup of S P consisting of all trivial permutations. Then the ordered partitions B of ℤ P can be indexed by equivalence classes of permutations \(\sigma \in S_{P}/\varGamma\).
The key observation is that \(\det (M)\) is a homogeneous polynomial in the P variables \(c_{0},\,c_{1},\,\ldots,\,c_{P-1}\) and we can write
where the monomial \(C^{\sigma }\) is given by
If it can be shown that this polynomial does not vanish identically, then we can choose a dense, open subset of \(c \in \mathbb{C}^{P}\) for which \(\det (M)\neq 0\). Since there are only finitely many P × P submatrices of G(c) it follows that there is a dense, open subset of c for which \(\det (M)\neq 0\) for all M, and we conclude that, for these c, G(c) has full Spark.
Following [32], we say that a monomial \(C^{\sigma _{0}}\) appears uniquely in (24) if for every \(\sigma \in S_{P}/\varGamma\) such that \(\sigma \neq \sigma _{0}\), \(C^{\sigma }\neq C^{\sigma _{0}}\). Therefore, in order to show that the polynomial (24) does not vanish identically, it is sufficient to show that (1) there is a monomial \(C^{\sigma }\) that appears uniquely in (24) and (2) the coefficient \(a_{\sigma }\) of this monomial does not vanish.
Obviously, whether or not (24) vanishes identically does not depend on how the variables c i are labelled. More specifically, if the variables are renamed by a cyclical shift of the indices, viz., c i ↦ c (i+γ)mod P for some 0 ≤ γ < P, then
where M′ is a P × P submatrix described by the vector
The lowest index monomial
In [30], a monomial referred to in [32] as the lowest index (LI) monomial is defined that has the required properties when P is prime. In order to see this, note first that each coefficient \(a_{\sigma }\) in the sum (24) is the product of minors of the Fourier matrix W P and since P is prime, Chebotarev’s Theorem says that such minors do not vanish [62]. More specifically,
and for each κ, the columns of M κ are columns of W P where each row has been multiplied by the same variable c j and \(M_{\kappa }(\sigma (A_{\kappa }))\) is a square matrix formed by choosing ℓ κ rows of M κ . Hence for each κ, \(\det (M_{\kappa }(\sigma (A_{\kappa })))\) is a monomial in c with coefficients a constant multiple of a minor of W P . Since \(a_{\sigma }\) is the product of those minors, it does not vanish.
Note moreover that each submatrix \(M_{\kappa }(\sigma (A_{\kappa }))\) is an ℓ κ × ℓ κ matrix, so that \(\det (M_{\kappa }(\sigma (A_{\kappa })))\) is the sum of a multiple of the product of ℓ κ ! diagonals of \(M_{\kappa }(\sigma (A_{\kappa }))\). Hence \(a_{\sigma }\,C^{\sigma }\) is the sum of multiples of the product of \(\prod _{\kappa =0}^{P-1}\ell_{\kappa }!\) generalized diagonals of M.
We define the LI monomial as in [30] as follows. If M is 1 × 1, then \(\det (M)\) is a multiple of a single variable c j and we define the LI monomial, p M by p M = c j . If M is d × d, let c j be the variable of lowest index appearing in M. Choose any entry of M in which c j appears, eliminate the row and column containing that entry, and call the remaining (d − 1) × (d − 1) matrix M′. Define p M = c j p M′. It is easy to see that the monomial p M is independent of the entry of M chosen at each step. In order to show that the LI monomial appears uniquely in (24), we observe as in [30] that the number of diagonals in \(\det (M)\) that correspond to the LI monomial is \(\prod _{\kappa =0}^{P-1}\ell_{\kappa }!\). Since this is also the number of generalized diagonals appearing in the calculation of each \(\det (M_{\kappa }(\sigma (A_{\kappa })))\), it follows that this monomial appears only once. The details of the argument can be found in Section “Proof of Theorem 3”. Note that because P is prime, this argument is valid no matter how large the matrix M is. In other words, M does not have to be a P × P submatrix in order for the result to hold. Consequently, given k < P and M an arbitrary P × k submatrix of G(c), we can form the k × k matrix M 0 by choosing k rows of M in such a way that the LI monomial of M 0 contains at most only the variables \(c_{0},\,\ldots,\,c_{k-1}\). This observation leads to the following theorem for matrices with arbitrary Spark.
Theorem 5.
[46] If P ∈ ℕ is prime, and 0 < k < P, there exists an open, dense subset of \(c \in \mathcal{C}^{k} \times \{ 0\} \subseteq \mathbb{C}^{P}\) with the property that Spark(G(c)) = k + 1.
This result has implications for relating the capacity of a time-variant communication channel to the area of the spreading support, see [46].
The consecutive index monomial
In [32], a monomial referred to as the consecutive index (CI) monomial is defined that has the required properties for any P ∈ ℕ. The CI monomial, C I, is defined as the monomial corresponding to the identity permutation in S P ∕Γ, that is, to the equivalence class of the trivial partition \(A = (A_{0},\,A_{1},\,\ldots,\,A_{P-1})\). Hence
For each κ, the monomial appearing in \(\det (M_{\kappa }(A_{\kappa }))\), \(\prod _{j\in A_{\kappa }}c_{(j-\kappa )mod\ P}\), consists of a product of ℓ k variables c j with consecutive indices modulo P.
That a I ≠ 0 follows from the observation that for each κ, \(\det (M_{\kappa }(A_{\kappa }))\) is a monomial whose coefficient is a nonzero multiple of a Vandermonde determinant and hence does not vanish (for details, see [32]). The proof that C I appears uniquely in (24) amounts to showing that, with respect to an appropriate cyclical renaming of the variables c i , the CI monomial uniquely minimizes the quantity \(\varLambda (C^{\sigma }) =\sum _{ i=0}^{P-1}i^{2}\,\alpha _{i}\), where α i is the exponent of the variable c i in \(C^{\sigma }\). An abbreviated version of the proof of this result as it appears in [32] is given in Section “Proof of Theorem 4”.
As a final observation, we quote the following corollary that provides an explicit construction of a unimodular vector c such that G(c) has full Spark.
Corollary 1.
[32] Let \(\zeta = e^{2\pi i/(P-1)^{4} }\) or any other primitive root of unity of order (P − 1) 4 where P ≥ 4. Then the vector
generates a Gabor frame for which G(c) has full Spark.
Generalizations of operator sampling to higher dimensions
The operator representations (5), (6), and (7) hold verbatim for higher dimensional variables \(x,\xi,t,\nu \in \mathbb{R}^{d}\). In this section, we address the identifiability of
where \(S \subseteq \mathbb{R}^{2d}\).
Looking at the components of the multidimensional variables separately, Theorem 1 easily generalizes as follows.
Theorem 6.
For \(H \in OPW\big(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[-\varOmega _{\ell}/2,\varOmega _{\ell}/2]\big)\) with T ℓ Ω ℓ ≤1, ℓ = 1,…,d, we have
and H can be reconstructed by means of
with convergence in the L 2 norm.
In the following, we address the situation where S is not contained in a set \(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[-\varOmega _{\ell}/2,\varOmega _{\ell}/2]\) with T ℓ Ω ℓ ≤ 1, ℓ = 1, …, d. For example, \(S = [0,1] \times [0,2] \times [0, \frac{1} {4}] \times [0,1] \subseteq \mathbb{R}^{4}\) of volume \(\frac{1} {2}\) is not covered by Theorem 6.
To give a higher dimensional variant of Theorem 2, we shall denote pointwise products of finite and infinite length vectors k and T by \(k \star T\), that is, \(k \star T = (k_{1}T_{1},\ldots,k_{d}T_{d})\) for \(k,T \in \mathbb{C}^{d}\). Similarly, k∕T = (k 1∕T 1, …, k d ∕T d ).
Theorem 7.
If \(S \subseteq (0,\infty )^{d}\times \mathbb{R}^{d}\) is compact with |S| < 1, then OPW(S) is identifiable. Specifically, there exist T 1 ,…,T d > 0 and pairwise relatively prime natural numbers P 1 ,…,P d such that
and a sequence \(c = (c_{n}) \in \ell^{\infty }(\mathbb{Z}^{d})\) which is P ℓ periodic in the ℓ-th component n ℓ such that \(g =\sum _{n\in \mathbb{Z}^{d}}c_{n}\,\delta _{n\star T}\) identifies OPW(S). In fact, for such g there exists for each \(j \in J =\prod _{ \ell=1}^{d}\{0,1,\ldots,P_{\ell}-1\}\) a sequence b j = (b j,k ) which is P ℓ periodic in k ℓ and 2d-tuples (q j ,m j ) ∈ J × J with
The functions \(r,\phi \in \mathcal{S}(\mathbb{R}^{d})\) are assumed to satisfy
and \(r(t)\widehat{\phi }(\gamma )\) is supported in a neighborhood of \(\prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[0,1/P_{\ell}T_{\ell}]\) . The sum in (25) converges unconditionally in L 2 and for each t uniformly in x.
This result follows from adjusting the proof of Theorem 7 to the higher dimensional setting. For example, it will employ the Zak transform
where P = (P 1, …, P d ). We are then led again to a system of linear equations of the form
where as before
and where G(c) is now a multidimensional finite Gabor system matrix similar to (23).
In order to show that the spreading function for operator H can be completely determined by its response to the periodically weighted d-dimensional delta-train g, we need to show that (27) can be solved for each \((t,\nu ) \in \prod _{\ell=1}^{d}[0,T_{\ell}]\times \prod _{\ell=1}^{d}[0,1/(T_{\ell}P_{\ell})]\) if \(c \in \mathbb{C}^{P_{1}\times \ldots \times P_{d}}\) is chosen appropriately.
To see that a choice of c is possible, observe that the product group \(\mathbb{Z}_{P_{1}} \times \ldots \times \mathbb{Z}_{P_{d}}\) is isomorphic to the cyclic group \(\mathbb{Z}_{P_{1}\cdot \ldots \cdot P_{d}}\) since the P ℓ are chosen pairwise relatively prime. Theorem 4 applied to the cyclic group \(\mathbb{Z}_{P_{1}\cdot \ldots \cdot P_{d}}\) guarantees the existence of \(\widetilde{c} \in \mathbb{C}^{P_{1}\cdot \ldots \cdot P_{d}}\) so that the Gabor system matrix \(G(\widetilde{c})\) is full spark. We can now define \(c \in \mathbb{C}^{P_{1}\times \ldots \times P_{d}}\) by setting
and observe that G(c) is simply a rearrangement of \(G(\widetilde{c})\), hence, G(c) is full spark.
Further results on operator sampling
The results discussed in this paper are discussed in detail in [6, 22, 24, 38, 41] and [46]. The last listed article contains the most extensive collection of operator reconstruction formulas, including extensions to some OPW(S) with S unbounded. Moreover, some hints on how to use parallelograms to rectify a set S for operator sampling efficiently are given.
A central result in [46] is the classification of all spaces OPW(S) that are identifiable for a given \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) for c n being P-periodic.
The papers [38, 42] address some functional analytic challenges in operator sampling, and [26] focuses on the question of operator identification if we are restricted to using more realizable identifiers, for example, truncated and modified versions of g, namely, \(\widetilde{g}(t) =\sum _{ n=0}^{N}c_{n}\varphi (t - nT)\). The problem of recovering parametric classes of operators in OPW(S) is discussed in [2, 3].
In the following, we briefly review literature that address some other directions in operator sampling.
Multiple Input Multiple Output
A Multiple Input Multiput Output (MIMO) channel H with N transmitters and M receivers can be modeled by an N × M matrix whose entries are time-varying channel operators H mn ∈ OPW(S mn ). For simplicity, we write H ∈ OPW(S). Assuming that the operators H mn are independent, a sufficient criterion for identifiability is given by \(\sum _{n=1}^{N}\vert S_{mn}\vert \leq 1\) for m = 1, …, M. Conversely, if for a single m, \(\sum _{n=1}^{N}\vert S_{mn}\vert > 1\), then OPW(S) is not identifiable by any collection s 1, …, s N of input signals [36, 43].
A somewhat dual setup was considered in [18]. Namely, a Single Input Single Output (SISO) channel with S being large, say S = [0, M] × [−N∕2, N∕2] with N, M ≥ 2. As illustrated above, OPW([0, M] × [−N∕2, N∕2]) is not identifiable, but if we are allowed to use MN (infinite duration) input signals g 1, …, g MN , then H ∈ OPW([0, M] × [−N∕2, N∕2]) can be recovered from the MN outputs Hg 1, …, Hg MN .
Irregular Sampling of Operators
The identifier \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) is supported on the lattice T ℤ in ℝ. In general, for stable operator identification, choosing a discretely supported identifier is reasonable, indeed, in [26] it is shown that identification for OPW(S) in full requires the use of identifiers that neither decay in time nor in frequency. (Recovery of the characteristics of H during a fixed transmission band and a fixed transmission interval can be indeed recovered when using Schwartz class identifiers [26].)
In irregular operator sampling, we consider identifiers of the form \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{\lambda _{n}}\) for nodes \(\lambda _{n}\) that are not necessarily contained in a lattice. If such g identifies OPW(S), then we refer to \(\mathop{\mathrm{supp}}\nolimits g =\{\lambda _{n}\}\) as a sampling set for OPW(S), and similarly, the sampling rate of g is defined to be
where
assuming that the limit exists [18, 46].
To illustrate a striking difference between irregular sampling of functions and operators, note that ℤ is a sampling set for \(OPW([0,1] \times [-\frac{1} {2}, \frac{1} {2}])\) as well as for the Paley Wiener space \(PW([-\frac{1} {2}, \frac{1} {2}])\), but the distribution \(g = c_{0}\delta _{\lambda _{0}} +\sum _{n\in \mathbb{Z}\setminus \{0\}}c_{n}\delta _{n}\) does not identify \(OPW([0,1] \times [-\frac{1} {2}, \frac{1} {2}])\), regardless of our choice of c n and \(\lambda _{0}\neq 0\). This shows that, for example, Kadec’s \(\frac{1} {4}\) th theorem does not generalize to the operator setting [18].
In [46] we give with \(D(g) = D(\varLambda ) \geq B(S)\) a necessary condition on the (operator) sampling rate based on the bandwidth B(S) of OPW(S) which is defined as
Here, χ S denotes the characteristic function of S. This quantity can be interpreted as the maximum vertical extent of S and takes into account gaps in S. Moreover, in [46] we discuss the goal of constructing \(\{\lambda _{n}\}\) of small density, and/or large gaps in order to reserve time-slots for information transmission. Results in this direction can be interpreted as giving bounds on the capacity of a time-variant channel in OPW(S) in terms of | S | [46].
Finally, we give in [46] an example of an operator class OPW(S) that cannot be identified by any identifier of the form \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{nT}\) with T > 0 and periodic c n , but requires coefficients that form a bounded but non-periodic sequence. In this case, S is a parallelogram and B(S) = D(g) (see Figure 5)
Sampling of OPW(S) with unknown S
In some applications, it is justified to assume that the set S has small area, but its shape and location are unknown. If further S satisfies some basic geometric assumptions that guarantee that S is contained in [0, TP] × [−1∕2T, 1∕2T] and only meets few rectangles of the rectification grid [kT, (k + 1)T] × [q∕TP, (q + 1)∕TP], then recovery of S and, hence, an operator in OPW(S) is possible [15, 46].
The independently obtained results in [15, 46] employ the same identifiers \(g =\sum _{n\in \mathbb{Z}}c_{n}\delta _{\lambda _{n}}\) as introduced above. Operator identification is therefore again reduced to solving (18), that is, the system of P linear equations
for the vector \(\boldsymbol{\eta }(t,\nu ) \in \mathbb{C}^{P^{2} }\) for (t, ν) ∈ [0, T] × [−1∕2TP, 1∕2TP]. While the zero components of \(\boldsymbol{\eta }(t,\nu )\) are not known, the vector is known to be very sparse. Hence, for fixed (t, ν), we can use the fact that G(c) is full spark and recover \(\boldsymbol{\eta }(t,\nu )\) if it has at most P∕2 nonzero entries. Indeed, assume \(\boldsymbol{\eta }(t,\nu )\) and \(\widetilde{\boldsymbol{\eta }}(t,\nu )\) solve (29) and both have at most P∕2 nonzero entries. Then \(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu )\) has at most P nonzero entries and the fact that G(c) is full spark indicates that \(G(c)(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu )) = 0\) implies \(\boldsymbol{\eta }(t,\nu ) -\widetilde{\boldsymbol{\eta }} (t,\nu ) = 0\).
Clearly, under the geometric assumptions alluded to above, the criterion that at most P∕2 rectangles in the grid are met can be translated to the unknown area of S has measure less than or equal to 1/2.
In [15], this area 1/2 criterion is improved by showing that H can be identified whenever at most P − 1 rectangles in the rectification grid are met by S. This result is achieved by using a joint sparsity argument, based on the assumption that for all (t, ν), the same cells are active.
Alternatively, the “area 1/2” result can be strengthened by not assuming that for all (t, ν), the same cells are active. This requires solving (29), for \(\boldsymbol{\eta }(t,\nu )\) sparse, for each considered (t, ν) independently, see Figure 6 and [46].
It must be added though, that solving (29) for \(\boldsymbol{\eta }(t,\nu )\) being P∕2 sparse is not possible for moderately sized P, for example for P > 15. If we further reduce the number of active boxes, then compressive sensing algorithms such as Basis Pursuit and Orthogonal Matching Pursuit become available, as is discussed in the following section.
Finite dimensional operator identification and compressive sensing
Operator sampling in the finite dimensional setting translates into the following matrix probing problem [5, 7, 49]. For a class of matrices \(\boldsymbol{\mathcal{M}}\in \mathbb{C}^{P\times P}\), find \(c \in \mathbb{C}^{P}\) so that we can recover \(M \in \boldsymbol{\mathcal{M}}\) from Mc (Figure 7).
The classes of operator considered here are of the form \(M_{\boldsymbol{\eta }} =\sum _{\lambda }\boldsymbol{\eta }_{\lambda }B_{\lambda }\) with \(B_{\lambda } = B_{p,q} = \mathcal{T}^{p}\mathcal{M}^{q}\), and the matrix identification problem is reduced to solving
where c is chosen appropriately; this is just (29) with the dependence on (t, ν) removed.
If \(\boldsymbol{\eta }\) is assumed to be k-sparse, (Figure 8) we arrive at the classical compressive sensing problem with measurement matrix \(G(c) \in \mathbb{C}^{P\times P^{2} }\) which depends on c = (c 0, c 1, …, c P−1). To achieve recovery guarantees for Basis Pursuit and Orthogonal Matching Pursuit, averaging arguments have to be used that yield results on the expected qualities of G(c). This problem was discussed in [40, 49, 50] as well as, in slightly different terms, in [1, 17] (Figure 9). The strongest results were achieved in [25] by estimating Restricted Isometry Constants for c being a Steinhaus sequence. These results show that with high probability, G(c) has the property that Basis Pursuit recovers \(\boldsymbol{\eta }\) from \(G(c)\boldsymbol{\eta }\) for every k sparse \(\boldsymbol{\eta }\) as long as \(k \leq C\,P/\log ^{2}P\) for some universal constant C.
Stochastic operators and channel estimation
It is common that models of wireless channels and radar environments take the stochastic nature of the medium into account. In such models, the spreading function η(t, ν) (and therefore the operator’s kernel and Kohn–Nirenberg symbol) are random processes, and the operator is split into the sum of its deterministic portion, representing the mean behavior of the channel, and its zero-mean stochastic portion that represents the noise and the environment.
The detailed analysis of the stochastic case was carried out in [47, 48]. One of the foci of these works lies in the goal of determining the second-order statistics of the (zero mean) stochastic process η(τ, ν), that is, its so-called covariance function \(R(\tau,\nu,\tau ',\nu ') = \mathbb{E}\{\eta (\tau,\nu )\,\overline{\eta (\tau ',\nu ')}\}\). In [47, 48], it was shown that a necessary but not sufficient condition for the identifiability of Rη(τ, ν, τ′, ν′) from the output covariance \(A(t,t') = \mathbb{E}\{Hg(t)\,\overline{Hg(t')}\}\) is that R(τ, ν, τ′, ν′) is supported on a bounded set of 4-dimensional volume less than or equal to one. Unfortunately, for some sets \(S \subseteq \mathbb{R}^{4}\) of arbitrary small measure, the respective stochastic operator Paley–Wiener space StOPW(S) of operators with R η supported on S is not identifiable; this is a striking difference to the deterministic setup where the geometry of S does not play a role at all.
In [67, 68] the special case of wide-sense stationary operators with uncorrelated scattering, or WSSUS operators is considered. These operators are characterized by the property that
The function C η (t, ν) is then called the scattering function of H. Our results on the identifiability of stochastic operator classes allowed for the construction of two estimators for scattering functions [67, 68]. The estimator given in [67] is applicable, whenever the scattering function of H has bounded support. Note that the autocorrelation of a WSSUS operator is supported on a two-dimensional plane in ℝ 4 which therefore has 4D volume 0, a fact that allows us to lift commonly assumed restrictions on the size of the 2D area of the support of the scattering function.
For details, formal definitions of identifiability and detailed statements of results we refer to the papers [47, 48, 67, 68].
Notes
- 1.
Ref. 6 is [19].
- 2.
- 3.
\(\mathcal{S}(\mathbb{R})\) denotes the Schwartz class of infinitely differentiable, rapidly decreasing functions.
References
L. Applebaum, S.D. Howard, S. Searle, R. Calderbank, Chirp sensing codes: deterministic compressed sensing measurements for fast recovery. Appl. Comput. Harmon. Anal. 26(2), 283–290 (2008)
W.U. Bajwa, K. Gedalyahu, Y.C. Eldar, Identification of parametric underspread linear systems and super-resolution radar. IEEE Trans. Signal Process. 59(6), 2548–2561 (2011)
W.U. Bajwa, K. Gedalyahu, Y.C. Eldar, On the identification of parametric underspread linear systems, in EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4088–4091
B.L. Basore, Noise-like signals and their detection by correlation. Thesis (Sc. D.)-Massachusetts Institute of Technology (1952). http://hdl.handle.net/1721.1/34322
R. Belanger-Rioux, L. Demanet, Compressed absorbing boundary conditions via matrix probing. (2014). arXiv.org/abs/1401.4421
P.A. Bello. Measurement of random time-variant linear channels. IEEE Trans. Commun. 15, 469–475 (1969)
J. Chiu, L. Demanet, Matrix probing and its conditioning. SIAM J. Numer. Anal. 50(1), 171–193 (2012)
D. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ 1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003)
H.G. Feichtinger, W. Kozek, F. Luef, Gabor analysis over finite abelian groups. Appl. Comput. Harmon. Anal. 26(2), 230–248 (2009)
P.E. Green, Correlation detection using stored signals. Thesis (Sc. D.)-Massachusetts Institute of Technology (1953). http://hdl.handle.net/1721.1/34880
P.E. Green, Radar measurements of target scattering properties, in Radar Astronomy, ed. by J.V. Evans, T. Hagfors (McGraw-Hill, New York, 1968), pp. 1–78
P. Green, Early Spread-Spectrum and automatic equalization – NOMAC and Rake. in IEEE GLOBECOM 2008: Global Telecommunications Conference, 2008 (2008), pp. 1–5
K. Gröchenig, Foundations of Time-Frequency Analysis. Applied and Numerical Harmonic Analysis (Birkhäuser, Boston, 2001)
G.L. Hardy, J.E. Littlewood, G. Pólya, Inequalities, 2nd edn. (Cambridge University Press, Cambridge, 1952)
R. Heckel, H. Bölcskei, Identification of sparse linear operators. IEEE Trans. Inf. Theory 59(12), 7985–8000 (2013)
C.W. Helstrom, Statistical Theory of Signal Detection (Pergamon Press, London, 1960)
M.A. Herman, T. Strohmer, High-resolution radar via compressed sensing. IEEE Trans. Signal Process. 57(6), 2275–2284 (2009)
Y.M. Hong, G.E.Pfander, Irregular and multi-channel sampling of operators. Appl. Comput. Harmon. Anal. 29(2), 214–231 (2010)
T. Kailath, Sampling models for linear time-variant filters. Technical Report 352, Massachusetts Institute of Technology, Research Laboratory of Electronics (1959)
T. Kailath, Correlation detection of signals perturbed by a random channel. IRE Trans. Inf. Theory 6(3), 361–366 (1960)
T. Kailath, Communication via randomly varying channels. Thesis (Sc. D.)–Massachusetts Institute of Technology, Dept. of Electrical Engineering (1961). http://hdl.handle.net/1721.1/11319
T. Kailath, Measurements on time–variant communication channels. IEEE Trans. Inf. Theory 8(5), 229–236 (1962)
T. Kailath, Time–variant communication channels. IEEE Trans. Inf. Theory. Inf. Theory. Prog. Rep. 1960–1963 9, 233–237 (1963)
W. Kozek, G.E. Pfander, Identification of operators with bandlimited symbols. SIAM J. Math. Anal. 37(3), 867–888 (2006)
F. Krahmer, S. Mendelson, H. Rauhut, Suprema of chaos processes and the restricted isometry property. Commun. Pure Appl. Math. 67(11), 1877–1904 (2014)
F. Krahmer, G.E. Pfander, Local sampling and approximation of operators with bandlimited Kohn-Nirenberg symbols. Construct. Approx. 39(3), 541–572 (2014)
F. Krahmer, G.E. Pfander, P. Rashkov, Uncertainty in time-frequency representations on finite abelian groups and applications. Appl. Comput. Harmon. Anal. 25(2), 209–225 (2008)
H.J. Landau, H.O. Pollak, Prolate spheroidal wave functions, Fourier analysis and uncertainty– I. Bell Syst. Tech. J. 40, 43–64 (1961)
H.J. Landau, D. Slepian, H.O. Pollak, Prolate spheroidal wave functions, fourier analysis and uncertainty–II. Bell Syst. Tech. J. 40(1), 65–84 (1961)
J. Lawrence, G.E. Pfander, D.F. Walnut, Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl. 11(6), 715–726 (2005)
Y.I. Lyubarskii, Frames in the Bargmann space of entire functions. Adv. Sov. Math. 429, 107–113 (1992)
R.-D. Malikiosis, A note on Gabor frames in finite dimensions. Appl. Comput. Harmon. Anal. 38(2), 318–330 (2015)
H.K. Markey, G. Antheil, Secret Communication System, US 2292387 (11 August 1942)
B.J. Pankowski, Multiplexing a radio teletype system using a random carrier and correlation detection. Thesis (Sc. M.)-Massachusetts Institute of Technology (1952)
A. Papoulis, Generalized sampling expansion. IEEE Trans. Circuits Syst. 24(11), 652–654 (1977)
G.E. Pfander, Measurement of time-varying multiple-input multiple-output channels. Appl. Comput. Harmon. Anal. 24(3), 393–401 (2008)
G.E. Pfander, On the invertibility of rectangular bi-infinite matrices and applications in time-frequency analysis, Linear Algebra Appl. 429(1), 331–345 (2008)
G.E. Pfander, Sampling of operators. http://arxiv.org/abs/1010.6165, preprint (2010)
G.E. Pfander, Gabor frames in finite dimensions. in Finite Frames: Theory and Applications, ed. by P.G. Casazza, G. Kutyniok (Springer, New York, 2013)
G.E. Pfander, H. Rauhut, Sparsity in time–frequency representations. J. Fourier Anal. Appl. 16(2), 233–260 (2010)
G.E. Pfander, D. Walnut, Measurement of time–variant channels. IEEE Trans. Inf. Theory 52(11), 4808–4820 (2006)
G.E. Pfander, D. Walnut, Operator identifcation and Feichtinger’s algebra. Sampling Theory Signal Image Process. 5(2), 151–168 (2006)
G.E. Pfander, D. Walnut, On the sampling of functions and operators with an application to Multiple–Input Multiple–Output channel identification, in Proceedings SPIE Vol. 6701, Wavelets XII, ed. by D. Van De Ville, V.K. Goyal, M. Papadakis (2007), pp. 67010T-1–67010T-14
G.E. Pfander, D. Walnut, Operator identification and sampling, in Proceedings of the Conference on Sampling Theory and Applications (Aix-Marseilles Université, Marseilles, 2009)
G.E. Pfander, D. Walnut, Sparse finite Gabor frames for operator sampling. Proceedings of the Conference on Sampling Theory and Applications (Jacobs University, Bremen, 2013)
G.E. Pfander, D. Walnut, Sampling and reconstruction of operators. IEEE Trans. Inf. Theory (2014, submitted). arXiv.org/abs/1503.00628
G.E. Pfander, P. Zheltov, Identification of stochastic operators. Appl. Comput. Harmon. Anal. 36(2), 256–279 (2014)
G.E. Pfander, P. Zheltov, Sampling of stochastic operators. IEEE Trans. Inf. Theory 60(4), 2359–2372 (2014)
G.E. Pfander, H. Rauhut, J. Tanner, Identification of matrices having a sparse representation. IEEE Trans. Signal Process. 56(11), 5376–5388 (2008)
G.E. Pfander, H. Rauhut, J.A. Tropp, The restricted isometry property for time-frequency structured random matrices. Probab. Theory Relat. Fields 156(3–4), 707–737 (2013)
R.I. Pickholtz, D.L. Schilling, L.B. Milstein, Theory of spread-spectrum communications – a tutorial. IEEE Trans. Commun. 30, 855–884 (1982)
R. Price, Statistical theory applied to communication through multipath disturbances. RLE Tech. Rpt. No. 266, Lincoln Laboratory Tech. Rpt. No. 34 (1953)
R. Price, Optimum detection of random signals in noise, with applications to scatter-multipath communication, I. IRE Trans. Inf. Theory IT-2, 125–135 (1956)
R. Price, Detectors for Radar astronomy, in Radar Astronomy, ed. by J.V. Evans, T. Hagfors (McGraw-Hill, New York, 1968), pp. 547–614
R. Price, Further notes and anecdotes on spread-spectrum origins. IEEE Trans. Commun. 31, 85–97 (1982)
R. Price, P. Green, A communication technique for multipath channels. Proc. IRE 46, 555–570 (1958)
R.A. Scholtz, The origins of spread-spectrum communications., IEEE Trans. Commun. 30, 822–854 (1982)
R.A. Scholtz, Notes on spread-spectrum history. IEEE Trans. Commun. 31, 82–84 (1983)
K. Seip, Density theorems for sampling and interpolation in the Bargmann-Fock space. I. J. Reine Angew. Math. 429, 91–106 (1992)
K. Seip, R. Wallstén, Density theorems for sampling and interpolation in the Bargmann-Fock space. II. J. Reine Angew. Math. 429, 107–113 (1992)
C. Shannon, Communication in the presence of noise. Proc. IRE 37(1), 10–21 (1949)
P. Stevenhagen, H.W. Lenstra Jr., Chebotarëv and his density theorem. Math. Intell. 18(2), 26–37 (1996)
G. Turin, Introduction to spread-spectrum antimultipath techniques and their application to urban digital radio. Proc. IEEE 68, 328–353 (1980)
W. Ward, The NOMAC and rake systems. Lincoln Lab. J. 5, 351–366 (1992)
N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series, with Engineering Applications (M.I.T. Press, Cambridge, 1949)
L.A. Zadeh, Frequency analysis of variable networks. Proc. IRE 67, 291–299 (1950)
P. Zheltov, G.E. Pfander, Estimation of overspread scattering functions. IEEE Trans. Signal Process. 63(10), 2451–2463 (2015)
P. Zheltov, G.E. Pfander, O. Oktay, Reconstruction of the scattering function of overspread radar targets. IET Signal Process. 8(9), 1018–1024 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Proofs of Theorems
Appendix: Proofs of Theorems
Proof of Lemma 2
In order to see how the time-frequency shifts of c arise, we will briefly outline the calculation that leads to (16). It can be seen by direct calculation using the representation given by (7), that if \(g =\sum _{n}\delta _{nTP}\) then 〈Hg, s〉 = 〈η H , Z TP s〉 for all \(s \in \mathcal{S}(\mathbb{R})\) where the bracket on the left is the L 2 inner product on ℝ and that on the right the L 2 inner product on the rectangle [0, TP]×[0, 1∕(TP)]. Periodizing the integral on the left gives
Since this holds for every \(s \in \mathcal{S}(\mathbb{R})\), we conclude that
Given \(g =\sum _{n\in \mathbb{Z}}c_{n}\,\delta _{nT}\), for a period-P sequence c = (c n ), and letting n = mP − q for m ∈ ℤ and 0 ≤ q < P, we obtain
Since for α ∈ ℝ, the spreading function of \(H \circ \mathcal{T}_{\alpha }\) is η H (t −α, ν) e 2π i ν α, we arrive at
Letting m = jP + ℓ for j ∈ ℤ and 0 ≤ ℓ < P, we obtain
Finally, replacing t by t + pT for \(p = 0,\,1,\,\ldots,\,P-1\), and changing indices by replacing q by q − p, we obtain
The observation that \((\mathcal{T}^{q}\,\mathcal{M}^{m}c)_{p} = c_{p-q}\,e^{2\pi im(p-q)/P}\) completes the proof.
Proof of Theorem 3
To see why this is true, define μ(M) to be the number of diagonals of M whose product is a multiple of p M , and proceed by induction on the size of the matrix M. If M is 1 × 1, then the result is obvious. Suppose that M is n × n and that it is described by the vector \(\ell= (\ell_{0},\,\ldots,\,\ell_{P-1})\). Assuming without loss of generality that the variable of smallest index in p M with a nonzero exponent is c 0, there is a row of M in which the variable c 0 appears ℓ j times for some index j. Choose one of these terms and delete the row and column in which it appears. Call the remaining matrix M′. The vector ℓ describing M′ is \((\ell_{0},\,\ldots,\,\ell_{j-1},\,\ell_{j} - 1,\,\ell_{j+1},\,\ldots,\,\ell_{P-1})\), and is independent of which term was chosen from the given row to form M′. By the construction of the LI monomial, p M = c 0 p M′ and by the induction hypothesis
Since there are ℓ j ways to choose a term from the given row to produce M′ we have that
which was to be proved.
Since each term \(a_{\sigma }\,C^{\sigma }\) in (24) is made up of a sum of precisely this many terms, it follows that exactly one of these terms is a multiple of the LI monomial. Alternatively, we can think of the LI monomial as the one corresponding to the \(\sigma \in S_{P}/\varGamma\) that minimizes the functional \(\varLambda _{0}(C^{\sigma }) =\sum _{ i=0}^{L-1}i^{2}\,H(\alpha _{i})\) where α i is the exponent of c i in \(C^{\sigma }\) and where H(α i ) = 0 if α i = 0 and 1 otherwise.
Because by Chebotarev’s Theorem, \(a_{\sigma }\neq 0\) for all \(\sigma\) the proof works for any square submatrix M, no matter what size. This gives us Theorem 3.
Proof of Theorem 4
We first need to assert the existence of a cyclical renumbering of the variables such that with respect to the new trivial partition A′ = (A κ ′) κ = 0 P−1, the CI monomial is given by
in other words, if j ∈ A κ ′ then 0 ≤ j −κ < P. Note first that since \(\min (A_{\kappa }') =\sum _{ i=0}^{\kappa -1}\ell_{i}'\) for all κ, j ∈ A κ ′ implies that \(j \geq \sum _{i=0}^{\kappa -1}\ell_{i}'\). Therefore, it will suffice to find a 0 ≤ γ < P such that for all κ, \(\sum _{i=0}^{\kappa -1}\ell_{i}'-\kappa \geq 0\) so that \(j-\kappa \geq \sum _{i=0}^{\kappa -1}\ell_{i}'-\kappa \geq 0\).
Let 0 ≤ γ < P be such that the quantity \(\sum _{i=0}^{\gamma -1}\ell_{i}-\gamma\) is minimized, let
and let A′ = (A κ ′) κ = 0 P−1 be the corresponding trivial partition. Now fix κ and assume that κ +γ ≤ P. Then
since the second term in the difference is minimal. If κ +γ ≥ P + 1, then remembering that \(\sum _{i=0}^{P-1}\ell_{i} = L\)
In order to complete the proof, we must show that \(\varLambda (C^{\sigma }) \geq \varLambda (C^{I})\) for all \(\sigma \in S_{P}/\varGamma\) with equality holding if and only if \(\sigma\) is trivial. This will follow by direct calculation together with the following lemma which follows from a classical result on rearrangements of series [14, Theorems 368, 369]. This result is Lemma 3.3 in [32].
First, however, we adopt the following notation. For 0 ≤ n < P, let b n = κ if n ∈ A κ . With this notation, given \(\sigma \in S_{P}/\varGamma\),
and under the above assumptions,
Moreover,
Lemma 3.
Given two finite sequences of real numbers (α n ) and (β n ) defined up to rearrangement, the sum
is maximized when α and β are both monotonically increasing or monotonically decreasing. Moreover, if for every rearrangement α′ of α,
then α and β are similarly ordered , that is, for every j, k,
In particular, for every \(\sigma \in S_{P}\) ,
with equality holding if and only if \(\sigma\) is trivial.
Proof.
The first part of the lemma is simply a restatement of Theorems 368 and 369 of [14]. To prove the second part, note first that b n is a non-decreasing sequence and in particular is constant on each A κ . Theorem 368 in [14] states that a sum of the form \(\sum _{n=0}^{P-1}\sigma (n)\,b_{n}\) is maximized when \(\sigma (n)\) is monotonically increasing, which proves the given inequality. Since b n is constant on each A κ , it follows that if \(\sigma\) is trivial, then we have equality.
If \(\sigma\) is not trivial, then we will show that the sequences \(\sigma (n)\) and b n are not similarly ordered. Letting κ be the minimal index such that A κ is not left invariant by \(\sigma\), there exists m ∈ A κ such that \(\sigma (m) \in A_{\mu }\) for some μ > κ, and for some \(\lambda >\kappa\) there exists \(k \in A_{\lambda }\) such that \(\sigma (k) \in A_{\kappa }\). Therefore, \(b_{m} =\kappa <\lambda = b_{k}\) but since μ > κ, \(\sigma (m) >\sigma (k)\), and so \(\sigma (n)\) and b n are not similarly ordered.
In order to complete the proof, define \(\mathcal{C}_{1},\,\mathcal{C}_{2} \subseteq \{ 0,\,\ldots,\,P - 1\}\) by \(n \in \mathcal{C}_{1}\) if \(0 \leq \sigma (n) - b_{n} < P\), and \(n \in \mathcal{C}_{2}\) if \(-P + 1 \geq \sigma (n) - b_{n} < 0\) (note that always \(\vert \sigma (n) - b_{n}\vert < P\)) so that when n ∈ C 2, \((\sigma (n) - b_{n})\ mod\ P =\sigma (n) - b_{n} + P\). Let \(\sigma '(n) =\sigma (n)\) if \(n \in \mathcal{C}_{1}\) and \(\sigma (n) + P\) if \(n \in \mathcal{C}_{2}\), and let (a n ) n = 0 P−1 be an increasing sequence enumerating the set \(\sigma (\mathcal{C}_{1}) \cup (\sigma (\mathcal{C}_{2}) + P)\). Therefore,
Since a n is increasing, I ≥ 0 by Lemma 3, and since a n ≥ n for all n, (a n − b n ) ≥ (n − b n ) ≥ 0 so that (a n − b n )2 ≥ (n − b n )2 and hence II ≥ 0. It remains to show that equality holds only if \(\sigma\) is trivial. If \(\varLambda (C^{\sigma }) =\varLambda (C^{I})\), then I = II = 0. Since II = 0, \(\mathcal{C}_{2} =\emptyset\) for if \(a_{n} \in \sigma (\mathcal{C}_{2}) + P\) then a n > n and we would have II > 0. Since \(\mathcal{C}_{2} =\emptyset\), \(\sigma '(n) =\sigma (n)\) so that
which by Lemma 3 implies that \(\sigma\) is trivial. The proof is complete.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Walnut, D., Pfander, G.E., Kailath, T. (2015). Cornerstones of Sampling of Operator Theory. In: Balan, R., Begué, M., Benedetto, J., Czaja, W., Okoudjou, K. (eds) Excursions in Harmonic Analysis, Volume 4. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-20188-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-20188-7_12
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-20187-0
Online ISBN: 978-3-319-20188-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)