1 Introduction

Signal processing may broadly be considered as the recovery of information from different physical observations. The signals can be observed from different sources and they are usually corrupted by noise due to electrical, mechanical, atmospheric or intentional interferences. Since the observed signals are random in nature, statistical techniques are needed to extract the original signals. Statistics is used in the formulation of the proper models to describe the behavior of the underlying process, developing proper techniques to estimate the unknown model parameters, and assessment of model performances. Statistical signal processing broadly refers to the analysis of different random signals using appropriate statistical procedures.

Professor Rao has worked on statistical signal processing in early 90’s for a period of six to seven years. He along with his collaborators have worked on three fundamental problems in this area, namely (a) one-dimensional superimposed exponential model, (b) two-dimensional superimposed exponential model and, (c) direction of arrival (DOA) model. These are classical problems, and they have applications in several areas including telecommunications, radio location of objects, seismic signal processing, image processing, computer-assisted medical diagnostics, etc. Although these problems have several applications, they are highly non-linear in nature. The first two problems can be seen as non-linear regression problems. Both the problems can be formulated as the estimation of unknown parameters of a non-linear model in the presence of additive noise. The third problem involves as the estimation of the unknown parameters in a non-linear random effect model in the presence of additive noise. It will be seen that finding efficient estimators and deriving properties of these estimators are quite challenging problems due to the nature of the problems.

It may be mentioned that an extensive amount of work has been done in standard non-linear regression models, see for example the books by Bates and Watts [8] and Seber and Wild [52] in this respect. Least squares estimators seem to be the natural choice in a non-linear regression problem. But finding least squares estimators and establishing their properties are quite non-trivial, in general. Jennrich [20] and later on, Wu [55] provided several sufficient conditions for establishing the consistency and asymptotic normality properties of the least squares estimators of a non-linear regression model. The one dimensional and two-dimensional superimposed exponential models do not satisfy those sufficient conditions. Therefore, it is not immediate whether the least squares estimators will be consistent or not in these cases. Professor Rao and his collaborators established the consistency and asymptotic normality properties in these cases quite differently than the methods developed by Jennrich [20] and Wu [55] for general non-linear regression models.

Although the least squares estimators are the most efficient estimators, finding the least squares estimators in case of superimposed exponential models is well known to be a numerically difficult problem. The standard Newton–Rahson or Gauss–Newton algorithms do not work well as the problems are highly non-linear in nature. The least squares surface has several local minima. Hence, most of the standard iterative procedure converges to the local minimum rather than the global minimum. Due to this reason, an extensive amount of work has been done to find efficient estimators which behave like least squares estimators. Professor Rao and his collaborators established an efficient algorithm which converges in a finite number of steps and it produces estimates which have the same asymptotic convergence rates as the least squares estimators.

The main aim of this paper is two-fold. First, we would like to define the three classical problems from statistical perspective and provide solutions developed by Professor Rao and his collaborators. Our second aim is to show how his work has influenced the future research in this area. There are several interesting open problems in statistical signal processing for which sophisticated statistical techniques are needed to provide efficient solutions. We will provide several open problems and relevant references for the statistical community who may not be very familiar in this area.

The rest of the paper is organized as follows. We need some preliminaries for basic understanding of these problems, and that will be provided in Section 2. In Section 3, we describe the one-dimensional superimposed exponential models and provide different estimation procedures as provided by Professor Rao and his collaborators. Two-dimensional superimposed exponential model will be considered in Section 4 and DOA model will be described in Section 5. In Section 6, we provide several related problems which have been considered in the statistical signal processing literature in the last two to three decades and we conclude the paper.

2 Preliminaries

We provide one important result which has been used quite extensively in the statistical signal processing literature, and it is well known as Prony’s equation. Prony proposed the method in 1795 mainly to estimate the unknown parameters of a sum of real exponential model. It is available in several numerical analysis text books, for example, in [15] or [18]. It can be described as follows. Suppose

$$\begin{aligned} \mu (t) = \alpha _1 \mathrm{e}^{\beta _1 t} + \cdots + \alpha _M \mathrm{e}^{\beta _M t}, \quad t = 1, \ldots , n. \end{aligned}$$

Here \(\alpha _1, \ldots , \alpha _M\) are arbitrary real number and \(\beta _1, \ldots , \beta _M\) are distinct real numbers. Then, for a given \(\{\mu _1, \ldots , \mu _n\}\), there exists \((M+1)\) constants \(\{g_0, \ldots , g_M\}\), such that

$$\begin{aligned} {\mathbf{{A}}} {\mathbf{{g}}} = {\mathbf{{0}}}, \end{aligned}$$
(1)

where

$$\begin{aligned} {\mathbf{{A}}} = \left[ \begin{array}{ccc} \mu (1) &{} \cdots &{} \mu (M+1) \\ \vdots &{} \ddots &{} \vdots \\ \mu (n-M) &{} \cdots &{} \mu (n) \end{array} \right] , \ \ \ {\mathbf{{g}}} = \left[ \begin{array}{c} g_0 \\ \vdots \\ g_M \end{array} \right] \ \ \ \hbox {and} \ \ \ {\mathbf{{0}}} = \left[ \begin{array}{c} 0 \\ \vdots \\ 0 \end{array} \right] . \end{aligned}$$

It can be shown that the rank of the matrix \({\mathbf{{A}}}\) is M. Hence, the null space of the matrix \({\mathbf{{A}}}\) is of dimension one. Note that to make \({\mathbf{{g}}}\) to be unique, we can put restrictions on \(g_0, \ldots , g_M\) such that \(\sum _{j=0}^M g_j^2 = 1\) and \(g_0 > 0\). The set of linear equations (1) is known as Prony’s equations. It can be shown that the roots of the following polynomial equation

$$\begin{aligned} p(x) = g_0 + g_1 x + \cdots + g_M x^M = 0 \end{aligned}$$
(2)

are \( \mathrm{e}^{\beta _1}, \ldots , \mathrm{e}^{\beta _M}\). Therefore, there is a one-to-one correspondence between \(\{\beta _1, \ldots , \beta _M\}\) and \(\{g_0, \ldots , g_M\}\), such that \(\sum _{j=0}^M g_j^2 = 1\) and \(g_0 > 0\). Moreover, \(\{g_0, \ldots , g_M\}\) do not depend on \(\{\alpha _1, \ldots , \alpha _M\}\), but depend only on \(\{\beta _1, \ldots , \beta _M\}\).

One natural question is how to recover \(\alpha _1, \ldots , \alpha _M\) and \(\beta _1, \ldots , \beta _M\) for a given \(\mu (1), \ldots , \mu (n)\). Note that

$$\begin{aligned} {\varvec{\upmu }} = {\mathbf{{X}}} {\varvec{\upalpha }}, \end{aligned}$$

where

$$\begin{aligned} {\varvec{\upmu }} = \left[ \begin{array}{c} \mu (1) \\ \vdots \\ \mu (n) \end{array} \right] , \ \ \ {\mathbf{{X}}} = \left[ \begin{array}{ccc} \mathrm{e}^{\beta _1} &{} \ldots &{} \mathrm{e}^{\beta _M} \\ \vdots &{} \ddots &{} \vdots \\ \mathrm{e}^{n \beta _1} &{} \ldots &{} \mathrm{e}^{n \beta _M} \\ \end{array} \right] \ \ \ \ \hbox {and} \ \ \ \ {\varvec{\upalpha }} = \left[ \begin{array}{c} \alpha _1 \\ \vdots \\ \alpha _M \end{array} \right] . \end{aligned}$$

It is immediate that for \(\beta _1 \ne \ldots \ne \beta _M\), the rank of the matrix \({\mathbf{{X}}}\) is M. Hence, for \(n > M\), the matrix \({\mathbf{{X}}}^{\top } {\mathbf{{X}}}\) is of full rank, and

$$\begin{aligned} {\varvec{\upalpha }} = ({\mathbf{{X}}}^{\top } {\mathbf{{X}}})^{-1} {\mathbf{{X}}}^{\top } {\varvec{\upmu }} . \end{aligned}$$
(3)

Therefore, for a given \(\mu (1), \ldots , \mu (n)\), first obtain the vector \({\mathbf{{g}}}\) which satisfies (1). Obtain \(\beta _1, \ldots , \beta _M\) from the roots of the polynomial equation p(x) as given in (2). Finally, obtain \(\alpha _1, \ldots , \alpha _M\) from (3).

Note that although we have described Prony’s equations when \(\alpha \)’s and \(\beta \)’s are real valued, it is still valid when \(\alpha \)’s and \(\beta \)’s are complex-valued also. Due to this reason, Prony’s equations play an important role in the statistical signal processing literature. Several algorithms have been developed using Prony’s equations. It has been extended to two-dimensional superimposed exponential model also, see for example, [7].

As it has been mentioned above that Prony’s equations are valid even when \(\alpha \)’s and \(\beta \)’s are complex-valued, let us consider the following special case which is known as the undamped exponential model:

$$\begin{aligned} \mu (t) = A_1 \mathrm{e}^{i \omega _1 t} + \cdots + A_M \mathrm{e}^{i \omega _M t}, \quad t = 1, \ldots , n. \end{aligned}$$
(4)

Here, \(A_1, \ldots , A_M\) are arbitrary complex number, \(0< \omega _1 \ne \ldots \ne \omega _M < 2 \pi \), and \(i = \sqrt{-1}\). In this case also, there exists \(\{g_0, \ldots , g_M\}\), which are complex-valued, such that they satisfy (1). Moreover, the roots of the polynomial equation p(z), as in (2), are \(z_1 = \mathrm{e}^{i \omega _1}, \ldots , z_M = \mathrm{e}^{i \omega _M}\). Observe that

$$\begin{aligned} |z_1| = \cdots = |z_M| = 1, \quad \bar{z}_1 = z_1^{-1}, \ldots , \bar{z}_M = z_M^{-1}. \end{aligned}$$
(5)

Here, \(\bar{z}_k\) denotes the complex conjugate of \(z_k\), for \(k = 1, \ldots , M\). Define the new polynomial

$$\begin{aligned} Q(z) = z^{-M} \bar{p}(z) = \bar{g}_0 \bar{z}^{-M} + \cdots + \bar{g}_M. \end{aligned}$$

From (5), it is clear that p(z) and Q(z) have the same roots. Hence, by comparing the coefficients of the two polynomials p(z) and Q(z), we obtain

$$\begin{aligned} \frac{g_k}{g_M} = \frac{\bar{g}_{M-k}}{\bar{g}_0}, \quad k = 0, \ldots , M. \end{aligned}$$
(6)

Therefore, if we denote

$$\begin{aligned} b_k = g_k \left( \frac{\bar{g}_0}{g_M} \right) ^{-\frac{1}{2}}, \quad k = 0, \ldots , M, \end{aligned}$$

then

$$\begin{aligned} b_k = \bar{b}_{M-k}, \quad k = 0, \ldots , M. \end{aligned}$$

Hence, if we denote

$$\begin{aligned} {\mathbf{{b}}} = \left[ \begin{array}{c} b_0 \\ \vdots \\ b_M \end{array} \right] , \quad \bar{\mathbf{{b}}} = \left[ \begin{array}{c} \bar{b}_0 \\ \vdots \\ \bar{b}_M \end{array} \right] , \quad {\mathbf{{J}}} = \left[ \begin{array}{ccccc} 0 &{} 0 &{} \cdots &{} 0 &{} 1 \\ 0 &{} 0 &{} \cdots &{} 1 &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ 0 &{} 1 &{} \cdots &{} 0 &{} 0 \\ 1 &{} 0 &{} \cdots &{} 0 &{} 0 \end{array} \right] , \end{aligned}$$

then \(\displaystyle {\mathbf{{b}}} = {\mathbf{{J}}} \bar{\mathbf{{b}}}\). Therefore, it is immediate that for a given \(\mu (1), \ldots , \mu (n)\), as in (4), there exists a vector \({\mathbf{{g}}} = (g_0, \ldots , g_M)^{\top }\), such that \( \sum _{k=0}^M |g_k|^2 = 1\), which satisfies (1) and also \({\mathbf{{g}}} = {\mathbf{{J}}} \bar{\mathbf{{g}}}\). For more details, interested readers are referred to [36].

3 One-dimensional superimposed exponential model

We observe periodic phenomena everyday in our lives. For example, the number of tourists visiting the famous Taj Mahal, the daily temperature of the capital, New Delhi, the ECG pattern of a normal human being or the musical sounds, clearly follows a periodic pattern. Sometimes it may not be exactly periodic but nearly periodic. Moreover, the periodic signal may not be in one dimension only, it can be in two- or three-dimensions also. One major problem in statistical signal processing is to analyze periodic or nearly periodic data when they are observed with noise. Some of the natural questions are: what do you mean by a periodic signal? and, why do we want to analyze a periodic data? We will not provide a formal definition of a periodic data, but informally speaking, a periodic data indicates a repeated pattern in one dimension and a symmetric pattern in two- or three-dimensions. See for example the following figures which indicate periodic nature. Figure 1 shows the ECG plot of a normal human being and Figure 2 shows ’UUU’ vowel sound of a male.

Fig. 1
figure 1

ECG Plot of a normal human being.

Fig. 2
figure 2

UUU-sound of a male.

Now the second question is: why somebody wants to analyze a periodic data? One reason might be purely theoretical in nature, otherwise providing a proper periodic model can be very useful for compression or prediction purposes.

Note that the simple periodic function is the sinusoidal function and it can be written in the following form:

$$\begin{aligned} y(t) = A \cos (\omega t) + B \sin (\omega t). \end{aligned}$$

Clearly, the period of the function y(t) is the shortest time taken for y(t) to repeat itself, and it is \(2 \pi /\omega \). In general, a smooth periodic function (mean adjusted) with period \(2\pi /\omega \) can be written in the form

$$\begin{aligned} y(t) = \sum _{k=1}^{\infty } \left[ A_k \cos (k \omega t) + B_k \sin (k \omega t) \right] , \end{aligned}$$
(7)

and it is well known as the Fourier expansion of y(t). From y(t), \(A_k\) and \(B_k\) can be obtained uniquely for \(k \ge 1\) as

$$\begin{aligned} \int _0^{2\pi /\omega } \cos (k \omega t) y(t) \mathrm{d}t = \frac{\pi A_k}{\omega } \quad \hbox {and} \quad \int _0^{2\pi /\omega } \sin (k \omega t) y(t) \mathrm{d}t = \frac{\pi B_k}{\omega }. \end{aligned}$$

Since, in practice, y(t) is always corrupted with noise, so it is reasonable to assume that we have the following observation:

$$\begin{aligned} y(t) = \sum _{k=1}^{\infty } \left[ A_k \cos (\omega t) + B_k \sin (\omega t) \right] + e(t), \end{aligned}$$

where e(t) is the noise component. It is impossible to estimate infinite number of parameters. Hence the model is approximated by the following model:

$$\begin{aligned} y(t) = \sum _{k=1}^p \left[ A_k \cos (\omega _k t) + B_k \sin (\omega _k t) \right] + e(t), \end{aligned}$$
(8)

for some \(p < \infty \). The aim is to extract (estimate) the deterministic component \(\mu (t)\), where

$$\begin{aligned} \mu (t) = \sum _{k=1}^p \left[ A_k \cos (\omega _k t) + B_k \sin (\omega _k t) \right] , \end{aligned}$$

in the presence of the random error component e(t), based on the available data \(y(t), t = 1, \ldots , n\). Hence, the problem becomes the estimation of p, \(A_k, B_k\) and \(\omega _k\), for \(k = 1, \ldots , p\).

It should be mentioned that often, instead of working with the model (8), it might be more convenient to work with the associated complex-valued model. With the abuse of notation, we use the corresponding complex-valued model as

$$\begin{aligned} y(t) = \mu (t) + e(t) = \sum _{k=1}^p \alpha _k \mathrm{e}^{i \omega _k t} + e(t). \end{aligned}$$
(9)

In the model (9), y(t)’s, \(\alpha _k\)’s, e(t)’s are all complex valued and \(i = \sqrt{-1}\). The model (9) can be obtained by taking the Hilbert transformation of (8). Therefore, the two models are equivalent. Although, any observed signal is always real-valued, by taking the Hilbert transformation of the signal, the corresponding complex-valued model can be used. Any analytical result or numerical procedure for model (9) can be used for model (8) and vice-versa. Although, Rao and his collaborators have mainly dealt with model (9), all the results are valid for model (8) also, as we have just mentioned. Due to this reason, in this paper, we provide the results either for model (8) or for model (9) and it should be clear from the context.

Now consider the model (8), and the problem is to estimate the unknown parameters based on a sample \(\{y(t); t = 1, \ldots , n\}\). It is necessary to make certain assumptions on the random error component e(t). At this moment, we simply make the assumptions that e(t) has mean zero and finite variance. We will make it more explicit later when it is needed. The most used and popular estimation procedure is known as the periodogram estimators (see for example, [53]). The periodogram at a particular frequency is defined as

$$\begin{aligned} I(\theta ) = \left| \frac{1}{n} \sum _{t=1}^n y(t) \cos (\theta t) \right| ^2 + \left| \frac{1}{n} \sum _{t=1}^n y(t) \sin (\theta t) \right| ^2. \end{aligned}$$
(10)

The main reason to use the periodogram function to estimate the frequencies and the number of components p is that \(I(\theta )\) has local maxima at the true frequencies if there is no noise in the data. Therefore, if the noise variance is not high, it is expected that \(I(\theta )\) can be used quite effectively to estimate the frequencies and p also. It may be observed from the following example. Let us consider the following signal:

$$\begin{aligned} y(t)= & {} 3.0 \cos (0.2 \pi t) + 3.0 \sin (0.2 \pi t) + 3.0 \cos (0.5 \pi t) \nonumber \\&+ 3.0 \sin (0.5 \pi t) + e(t). \end{aligned}$$
(11)

Here e(t)’s are assumed to be independent and identically distributed (i.i.d.) normal random variables with mean 0 and variance 2. The plot of the periodogram function \(I(\theta )\) for the model (11) is provided in Figure 3. From Figure 3, it is clear that p = 2 and the local maxima are close to the true frequencies. But, the same may not be true always. For example, let us consider the following signal:

$$\begin{aligned} y(t)= & {} 3.0 \cos (0.2 \pi t) + 3.0 \sin (0.2 \pi t) + 0.25 \cos (0.5 \pi t) \nonumber \\&+ 0.25 \sin (0.5 \pi t) + e(t). \end{aligned}$$
(12)

Here e(t)’s are i.i.d. normal random variables with mean 0 and variance 5.0. The periodogram plot of y(t) obtained from the model (12) is provided in Figure 4. From Figure 4, it is not clear that p = 2 and the true location of the frequencies.

Fig. 3
figure 3

Periodogram plot of y(t) obtained from Model (11).

Fig. 4
figure 4

Periodogram plot of y(t) obtained from Model (12).

Therefore, the least squares estimators seems to be a natural choice. In case of model (9) for a given p, the least squares estimators can be obtained by the argument minimum of the sum of residuals as follows:

$$\begin{aligned} Q({\varvec{\upalpha }} , {\varvec{ \upomega }}) = \sum _{t=1}^n \left| y(t) - \sum _{k=1}^p \alpha _k \mathrm{e}^{i \omega _k t} \right| ^2. \end{aligned}$$
(13)

Here \({\varvec{\upalpha }} = (\alpha _1, \ldots , \alpha _p)^{\top }\) and \({\varvec{ \upomega }} = (\omega _1, \ldots , \omega _p)^{\top }\). Note that the model (9) is a complex valued non-linear regression model. There exists an extensive list of literature related to non-linear regression models (interested readers may refer to the book by Seber and Wild [52]). When e(t)’s are i.i.d. random variables with mean zero and finite variance, Jennrich [20] and later on, Wu [55] have provided several sufficient conditions for the least squares estimators to be consistent and asymptotically normally distributed. Some of these results have been generalized to the complex valued models also by Kundu [26]. It can be easily shown (see for example, [33]) that the model (9) does not satisfy the sufficient conditions of [20, 55] or [26]. Hence, the consistency of the least squares estimators under the assumptions of i.i.d. errors is not guaranteed.

Rao and Zhao [48] provided the consistency and asymptotic normality properties of the maximum likelihood estimators of \({\varvec{\upalpha }}\), \({\varvec{ \upomega }}\) and \(\sigma ^2\), under the following assumptions on e(t), for known p. Let us denote \(\hat{\varvec{\upalpha }}\) and \(\hat{\varvec{ \upomega }}\) as the least squares estimators of \({\varvec{\upalpha }}\) and \({\varvec{ \upomega }}\), respectively, and

$$\begin{aligned} \widehat{\sigma ^2} = \frac{1}{n} Q(\hat{\varvec{\upalpha }}, \hat{\varvec{ \upomega }}). \end{aligned}$$

We need the following assumptions for further development.

Assumption3.1

\(\{\{e(t)\},\ t = 1, 2, \ldots \}\) are i.i.d. complex-valued random variables. If Re(e(t)) and Im(e(t)) denote the real and imaginary parts of e(t), then it is assumed that Re(e(t))) and Im(e(t))) are independent normal random variables with mean zero and variance \(\sigma ^2/2\).

Assumption3.2

\(\alpha _k \ne 0, 0< \omega _k < 2 \pi \), and \(\omega _j \ne \omega _k\), for \(j \ne k\), \(k = 1, \ldots , p\).

Theorem 1

If the diagonal matrix \({\mathbf{{D}}}\) = diag{\(\alpha _1, \ldots , \alpha _p\)}, we denote \({\mathbf{{A}}}_1\) = Im \({\mathbf{{D}}}\), \({\mathbf{{A}}}_2\) = Re \({\mathbf{{D}}}\) and \({\mathbf{{B}}} = ({\mathbf{{D}}}^H{\mathbf{{D}}})^{-1}\). Here ‘H’ denotes the complex conjugate transpose of a matrix or of a vector. If e(t)’s satisfy Assumption 3.1 and the model parameters satisfy Assumption 3.2, then

  1. (i)

    \(\hat{\varvec{\upomega }}\), \(\hat{\varvec{\upalpha }}\) and \(\widehat{\sigma ^2}\) are strongly consistent estimators of \({\varvec{ \upomega }}\), \({\varvec{ \upalpha }}\) and \(\sigma ^2\), respectively.

  2. (ii)

    The limiting distribution of

    $$\begin{aligned} ( n^{3/2}(\hat{\varvec{ \upomega }}-{\varvec{ \upomega }}), n^{1/2}\, \mathrm{Re} (\hat{\varvec{ \upalpha }} - {\varvec{ \upalpha }}), n^{1/2}\, \mathrm{Im} (\hat{\varvec{ \upalpha }} - {\varvec{ \upalpha }}), \widehat{\sigma ^2} )^{\top } \end{aligned}$$

    is a \((3p+1)\)-variate normal distribution with mean vector zero, and covariance matrix \(\sigma ^2 {\mathbf{{\Sigma }}}\), where

    $$\begin{aligned} {\mathbf{{\Sigma }}} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 6{\mathbf{{B}}} &{} 3 {\mathbf{{A}}}_1 {\mathbf{{B}}} &{} - 3 {\mathbf{{A}}}_2 {\mathbf{{B}}} &{} {\mathbf{{0}}} \\ 3 {\mathbf{{A}}}_1 {\mathbf{{B}}} &{} \frac{1}{2} {\mathbf{{I}}} + \frac{3}{2} {\mathbf{{A}}}_1^2 {\mathbf{{B}}} &{} -\frac{3}{2} {\mathbf{{A}}}_2 {\mathbf{{A}}}_1 {\mathbf{{B}}} &{} {\mathbf{{0}}} \\ - 3 {\mathbf{{A}}}_2 {\mathbf{{B}}} &{} -\frac{3}{2} {\mathbf{{A}}}_2 {\mathbf{{A}}}_1 {\mathbf{{B}}} &{} \frac{1}{2} {\mathbf{{I}}} + \frac{3}{2} {\mathbf{{A}}}_2^2 {\mathbf{{B}}} &{} {\mathbf{{0}}} \\ {\mathbf{{0}}} &{} {\mathbf{{0}}} &{} {\mathbf{{0}}} &{} \sigma ^2 \\ \end{array} \right] . \end{aligned}$$

Note that Theorem 1 provides the strong consistency and asymptotic normality of the least squares estimators under the assumptions that the errors are normally distributed even if the model does not satisfy some of the standard sufficient conditions which are available for a general non-linear regression model. The results can be used for constructing asymptotic confidence intervals of the unknown parameters or for testing of hypothesis purposes also. The most important point of Theorem 1 is that the rates of convergence of the linear parameters are of the order \(O_p(n^{-1/2})\), which is quite common, whereas for the frequencies it is \(O_p(n^{-3/2})\), which is quite uncommon for a general non-linear regression model. It implies that the least squares estimators of the frequencies are more efficient than the least squares estimators of the linear parameters. The results of Rao and Zhao [48] were extended by Kundu and Mitra [34] when the errors are i.i.d. random variables with mean zero and finite variance. It has been further extended in case of stationary errors also (for details, see for example the monograph by Kundu and Nandi [36]).

Now we will discuss how to compute the least squares estimators of the unknown parameters of the model (9). The model (9) can be written as follows:

$$\begin{aligned} {\mathbf{{Y}}} = {\mathbf{{X}}} {\varvec{\upalpha }} + {\mathbf{{e}}}. \end{aligned}$$
(14)

Here

$$\begin{aligned} {\mathbf{{Y}}} = \left[ \begin{array}{c} y(1) \\ \vdots \\ y(n) \end{array} \right] , \quad {\textit{\textbf{X}}} = \left[ \begin{array}{ccc} \mathrm{e}^{i \omega _1} &{} \ldots &{} \mathrm{e}^{i \omega _p} \\ \vdots &{} \ddots &{} \vdots \\ \mathrm{e}^{i n \omega _1} &{} \ldots &{} \mathrm{e}^{i n \omega _p} \end{array} \right] , \quad {\varvec{\upalpha }} = \left[ \begin{array}{c} \alpha _1 \\ \vdots \\ \alpha _p \end{array} \right] , \quad {\mathbf{{e}}} = \left[ \begin{array}{c} e(1) \\ \vdots \\ e(n) \end{array} \right] . \end{aligned}$$

Therefore, \(Q({\varvec{\upalpha }}, {\varvec{ \upomega }} )\) as in (13) can be written as

$$\begin{aligned} Q({\varvec{\upalpha }}, {\varvec{ \upomega }}) = \left( {\mathbf{{Y}}} - {\mathbf{{X}}}({\varvec{ \upomega }}) {\varvec{\upalpha }} \right) ^{H} \left( {\mathbf{{Y}}} - {\mathbf{{X}}}({\varvec{ \upomega }} ) {\varvec{\upalpha }} \right) . \end{aligned}$$
(15)

For a given \({\varvec{ \upomega }} \), the least squares estimators of \({\mathbf{{\alpha }}}\), say \(\hat{\varvec{\upalpha }}({\varvec{ \upomega }} )\), can be obtained as

$$\begin{aligned} \hat{\varvec{\upalpha }}({\varvec{ \upomega }} ) = \big ({\mathbf{{X}}}^{H}({\varvec{ \upomega }} ){\mathbf{{X}}}({\varvec{ \upomega }})\big )^{-1}{\mathbf{{X}}}^{H}({\varvec{ \upomega }} ) {\mathbf{{Y}}}. \end{aligned}$$

Hence,

$$\begin{aligned} Q(\hat{\varvec{\upalpha }}({\varvec{ \upomega }} ), {\varvec{ \upomega }} ) = {\mathbf{{Y}}}^H \big ( {\mathbf{{I}}} - {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )}\big )^H \big ( {\mathbf{{I}}} - {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )}\big ) {\mathbf{{Y}}} = {\mathbf{{Y}}}^H \big ( {\mathbf{{I}}} - {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )}\big ) {\mathbf{{Y}}}. \end{aligned}$$

Here \(\displaystyle {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )} = {\mathbf{{X}}}({\varvec{ \upomega }} ) ({\mathbf{{X}}}^H({\varvec{ \upomega }} ){\mathbf{{X}}}({\varvec{ \upomega }} ))^{-1}{\mathbf{{X}}}^{H}({\varvec{ \upomega }} )\) is the projection matrix on the column space of \({\mathbf{{X}}}({\varvec{ \upomega }} )\). Hence, the least squares estimator of \({\varvec{ \upomega }} \), say \(\hat{\varvec{ \upomega }} \) can be obtained as

$$\begin{aligned} \hat{\varvec{ \upomega }} = \hbox {argmin}\, {\mathbf{{Y}}}^H \big ({\mathbf{{I}}} - {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )}\big ) {\mathbf{{Y}}} = \hbox {argmax}\,{\mathbf{{Y}}}^H {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )} {\mathbf{{Y}}}. \end{aligned}$$
(16)

The problem to find \(\hat{\varvec{ \upomega }} \) from (16) is a non-linear optimization problem. It has been observed by Bresler and Macovski [10] that the standard Newton–Raphson algorithm or its variants does not work in this case because the problem is a highly non-linear in nature. Using (1) it can be easily seen that there exists a \(n\times (n-p)\) matrix \({\mathbf{{G}}}({\mathbf{{g}}})\), such that \({\mathbf{{G}}}^H({\mathbf{{g}}}) {\varvec{ \upmu }} = {\mathbf{{0}}}\), where

$$\begin{aligned} {\mathbf{{G}}}^H({\mathbf{{g}}}) = \left[ \begin{array}{cccccccc} g_0 &{} g_1 &{} \ldots &{} g_p &{} 0 &{} 0 &{} \ldots &{} 0 \\ 0 &{} g_0 &{} \ldots &{} g_p &{} 0 &{} 0 &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \ldots &{} 0 &{} \ldots &{} g_0 &{} \ldots &{} g_p \end{array} \right] , \end{aligned}$$

and \({\mathbf{{g}}} = (g_0, \ldots , g_p)^{\top }\). Since, \({\mathbf{{G}}}^H({\mathbf{{g}}}) {\varvec{ \upmu }} = {\mathbf{{0}}}\), it implies \({\mathbf{{G}}}^H({\mathbf{{g}}}) {\mathbf{{X}}}({\varvec{ \upomega }} ) = {\mathbf{{0}}}\). Hence,

$$\begin{aligned} {\mathbf{{Y}}}^H {\mathbf{{P}}}_{{\mathbf{{X}}}({\varvec{ \upomega }} )} {\mathbf{{Y}}} = {\mathbf{{Y}}}^H ({\mathbf{{I}}} - {\mathbf{{P}}}_{{\mathbf{{G}}}({\mathbf{{g}}})}){\mathbf{{Y}}}. \end{aligned}$$

Therefore, if

$$\begin{aligned} \hat{\mathbf{{g}}} = \hbox {argmin}\, {\mathbf{{Y}}}^H {\mathbf{{P}}}_{{\mathbf{{G}}}({\mathbf{{g}}})}{\mathbf{{Y}}}, \end{aligned}$$

and \(\hat{\mathbf{{g}}} = (\hat{g}_0, \ldots , \hat{g}_p)^{\top }\), then \(\hat{\varvec{ \upomega }} \) can be obtained from the roots of the polynomial equation (2) by replacing \(g_i\)’s by \(\hat{g}_i\), for \(i = 0, 1, \ldots , p\). Kundu [27] converted this problem as a non-linear eigenvalue problem and provided an efficient estimation procedures of the frequencies.

Since it is difficult to obtain the least squares estimators or the maximum likelihood estimators when the errors are normally distributed, an extensive amount of work has been done in the signal processing literature to develop several sub-optimal algorithms, which are non-iterative in nature and which behave like the maximum likelihood estimators. Most of these algorithms have used Prony’s equations in some way or the other. It is difficult to list all the algorithms in a limited space, which are available till today. Among the different algorithms, the most popular ones are FBLP and MUSIC algorithms. Interested readers are referred to the monograph by Kundu and Nandi [36] in this respect. We provide a non-iterative procedure which was proposed by Bai et al. [2] based on Prony’s equations and it is very easy to implement in practice. Moreover, it does not require the assumptions of normality on the error term. It only needs that the errors should be i.i.d. random variables with mean zero and finite variance. Consider the following \((n-p)\times (p+1)\) data matrix \({\mathbf{{E}}}\) and \((p+1) \times (p+1)\) data matrix \({\mathbf{{F}}}\) as follows:

$$\begin{aligned} {\mathbf{{E}}} = \left[ \begin{array}{ccc} y(1) &{} \ldots &{} y(p+1) \\ \vdots &{} \vdots &{} \vdots \\ y(n-p) &{} \ldots &{} y(n) \\ \end{array} \right] , \quad {\mathbf{{F}}} = \frac{1}{n-p} {\mathbf{{E}}}^H {\mathbf{{E}}}. \end{aligned}$$

Obtain the eigenvector corresponding to the minimum eigenvalue of \({\mathbf{{F}}}\). It gives an estimator \(\tilde{\mathbf{{g}}} = (\tilde{g}_0, \ldots , \tilde{g}_p)^{\top }\) of \({\mathbf{{g}}}\). Construct a polynomial equation of the form

$$\begin{aligned} \tilde{g}_0 + \tilde{g}_1 z + \cdots + \tilde{g}_p z^p = 0, \end{aligned}$$
(17)

and obtain the roots of the form \(\tilde{\rho }_1 \mathrm{e}^{-i \tilde{\omega }_1}, \ldots , \tilde{\rho }_1 \mathrm{e}^{-i \tilde{\omega }_p}\). Take \(\tilde{\omega }_1, \ldots , \tilde{\omega }_p\) as the estimates of \(\omega _1, \ldots , \omega _p\), respectively. It has been shown by Bai et al. [2] that \(\tilde{\omega }_j\) is a consistent estimate of \(\omega _j\), and \(\tilde{\omega }_j - \omega _j = O_p(n^{-1/2})\), for \(j = 1, \ldots , p\). Using the conjugate symmetric properties of the polynomial coefficients as described in (6), the method has been modified by Kannan and Kundu [23].

Now we describe an efficient estimation procedure developed by Professor Rao and his colleagues, see Bai et al. [5], which is iterative in nature, but it is guaranteed that it stops in a fixed number of iteration. It also does not need the assumptions of normality on the error terms similar to the EVLP method as described above. Moreover, the most salient feature of this algorithm is that it produces frequency estimators which have the same asymptotic efficiency as the least squares estimators. It is quite an unique algorithm in the sense that no other iterative procedure available today, guarantees convergence in a fixed number of steps. It does not use the whole data set at each step. It starts with a smaller data set, and then gradually increases to the complete data set. The algorithm can be described as follows:

Let \(\tilde{\omega }_j\) be a consistent estimator of \(\omega _j\), for \(j = 1, \ldots , p\), and compute \(\hat{\omega }_j\) for \(j = 1, \ldots , p\) as follows:

$$\begin{aligned} \hat{\omega }_j = \tilde{\omega }_j + \frac{12}{n^2} \hbox {Im} \left[ \frac{C_n}{D_n} \right] , \end{aligned}$$

where

$$\begin{aligned} C_n = \sum _{t=1}^n y(t) (t-n/2) \mathrm{e}^{-i \hat{\omega }_j t}, \quad D_n = \sum _{t=1}^n y(t) \mathrm{e}^{-i \hat{\omega }_j t}. \end{aligned}$$

Then we have the following result whose proof can be found in Bai et al. [5].

Theorem 2

If \(\tilde{\omega }_j - \omega _j = O_p(n^{-1-\delta })\) for \(\delta \in (0, 1/2]\), and for \(j = 1, \ldots , p\), then

  1. (1)

    \(\hat{\omega }_j - \omega _j = O_p(n^{-1-2\delta })\), for \(\delta \le 1/4\).

  2. (2)

    \(n^{3/2}(\hat{\varvec{ \upomega }} - {\varvec{ \upomega }}) \rightarrow N_p({\mathbf{{0}}}, 6 \sigma ^2 ({\mathbf{{D}}}^H{\mathbf{{D}}})^{-1}\) if \(\delta > 1/4\).

Here \({\mathbf{{D}}}\) is \( p \times p\) diagonal matrix as \({\mathbf{{D}}}\) = diag\(\{\alpha _1, \ldots , \alpha _p\}\).

We start with a consistent estimate of \(\omega _j\), and improve it step by step by a recursive algorithm. The m-th stage estimate of \(\hat{\omega }_j^{(m)}\) is computed from the \((m-1)\)-th stage estimate \(\hat{\omega }_j^{(m-1)}\), by the formula

$$\begin{aligned} \hat{\omega }_j^{(m)} = \hat{\omega }_j^{(m-1)} + \frac{12}{n_m^2} \hbox {Im} \left[ \frac{C_{n_m}}{D_{n_m}} \right] , \end{aligned}$$
(18)

where \(C_{n_m}\) and \(D_{n_m}\) can be obtained from \(C_n\) and \(D_n\) by replacing n and \(\tilde{\omega }_j\) with \(n_m\) and \(\hat{\omega }_j^{(m-1)}\), respectively. We apply the formula (18) repeatedly choosing \(n_m\) at each stage as follows:

Step 1. With m = 1, choose \(n_1 = n^{0.40}\) and \(\hat{\omega }_j^{(0)} = \tilde{\omega }_j\), the FBLP estimator. Note that

$$\begin{aligned} \tilde{\omega }_j - \omega _j = O_P\big (n^{-1/2}\big ) = O_p\big (n_1^{-1-1/4}\big ). \end{aligned}$$

Then substituting \(n_1 = n^{0.40}\), \(\hat{\omega }_j^{(0)} = \tilde{\omega }_j\) in (18) and applying Theorem 2, we obtain

$$\begin{aligned} \hat{\omega }_j^{(1)} - \omega _j = O_p\big (n_1^{-1-1/2}\big ) = O_p\big (n^{-0.60}\big ). \end{aligned}$$

Step 2. With m = 2, choose \(n_2 = n^{0.48}\) and compute \(\hat{\omega }_j^{(2)}\) from \(\hat{\omega }_j^{(1)}\) using (18). Since \(\hat{\omega }_j^{(1)} - \omega _j = O_p(n^{-0.60}) = O_p(n_2^{-1-1/4})\) and using Theorem 2, we obtain

$$\begin{aligned} \hat{\omega }_j^{(2)} - \omega _j = O_p\big (n_2^{-1-1/2}\big ) = O_p\big (n^{-0.72}\big ). \end{aligned}$$

Choosing \(n_3, \ldots , n_6\) as given below and applying the main theorem in the same way as above, we have

  • Step 3. \(n_3 = n^{0.57}\) provides \(\hat{\omega }_j^{(3)} - \omega _j = O_p\big (n^{-0.87}\big )\).

  • Step 4. \(n_4 = n^{0.70}\) provides \(\hat{\omega }_j^{(4)} - \omega _j = O_p\big (n^{-1.04}\big )\).

  • Step 5. \(n_5 = n^{0.83}\) provides \(\hat{\omega }_j^{(5)} - \omega _j = O_p\big (n^{-1.25}\big )\).

  • Step 6. \(n_6 = n^{0.92}\), provides \(\hat{\omega }_j^{(6)} - \omega _j = O_p\big (n^{-1.58}\big )\).

Finally take \(n_7 = n\), and compute \(\hat{\omega }_j^{(7)}\) from \(\hat{\omega }_j^{(6)}\). Now applying Theorem 2, we have

$$\begin{aligned} n^{3/2}(\hat{\omega }_j^{(7)} - \omega ) \rightarrow N_p({\mathbf{{0}}}, 6 \sigma ^2 ({\mathbf{{D}}}^H{\mathbf{{D}}})^{-1}). \end{aligned}$$

So far, all the methods we have discussed are based on the assumptions that the number of exponential components is known and also the data are equispaced, namely at \(t = 1, \ldots , n\). Professor Rao and his colleagues, see Bai et al. [6], provided a method to estimate the number of components and frequencies simultaneously when some observations are missing, i.e. the data need not be equispaced. Clearly, this method is applicable when the complete data are available. The method is developed based on the following assumptions on the error terms. The error random variables e(t)’s are i.i.d. and

$$\begin{aligned} E(e(1)) = 0, \quad E|e(1)|^2 = \sigma ^2, \quad E|e(1)|^4 < \infty . \end{aligned}$$
(19)

It is further assumed that p is also unknown and \(0 \le p \le P < \infty \), where P is known. The method can be described as follows: For a given p, consider the \((n-p)\times (p+1)\) data matrix \({\mathbf{{Z}}}\) as given below:

$$\begin{aligned} {\mathbf{{Z}}} = \left[ \begin{array}{cccc} y(p+1) &{} y(p) &{} \ldots &{} y(1) \\ y(p+2) &{} y(p+1) &{} \ldots &{} y(2) \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ y(n) &{} y(n-1) &{} \ldots &{} y(n-p) \\ \end{array} \right] = [ {\mathbf{{y}}}_1, {\mathbf{{y}}}_2, \ldots , {\mathbf{{y}}}_{n-p}]^{\top }. \end{aligned}$$

It is assumed that the observations \(\{y(k); k \in \kappa _n\}\) are missing. Here \(\kappa _n \subset \{1, 2, \ldots , n\}\). Let \({\mathbf{{A}}}_p\) be the matrix obtained from the matrix \({\mathbf{{Z}}}\) after removing the rows having missing observations. Let \(r_p\) be the number of rows of \({\mathbf{{A}}}_p\). Let us define

$$\begin{aligned} \hat{\mathbf{{\Gamma }}}^{(p)} = \frac{1}{r_p} {\mathbf{{A}}}_p^H {\mathbf{{A}}}_p = \big (\big (\hat{\gamma }_{mj}^{(p)}\big )\big ) \end{aligned}$$

and

$$\begin{aligned} S_p = \hbox {min}\big \{r_p^{-1} ||{\mathbf{{A}}}_p {\mathbf{{g}}}^{(p)}||^2: ||{\mathbf{{g}}}^{(p)}|| = 1 \big \}, \end{aligned}$$
(20)

for \(p = 0, \ldots , P\), where \({\mathbf{{g}}}^{(p)} = (g_0^{(p)}, g_1^{(p)}, \ldots , g_p^{(p)})^{\top }\). Note that \(S_p\) is the smallest eigenvalue of \(\hat{\mathbf{{\Gamma }}}^{(p)}\). Consider

$$\begin{aligned} R_p = S_P + p C_n, \quad p = 0, 1, \ldots , P, \end{aligned}$$

where \(C_n\) satisfies the following assumptions:

$$\begin{aligned} \lim _{n \rightarrow \infty } C_n = 0 \quad \hbox {and} \quad \lim _{n \rightarrow \infty } \frac{\sqrt{r_p} C_n}{\sqrt{\ln \ln r_P}} = \infty . \end{aligned}$$

Then find \(\hat{p} \le P\), such that \(R_{\hat{p}} = \min _{0 \le p \le P} R_p\), and \(\hat{p}\) is an estimate of p. Once \(\hat{p}\) is obtained \(\hat{\mathbf{{g}}}^{(\widehat{p})}\) can be obtained from (20), by replacing p with \(\hat{p}\). Once \(\hat{\mathbf{{g}}}^{(\hat{p})}\) is obtained the estimates of \(\omega _1, \ldots , \omega _{\hat{p}}\) can be obtained from the polynomial equation (17). It has been shown by the authors that based on the error assumptions (19), the above method provides consistent estimates for p as well as for \(\omega _1, \ldots , \omega _p\). If p is known, the consistent estimates of the frequencies when some observations are missing can be obtained in a slightly weaker conditions, namely

$$\begin{aligned} E(e(1)) = 0, \quad E|e(1)|^2 = \sigma ^2 < \infty \end{aligned}$$

(see for example, [32]).

So far, we have discussed mainly one-dimensional superimposed exponential model. Now we will discuss about the contributions of Professor Rao for two-dimensional superimposed exponential signals.

4 Two-dimensional superimposed exponential model

Rao et al. [49] considered the two dimensional superimposed exponential signals which has the following form:

$$\begin{aligned} y(s,t) = \sum _{j=1}^p \sum _{k=1}^q \gamma _{jk} \mathrm{e}^{i(s \mu _j + t \nu _k)} + w(s,t), \quad s = 1, \ldots , m, \ t = 1, \ldots , n. \end{aligned}$$
(21)

Here, y(st) is the observed signal, \(\mu _j \in (0, 2 \pi )\) and \(\nu _k \in (0, 2 \pi )\) are the unknown frequencies of the signals, \(\gamma _{jk}\) is the unknown complex-valued linear parameters, w(st) is an array of two-dimensional random variables with mean 0. The explicit assumptions of w(st) will be explained later. The problem is to extract the original signal from the noise corrupted data.

The problem had a long history. It was first considered by McClellan [38] to analyze two-dimensional symmetric images. Since then several other authors also considered this model (see for example, [7, 19, 24] and see the references cited therein). The corresponding real two-dimensional sum of sinusoidal model can be written as follows

$$\begin{aligned} y(s,t)= & {} \sum _{j=1}^p \sum _{k=1}^q \gamma _{jk} \cos (s \mu _j + t \nu _k) \nonumber \\&+ \sin (s \mu _j + t \nu _k) + w(s,t), \quad s = 1, \ldots , m, \ t = 1, \ldots , n. \end{aligned}$$
(22)

In this case, similar to the one-dimensional model, y(st), \(\gamma _{jk}\) and w(st) are all real-valued. Some two-dimensional plots based on model (22) are provided in Figures 5 and 6.

Fig. 5
figure 5

Two dimensional plot obtained from Model (22).

Fig. 6
figure 6

Another two dimensional plot obtained from Model (22).

In this case also, the most popular method to estimate the frequencies is obtained through the periodogram function approach. The two-dimensional periodogram function can be written as follows:

$$\begin{aligned} I(\mu , \nu ) = \left| \frac{1}{mn} \sum _{s=1}^m \sum _{t=1}^n \mathrm{e}^{i(s \mu + t \nu )} \right| ^2, \end{aligned}$$
(23)

and the estimate of the frequencies can be obtained from the local maxima of \(I(\mu , \nu )\). Similar to the one-dimensional model, it can be easily shown that when there is no noise, then the two-dimensional periodogram function has local maxima at the true frequencies. Therefore, when there is no noise or the noise variance is small, the true frequencies can be easily estimated, but the same is not true when the error variance is high. Due to this reason, several methods have been proposed in the literature and most of the methods are based on two-dimensional extension of the Prony’s equations.

In this case also, the most reasonable estimators will be the least squares estimators, and they can be obtained by minimizing

$$\begin{aligned} Q({\varvec{ \upgamma }}, {\varvec{\upmu }} , {\varvec{ \upnu }}) = \frac{1}{mn} \sum _{s=1}^m \sum _{t=1}^n \left( y(s,t) - \sum _{j=1}^p \sum _{k=1}^q \gamma _{jk} \mathrm{e}^{i(s \mu _j + t \nu _k)} \right) ^2, \end{aligned}$$
(24)

with respect to \({\varvec{ \upgamma }}\), \({\varvec{ \upmu }}\) and \({\varvec{ \upnu }}\), where \({\varvec{\upgamma }} = (\gamma _{11}, \ldots , \gamma _{pq})^{\top }\), \({\varvec{\upmu }} = (\mu _1, \ldots , \mu _p)^{\top }\) and \({\varvec{\upnu }} = (\nu _1, \ldots , \nu _q)^{\top }\). If the least squares estimators of \(\gamma _{jk}\), \(\mu _j\) and \(\nu _k\) are denoted by \(\hat{\gamma }_{jk}\), \(\hat{\mu }_j\) and \(\hat{\nu }_k\), respectively, then an estimator of \(\sigma ^2\) becomes

$$\begin{aligned} \hat{\sigma }^2 = \frac{1}{mn} \sum _{s=1}^m \sum _{t=1}^n \left( y(s,t) - \sum _{j=1}^p \sum _{k=1}^q \hat{\gamma }_{jk} \mathrm{e}^{i(s \hat{\mu }_j + t \hat{\nu }_k)} \right) ^2. \end{aligned}$$

The model (21) is also a non-linear regression model, but it can be shown (see for example, [31]) that the model does not satisfy some of the standard sufficient conditions which are available in the literature for the least squares estimators to be consistent and asymptotically normally distributed, similar to the one-dimensional model. Hence, although the least squares estimators seem to be the most reasonable estimators, the consistency cannot be guaranteed. Rao et al. [49] provided the consistency and asymptotic normality results of the least squares estimators of the complex linear parameters and the frequencies under the following assumptions.

Assumption4.1

\(\{w(s,t)\}\) is an array of complex-valued i.i.d. random variables. Moreover, Re(w(st)) and Im(w(st)) are normal random variables with mean 0 and variance \(\sigma ^2/2\) each, and they are independently distributed.

Assumption4.2

The frequencies \(\mu _1, \ldots , \mu _p\) are different from each other and so are \(\nu _1, \ldots , \nu _q\).

Assumption4.3

\(||{\varvec{\upgamma }}_{j\cdot }||^2 = \sum _{k=1}^q |\gamma _{jk}|^2 > 0\), \(||{\varvec{\upgamma }}_{\cdot k}||^2 = \sum _{k=1}^q |\gamma _{jk}|^2 > 0\), where \( {\varvec{\upgamma }}_{j \cdot } = (\gamma _{j1}, \ldots , \gamma _{jq})^{\top }\), \( {\varvec{\upgamma }}_{\cdot k} = (\gamma _{1k}, \ldots , \gamma _{qk})^{\top }\) and \(|| \cdot ||\) denotes the \(L_2\) norm of a vector or a matrix.

Assumption4.4

There are positive constants \(c_1\), \(c_2\), \(\alpha _1\) and \(\alpha _2\), such that

$$\begin{aligned} c_1 n^{\alpha _1} \le m \le c_2 n^{\alpha _2}. \end{aligned}$$

We need the following notations:

$$\begin{aligned}&\Delta _j = \hat{\mu }_j - \mu _j, \quad \delta _k = \hat{\nu }_k - \nu _k, \quad {\varvec{\Delta }} = (\Delta _1, \ldots , \Delta _p)^{\top }, \quad {\varvec{ \updelta }} = (\delta _1, \ldots , \delta _q)^{\top }, \\&{\varvec{\Gamma }}_{\varvec{\Delta }} = \hbox {diag}(||\gamma _{1 \cdot }, \ldots , ||\gamma _{p \cdot }||)^{\top }; \ \ \ p \times p, \ \\&{\varvec{\Gamma }}_{\varvec{ \delta }} = \hbox {diag}(||\gamma _{\cdot 1}, \ldots , ||\gamma _{\cdot q}||)^{\top }; \ \ \ q \times q, \\&{\mathbf{{G}}}_{\varvec{\Delta }} = \hbox {diag}({\varvec{\upgamma }}_{1 \cdot }, \ldots , {\varvec{\upgamma }}_{p \cdot })^{\top }; \ \ \ pq \times p, \\&{\mathbf{{G}}}_{\varvec{\delta }} = (\hbox {diag}(\gamma _{11}, \ldots , \gamma _{1q}), \ldots , \hbox {diag}(\gamma _{p1}, \ldots , \gamma _{pq}))^{\top }; \ \ \ pq \times q. \end{aligned}$$

Now based on the above notations we can state the main result without the proof.

Theorem 3

If Assumptions 4.14.4 are met, then \(\hat{\varvec{\upgamma }}\) and \(\hat{\sigma }^2\) are strongly consistent estimates of \({\varvec{\upgamma }}\) and \(\sigma ^2\), respectively, as \(n \rightarrow \infty \). Moreover, the limiting distribution of

$$\begin{aligned}&\big (m^{3/2}n^{1/2} {\mathbf{{\Delta }}}^{\top },\ m^{1/2} n^{3/2} {\mathbf{{\delta }}}^{\top }, \sqrt{mn} \ { \mathrm Re}(\hat{\varvec{\upgamma }} - {\varvec{\upgamma }})^{\top }, \\&\quad \sqrt{mn} \ \mathrm{Im}\big (\hat{\varvec{\upgamma }} - {\varvec{\upgamma }}\big )^{\top }, \sqrt{mn} \ \big (\hat{\sigma }^2 - \sigma ^2\big )\big )^{\top } \end{aligned}$$

is a \((2 pq + p + q + 1)\)-variate normal distribution with mean vector zero, and the covariance matrix \(\sigma ^2 \varvec{\Sigma }\) = \(\sigma ^2 (\varvec{\Sigma }_{kl})\), \(k,l = 1, \ldots , 5\), where

$$\begin{aligned}&\varvec{\Sigma }_{11} = 6 {\varvec{\upgamma }}_{\varvec{\Delta }}^{-2}, \ \ {\varvec{\Sigma }}_{13} = {\varvec{\Sigma }}_{31}^{\top } = 3 \ \mathrm{Im} \big ({\varvec{\Gamma }}_{\varvec{\Delta }}^{-2}{\mathbf{{G}}}_{\varvec{\Delta }}^{\top }\big ), \ \ {\varvec{\Sigma }}_{14} = {\varvec{\Sigma }}_{41}^{\top } = -3 \ \mathrm{Re} \big ({\varvec{\Gamma }}_{\varvec{\Delta }}^{-2}{\mathbf{{G}}}_{\varvec{\Delta }}^{\top }\big ), \\&{\varvec{\Sigma }}_{22} = 6 \ {\varvec{\Gamma }}_{{\varvec{\updelta }}}^{-2}, \ \ {\varvec{\Sigma }}_{23} = {\varvec{\Sigma }}_{32}^{\top } = 3 \, \mathrm{Im} \big ({\varvec{ \upgamma }}_{\varvec{\updelta }}^{-2}{\mathbf{{G}}}_{\varvec{\updelta }}^{\top }\big ) , \ \ {\varvec{\Sigma }}_{24} = \varvec{\Sigma }_{42}^{\top } = 3 \ \mathrm{Re} \big (\varvec{\Gamma }_{\varvec{\updelta }}^{-2}{\mathbf{{G}}}_{\varvec{\updelta }}^{\top }\big ), \\&\varvec{\Sigma }_{33} = \frac{3}{2} \big (\mathrm{Im} ({\mathbf{{G}}}_{\varvec{\Delta }}) \varvec{\Gamma }_{\varvec{\Delta }}^{-2} \mathrm{Im} \big ({\mathbf{{G}}}^{\top }_{\varvec{\Delta }}\big )\big ) + \big (\mathrm{Im} \big ({\mathbf{{G}}}_{\varvec{\updelta }}\big ) {\varvec{\Gamma }}_{\varvec{\updelta }}^{-2} \mathrm{Im} \big ({\mathbf{{G}}}^{\top }_{\varvec{\updelta }}\big )\big ) + \frac{1}{2} {\mathbf{{I}}}_{pq}, \\&\varvec{\Sigma }_{34} = \varvec{\Sigma }_{43}^{\top } = -\frac{3}{2} \big (\mathrm{Im} \big ({\mathbf{{G}}}_{\varvec{\Delta }}\big ) \varvec{\Gamma }_{\varvec{\Delta }}^{-2} \mathrm{Re} \big ({\mathbf{{G}}}^{\top }_{\varvec{\Delta }}\big )\big ) + \big (\mathrm{Im} ({\mathbf{{G}}}_{\varvec{\updelta }}) \varvec{\Gamma }_{\varvec{\updelta }}^{-2} \mathrm{Re} \big ({\mathbf{{G}}}^{\top }_{\varvec{\updelta }}\big )\big ), \\&\varvec{\Sigma }_{44} = \frac{3}{2} \big (\mathrm{Re} \big ({\mathbf{{G}}}_{\varvec{\Delta }}\big ) \varvec{\Gamma }_{\varvec{\Delta }}^{-2} \mathrm{Re} \big ({\mathbf{{G}}}^{\top }_{\varvec{\Delta }}\big )\big ) + \big (\mathrm{Re} \big ({\mathbf{{G}}}_{\varvec{\updelta }}\big ) \varvec{\Gamma }_{\varvec{\updelta }}^{-2} \mathrm{Re} \big ({\mathbf{{G}}}^{\top }_{\varvec{\updelta }}\big )\big ) + \frac{1}{2} {\mathbf{{I}}}_{pq}, \\&\varvec{\Sigma }_{55} = \sigma ^2,\\ \end{aligned}$$

and the remaining \(\varvec{\Sigma }_{kl}\)’s are zeros.

In this case also, the most interesting feature is that the maximum likelihood estimators of the linear parameters have the convergence rate as \(O_p(m^{-1/2} n^{-1/2})\), whereas the maximum likelihood estimators of the frequencies have the convergence rate as \(O_p(m^{-3/2}n^{-1/2})\) and \(O_P(m^{-1/2} n^{-3/2})\). Rao et al. [49] provided the results when the errors follow normal distributions. The results have been extended by Kundu and Gupta [31] for i.i.d. errors and further it has been extended by Zhang and Mandrekar [56] and Kundu and Nandi [35] for stationary random fields.

Although the theoretical properties of the least squares estimators have been established, finding the least squares estimates is a numerically challenging problem. Due to this reason, several sub-optimal estimators have been proposed by several authors (see for example, [7, 12, 13, 19, 24] and the references cited therein). But none of these authors discussed the convergence rates of the estimators theoretically. Miao et al. [39] and Nandi et al. [40] provided two efficient algorithms which produce estimators, which have the same convergence rates as the least squares estimators.

Professor Rao has worked on another important signal processing model and it is known as the direction of arrival (DOA) model. Now we will provide his contribution briefly in the next section.

5 Direction of arrival (DOA) model

The DOA model can be written as follows:

$$\begin{aligned} {\mathbf{{x}}}(t) = {\mathbf{{A}}} {\mathbf{{s}}}(t) + {\mathbf{{n}}}(t), \quad t = 1, \ldots , n. \end{aligned}$$
(25)

Here at time t, \({\mathbf{{x}}}(t)\) is a \(p \times 1\) complex vector of observations received by p sensors uniformly spaced, \({\mathbf{{s}}}(t)\) is a \(q \times 1\) complex vector of unobservable signals emitted from q sources, and \({\mathbf{{n}}}(t)\) is a \(p \times 1\) complex-valued noise vector. The number of sources is less than the number of sensors. The matrix \({\mathbf{{A}}}\) has a special structure with its k-th column \({\mathbf{{a}}}_k = {\mathbf{{a}}}(\tau _k)\) of the form

$$\begin{aligned} {\mathbf{{a}}}_k^{\top } = {\mathbf{{a}}}(\tau _k)^{\top } = (1, \mathrm{e}^{-i \omega _0 \tau _k}, \ldots , \mathrm{e}^{-i \omega _0 (p-1) \tau _k}), \end{aligned}$$

here as before, \(i = \sqrt{-1}\), \(\displaystyle \tau _k = c^{-1} \Delta \sin \theta _k\), c = speed of propagation, \(\theta _k\) is the DOA from the k-th source, and \(\Delta \) is the inter sensor distance. Without loss of generality, \(\omega _0\) is taken to be unity. It is assumed that \(\{{\mathbf{{s}}}(t)\}\) and \(\{{\mathbf{{n}}}(t)\}\) are independent sequences of i.i.d. random variables with

$$\begin{aligned} \hbox {E}[{\mathbf{{n}}}(t) {\mathbf{{n}}}(t)^H] = \sigma ^2 {\mathbf{{I}}}_p, \quad \hbox {E}[{\mathbf{{s}}}(t) {\mathbf{{s}}}(t)^H] = {\varvec{\Gamma }} \quad \hbox {and} \quad \hbox {Rank}({\varvec{\Gamma }}) = q. \end{aligned}$$
(26)

There are two important problems connected with this model. One is the estimation of q, the number of sources and the other is the estimation of \(\tau _1, \ldots , \tau _q\), providing the estimates of \(\theta _1, \ldots , \theta _q\), the DOA of signals.

This model has been used in censor array processing [11, 21, 25], in harmonic analysis [41], in retrieving the poles of a system from natural responses [54] and also in retrieving overlapping echoes from radar backscatter [43]. This model has received considerable amount of attention in the signal processing literature. Estimation of \(\tau _1, \ldots , \tau _q\) assuming q known is usually solved by some eigen-decomposition methods available in the literature (for example, MUSIC [9, 51], ESPRIT, TLS-ESPRIT [50], GEESE [43]. For detailed discussions of the different eigen-decomposition methods, interested readers are referred to the Ph.D. thesis of Kannan [22] or the review article of Paulraj et al. [42]. The estimation of q is obtained by using some information theoretic criterion (see for example, [57, 58] in this respect).

First, let us address the estimation of q, and then we provide the estimation of the DOA of signals. In addition to the assumption (26), if it is further assumed that \(\{{\mathbf{{n}}}(t)\}\) has a p-variate complex normal distribution, then the sample variance covariance matrix

$$\begin{aligned} {\mathbf{{S}}} = \frac{1}{n} \sum _{t=1}^n {\mathbf{{x}}}(t) {\mathbf{{x}}}(t)^H \end{aligned}$$

has a complex Wishart distribution with n degrees of freedom and the covariance matrix

$$\begin{aligned} {\varvec{\Sigma }} = {\mathbf{{A}}} {\varvec{\Gamma }} {\mathbf{{A}}}^H + \sigma ^2 {\mathbf{{I}}}. \end{aligned}$$

Since \({\varvec{\Gamma }}\) is non-singular, then the number of signals is \(q=p-s\), where s is the multiplicity of the smallest eigenvalue of \({\varvec{\Sigma }}\). Hence, the problem of estimation of q can be studied within the framework of testing the equality of the given number of smallest eigenvalues of \({\varvec{\Sigma }}\).

The log-likelihood function of the observed data without the additive constant can be written as

$$\begin{aligned} l_n^{(q)} = -\frac{n}{2} ( \ln |{\varvec{\Sigma }}| + \hbox {tr} \ {\varvec{\Sigma }}^{-1} {\mathbf{{S}}}). \end{aligned}$$

The maximum of \(\displaystyle l_n^{(q)}\) for a given q is

$$\begin{aligned} \hat{l}_n^{(q)} = -\frac{n}{2} \left( \sum _{i=1}^q \ln \hat{\lambda }_i + (p-q) \ln \frac{\hat{\lambda }_{q+1} + \cdots + \hat{\lambda }_p}{p-q}\right) , \end{aligned}$$

where \(\hat{\lambda }_1> \cdots > \hat{\lambda }_p\) are the ordered eigenvalues of \({\mathbf{{S}}}\) and they are distinct with probability 1. The likelihood ratio test statistic for testing the equality of the last \(p-q\) eigenvalues of \({\varvec{\Sigma }}\) is

$$\begin{aligned} G_n^{(q)} = 2(l_n^{(p)} - l_n^{(q)}) = n (p-q) \ln \frac{\hat{\lambda }_{q+1} + \cdots + \hat{\lambda }_p}{p-q} - \ln (\hat{\lambda }_{q+1} + \cdots + \hat{\lambda }_p) \end{aligned}$$

and it is asymptotically distributed as \(\chi ^2\) with \((p-q)^2 - 1\) degrees of freedom. Let

$$\begin{aligned} \hat{q} = \hbox {max} \{q: G_n^{(q)} \le c_{\alpha }\}, \end{aligned}$$

where \(c_{\alpha }\) is the upper \(\alpha \)-th percentile point of a \(\chi ^2\) distribution with \((p-q)^2 - 1\) degrees of freedom, then an upper 100\((1-\alpha )\)% confidence limit to q is \(\hat{q}\).

Further, to get a point estimate of q, a general information theoretic criterion can be used as follows. Consider

$$\begin{aligned} \hbox {GIC}(k) = 2 \ln _n^{(k)} - \nu (k) c_n, \end{aligned}$$

where \(\nu (k)\) is the number of free parameters in the model and \(c_n\) is a function of n, such that as \(n \rightarrow \infty \),

$$\begin{aligned} \frac{c_n}{n} \rightarrow 0 \quad \hbox {and} \quad \frac{c_n}{\ln \ln n} \rightarrow \infty . \end{aligned}$$
(27)

In this case, \(\nu (k) = q(2p-q)+1\) and the estimate of q is \(\hat{q}\), where

$$\begin{aligned} \hbox {GIC}(\hat{q}) = \max _k \hbox {GIC}(k). \end{aligned}$$

It has been shown that \(\hat{q}\) is a strongly consistent estimate of q. Note that the choice of \(c_n\) is quite arbitrary except that it satisfies (27). Moreover, the performance of the method depends on the choice of \(c_n\). Not enough studies have been performed to choose \(c_n\). Some attempt has been made by Kundu [30] in this direction.

Now we discuss about the estimation of DOA of signals. From now on, it is assumed that q is known. Note that for a given q, the eigenvalues of \({\varvec{\Sigma }}\) are of the form

$$\begin{aligned} \lambda _1 \ge \cdots \ge \lambda _q > \lambda _{q+1} = \cdots = \lambda _p = \sigma ^2. \end{aligned}$$

Let \({\mathbf{{e}}}_1, \ldots , {\mathbf{{e}}}_q, {\mathbf{{e}}}_{q+1}, \ldots , {\mathbf{{e}}}_p\) be the corresponding eigenvectors and define matrices

$$\begin{aligned} {\mathbf{{E}}}_s = ({\mathbf{{e}}}_1, \ldots , {\mathbf{{e}}}_q) \quad \hbox {and} \quad {\mathbf{{E}}}_n = ({\mathbf{{e}}}_{q+1}, \ldots , {\mathbf{{e}}}_p). \end{aligned}$$

The vector space generated by \({\mathbf{{E}}}_s\) is called the signal space and by \({\mathbf{{E}}}_n\) is called the noise space. Observe that

$$\begin{aligned} {\mathbf{{a}}}(\tau )^{\top } {\mathbf{{E}}}_n = {\mathbf{{0}}} \ \Leftrightarrow \ D(\tau ) = {\mathbf{{a}}}(\tau )^{\top } {\mathbf{{E}}}_n {\mathbf{{E}}}_n^{\top } {\mathbf{{a}}}(\tau ) = 0, \end{aligned}$$

for \(\tau = \tau _i\), \(i = 1, 2, \ldots , q\). In practice we do not have \({\varvec{\Sigma }}\), we only have an estimate of \({\varvec{\Sigma }}\), i.e. \({\mathbf{{S}}}\). Let the eigenvalues and the corresponding eigenvectors of \({\mathbf{{S}}}\) be

$$\begin{aligned}&\hat{\lambda }_1> \cdots> \hat{\lambda }_q> \cdots > \hat{\lambda }_p, \\&\hat{\mathbf{{e}}}_1, \ldots , \hat{\mathbf{{e}}}_q \ldots , \hat{\mathbf{{e}}}_q \end{aligned}$$

and they provide consistent estimates of the corresponding eigenvalues and eigenvectors of \({\varvec{\Sigma }}\). Moreover, an estimate of \({\mathbf{{E}}}_n\), spanning the noise eigenspace is

$$\begin{aligned} \hat{\mathbf{{E}}}_n = (\hat{\mathbf{{e}}}_{q+1} \ldots , \hat{\mathbf{{e}}}_q). \end{aligned}$$

The function

$$\begin{aligned} \hat{D}(\tau ) = {\mathbf{{a}}}(\tau )^{\top } \hat{\mathbf{{E}}}_n \hat{\mathbf{{E}}}_n^{\top } {\mathbf{{a}}}(\tau ) \end{aligned}$$

may not vanish at any \(\tau \), but in the neighborhood of \(\tau _1, \ldots , \tau _q\), it is expected to be small. This is the main idea of the MUSIC algorithm as proposed by Bienvenu and Kopp [9]. Using the conjugate symmetry property of the polynomial coefficients, as described in Section 2, the MUSIC algorithm has been modified by Kundu [28]. It is important to note that in both MUSIC or modified MUSIC algorithms, the solutions are obtained by a search method. Hence, although the consistency property of the estimators can be established, the asymptotic distributions are difficult to study. The following approach may be used to study the asymptotic properties of the estimators.

Note that there exists a matrix \({\mathbf{{G}}}\) of order \(p\times (p-q)\),

$$\begin{aligned} {\mathbf{{G}}} = ({\mathbf{{G}}}_1, \ldots , {\mathbf{{G}}}_{p-q}), \end{aligned}$$
(28)

where the \(p\times 1\) vector \({\mathbf{{G}}}_k\) for \(k = 1, \ldots , (p-q)\) is of the form

$$\begin{aligned} {\mathbf{{G}}}_k = (0, \ldots , 0, g_0, \ldots , g_{q}, 0, \ldots , 0)^{\top }, \end{aligned}$$

with the first k and the last \((p-k-q-1)\) elements are zeros, \(g_{q} > 0\), \(\displaystyle g_0 \bar{g}_0 + \cdots + g_{q} \bar{g}_{q} = 1\), such that \({\mathbf{{a}}}(\tau )^{\top } {\mathbf{{G}}} = {\mathbf{{0}}}\), for \(\tau = \tau _1, \ldots , \tau _q\), and \(\mathrm{e}^{i \tau _1}, \ldots , \mathrm{e}^{i \tau _q}\) are the solutions of the polynomial equation

$$\begin{aligned} g_0 + g_1 z + \cdots + g_q z^q = 0. \end{aligned}$$

Therefore, the statistical problem is to estimate the vector \({\mathbf{{g}}} = (g_0, \ldots , g_q)^{\top }\) from the given observation. Since the columns of \({\mathbf{{G}}}\) generate the noise eigenspace of \({\mathbf{{E}}}_n\), the problem of estimating \({\mathbf{{G}}}\) can be thought of fitting a basis of the type \({\mathbf{{G}}}_1, \ldots , {\mathbf{{G}}}_{p-q}\) to the estimated noise eigenspace \(\hat{\mathbf{{E}}}_n\). Mathematically, it can be formulated as minimizing the Euclidean norm

$$\begin{aligned} ||\hat{\mathbf{{E}}}_n - \hat{\mathbf{{G}}}\hat{\mathbf{{B}}}|| \end{aligned}$$

with respect to a matrix \({\mathbf{{G}}}\) of the form (28) and an arbitrary matrix \({\mathbf{{B}}}\) of order \((p-q)\times (p-q)\). Since this optimization problem is quite complicated numerically and explicit solutions do not exist, Bai and Rao [4] proposed the following method. By Householder transformation, i.e. multiplying by an unitary matrix \({\mathbf{{O}}}\) of order \((p-q) \times (p-q)\), convert \(\hat{\mathbf{{E}}}_n\) into the form

$$\begin{aligned} \hat{\mathbf{{E}}}_n {\mathbf{{O}}} = ({\mathbf{{u}}}_{q+1}, \ldots , {\mathbf{{u}}}_p), \end{aligned}$$

where \(\displaystyle {\mathbf{{u}}}_{q+i} = (u_{0,q+i}, \ldots , u_{q+i-1,q_i},0,\ldots , 0)^{\top }\), for \(i = 1, \ldots , (p-q)\) with \(u_{q+i-1,q+i} \ne 0\) (with probability one). Solve the equation

$$\begin{aligned} u_{0,q+1} + \cdots + u_{q,q+1} z^q = 0, \end{aligned}$$

obtain the roots of the form \(\hat{\rho }_1 \mathrm{e}^{i \hat{\tau }_1}, \ldots , \hat{\rho }_q \mathrm{e}^{i \hat{\tau }_q}\) and choose \(\hat{\tau }_1, \ldots , \hat{\tau }_q\) as estimates of \(\tau _1, \ldots , \tau _q\).

We need the following assumptions to establish the asymptotic properties of the proposed estimators:

Assumption5.1

The second moment of \({\mathbf{{x}}}(t)\) exists.

Assumption5.2

The fourth moment of \({\mathbf{{x}}}(t)\) exists.

Assumption5.3

Var[Re \({\mathbf{{s}}}(t)\)] = Var[Im \({\mathbf{{s}}}(t)\)] = \(2^{-1}\) Re \({\varvec{\Gamma }}\), E[(Re \({\mathbf{{s}}}(t)\))(Im \({\mathbf{{s}}}(t)\))\(^{\top }\)] = −E[(Im \({\mathbf{{s}}}(t)\))(Re \({\mathbf{{s}}}(t)\))\(^{\top }\)] = \(2^{-1}\) Im \({\varvec{\Gamma }}\).

Assumption5.4

Var[Re \({\mathbf{{n}}}(t)\)] = Var[Im \({\mathbf{{n}}}(t)\)] = \(2^{-1}\) \(\sigma ^2 {\mathbf{{I}}}_p\), Cov[(Re \({\mathbf{{n}}}(t)\), Im \({\mathbf{{n}}}(t)\))] = \({\mathbf{{0}}}\).

Assumption5.5

E[Re \(n_k(t)]^4\) = E[Im \(n_k(t)]^4\) = \(\frac{3}{4} \sigma ^2\), \(k = 1, \ldots , p\); E(Re \(n_k(t))^2\)(Re \(n_h(t))^2\) = E(Im \(n_k(t))^2\)(Im \(n_h(t))^2\) = \(4^{-1} \sigma ^4\), \(k \ne h = 1, \ldots , p\); E(Re \(n_k(t))^2\)(Im \(n_h(t))^2\) = \(4^{-1} \sigma ^4\), \(k,h = 1, \ldots , p\), all other kinds of fourth moments are zero.

The following results regarding the asymptotic properties of the proposed estimators have been obtained by Bai et al. [3].

Theorem 4

Under Assumption 5.1, \((\hat{\tau }_1, \ldots , \hat{\tau }_q)^{\top }\) is a strongly consistent estimator of \((\tau _1, \ldots , \tau _q)^{\top }\).

Theorem 5

Under Assumptions 5.25.4, the limiting distribution of \(\sqrt{n}(\hat{\tau }_1 - \tau _1, \ldots , \hat{\tau }_q - \tau _q)^{\top }\) is a q-variate normal with mean vector zero and covariance matrix

$$\begin{aligned} 2^{-1} \mathrm{Re}[{\mathbf{{G}}}^{-1}(\sigma ^4 {\varvec{\Gamma }}^{-1} ({\mathbf{{A}}}^H {\mathbf{{A}}})^{-1} {\varvec{\Gamma }}^{-1} + \sigma ^2 {\varvec{\Gamma }}^{-1}) {\mathbf{{G}}}^{-1} ], \end{aligned}$$

where

$$\begin{aligned} {\mathbf{{G}}}= & {} \mathrm{diag}(D(\tau _1), \ldots , D(\tau _q)), \\ D(\tau _k)= & {} g_{q+1} \prod _{j=1, j\ne k}^q (\mathrm{e}^{i \tau _k} - \mathrm{e}^{i \tau _j})\mathrm{e}^{i \tau _k}, \quad k = 1, \ldots , q. \end{aligned}$$

Using the conjugate symmetric properties of the polynomial coefficients as discussed in Section 2, the method has been modified and the associated asymptotic results have been established by Kundu [29]. The method has been extended for the multiple moving targets in a sequence of papers by Rao et al. [44,45,46,47].

6 Some recent developments and conclusions

Professor Rao has worked in this area of statistical signal processing for about 6–7 years. But it has left a long lasting impact among some of the statisticians who are working in this field. In recent time one related model has received a considerable attention in the area of signal processing and it is known as the chirp model. A one dimensional multiple chirp model can be expressed as

$$\begin{aligned} y(t) = \sum _{k=1}^p \alpha _k \mathrm{e}^{i(\omega _k t + \theta _k t^2)} + e(t). \end{aligned}$$
(29)

Here also as before, y(t)’s, \(\alpha _k\)’s and e(t)’s are complex valued. As the multiple exponential model \(\omega _k\)’s are known as frequency, and \(\theta _k\)’s are known as frequency rate. This model appears in many areas of signal processing, one of the most important one being the radar problem. It can also appear in active sonar and passive sonar system. The problem of parameter estimation of chirp signal has received a considerable amount of attention in the signal processing literature (see for example, [1, 14, 17] and the references cited therein for different estimation procedures). Several statistical aspects related to this model, asymptotic properties of different estimators of this model and efficient estimation procedures have been developed by Lahiri [37] and Grover [16]. It should be mentioned that both Lahiri [37] and Grover [16] have successfully extended many of the methods proposed by Rao and his collaborators to establish their results.

Grover [16] has proposed a chirp-like model which has the following form:

$$\begin{aligned} y(t) = \sum _{k=1}^p \alpha _k \mathrm{e}^{i\omega _k t} + \sum _{k=1}^q \beta _k \mathrm{e}^{i\theta _k t^2} + e(t). \end{aligned}$$
(30)

It is observed that the proposed chirp-like model (30) behaves very similarly to the traditional chirp model (29). On the other hand, estimation of the parameters for the model (30) is less challenging compared to the model (29). Efficient estimation procedure has been developed by Grover [16] for the model (30). In principle, it should be possible to generalize for two-dimensional case also, which has several applications in image processing, particularly in finger printing. More work is needed along this direction.