1 Introduction

Source signal separation is one of the fundamental issues in communication systems in which a set of source signals have been mixed together into a combined signal and the objective is to recover the source signals. This mixture mainly happens due to co-channel interference (CCI) and inter-symbol interference (ISI). CCI is caused by simultaneously serving several users which transmit data at the same frequency from different directions, and ISI is caused either by the inherent frequency-selective characteristic of the communication channels, or by multipath propagation [14]. The first type of ISI, known as temporal ISI, results in successive symbols being blurred, and in the second, called spatial ISI, several delayed versions of the same data are received by the antenna from different direction of arrivals (DOAs) as a result of reflections from different objects. In order to separate all users from a set of mixed signals received by the antenna and recover the data transmitted by each user, CCI and ISI have to be cancelled.

One of the techniques to combat CCI is using antenna arrays and beamforming [22]. For narrowband signals, beamforming can be achieved by an instantaneous linear combination of the received signals in the antenna array. A comprehensive review of narrowband beamforming techniques can be found in [22]. If a pilot signal is available, the beamformer’s weights can be adjusted so that the error between the output and the reference signal is minimized. Another approach for combating CCI using antenna arrays consists of two main stages: separating different users based on their locations using DOA estimation techniques and then designing a beamformer (spatial filter) to pass the desired signal propagating from the user of interest while rejecting signals from all other users with different DOAs. Some of the best-known techniques for DOA estimation are MUSIC [17] and ESPRIT [15] and their many variations. MODE is another DOA estimation technique with performance close to the maximum likelihood method at a cost of more computation than MUSIC [20]. MODE is statistically efficient when either the SNR or the number of snapshots is sufficiently large [3]. MODE with extra roots (MODEX) is a MODE-based DOA estimation algorithm with an improved performance for lower SNR [3]. These methods entail high computational complexity. In contrast, DOA estimation techniques using matrix pencil (MP) [16] are fast, but the DOA estimation capacity (maximum number of users which can be detected) is less than that of MUSIC and ESPRIT. The aforementioned methods estimate DOAs without employing training sequences (pilot data). If pilot data are available, it can be used to obtain DOAs using, for example, the phase difference between subarray beamformers as it is done in [24]. The performance of this method is very good in terms of accuracy, capacity and computational complexity at the cost of decreasing the bit rate due to transmitting a training sequence. When the DOA of the signal of interest is known, three well-known beamforming techniques are delay-and-sum, linearly constrained minimum variance (LCMV) and its efficient implementation known as generalized sidelobe canceller (GSC) [22].

After reducing CCI and capturing the desired signal by the beamformer, the last stage in recovering the transmitted data is equalization which reduces ISI. Linear and decision-feedback equalizers (DFEs) are two common approaches. The former is very simple but not very effective when fading is very deep [14]. In such a case, the latter shows better performance, but it may suffer from error propagation, due to feedback of wrong decisions, resulting in performance degradation.

Separate CCI and ISI cancellation may not result in satisfactory performance, as was articulated in [7]. Instead, a beamformer and an equalizer can be combined into one device called space–time equalizer (STE) to jointly combat CCI and ISI. The optimum STE is the multi-user maximum likelihood sequence estimator (MLSE) [13] which requires the channel information of all users and entails high computational complexity. Several suboptimal hybrid STEs have been proposed in the literature [5, 79] which function properly provided that either training sequence [5, 8, 9] is available, or the DOA is known [7].

Blind source separation (BSS) refers to the case where the transmitted signals are recovered without using any information, such as training sequences or DOAs. A well-known and simple BSS approach is the multistage constant modulus algorithm (CMA) [2, 10, 18, 19, 23]. The constant modulus property [4], which is true for many modulation schemes such as QAM, PSK and FSK, is instrumental in CMA. Thanks to multistage structure, this approach is able to capture multiple co-channel sources and provides estimates of their DOAs. Each stage consists of a weight-and-sum CMA-based adaptive beamformer which tries to capture (lock onto) one of the sources and an adaptive signal canceller that removes the captured source from the array input before processing it by the next stage. The canceller weights can be used to estimate the DOA of the captured source. A blind STE has been recently proposed in [11] based on the multi-modulus algorithm (MMA) [26], an advanced version of CMA. In [11], the DOA is estimated using subarray beamformers [24] which is then used to compute the input for the next stage. Multistage STEs, first, have the inherent problem of inter-stage error propagation as a result of feeding each stage with the output of the previous stage leading to performance degradation [25]. Second, an adaptive weight-and-sum beamformer [11] may not lead to deep nulls at the DOAs of all interferes and consequently may not provide strong CCI cancelation. The method proposed in this paper addresses these two issues.

In this paper, a new multistage STE is proposed for blind source separation. Each stage is responsible for locking onto one of the sources and is equipped with a beamformer, a DOA estimator and an equalizer. An adaptive version of GSC, called adaptive GSC (AGSC), is presented which can adaptively track a user and strongly attenuate other users with different DOAs. The beamformer and equalizer are jointly being updated (consistent with the STE concept) to combat both CCI and ISI effectively. Using the concept of subarray beamformers [24], the DOA, possibly time-varying, of the captured signal is estimated and tracked. The estimated DOA at each stage is being used by the AGSC to provide strong CCI cancellation and to form the input to the next stage. In order to significantly alleviate inter-stage error propagation, a sorting algorithm is suggested to assign detected sources to different stages based on the reconstruction error at different stages. Further, to speed up the convergence, a simple, yet efficient DOA estimation algorithm is presented which can provide good initial DOAs for the multistage STE. Simulation results illustrate the good performance of the proposed STE and show that it can effectively deal with changing DOAs and time-variant channels.

This paper is organized as follows: In Sect. 2, the problem description is presented. The proposed multistage STE is discussed in Sect. 3. In Sect. 4, two algorithms are outlined which are utilized to speed up the convergence of the proposed STE. Section 5 summarizes the implementation steps discussed in Sects. 3 and 4. The performance of the proposed method will be illustrated in Sect. 6 using simulations.

The mathematical notation used in this paper is as follows: Bold capital and small letters represent matrix and vector, respectively, (e.g., “B” is a matrix and “r” is a vector). Also, regular letters indicate a scalar value. \(T,H\) and “*” represent transpose, conjugate transpose and conjugate, respectively. The index \(n\) shows the parameter values at \(n\)th sample. Re{} and Im{} denotes the real and imaginary parts, respectively, and \(E\){} represents the expected value.

2 Problem Description

\(Q\) uncorrelated and narrowband signals with the common wavelength \((\lambda )\) are being transmitted by different users through time-varying fading and noisy channels. The DOA for each user is different and possibly time-varying. Using the signals received by an antenna array at the base station, the goal is to separate all sources and recover the original data transmitted by each one of them in a blind manner. The only assumption is that the transmitted signals satisfy the constant modulus property. DOAs are not known and there are no pilot signals.

Consider an \(M\)-element uniform linear array at the base station with half wavelength space \(\lambda /2\) between elements and \(Q\) users transmitting data [\(s_{q} (n)\), for \(q=1, 2, \ldots , Q\)] to the base station. The users’ DOAs to the antenna are denoted by \(\theta _{1} , \theta _{2} , \ldots , \theta _Q \) which might be time-varying due to user movement. After sampling, the baseband received signal at the antenna is given by:

$$\begin{aligned} \mathbf{x}_{1} (n)=[{x_{1} (n)}\quad {x_{2} (n)}\quad {\ldots }\quad {x_M (n)}]^\mathrm{T}=\sum _{q =1}^Q { t_{q} (n)\mathbf{a}_{q} (n)} +\varpi (n), \end{aligned}$$
(1)

where \(t_{q} (n)\), the signal received at the antennas from each user, is as follows:

$$\begin{aligned}&t_{q} (n)=\mathbf{h}_{q}^T (n) \mathbf{s}_{q} (n) \nonumber \\&\mathbf{s}_{q} (n)=[s_{q} (n), s_{q} (n-1), \ldots , s_{q} (n-(L-1))] ^\mathrm{T} \nonumber \\&\mathbf{h}_{ q} (n)=[h _{q, 1} (n), h _{q, 2} (n), \ldots , h _{q, L} (n)] ^\mathrm{T} \end{aligned}$$
(2)

with \(\mathbf{h}_{q} (n)\) representing a fading channel between the \(q\)th user and the base station which might be time-varying due to the changes in the channel and \(L\) being the maximum delay of all channels. Further, \(\varpi (n)\) is the white Gaussian noise and \(\mathbf{a}_{q} (n)\) is the array manifold vector [22] given by:

$$\begin{aligned} \mathbf{a}_{q} (n)=\left[ 1 \quad {e^{-j\pi \sin (\theta _{q} (n))}}\quad {e^{-j2\pi \sin (\theta _{q} (n))}}\quad {\ldots }\quad {e^{-j(M-1) \pi \sin (\theta _{q} (n))}}\right] ^\mathrm{T} \end{aligned}$$
(3)

In the above model, it is assumed that CCI and temporal ISI are present while spatial ISI is neglected. The goals, as mentioned before, are to accurately estimate the DOA of all users and recover the original data transmitted by each user.

3 The Proposed Multistage Space–Time Equalizer

To deal with the problem formulated in the previous section, a multistage STE as shown in Fig. 1 is proposed. The STE consists of an antenna with \(M\) elements, \(Q\) stages, one for each of the \(Q\) users, and a sorting algorithm which assigns users to the stages. The structure of all stages is the same, and therefore, just the first stage is shown in Fig. 2. Each stage consists of three main parts: a beamformer, a DOA estimator and an equalizer which includes a decision-feedback equalizer (DFE) and a channel estimator. The major mathematical relationships between different blocks are summarized in Table 1.

Fig. 1
figure 1

Structure of the proposed multistage STE

Fig. 2
figure 2

First stage of the proposed STE in Fig. 1

Table 1 Major mathematical relationships between different blocks

The beamformer at each stage locks onto one of the users and attenuates the signals from the others. Based on the basic structure of the GSC, an adaptive beamforming scheme called AGSC is proposed which consists of two adaptive branches denoted as the top [\({\mathbf {w}}_\mathbf{p} (n)\) in Fig. 2] and bottom [\(\mathbf{B}(n)\) and \({\mathbf {w}}_\mathbf{a} (n)\) in Fig. 2] branches. The objective for the top branch is to capture and track one of the sources. The bottom branch consists of an adaptive blocking matrix [\(\mathbf{B}(n)\)] and an adaptive CCI weight canceller [\({\mathbf {w}}_\mathbf{a} (n)\)], and its objective is to cancel CCI. The function of the blocking matrix \(\mathbf{B}(n)\) is to eliminate the signal of the captured source from the bottom branch, and the CCI weight canceller \({\mathbf {w}}_\mathbf{a} (n)\) minimizes the energy of the signal passing through the bottom branch (which is ideally pure CCI and noise) by placing nulls in the array response at the DOAs of CCIs. To recover data transmitted by each user, the effect of fading in the channel between the detected user and the base station will be compensated using an equalizer in each stage. The equalization scheme includes a DFE and a channel estimator to improve the performance of both ISI and CCI cancellation and is motivated by a similar structure in [7]. A joint MMA-based scheme is used to update both \({\mathbf {w}}_\mathbf{p} (n)\), top branch of the beamformer, and the feedforward filter of the DFE \({\mathbf {w}}_\mathbf{f} (n)\) [11]. Similarly, the CCI weight canceller \({\mathbf {w}}_\mathbf{a} (n)\) and channel estimator \({\mathbf {w}}_\mathbf{h} (n)\) are jointly updated using the performance index proposed in [7], leading to fast convergence.

The DOA of a user is estimated using subarray beamformers motivated by the approach proposed in [24]. The first subarray beamformer (see Fig. 2) is receiving input from the top \(M-1\) antenna elements and the second one from the \(M-1\) bottom elements. The DOA of the corresponding user is a function of the phase shift between the outputs of the two subarray beamformers [\({u}_\mathrm{top} (n)\) and \(u_\mathrm{bottom} (n)\) in Fig. 2]. A phase tracking algorithm is implemented to estimate and track the possibly time-varying DOA of the captured signal. The estimated DOA is used to update the blocking matrix \(\mathbf{B}(n)\) of the AGSC and to form the input to the next stage by eliminating the detected user from the input signal.

The MSE-based sorting algorithm assigns detected sources to different stages based on the reconstruction mean square error (MSE) for each source signal. Intuitively, the algorithm tries to assign sources with small reconstruction error to the early stages and push the sources with larger error to the last stages. This arrangement significantly alleviates inter-stage error propagation and speeds up the convergence. In the rest of this section, the function of each part of the first stage will be described. The other stages are the same.

3.1 Beamformer: Adaptive Generalized Sidelobe Canceller

The beamformer used at each stage is based on the GSC structure. Since the DOA of the user is assumed not known a priori, or it may change due to user movement, the standard GSC cannot be used here. Instead, an adaptive version of the GSC called adaptive GSC, with all the components [\({\mathbf {w}}_\mathbf{p} (n), \mathbf{B}(n)\) and \({\mathbf {w}}_\mathbf{a} (n)\)] being adaptively updated, is being proposed here.

The objective for \({\mathbf {w}}_\mathbf{p} (n)\), a vector of length \(M-1\), is to lock onto one of the users. The updating rule is based on the MMA [26] with the cost function given by:

$$\begin{aligned} CF_\mathrm{MMA} = E\left\{ { \left[ {\left( {\hbox {Re} \{{y}_{t} (n)\}} \right) ^{L}-R^{L}} \right] ^{ 2}+\left[ {\left( {\hbox {Im} \{{y}_{t} (n)\}} \right) ^{L}-R^{L}} \right] ^{ 2}} \right\} , \end{aligned}$$
(16)

where \(R\) is a constant determined by the statistics of the modulation scheme used and \({y}_{t} (n)\) is the input to the “Decision Maker” block (Eq. 14). As was proposed in [4], \(L=2\) provides a good compromise between performance and implementation complexity. Taking advantage of the stochastic gradient algorithm [1] and MMA, the updating equation for \({\mathbf {w}}_\mathbf{p} (n)\) is given by (\({\mu }_\mathbf{p}\) is the learning step size):

$$\begin{aligned} {\mathbf {w}}_\mathbf{p} (n+1)={\mathbf {w}}_\mathbf{p} (n)-{\mu }_\mathbf{p} \mathbf{x}_{1}^{(\mathrm{top})} (n) {e}_\mathrm{MMA}^{*} (n), \end{aligned}$$
(17)

where \(\mathbf{x}_{1}^{(\mathrm{top})} (n)\) is defined in Eq. (6.a) and:

$$\begin{aligned} {e}_\mathrm{MMA} (n)=( \hbox {Re} \{{y}_{t} (n)\} )^{3} +i ( \hbox {Im} \{{y}_{t} (n)\} )^{3} -R^{2} {y}_{t} (n). \end{aligned}$$
(18)

The role of the (\(M-1\)) \(\times \) (\(M\)-2) matrix \(\mathbf{B}(n)\) can be intuitively described as blocking the signal from the desired user in the lower branch of the beamformer [desired user at each stage means the user which has been captured by \({\mathbf {w}}_\mathbf{p} (n)\)]. The updating of \(\mathbf{B}(n)\) is based on the estimated phase difference \(\hat{{\varphi }}(n)\) (see Fig. 2) between the outputs of the two subarray beamformers [\({u}_\mathrm{top} (n)\) and \(u_\mathrm{bottom} (n)\) in Fig. 2] which will be discussed in Sect. 3.2. Given the phase difference \(\hat{{\varphi }}(n), \mathbf{B}(n)\) can be updated by first forming the \((M-1) \times 1\) array manifold vector:

$$\begin{aligned} {\tilde{\mathbf{a}}} (n)=\left[ 1 \quad {e^{-j\hat{{\varphi }} (n)}}\quad {\ldots }\quad {e^{-j(M-2)\hat{{\varphi }} (n)}}\right] ^\mathrm{T}. \end{aligned}$$
(19)

Then, using singular value decomposition (SVD), \({\tilde{\mathbf{a}}}(n)\) can be expressed as:

$$\begin{aligned} {\tilde{\mathbf{a}}}(n)=\mathop {\mathbf{U}}^{\frown } \mathop {\mathbf{S}}^{\frown } \mathop {\mathbf{V}}^{\frown }{^{H}}, \end{aligned}$$
(20)

where \(\mathop {\mathbf{U}}\limits ^{\frown }\) and \(\mathop {\mathbf{V}}\limits ^{\frown }\) are unitary matrices and \(\mathop {\mathbf{U}}\limits ^{\frown }\) is a diagonal matrix. \(\mathbf{B}(n)\) is the null space of \({\tilde{\mathbf{a}}}^{H}(n)\) and can be formed by deleting the first column of the (\(M-1\)) \(\times \) (\(M-1\)) matrix \(\mathop {\mathbf{U}}\limits ^{\frown }\).

In conventional GSCs, the purpose of \({\mathbf {w}}_\mathbf{a} (n)\) is to increase CCI and noise cancellation, and it is usually implemented using the least-mean-square (LMS) approach. LMS-based GSC is simple but converges very slowly. This is due to the fact that the conventional GSC algorithm uses \({u}_\mathrm{top} (n)\) (see Fig. 2) as the error to be minimized. Since \({u}_\mathrm{top} (n)\) contains the desired signal, it has large amplitude, and to guarantee the algorithm’s stability, the learning step size must be very small. This makes the convergence very slow. In [7], it was shown that the use of \({e}_{s} (n)\) (see Fig. 2) instead of \({u}_\mathrm{top} (n)\) can speed up the convergence. This stems from the fact that at steady state, \({e}_{s} (n)\) does not contains the desired signal and therefore has smaller value compared to \({u}_\mathrm{top} (n)\). Thus, the learning step size for the updating algorithm can be larger which makes the convergence faster. Accordingly, in order to update the (\(M\)-2) \(\times \) 1 vector \({\mathbf {w}}_\mathbf{a} (n)\), the following cost function is minimized [7]:

$$\begin{aligned} CF_{{\mathbf {w}}_\mathbf{a}}= & {} E\left\{ |{e}_{s} (n)|^{2}\right\} \nonumber \\ {e}_{s} (n)= & {} {u}_\mathrm{top} (n-\Delta -1)-{y}_{h} (n) \end{aligned}$$
(21)

where \({y}_{h} (n)\) is defined in Eq. (9) and \(\Delta \) is the decision delay. Using the LMS algorithm, the updating equation for \({\mathbf {w}}_\mathbf{a} (n)\) is given by:

(22)

where \({\mu }_\mathbf{a}\) is the step size and \(\mathbf{r} (n)\) is defined in Eq. (5).

3.2 DOA Estimation

The DOA estimation approach used in this paper is motivated by the technique proposed in [24] where the phase difference between the output signals of two subarray beamformers is being used. As mentioned earlier, given an array of \(M\) elements, the first subarray is formed using the top \(M -1\) elements while the second subarray is formed using the \(M -1\) bottom ones. The signals from these two subarrays are fed to two identical beamformers. It was shown in [24] that the DOA \(\hat{{\theta }}\) is a function of the phase difference \(\hat{{\varphi }}\) between the output signals of the two beamformers. \(\hat{{\theta }}\) is given by:

$$\begin{aligned} \hat{{\theta }}(n)=\arcsin (\hat{{\varphi }} (n)/ \pi ) \end{aligned}$$
(23)

In the proposed STE, the \(M\) input signals at each stage are used to form the two (\(M-1\)) subarrays, each feeding into two subarray beamformers which have the same coefficient values, \({\mathbf {w}}_\mathbf{p} (n), \mathbf{B}(n)\) and \({\mathbf {w}}_\mathbf{a} (n)\) as shown in Fig. 2. Thus, when the top beamformer locks onto one of the users, the bottom beamformer will also lock onto the same user. Using the outputs of these two subarray beamformers, the DOA of a detected user \(\hat{{\theta }}(n)\) can be estimated as a function of the phase difference \(\hat{{\varphi }}(n)\) between \({u}_\mathrm{top} (n)\) and \(u_\mathrm{bottom} (n)\) in Fig. 2.

In order to estimate and track the phase difference between \({u}_\mathrm{top} (n)\) and \(u_\mathrm{bottom} (n)\), the second-order phase tracking loop algorithm [6] is used (\({\mu }_\varphi \) is learning step size, and \(\tau \) is a positive constant):

$$\begin{aligned}&\hat{{u}}_\mathrm{top} (n)=u_\mathrm{bottom} (n) e^{j \hat{{\varphi }} (n)} \end{aligned}$$
(24)
$$\begin{aligned}&{e}_\varphi (n) = \hbox {Im}\left\{ \hat{{u}}_\mathrm{top} (n) [\hat{{u}}_\mathrm{top} (n)-{u}_\mathrm{top} (n)]^{*} \right\} \end{aligned}$$
(25)
$$\begin{aligned}&\hat{{\varphi }} (n+1) = \hat{{\varphi }} (n)+{\mu }_\varphi \left[ {e}_\varphi (n)+\tau \sum _{i=1}^n {{e}_\varphi (i)}\right] \end{aligned}$$
(26)

The estimated phase difference \(\hat{{\varphi }}(n)\) will be used for two things: first, to update \(\mathbf{B}(n)\) and, second, to form the input for the next stage. Updating \(\mathbf{B}(n)\) has been discussed in Eqs. (19) and (20). The preparation of the signal for the next stage will be discussed in Sect. 3.4.

3.3 Equalizer

The equalization scheme used in the proposed method is similar to [7] and includes a DFE with \({\mathbf {w}}_\mathbf{f} (n)\) and \({\mathbf {w}}_\mathbf{b} (n)\) as the feedforward and feedback FIR filters, respectively, and a channel estimator \({\mathbf {w}}_\mathbf{h} (n)\). As discussed in [7], the addition of the channel estimator \({\mathbf {w}}_\mathbf{h} (n)\) can improve the performance of the structure in terms of both stronger CCI attenuation and more effective ISI compensation. The updating of \({\mathbf {w}}_\mathbf{f} (n)\) is done using the signal \({y}_{t} (n)\) (Eq. 14) and accordingly \({e}_\mathrm{MMA} (n)\) in Eq. (18). As was proposed in [11], \({\mathbf {w}}_\mathbf{f} (n)\) and \({\mathbf {w}}_\mathbf{p} (n)\) can be updated jointly. Further, \({\mathbf {w}}_\mathbf{h} (n)\) and \({\mathbf {w}}_\mathbf{a} (n)\) can also be jointly updated using \({e}_{s} (n)\) defined in Eq. (21), as proposed in [7]. The updating equations are then given by:

$$\begin{aligned} {\mathbf {w}}_\mathbf{f} (n+1)= & {} {\mathbf {w}}_\mathbf{f} (n)-{\mu }_\mathbf{f} \mathbf{u}_\mathrm{top} (n) {e}_\mathrm{MMA}^{*} (n) \end{aligned}$$
(27)
$$\begin{aligned} {\mathbf {w}}_\mathbf{h} (n+1)= & {} {\mathbf {w}}_\mathbf{h} (n)+{\mu }_\mathbf{h} \mathbf{d}_\mathbf{h} (n) {e}_{s}^{*} (n), \end{aligned}$$
(28)

where \(\mathbf{u}_\mathrm{top} (n)\) and \(\mathbf{d}_\mathbf{h} (n) \) are defined in Eqs. (8) and (10). \({\mu }_{\mathbf{f}}\) and \({\mu }_\mathbf{h}\) are learning step sizes.

The postcursor response of the channel convolved with \({\mathbf {w}}_\mathbf{f} (n)\) would be cancelled by \({\mathbf {w}}_\mathbf{b} (n)\) [7]. Thus, \({\mathbf {w}}_\mathbf{b} (n)\) can be designed using Eq. (12). The decision maker maps its input to the nearest alphabet in terms of Euclidean distance.

3.4 Preparing the Signal for the Next Stage

The input to the second stage, \(\mathbf{x}_{2} (n)\) (see Fig. 2), is supposed to include all signals received by the antenna except the signal from the user detected at the first stage. \(\mathbf{x}_{2} (n)\) can be obtained as follows. First, the estimated phase \(\hat{{\varphi }}(n)\) will be used to compensate for any gain mismatch of the AGSC at the DOA of the first user. The beampattern of the AGSC at \(\hat{{\varphi }}(n)\) is obtained using [22]:

$$\begin{aligned} \varOmega =\sum _{m=0}^{M-2} e^{-jm \hat{{\varphi }}(n)}{w}_{m}^{*}, \end{aligned}$$
(29)

where

$$\begin{aligned} \mathbf{w}=[w_0 , w_{1} ,\ldots , w_m ,\ldots , w_{M-2}]^\mathrm{T}={\mathbf {w}}_\mathbf{p} (n)-\mathbf{B}(n) {\mathbf {w}}_\mathbf{a} (n). \end{aligned}$$
(30)

Then, in order to normalize the beampattern to unity at \(\hat{{\varphi }}(n), {u}_\mathrm{top} (n)\) is multiplied by \(G=1/\varOmega \). The resulting signal, \(\hat{{t}}(n)=G {u}_\mathrm{top} (n)\;\) in Fig. 2, is an approximation of the signal from the first user received at the first stage [i.e., \(t_{q} (n)\) in Eq. (2) plus noise]. In order to form the input for the second stage, \(\mathbf{x}_{2} (n)\), the received signal from the first user will be subtracted from the antenna input \(\mathbf{x}_{1} (n)\). To this end, \(\hat{{t}}(n)\) will be multiplied by the estimated array manifold vector formed by \(\hat{{\varphi }}(n)\) as follows:

$$\begin{aligned} {\hat{\mathbf{a}}}(n)=[ 1 \quad {e^{-j\hat{{\varphi }} (n)}} \quad {\ldots }\quad {e^{-j(M-1)\hat{{\varphi }} (n)}}]^\mathrm{T} \end{aligned}$$
(31)

The input for the second stage then becomes:

$$\begin{aligned}&\mathbf{x}_{2} (n)=\mathbf{x}_{1} (n)-{\hat{\mathbf{t}}}(n) \nonumber \\&\quad \hbox {where } {\hat{\mathbf{t}}}(n)=\hat{{t}}(n)\;{\hat{\mathbf{a}}}(n). \end{aligned}$$
(32)

In general, the input to the \((q+1)\)th stage is \(\mathbf{x}_{q +1} (n)=\mathbf{x}_{q} (n)-{\hat{\mathbf{t}}}(n)\) (see Fig. 1), obtained in the same way as described in Eqs. (29)–(32).

3.5 Stage-Switching Scheme

A problem with multistage methods [2, 10, 11, 18, 19, 23] is that the residual error from the \(q\)th stage \((q=1, 2, \ldots , Q)\) will propagate to the next one due to the fact that each stage is fed from the previous one. This drawback results in slow convergence and larger errors for the last stages. To alleviate this problem, the MSE value of the reconstruction error at each iteration will be used to reassign users to the stages. Since the true transmitted data are not available, the MSE (see Fig. 2) can be estimated using [6] (for \(q=1, 2, \ldots , Q\)):

$$\begin{aligned} \hbox {MSE}^{(q)}(n+1)=\lambda \hbox {MSE}^{(q)}(n)+(1-\lambda ) \left| { \hat{{d}}(n-\Delta )-{y}_{t} (n) } \right| ^{2}, \end{aligned}$$
(33)

where \(\lambda \) is a constant forgetting factor (set to 0.99). If :

$$\begin{aligned} \hbox {MSE}^{(q+1)}(n+1)<\beta \hbox {MSE}^{(q)}(n+1) \quad \hbox {where} \ 0<\beta \le 1 \end{aligned}$$
(34)

then all the parameters of the \((q+1)\)th stage will be replaced by the parameters of the \(q\)th stage. This is equivalent with switching the inputs of stages \(q+1\) and \(q\).

Remark 1

The STE proposed here is based on combining existing methods [6, 7, 11, 18, 19, 24, 26] in an innovative way to achieve blind source separation. The main contributions of this paper are highlighted as follows:

  • In the proposed STE, a multistage algorithm is combined with a STE to perform joint CCI and ISI cancelation. The multistage algorithm is motivated by Shynk and Gooch [18], Shynk et al. [19] where only the beamforming aspect of blind source separation was considered without any equalization. The STE is motivated by Lee and Wu [7] where the case of the DOA of the desired user being assumed to be fixed as well as known a priori is dealt with. In the STE proposed here, there are two innovations. First, an AGSC is introduced to account for the changing DOA. Second, a blind approach is proposed which can dynamically estimate the unknown and possibly time-varying DOA of each source. This is done using the phase difference between two subarray beamformers, a concept used for DOA estimation in [24]. However, in [24], pilot data were required (which is not available here). The contribution of this paper is to combine the aforementioned methods and propose a multistage STE which can separate sources in a blind manner (neither DOA nor training sequence is available).

  • Further, the proposed STE addresses in an innovative way another problem of multistage methods, which is the inter-stage error propagation as a result of feeding each stage with the output of the previous stage. This can lead to significant performance degradation [25]. The STE proposed in this paper includes a stage-switching scheme which not only can alleviate the error propagation but also speeds up the convergence as simulations will show. This is the other important contribution of the paper.

4 Convergence Acceleration Algorithms for the Proposed STE

To speed up the convergence of the proposed multistage STE, two algorithms are suggested in this section. The first algorithm, which is a modified version of the one in [21], estimates initial DOAs for all users. The second algorithm updates the learning step sizes to provide a good compromise between convergence speed and stability.

4.1 The Phase Shift Initialization

During the initialization of the STE, an initial estimate for the DOAs can be obtained using any of the well-known DOA estimation algorithms. However, algorithms such as MUSIC, ESPRIT or MODEX require many computations and algorithms like matrix pencil, which entails less computations and requires at least \(2Q\) antenna elements (\(Q\) is the number of users). To speed up the convergence of the proposed STE, a simple approach is proposed here which can provide initial estimates of the DOAs and the associated \(\hat{{\varphi }} (0)\) very fast.

Consider \(\mathbf{x}_{1} (n)\) in Eq. (1) and assume that there is no noise or that noise power is very small. By applying \(N\)-point discrete Fourier transform (DFT) to \(\mathbf{x}_{1} (n)\), i.e., DFT in the spatial dimension and zero-padding by adding \(N-M\) zeros, we get:

$$\begin{aligned} X_{1} (k)= & {} \sum _{i=0}^{N-1} {x_i (n) e^{-j \frac{2\pi }{N}i k}} =\sum _{i=0}^{N-1} {\left( {\sum _{q=1}^Q { t_{q} (n)e^{-j\varphi _{q} i}}} \right) e^{-j\frac{2\pi }{N}i k}} \nonumber \\= & {} \sum _{q=1}^Q {t_{q} (n) \left( {\sum _{i=0}^{N-1} { e^{-j(\varphi _{q} +\frac{2\pi }{N}k)i}}} \right) } ,\quad \quad \hbox {for} \quad 0\le k\le N-1 \end{aligned}$$
(35)

where \(\varphi _{q}\) is \(\pi \sin \theta _{q}\). It can be easily shown that Eq. (35) can be expressed as:

$$\begin{aligned} X_{1} (k)=\sum _{q=1}^Q { t_{q} (n)e^{-j\left( {\varphi _{q} +\frac{2 \pi k}{N}} \right) \left( {\frac{N-1}{2}} \right) }} \frac{\sin \left( {\frac{N\varphi _{q}}{2}+\pi k} \right) }{\sin \left( {\frac{\varphi _{q}}{2}+\frac{\pi k}{N}} \right) },\quad \hbox {for } 0\le k\le N-1 \end{aligned}$$
(36)

where \(X_{1} (k)\) is the DFT of the input signal (in the spatial dimension) for the \(n\)th snapshot. The \(Q\) largest peaks of \(X_{1} (k)\) would occur at \(k=-N\varphi _{q} /2 \pi \,(q=1, 2, \ldots , Q)\) or equivalently at:

$$\begin{aligned} \theta _{q} =\arcsin \left( -\frac{2 k}{N}\right) \quad \hbox {for } q=1, 2, \ldots , Q \end{aligned}$$
(37)

Thus, by finding the value of \(k\) at the maxima of \(X_{1} (k)\), an initial estimate of the DOAs can be obtained. In the presence of noise, this estimate may not be accurate if \(M\) (the number of sensors) is not large compared to \(Q\) (the number of users). However, in such a case, the accuracy can be improved by averaging the peaks of the DFTs, \(X_{1} (k)\), obtained for a sequence of \(J\) snapshots. This will be illustrated with simulations in Sect. 6.

4.2 Choosing Appropriate Learning Step Sizes

In order to choose an appropriate value for the step sizes \({\mu }_\mathbf{p} , {\mu }_{\mathbf{f}} , {\mu }_\mathbf{h}\) and \({\mu }_{\mathbf{a}}\), a trade-off between convergence speed and stability should be considered. In the updating equations, Eqs. (17), (22) and (27), \(\mathbf{x}_{1}^{(\mathrm{top})} (n), \mathbf{r} (n)\) and \(\mathbf{u}_\mathrm{top} (n)\) are obtained from the STE input. Therefore, in the presence of several users, their amplitude may be quite large. In this case to guarantee convergence, \({\mu }_\mathbf{p} , {\mu }_{\mathbf{a}}\) and \({\mu }_{\mathbf{f}}\) should be set to small values. From Fig. 2, it can be seen that the \({\mathbf {w}}_\mathbf{h} (n)\) is fed from the output of the decision maker, \(\mathbf{d}_\mathbf{h} (n)\) in Eq. (10), which has a considerably smaller amplitude than \(\mathbf{x}_{1}^{(\mathrm{top})} (n), \mathbf{r} (n)\) and \(\mathbf{u}_\mathrm{top} (n)\), especially in the presence of several users. Thus, \({\mu }_\mathbf{h}\) in Eq. (28) can be set to a larger value compared to \({\mu }_\mathbf{p} , {\mu }_{\mathbf{f}}\) and \({\mu }_{\mathbf{a}}\).

Based on these observations, the convergence speed of the proposed STE can be increased. All stages start with small step sizes to ensure convergence. When the MSE at a stage converges to a satisfactory small value, this indicates that both errors \({e}_\mathrm{MMA} (n)\) and \({e}_{s} (n)\) are small for this stage, as well as for all previous stages since all stages are sorted from minimum to maximum MSE. It can therefore be expected that all previous stages have locked onto a user and the signals received from the detected users are (almost) eliminated from the inputs to the next stages. Consequently, the input signals for the next stages are expected to have smaller amplitudes than the input signals to the earlier stages, and therefore, the magnitudes of \({\mu }_\mathbf{p} , {\mu }_{\mathbf{f}}\) and \({\mu }_{\mathbf{a}}\) for all stages can be increased. This is implemented using the following algorithm:

Given a constant \(\xi \), an appropriate threshold depending on the modulation scheme [6] and \(\gamma _{{i}}\), constants arranged in this order \(\gamma _{{Q-1}} >\gamma _{{Q-2}} >\cdots >\gamma _\mathrm{1}\), then:

\(\hbox {if}\quad \sum \limits _{k=1}^{Q-1} {\hbox {MSE}^{(k)}} <(Q-1) \xi \)

   Multiply the initial values of \({\mu }_\mathbf{p} , {\mu }_\mathbf{f}\) , and \({\mu }_\mathbf{a}\) with \(\gamma _{\mathrm{Q-1}}\)

\(\hbox {elseif}\quad \sum \limits _{k=1}^{Q-2} {\hbox {MSE}^{(k)}} <(Q-2) \xi \)

   Multiply the initial values of \({\mu }_\mathbf{p} , {\mu }_\mathbf{f}\) , and \({\mu }_\mathbf{a}\) with \(\gamma _{\mathrm{Q-2}}\)

\(\begin{array}{l} \vdots \\ \hbox {elseif}\quad \hbox {MSE}^{(1)}< \xi \\ \end{array}\)

   Multiply the initial values of \({\mu }_\mathbf{p} , {\mu }_\mathbf{f}\) , and \({\mu }_\mathbf{a}\) with \(\gamma _\mathrm{1}\)

\(\hbox {else}\)

   Keep the initial values of \({\mu }_\mathbf{p} , {\mu }_\mathbf{f}\) , and \({\mu }_\mathbf{a}\)

5 Implementation of the Proposed STE

The multistage STE algorithm can be summarized as follows:

For all stages:

Initialization Mode:

         Initialize the phase shift \(\hat{\varphi } (0)\) using the algorithm described in Sect. 4.1. Based on \(\hat{\varphi } (0)\), initialize \({\mathbf {w}}_\mathbf{p} (n)\) using Eq. (19). Set \({\mathbf {w}}_\mathbf{a} (0)\) and \({\mathbf {w}}_\mathbf{h} (0)\) to all-zero vectors. Set \({\mathbf {w}}_\mathbf{f} (0)\) to an all-zero vector except for the first element which is set to one. Set the initial \(\hbox {MSE}^{(q)}(0)\) to one for \(q=1, 2, \ldots , Q\). Set \(n\) to zero and go to the operating mode.

Operating Mode:

      1.    Using \(\hat{\varphi } (n)\), form \(\mathbf{B}(n)\) using the method described in Sect. 3.1.

      2.    Form w defined in Eq. (30), and normalize the beampattern to unity at \(\hat{{\varphi }} (n)\) [multiplying the beampattern by \(1/\varOmega \), where \(\varOmega \) is defined in Eq. (29)].

      3.    Using \(\hat{\varphi } (n)\), form the input for all stages using Eqs. (31) and (32).

      4.    Using Eq. (12), find \({\mathbf {w}}_\mathbf{b} (n)\). Set the length of \({\mathbf {w}}_\mathbf{b} (n)\) to \({\ell }_{1} +{\ell }_{2} -2-\Delta \) (\({\ell }_{1}\) and \({\ell }_{2}\) are the lengths of \({\mathbf {w}}_\mathbf{f} (n)\) and \({\mathbf {w}}_\mathbf{h} (n)\), and \(\Delta \) is the decision delay in Fig. 2).

      5.    Find \({\mathbf {w}}_\mathbf{p} (n+1), {\mathbf {w}}_\mathbf{a} (n+1), {\mathbf {w}}_\mathbf{f} (n+1)\), and \({\mathbf {w}}_\mathbf{h} (n+1)\) using Eqs. (17), (22), (27) and (28), respectively.

      6.    Estimate the new phase difference (\(\hat{{\varphi }} (n+1)\)) using Eq. (26).

      7.    Compute \(\hbox {MSE}^{(q)}(n+1)\) for \(q=1, 2, \ldots , Q\) using Eq. (33), and sort the stages based on the rule described in Eq. (34) so that the stage with lowest MSE is first and the one with the largest is last.

      8.    Update the learning step size \({\mu }_\mathbf{p} , {\mu }_{\mathbf{f}}\), and \({\mu }_{\mathbf{a}}\) using the algorithm described in Sect. 4.2.

      9.    Increase \(n\) by one and go back to step 1.

Remark 2

The proposed algorithm has some parameters that need to be appropriately selected. The choice of learning step sizes has been already discussed in Sect. 4.2. In the rest of this section, other important parameters are discussed.

Equalizer

  • \({\ell }_{1}\) , length of \({\mathbf {w}}_\mathbf{f}\): This FIR filter is updated using the MMA (Sect. 3.3). To choose the appropriate length, one should consider the compromise between convergence speed and error rate. As mentioned in [26], fewer taps can result in faster convergence but higher error rate. On the other hand, a longer filter during blind startup might degrade the performance. One way to provide a good trade-off between convergence speed and accuracy is to add more taps after MSE drops below an acceptable level (which depends on the modulation scheme) [26].

  • \({\ell }_{2}\), length of \({\mathbf {w}}_\mathbf{h}\): This filter is used to estimate the channel between each source and the antenna. Given \(L\) [maximum delay of all channels, see Eq. (2)] and \(\Delta \) (decision delay, see Fig. 2), \({\ell }_{2}\) should satisfy \({\ell }_{2} \ge L+\Delta \).

  • \({\ell }_{3}\), length of \({\mathbf {w}}_\mathbf{b}\): As was discussed in Sect. 3.3, \({\mathbf {w}}_\mathbf{b}\) is responsible for canceling the postcursor response of the channel convolved with \({\mathbf {w}}_\mathbf{f}\). To this end, \({\ell }_{3}\) should be at least \({\ell }_{1} +{\ell }_{2} -2-\Delta \) [7].

Beamformer

  • If \(Q\) sources are present, ideally the proposed method locks onto one source at each stage and places up to \(Q-1\) nulls at the beam pattern of each stage to strongly attenuate other sources. Therefore, each sub-beamformer of the proposed method needs to have at least \(Q\) taps. As shown in Fig. 2, each sub-beamformer has \(M-1\) elements, thus \(M\ge Q +1\). A high value for \(M\) would improve spatial selectivity at the cost of more computation and slower convergence. In our simulations presented in Sect. 6, \(M=Q +2\) works well.

Stage-Switching Scheme

  • \(\beta \) controls the stage-switching scheme [see Eq. (34)]. If \(\beta \) is too small, unnecessary switching might occur. If \(\beta \) is close to one, switching will rarely happen which degrades the performance (higher MSE and slower convergence). The choice of \(\beta \) depends on the modulation scheme which is 4-QAM in our simulations. For this type of modulation, the proposed switching scheme showed a good behavior when \(\beta \) is set to 0.25.

6 Simulation Results

In this section, five simulations are presented to illustrate the performance of the proposed STE:

  • The first simulation shows the performance of the DOA initialization method discussed in Sect. 4.1.

  • The second simulation demonstrates the effectiveness of the stage-switching scheme.

  • In the third simulation, the robustness and reliability of the proposed STE is evaluated for 250 independent simulations with randomly generated signals and noise.

  • In the fourth simulation, the performance of the proposed algorithm in terms of accuracy of the estimated DOA is compared with the multistage algorithm presented in [18, 19] to highlight the value of the suggested stage-switching scheme.

  • The fifth simulation illustrates the algorithm’s ability to track moving users in the presence of time-varying channels.

In all simulations, the modulation scheme used is 4-QAM, and the data transmitted from each user are randomly generated. Further, for the first four simulations, four users are considered with the fixed DOAs, \(\theta _{1} =- 60^{\circ }, \theta _{2} =-30^{\circ }, \theta _{3} =15^{\circ }\) and \(\theta _4 =45^{\circ }\). The channels in these four simulations are also fixed and given by:

$$\begin{aligned}&\mathbf{h}_{1} =1 \hbox { (fading-free)}, \quad \mathbf{h}_{2} =[0.8+0.6j, 0.1, 0.4+0.6j], \\&\mathbf{h}_{3} =[0.407, 0.815, 0.407],\quad \mathbf{h}_4 =[0.6, 1, 0.8]. \end{aligned}$$

The zeros of \(\mathbf{h}_{2}, \mathbf{h}_{3}\) and \(\mathbf{h}_4\) are shown in Fig. 3. As it can be seen, \(\mathbf{h}_{3}\) and \(\mathbf{h}_4\) are non-minimum phase channels and quite challenging to be equalized since they have zeros close to the unit circle [14]. The simulation parameters for the second, third and fifth simulations are summarized in Table 2.

Fig. 3
figure 3

Zeros location for different channels. a \(\mathbf{h}_{2}\), b \(\mathbf{h}_{3}\), c \(\mathbf{h}_4\)

Table 2 Simulations parameters

6.1 First Simulation

In this simulation, the performance of the DOA estimation algorithm described in Sect. 4.1 will be evaluated for different number of sensors (\(M\)) and SNR. To this end, for a fixed \(M\) and SNR, the DOAs are estimated for 1000 independent simulations. For all cases, 512-point DFT is considered (\(N=512\)). The results are shown in Fig. 4 for one snapshot (\(J=1\)). Red triangles indicate the real DOA in this figure. It can be seen that the estimation accuracy gets better as \(M\) increases. The estimated DOA is accurate even in the presence of severe noise provided that \(M\) is large enough which is consistent with the results in [21]. The accuracy of the estimated DOA for small M can be increased, as mentioned in Sect. 4.1, by finding the peaks of the average of \(X_{1} (k)\) for a sequence of \(J\) snapshots. To illustrate the effect of averaging, the results for two different SNRs, \(M=6\) and \(J=20\) are shown in Fig. 5.

Fig. 4
figure 4

First simulation, DOAs estimation for different SNRs and \(M\) (number of antenna elements) with one snapshot (\(J=1\))

Fig. 5
figure 5

First simulation, DOAs are estimated by averaging over a sequence of 20 snapshots

6.2 Second Simulation

In this simulation, the effectiveness of the proposed stage-switching scheme is illustrated and it will be shown that the proposed STE is robust to mismatch between the initial DOA estimate and the real one. For the system considered here, the first and second channels are the easiest to be equalized since \(\mathbf{h}_{1} (\theta _{1} =-60^{\circ })\) is fading-free and \(\mathbf{h}_{2}\) (\(\theta _{2} =- 30^{\circ }\)) is minimum phase. Further, the zeros of \(\mathbf{h}_{3} \,(\theta _{3} =15^{\circ })\) are closer to the unit circle than the zeros of \(\mathbf{h}_4 (\theta _4 =45^{\circ })\), which implies that fading is deeper for \(\mathbf{h}_{3}\) [14] and, thus, equalizing \(\mathbf{h}_{3}\) is more challenging than equalizing \(\mathbf{h}_{4}\). Therefore, the expected behavior of the proposed STE after convergence is to lock onto the users according to the following order: the first user at the first stage, the second user at the second stage, the fourth user at the third stage and the third user at the final stage. Let us now initialize the STE as follows: stage 1 with the DOA of \(10^{\circ }\) (close to the third user), stage 2 with the DOA of \(50^{\circ }\) (close to the fourth user), stage 3 with the DOA of \(-35^{\circ }\) (close to second user) and stage 4 with the DOA of \(-65^{\circ }\) (close to the first user). This initialization implies \(5^{\circ }\) initial DOA mismatch for all users and, further, assigns the users in the opposite order than the one expected based on the above discussion. The third user being the most challenging to be equalized is expected to produce a significant residual error at the beginning which will be propagating to the other stages. The level of residual error at each stage is an indication of the difficulty of equalizing the channel for the user detected by this stage. Thus, the initialization chosen here represents one of the worst possible initializations. The function of the switching scheme in the proposed STE is to place the stage with the largest MSE last and thus lead to a small residual error for all stages at convergence. To illustrate this, the MSE of all stages as defined in Eq. (33) is plotted in Fig. 6a, b for the following two cases: when all stages are fixed and no switching is allowed and when the stages are switched based on the rule defined by Eq. (34). Clearly, the switching scheme will lead to faster convergence and the final MSE is considerably lower than in the case where there is no switching. The estimated DOAs for these two cases are shown in Fig. 6c, d. As it can be seen, the estimation accuracy is better when switching is used. To illustrate the sequence of switching in time, the estimated DOAs for different stages at the switching instances as well as the initial (\(n=0\)) and final (\(n=15{,}000\)) DOA estimations are shown in Table 3. Colors in this table represent different users and are also indicating the level of difficulty for equalizing each user from green (the easiest user) to red (the most difficult user). Clearly, after convergence, the order of the user detected by each stage is from the easiest to the most difficult, consistent with the earlier discussion.

Fig. 6
figure 6

Second simulation, MSE and estimated DOAs for two cases: when all stages are fixed and no switching is allowed and when the stages are switched based on the rule defined in Eq. (34). a MSE with no switching, b MSE with switching, c DOAs with no switching, d DOAs with switching

Table 3 Second simulation, the Estimated DOAs (in degrees) at \(n\)th sample. Entries with the same color correspond to the same user

In order to clarify the improvement of using the proposed AGSC instead of the beamformer used in [11] (consisting of only updating \({\mathbf {w}}_\mathbf{p} (n))\), the final beampatterns normalized with \(G=1/\varOmega \) [Fig. 2 and Eq. (29)] for these two cases are shown in Fig. 7. It can be seen that the AGSC of the first stage tries to pass data propagating from \(-60^{\circ }\) and reject other users. That is why the beampattern of the first beamformer has three deep nulls at \(-30^{\circ }, 15^{\circ }\) and \(45^{\circ }\). For the second stage, since the signal of one of the users at \(-60^{\circ }\) has been eliminated by the first stage from the input of the second stage [Eq. (32) and Fig.1], the beampattern has just two deep nulls corresponding to the two remaining users at \(15^{\circ }\) and \(45^{\circ }\). Due to the same reason, the third beampattern has just one deep null at \(15^{\circ }\). Clearly, the use of AGSC leads to deep nulls and strong CCI attenuation for the proposed STE.

Fig. 7
figure 7

Second simulation, the normalized beampattern for a \({\mathbf {w}}_\mathbf{p} (n)-\mathbf{B}(n){\mathbf {w}}_\mathbf{a} (n)\), and b \({\mathbf {w}}_\mathbf{p} (n)\)

6.3 Third Simulation

In this simulation, a similar scenario as in the second simulation is considered. The difference is that no initial DOAs are assumed as it was done in the second simulation. Instead, the method proposed in Sect. 4.1 is used for initialization of the DOAs. The simulation is repeated 250 times with randomly generated signals and noise, and the averaged MSE (dB) over 250 independent simulations is shown in Fig. 8. The initial DOAs along with the final estimated DOAs are shown in Fig. 9a, b. Red triangles show the real DOAs. Both Figs. 8 and 9 illustrate the good performance of the proposed STE in terms of low MSEs and accuracy of the estimated DOAs.

Fig. 8
figure 8

Third simulation, the averaged MSE over 250 simulations

Fig. 9
figure 9

Third simulation, the estimated DOAs. a Initial DOAs, b final DOAs

6.4 Fourth Simulation

The problem with multistage methods, as mentioned before, is that the residual error from the \(q\)th stage (\(q=1, 2, \ldots , Q\)) will propagate to the next one due to the fact that the input to each stage is the output of the previous one. To illustrate this drawback, the performance of the multistage algorithm proposed in [18, 19] in terms of accuracy of the estimated DOAs is considered for two different cases: when all four channels are ideal (non-fading) and when the four channels are the ones used in the third simulation. The algorithm in [18, 19] is comprised of two sets of weights: (1) the adaptive beamformer (based on the constant modulus algorithm) and (2) an adaptive signal canceller (based on the LMS algorithm). To do a fair comparison between this method and the proposed approach, the length for these weights is set to 6 (same as \(M\) in the third simulation). Further, \({N}_{s}\) and SNR are set to the same value as the third simulation. The estimated DOAs for 250 independent simulations are shown in Fig. 10. The actual DOAs are shown by the red triangles. As it can be seen in Fig. 10, the deviation of the estimated DOA from the actual DOA for the case of non-ideal channels is more than that for the ideal case. This is due to error propagation. The stage-switching scheme proposed here can significantly alleviate this problem as is shown in Fig. 9b. To provide a numerical comparison for the accuracy of the estimated DOAs, the root-mean-square error (RMSE) was calculated for the simulations of Figs. 9b and 10b. For the proposed method, the RMSE (0.009\(^\circ \)) in Fig. 9b is significantly less than the RMSE (1.469\(^\circ \)) of the multistage algorithm of [18, 19] in Fig. 10b.

Fig. 10
figure 10

Fourth simulation, the estimated DOAs using [18, 19]. a DOAs for ideal channels, b DOAs for non-ideal channels

6.5 Fifth Simulation

In this simulation, the performance of the proposed STE in the case of users with varying DOA and time-varying channel will be illustrated. Consider the case of three users with initial DOAs \(\theta _{1} =-35^{\circ }, \theta _{2} =0^{\circ }\) and \(\theta _{3} =45^{\circ }\). The initial channel for the first user is the same as in Fig. 3b, and the two other channels are fading free. From the \(3000\)th sample to the \(5000\)th sample, all DOAs are changing very fast in a linear fashion as shown in Fig. 11a. It can be seen that at around the \(4000\)th sample, the first and second user’s DOA overlap, which makes the scenario very challenging. Also, at the \(3000\)th sample, a time-varying zero at \(0.8\exp ( j2\pi /3)+0.4\exp ( j2\pi [n-3000]/10{,}000 )\) will be added to the second channel \(\mathbf{h}_{2}\), as in [6], and the fading-free channel \(\mathbf{h}_{3}\) will be replaced by \([1, 0.8, 0.6]\). For the whole time, \(\mathbf{h}_{1}\), a channel with deep fading, remains the same. At the \(5000\)th sample, all DOAs and channels get fixed and remain unchanged. The zero locations for \(\mathbf{h}_{2}\) at the \(3000\)th and \(5000\)th samples are shown in Fig. 11b which indicates that the channel is changing from minimum phase to non-minimum phase. The zeros of \(\mathbf{h}_{3}\) after the \(5000\)th sample are shown in Fig. 11c, indicating a minimum phase channel. The MSE averaged over 250 simulations is shown in Fig. 12a. Further, the final beampattern [normalized with \(G=1/\varOmega \), see Eq. (29)] for all three stages is shown in Fig. 12b. It can be seen that the beampattern at each stage has a unity gain at the DOA of the detected user (shown by arrows) and deep nulls (63–66 dB) in the direction of the other users. This indicates a good performance in CCI cancellation for the proposed STE. Finally, the average of the estimated DOAs is shown in Fig. 12c. The dotted green lines show the real DOAs. As it can be seen, the estimated DOA converges fast and is accurate which demonstrates the ability of STE to track moving users in the presence of time-varying channels.

Fig. 11
figure 11

Fifth simulation, a time-varying DOAs for the three users, b moving zero of \(\mathbf{h}_{2}\), and c \(\mathbf{h}_{3}\) after the 5000th sample

Fig. 12
figure 12

Fifth simulation, a averaged MSE, b the normalized beampattern, c the average of estimated DOAs

7 Conclusions

In this paper, a new multistage STE is proposed for blind source separation. Each stage is equipped with an adaptive version of the GSC, called AGSC, a DOA estimator and an equalizer. The beamformer and equalizer are jointly being updated to combat both CCI and ISI effectively. The DOA, possibly time-varying, of the captured signal is estimated and tracked using the phase difference of two subarray beamformers. The estimated DOA is being used by the AGSC to provide strong co-channel interference cancellation. Further, the estimated DOAs will be used to form the input to the next stage. A simple, yet efficient, DOA estimation algorithm is also proposed for estimating the initial DOAs and speed up the convergence. In order to significantly alleviate inter-stage error propagation, a sorting algorithm is used which assigns detected sources to different stages based on the value of the MSE. Simulation results demonstrate the good performance of the proposed STE and show that it can effectively deal with changing DOAs and time-varying channels.

Future work includes the theoretical analysis of the DOA estimation algorithm (presented in Sect. 4.1 and used for the initialization) as well as the space–time frequency analysis of the proposed STE for the steady state. Further, it would be interesting to extend the proposed STE to the case when spatial ISI due to multipath is present in addition to temporal ISI and CCI.