1 Introduction

Massive MIMO or large-scale MIMO systems are currently investigated with the remarkable potential of increasing the spectral efficiency and power efficiency even with very simple linear transmitter/receiver [13]. To avoid the per-antenna pilot training of the downlink, TDD (Time Division Multiplexing) is adopted and the downlink CSI can be obtained by exploiting channel reciprocity; Thus the overhead related to channel training scales linearly with the number of mobile users per cell (\(K\)) and is independent of the number of antennas per base station (\(M\)) [4]. In the multi-cell scenario, non-orthogonal training sequences must be used due to the limitation of the pilot resources. This non-orthogonal nature causes pilot contamination when the channel estimate at target cell becomes polluted by users from other cells. There’re many papers studying the impact of pilot contamination; for example, the performances of the spectral efficiency for massive MIMO with the pilot contamination have been studied in [1, 5, 6] for both uplink and downlink transmission under the i.i.d Rayleigh fading channels; and [79] study the performances of the spectral efficiency under correlated channel.

The papers mentioned above all assume the channel state information (CSI) stay constant during the phase of pilot transmission and uplink data transmission. However, in practical TDD systems, due to increased user mobility, transmission delay and processing delay, the delayed CSI will be outdated. So the assumption of a locally time-invariant (block-fading) channel breaks down, and an effective approach is to predict the channel values at times when they will be used [10, 11]. Then how will the predicted CSI affect the spectral efficiency of practical massive MIMO system?

In this work, we study the performances of the uplink sum-rate of the massive MIMO under correlated channel taking both pilot contamination and CSI delay into consideration. Using correlation between the actual channel and the estimated one, and the channel’s time-correlation, we derive an equivalent channel model with MMSE channel estimation and one-tap prediction. Employing this equivalent channel model, we then obtain the lower bound of the uplink sum-rate, and study its asymptotical performance when the base station antenna number goes without bound. We find that if we schedule the \(k\)-th user of all cells who have the same prediction coefficient, the uplink sum-rate is the same as the one with no CSI delay when the number of BS antennas goes without bound at a much greater rate than the number of users.

The notation adopted in this paper conforms to the following convention. Matrices are represented with uppercase boldface and vectors meaning column vector with lowercase boldface. \({\left( \cdot \right) ^ * }\), \({\left( \cdot \right) ^{\mathrm{T}}}\) and \({\left( \cdot \right) ^{\mathrm{H}}}\) represent conjugate, transpose and Hermitian transpose respectively. \(\left| \cdot \right| \) denotes the module operation and \(\left\| \cdot \right\| \) denotes the spectral norm. \(Tr\left( {\varvec{A}} \right) \) is the trace of \(\varvec{A}\) and \(\det \left( {\varvec{A}} \right) \) denotes the determinant of \(\varvec{A}\). \({\mathrm{Diag}}\left( {\varvec{x}} \right) \) is a diagonal matrix with \(x\) on its diagonal. \({I_M}\) denotes a \(M \times M\) identity matrix. The operator \(\mathcal{E}\left( \cdot \right) \) denotes expectation, and the covariance operator is given by \({\mathrm{cov}} \left( {{\varvec{x}},{\varvec{y}}} \right) \mathop {=}\limits ^{\Delta } \mathcal{E}\left( {{\varvec{x}}{{\varvec{y}}^{\mathrm{H}}}} \right) - \mathcal{E}\left( {\varvec{x}} \right) \mathcal{E}\left( {{{\varvec{y}}^{\mathrm{H}}}} \right) \). \(\mathcal{C}\mathcal{N}\left( {{\varvec{0}},{\varvec{\varSigma }}} \right) \) denotes complex Gaussian distribution with mean 0 and covariance matrix \({\varvec{\varSigma }}\).

2 System Model of the Uplink Multi-cell Multi-user Massive MIMO

We consider the \(L\) cells system with one BS and \(K\) mobile users in each cell. Each BS is equipped with \(M\) antennas, and each user has single antenna. We assume that the system is operating on TDD protocol with full frequency reuse. Because of TDD operation and reciprocity the propagation is the same for either a downlink or an uplink transmission. Then at BS of cell \(l\), the uplink received base-band signal vector at time \(t\) reads

$$\begin{aligned} {{\varvec{y}}_l}\left( t \right) = {{\varvec{G}}_{l,l}}\left( t \right) {{\varvec{x}}_l}\left( t \right) + \sum \limits _{i \ne l}^L {{{\varvec{G}}_{l,i}}\left( t \right) {{\varvec{x}}_i}\left( t \right) } + {{\varvec{w}}_l}\left( t \right) \end{aligned}$$
(1)

where \({{\varvec{y}}_l}\left( t \right) = {\left[ {{y_{l,1}}\left( t \right) \cdots {y_{l,M}}\left( t \right) } \right] ^T}\) is the received signal vector at time \(t\), and \({{\varvec{x}}_l}\left( t \right) = {\left[ {{x_{l,1}}\left( t \right) \cdots {x_{l,K}}\left( t \right) } \right] ^T} \sim \mathcal{C}\mathcal{N}\left( {{\varvec{0}},{{\varvec{I}}_K}} \right) \) is the overall transmitted signal vector of the \(l\)-th cell is, \({x_{l,k}}\left( t \right) \) the transmit signal of user \(k\) in cell \(l\), \({{\varvec{w}}_l}\left( t \right) \sim \mathcal{C}\mathcal{N}\left( {{\varvec{0}},{\gamma _{{\mathrm{UL}}}}{{\varvec{I}}_M}} \right) \) is the complex additive noise vector. \({{\varvec{G}}_{l,i}}\left( t \right) = \left[ {{{\varvec{g}}_{l,i,1}}\left( t \right) \cdots {{\varvec{g}}_{l,i,K}}\left( t \right) } \right] \) is the uplink channel matrix at time index \(t \) between all of the \(K\) users of cell \(i\) to the BS of cell \(l\).

We assume all of the \(K\) users of each cell have the same antenna correlation, and model the channel vector \({{\varvec{g}}_{l,i,k}}\left( t \right) \) as

(2)

where \(\alpha \) is the path loss exponent, typically between 3.0 and 5.0, \(c\) is the median of the mean path gain at a reference distance \({d_{l,i,k}} = 1km\), and \({s_{l,i,k}}\) is a log-normal shadow fading variable. \({{\varvec{h}}_{l,i,k}}\left( t \right) \sim \mathcal{C}\mathcal{N}\left( {{\varvec{0}},{{\varvec{I}}_M}} \right) \) represents the small-scale fast fading. \({{\varvec{R}}_{l,i}}\) is the deterministic receive correlation matrix of the \(K\) users of cell \(i\) to the BS of cell \(l\).

Then the composite channel \({{\varvec{G}}_{l,i}}\left( t \right) \) can be expressed as

$$\begin{aligned} {{\varvec{G}}_{l,i}}\left( t \right)&= {\varvec{R}}_{l,i}^{\frac{1}{2}}\left[ {{{\varvec{h}}_{l,i,1}}\left( t \right) \cdots {{\varvec{h}}_{l,i,K}}\left( t \right) } \right] {\varvec{\varLambda }}_{l,i}^{\frac{1}{2}} \nonumber \\&= {\varvec{R}}_{l,i}^{\frac{1}{2}}{{\varvec{H}}_{l,i}}\left( t \right) {\varvec{\varLambda }}_{l,i}^{\frac{1}{2}} \end{aligned}$$
(3)

where \({\lambda _{l,i,k}} \Delta \over = cd_{l,i,k}^{ - \alpha }{s_{l,i,k}}\), \({{\varvec{\varLambda }}_{l,i}} = {\mathrm{diag}}\left\{ {\left[ {\begin{array}{*{20}{c}} {{\lambda _{l,i,k}}}&\cdots&{{\lambda _{l,i,K}}} \end{array}} \right] } \right\} \).

Throughout the paper we adopt the following assumptions:

A1: \({h_{l,i,m,k}}\left( t \right) \) is the small-scale fast fading between \(k\)-th user of cell \(i\) to the \(m\)-th antenna of BS \(l\), which is slowly time-varying according to Jakes model with Doppler spread \({f_d}\),and \({\rho _{l,i,k}}\left( \tau \right) \Delta \over = E\left\{ {h_{l,i,m,k}^ * \left( t \right) {h_{l,i,m,k}}\left( {t + \tau } \right) } \right\} = {J_0}\left( {2\pi {f_d}\left| \tau \right| {T_s}} \right) \). \({T_s}\) is the symbol period, and \({J_0}\left( \cdot \right) \) is the zeroth order Bessel function of the first kind.

A2: \(\lim {\sup _M}\left\| {{{\varvec{R}}_{l,i}}} \right\| < \infty \) for all \(l\) and \(i\).

A3: \(\lim {\inf _M}\frac{1}{M}Tr\left( {{{\varvec{R}}_{l,i}}} \right) > 0\) for all \(l\).

3 Uplink Sum-Rate Analysis of Massive MIMO with Pilot Contamination and CSI Delay

In this section, the channel estimation of massive MIMO system under correlated channel using minimum mean-square-error (MMSE) is reviewed first, and then prediction filter with one tap is used to predict the delayed channel. We derive the equivalent channel model which allows us to give further analysis of the achievable uplink sum-rate for a finite and an infinite number of BS antennas, and obtain some interesting and insightful observations. We also analyze these performances for i.i.d channel model.

3.1 Channel Estimation and Prediction

When TDD is adopted, the minimum required number of pilot symbols is equal to the number of mobile users per cell \(K\) and is independent of the number of antennas per base station \(M\) [4]. So without loss a generality, here we assume that the pilot matrix is an identity matrix of size \(K\). Channel estimation is performed by sending pilot sequence of length \(K\) to BSs, and we assume that a total of \(L\) base stations share the same band of frequencies and the same set of \(K\) orthogonal pilot signals. Furthermore we assume synchronized transmissions and reception which constitutes a worst-case scenario from the standpoint of pilot contamination [1]. There is no cooperation between the BSs. For slow varying channel, we assume that the channel is invariable during the phase of pilot transmission. For simplicity of writing, from here on we omit the time index \(\left( t \right) \)of channel matrix, and use subscript P to distinguish the phase of pilot transmission from the phase of data transmission. Then, from (1), the received pilot signal at BS of cell \(l\) can be given by

$$\begin{aligned} {{\varvec{Y}}_{{\mathrm{P}},l}} = {{\varvec{G}}_{l,l}}{{\varvec{X}}_{{\mathrm{P}},l}} + \sum \limits _{i \ne l}^L {{{\varvec{G}}_{l,i}}{{\varvec{X}}_{{\mathrm{P}},i}}} + {{\varvec{W}}_{{\mathrm{P}},l}} \end{aligned}$$
(4)

where \({{\varvec{Y}}_{{\mathrm{P}},l}}\) is an \(M \times K\) received pilot signal matrix, here the dimension \(K\) refers to the length of pilot sequence. \({{\varvec{X}}_{{\mathrm{P}},i}} = {{\varvec{I}}_K}\) is a \(K \times K\) pilot matrix. \({{\varvec{W}}_{{\mathrm{P}},l}}\) is an \(M \times K\) noise matrix and each element is i.i.d ZMCSCG random variable with variance \({\gamma _{\mathrm{P}}}\). Since \(\mathcal{E}\left( {{{\varvec{g}}_{l,i,k}}{\varvec{g}}_{l,i',j}^{\mathrm{H}}} \right) = 0\), for \(j \ne k\), the estimation of \({{\varvec{g}}_{l,i,k}}\) can be processed individually. We take \(k\)-th user for example,

$$\begin{aligned} {{\varvec{y}}_{{\mathrm{P}},l,k}} = {{\varvec{g}}_{l,l,k}} + \sum \limits _{i \ne l} {{{\varvec{g}}_{l,i,k}}} + {{\varvec{w}}_{{\mathrm{P}},l,k}} \end{aligned}$$
(5)

where \({{\varvec{y}}_{{\mathrm{P}},l,k}}\) and \({{\varvec{w}}_{{\mathrm{P}},l,k}}\) are the \(k\)-th column of \({{\varvec{Y}}_{{\mathrm{P}},l}}\) and \({{\varvec{W}}_{{\mathrm{P}},l}}\), respectively. Given (5), the MMSE estimation of channel vector \({{\varvec{g}}_{l,i,k}}\) can be obtained by

$$\begin{aligned} {{\varvec{\hat{g}}}_{l,i,k}} = {\lambda _{l,i,k}}{{\varvec{R}}_{l,i}}{{\varvec{Q}}_{l,k}}{{\varvec{y}}_{{\mathrm{P}},l,k}} \end{aligned}$$
(6)

where

$$\begin{aligned} {{\varvec{Q}}_{l,k}} = {\left( {\sum \limits _{i = 1}^L {{\lambda _{l,i,k}}{{\varvec{R}}_{l,i}}} + {\gamma _{\mathrm{P}}}{{\varvec{I}}_M}} \right) ^{ - 1}} \end{aligned}$$
(7)

Invoking the orthogonality property of the MMSE estimate [12], the estimation error \({{\varvec{\tilde{g}}}_{l,i,k}}\) is statistically independent of \({{\varvec{g}}_{l,i,k}}\) due to the joint Gaussianity of both vectors. So we can decompose the channel as \({{\varvec{g}}_{l,i,k}} = {{\varvec{\tilde{g}}}_{l,i,k}} + {{\varvec{\hat{g}}}_{l,i,k}}\). And the covariance matrix of the estimation error can be expressed as

$$\begin{aligned} {\mathop {\mathrm{cov}}} \left( {{{{\varvec{\tilde{g}}}}_{l,i,k}},{{{\varvec{\tilde{g}}}}_{l,i,k}}} \right) = {\lambda _{l,i,k}}{{\varvec{R}}_{l,i}} - \lambda _{l,i,k}^2{{\varvec{R}}_{l,i}}{{\varvec{Q}}_{l,k}}{{\varvec{R}}_{l,i}} \end{aligned}$$
(8)

See (6), we find that given \(k\), the channel estimation \({{\varvec{\hat{g}}}_{l,i,k}}\) are correlated for each \(i\) due to pilot contamination. As in [9], we define \({{\varvec{\hat{h}}}_{l,k}} \Delta \over = {Q_{l,k}}{{\varvec{y}}_{{\mathrm{P}},l,k}}\), which obeys \({{\varvec{\hat{h}}}_{l,k}} \sim \mathcal{C}\mathcal{N}\left( {{\varvec{0}},{{\varvec{Q}}_{l,k}}} \right) \). So the estimated channel vector can be expressed by

$$\begin{aligned} {{\varvec{\hat{g}}}_{l,i,k}} = {\lambda _{l,i,k}}{{\varvec{R}}_{l,i}}{{\varvec{\hat{h}}}_{l,k}} \end{aligned}$$
(9)

where \({{\varvec{\hat{h}}}_{l,k}}\) represents the Rayleigh fading part of the estimated channel.

Due to increased user mobility, transmission delay and processing delay, the CSI of the phase of data transmission will be different from that of the pilot transmission, which we call CSI delay. The time-varying characteristics of the small-scale fading cause estimation error also. We improve the initial channel estimation taking CSI delay into consideration.

Suppose that we use a prediction filter with one tap exploiting the time-domain correlation present according to A1 [10, 11].

$$\begin{aligned} \left[ {{\varvec{h}}_{l,i,1}^\tau \cdots {\varvec{h}}_{l,i,K}^\tau } \right]&= \left[ {{{\varvec{h}}_{l,i,1}} \cdots {{\varvec{h}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} \nonumber \\&\quad +\left[ {{\varvec{e}}_{l,i,1}^\tau \cdots {\varvec{e}}_{l,i,K}^\tau } \right] \textit{diag}\left\{ {\left[ {\sqrt{1 - {{\left| {\rho _{l,i,1}^\tau } \right| }^2}} \cdots \sqrt{1 - {{\left| {\rho _{l,i,K}^\tau } \right| }^2}} } \right] } \right\} \end{aligned}$$
(10)

As mentioned before, for simplicity of writing, we use \({\varvec{h}}_{l,i,k}^\tau \)for the channel delayed (the superscript \(\tau \)indicates the CSI delay between channel estimation and uplink transmission). \(\rho _{l,i,k}^\tau \) is the correlation coefficient between the delayed channel and current one, and it is decided according to the assumption of A1. \({\varvec{e}}_{l,i,k}^\tau \)is i.i.d zero mean circularly symmetric complex Gaussian random variables of variance 1. And the second part of (10) stands for prediction error.

Substituting (10) into (3), we obtain

$$\begin{aligned} {\varvec{G}}_{l,i}^\tau&= {\varvec{R}}_{l,i}^{\frac{1}{2}}\left[ {{{\varvec{h}}_{l,i,1}} \cdots {{\varvec{h}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} \varLambda _{l,i}^{\frac{1}{2}} \nonumber \\&\quad + {\varvec{R}}_{l,i}^{\frac{1}{2}}\left[ {{\varvec{e}}_{l,i,1}^\tau \cdots {\varvec{e}}_{l,i,K}^\tau } \right] \textit{diag}\left\{ {\left[ {\sqrt{1 - {{\left| {\rho _{l,i,1}^\tau } \right| }^2}} \cdots \sqrt{1 - {{\left| {\rho _{l,i,K}^\tau } \right| }^2}} } \right] } \right\} \varLambda _{l,i}^{\frac{1}{2}} \nonumber \\&= \left[ {{{\varvec{g}}_{l,i,1}} \cdots {{\varvec{g}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} + {\varvec{\mathrm{Z}}}_{l,i}^\tau \end{aligned}$$
(11)

where

is the matrix of prediction error with covariance matrix \(\mathcal{E}\left[ {{\varvec{\mathrm{Z}}}_{l,i}^\tau {{\left( {{\varvec{\mathrm{Z}}}_{l,i}^\tau } \right) }^{\mathrm{H}}}} \right] = \left( {\sum _{k = 1}^K {{\lambda _{l,i,k}}\left( {1 - {{\left| {\rho _{l,i,k}^\tau } \right| }^2}} \right) } } \right) {{\varvec{R}}_{l,i}}\).

According to \({{\varvec{g}}_{l,i,k}} = {{\varvec{\tilde{g}}}_{l,i,k}} + {{\varvec{\hat{g}}}_{l,i,k}}\), we get

$$\begin{aligned} {\varvec{G}}_{l,i}^\tau&= \left[ {{{{\varvec{\hat{g}}}}_{l,i,1}} \cdots {{{\varvec{\hat{g}}}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} \nonumber \\&\quad + \left[ {{{{\varvec{\tilde{g}}}}_{l,i,1}} \cdots {{{\varvec{\tilde{g}}}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} + {\varvec{\mathrm{Z}}}_{l,i}^\tau \end{aligned}$$
(12)

Substituting (9) into (12), we obtain

$$\begin{aligned} {\varvec{G}}_{l,i}^\tau&= {{\varvec{R}}_{l,i}}\left[ {{{{\varvec{\hat{h}}}}_{l,1}} \cdots {{{\varvec{\hat{h}}}}_{l,K}}} \right] \textit{diag}\left\{ {\left[ {\lambda _{l,i,1}^\tau \cdots \lambda _{l,i,K}^\tau } \right] } \right\} \nonumber \\&\quad \; + \left[ {{{{\varvec{\tilde{g}}}}_{l,i,1}} \cdots {{{\varvec{\tilde{g}}}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} + {\varvec{\mathrm{Z}}}_{l,i}^\tau \end{aligned}$$
(13)

where .

We define the following matrices

$$\begin{aligned} {\varvec{\varLambda }}_{l,i}^\tau&\triangleq {\mathrm{diag}}\left\{ {\left[ {\lambda _{l,i,1}^\tau \cdots \lambda _{l,i,K}^\tau } \right] } \right\} \\ {{\varvec{\hat{H}}}_l}\,&\triangleq \left[ {{{{\varvec{\hat{h}}}}_{l,1}} \cdots {{{\varvec{\hat{h}}}}_{l,K}}} \right] \\ {\varvec{\hat{G}}}_{l,i}^\tau&\triangleq \left[ {{\varvec{\hat{g}}}_{l,i,1}^\tau \cdots {\varvec{\hat{g}}}_{l,i,K}^\tau } \right] \\&= {{\varvec{R}}_{l,i}}\left[ {{{{\varvec{\hat{h}}}}_{l,1}} \cdots {{{\varvec{\hat{h}}}}_{l,K}}} \right] \textit{diag}\left\{ {\left[ {\lambda _{l,i,1}^\tau \cdots \lambda _{l,i,K}^\tau } \right] } \right\} \\ {\varvec{\tilde{G}}}_{l,i}^\tau&\triangleq \left[ {{\varvec{\tilde{g}}}_{l,i,1}^\tau \cdots {\varvec{\tilde{g}}}_{l,i,K}^\tau } \right] \\&= \left[ {{{{\varvec{\tilde{g}}}}_{l,i,1}} \cdots {{{\varvec{\tilde{g}}}}_{l,i,K}}} \right] \textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} \end{aligned}$$

The predicted channel estimation can be expressed as

$$\begin{aligned} {\varvec{\hat{G}}}_{l,i}^\tau = {{\varvec{R}}_{l,i}}{{\varvec{\hat{H}}}_l}{\varvec{\varLambda }}_{l,i}^\tau \end{aligned}$$
(14)

and the actual channel matrix can be expressed as

$$\begin{aligned} {\varvec{G}}_{l,i}^\tau = {\varvec{\hat{G}}}_{l,i}^\tau + {\varvec{\tilde{G}}}_{l,i}^\tau + {\varvec{Z}}_{l,i}^\tau \end{aligned}$$
(15)

3.2 Lower Bound of the Sum-Rate and Its Asymptotical Performance Analysis for Large \(M\)

Consider the uplink transmission represented by (1), and according to (15), the overall uplink received signal at the BS of cell \(l\) can be written as

$$\begin{aligned} {{\varvec{y}}_l} = {\varvec{\hat{G}}}_{l,l}^\tau {{\varvec{x}}_l} + \sum \limits _{i \ne l}^L {{\varvec{\hat{G}}}_{l,i}^\tau {{\varvec{x}}_i}} + \sum \limits _{i = 1}^L {{\varvec{\tilde{G}}}_{l,i}^\tau {{\varvec{x}}_i}} + \sum \limits _{i = 1}^L {{\varvec{Z}}_{l,i}^\tau {{\varvec{x}}_i}} + {\varvec{w}}_l^{{\mathrm{UL}}} \end{aligned}$$
(16)

Then the sum rate of the cell \(l\) in bits per second per channel use (bps/channel) (omitting the rate loss due to uplink channel training)is defined as

(17)

Theorem 1

For linear multi-user MMSE detection, while taking channel estimation error and CSI delay into consideration, the lower bound of (17)can be given by

$$\begin{aligned} {{\hat{C}}^\tau } \ge \hat{C}_{LB}^\tau&= {\log _2}\det \left( {\sum \limits _{i = 1}^L {{\varvec{\hat{G}}}_{l,i}^\tau {{\left( {{\varvec{\hat{G}}}_{l,i}^\tau } \right) }^{\mathrm{H}}}{{\left( {{\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}} + {{\varvec{I}}_M}} \right) \nonumber \\&\quad - {\log _2}\det \left( {\sum \limits _{i \ne l}^L {{\varvec{\hat{G}}}_{l,i}^\tau {{\left( {{\varvec{\hat{G}}}_{l,i}^\tau } \right) }^{\mathrm{H}}}{{\left( {{\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}} + {{\varvec{I}}_M}} \right) \end{aligned}$$
(18)

where \({\varvec{\varSigma }}_l^\tau \)is the covariance matrix of noise and interference caused by prediction and estimation error which is computed as

$$\begin{aligned} {\varvec{\varSigma }}_l^\tau&= {\mathop {\mathrm{cov}}} \left( {\sum \limits _{i = 1}^L {{\varvec{\tilde{G}}}_{l,i}^\tau {{\varvec{x}}_i}} + \sum \limits _{i = 1}^L {{\varvec{Z}}_{l,i}^\tau {{\varvec{x}}_i}} + {\varvec{w}}_l^{{\mathrm{UL}}},\sum \limits _{i = 1}^L {{\varvec{\tilde{G}}}_{l,i}^\tau {{\varvec{x}}_i}} + \sum \limits _{i = 1}^L {{\varvec{Z}}_{l,i}^\tau {{\varvec{x}}_i}} + {\varvec{w}}_l^{{\mathrm{UL}}}} \right) \nonumber \\&= \sum \limits _{i = 1}^L {\left( {\sum \limits _{k = 1}^K {{\lambda _{l,i,k}}{{\varvec{R}}_{l,i}} - {{\varvec{R}}_{l,i}}\left( {\sum \limits _{k = 1}^K {\lambda _{l,i,k}^2{{\left| {\rho _{l,i,k}^\tau } \right| }^2}{{\varvec{Q}}_{l,k}}} } \right) {{\varvec{R}}_{l,i}}} } \right) } + {\gamma _{{\mathrm{UL}}}}{{\varvec{I}}_M} \end{aligned}$$
(19)

Proof

Because estimation error, prediction error and the additive noise are all uncorrelated with the signal, then according to the Worst Case Uncorrelated Additive Noise Theorem in [13],

$$\begin{aligned} \hat{C}_{LB}^\tau = {\log _2}\det \left( {{{\left( {\sum \limits _{i \ne l}^L {{\varvec{\hat{G}}}_{l,i}^\tau {{\left( {{\varvec{\hat{G}}}_{l,i}^\tau } \right) }^H}} + {\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}{\varvec{\hat{G}}}_{l,l}^\tau {{\left( {{\varvec{\hat{G}}}_{l,l}^\tau } \right) }^H} + {{\varvec{I}}_K}} \right) \end{aligned}$$
(20)

Multiplying out \({\left( {\sum \limits _{i \ne l}^L {{\varvec{\hat{G}}}_{l,i}^\tau {{\left( {{\varvec{\hat{G}}}_{l,i}^\tau } \right) }^{\mathrm{H}}}} + {\varvec{\varSigma }}_l^\tau } \right) ^{ - 1}}\), we obtain (18).

Observe the similarity between our result and theorem 1 in [9]. And using the similar manipulation, we can deduce theorem 2 invoking the strong law of large number.

Theorem 2

As \(M \rightarrow \infty \), \(\hat{C}_{LB}^\tau \) obeys , where

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau&= \sum \limits _{k = 1}^K {{{\log }_2}\left[ {\frac{{\det \left( {{{\varvec{\varXi }}_{l,k}} + {{\varvec{I}}_L}} \right) }}{{\det \left( {{{{\varvec{\varXi '}}}_{l,k}} + {{\varvec{I}}_{L - 1}}} \right) }}} \right] }\nonumber \\&= \sum \limits _{k = 1}^K {{{\log }_2}\left[ {{\xi _{l,l,k}} + 1 - \left[ {{\xi _{l,2,k}} \cdots {\xi _{l,L,k}}} \right] {{\left( {{{{\varvec{\varXi '}}}_{l,k}} + {{\varvec{I}}_{L - 1}}} \right) }^{ - 1}}{{\left[ {{\xi _{2,l,k}} \cdots {\xi _{L,l,k}}} \right] }^T}} \right] } \end{aligned}$$
(21)

where

$$\begin{aligned} {\xi _{i,i',k}}&= {\left| {\lambda _{l,i,k}^\tau } \right| ^2}{\mathrm{Tr}}\left( {{{\varvec{Q}}_{l,k}}{{\varvec{R}}_{l,i}}{{\left( {{\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}{{\varvec{R}}_{l,i'}}} \right) \\ {{\varvec{\varXi }}_{l,k}}&= \left[ \begin{array}{l} {\xi _{1,1,k}} \cdots {\xi _{1,L,k}}\\ \;\;\, \vdots \,\;\,\, \cdots \;\;\, \vdots \\ {\xi _{L,1,k}} \cdots {\xi _{L,L,k}} \end{array} \right] \end{aligned}$$

and \({{\varvec{\varXi '}}_{l,k}}\) is the submatrix of \({{\varvec{\varXi }}_{l,k}}\) formed by deleting the \(l\)-th row and \(l\)-th column.

Proof

According to Theorem 2 in [9], and using the property of the determinant of the block matrix, one can obtain the result.

If \({{\varvec{R}}_{l,i}} = {{\varvec{R}}_l},\forall i\), \(\hat{C}_{LB,\inf }^\tau \) can be simplified as

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau = \sum \limits _{k = 1}^K {\left( {1 + \frac{{{{\left| {{\lambda _{l,l,k}}\rho _{l,l,k}^\tau } \right| }^2}}}{{\sum _{i \ne l}^L {{{\left| {{\lambda _{l,i,k}}\rho _{l,i,k}^\tau } \right| }^2}} + {{\left[ {{\mathrm{Tr}}\left( {{{\varvec{Q}}_{l,k}}{{\varvec{R}}_l}{{\left( {{\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}{{\varvec{R}}_l}} \right) } \right] }^{ - 1}}}}} \right) } \end{aligned}$$
(22)

The receive correlation matrix can be decomposed as \({{\varvec{R}}_l} = {\varvec{UV}}{{\varvec{U}}^{\mathrm{H}}}\) where \(\varvec{U}\) is a unitary matrix, then \({{\varvec{Q}}_{l,k}}\), \({\left( {{\varvec{\varSigma }}_l^\tau } \right) ^{ - 1}}\) can be written in the similar way as \({{\varvec{Q}}_{l,k}} = {\varvec{UV}}_{l,k}^Q{{\varvec{U}}^{\mathrm{H}}}\), \({\left( {{\varvec{\varSigma }}_l^\tau } \right) ^{ - 1}} = {\varvec{U}}{{\varvec{V}}_\varSigma }{{\varvec{U}}^{\mathrm{H}}}\), and with the assumption of A2 and A3, when \({M / K} \rightarrow \infty \), \({\left[ {{\mathrm{Tr}}\left( {{{\varvec{Q}}_{l,k}}{{\varvec{R}}_l}{{\left( {{\varvec{\varSigma }}_l^\tau } \right) }^{ - 1}}{{\varvec{R}}_l}} \right) } \right] ^{ - 1}} \rightarrow 0\). We get following corollary.

Corollary 1

Let \({{\varvec{R}}_{l,i}} = {{\varvec{R}}_l},\forall i\), when \(M \rightarrow \infty \), and assume \({M / K} \rightarrow \infty \), (22) can be given by

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau = \sum \limits _{k = 1}^K {{{\log }_2}\left( {1 + \frac{{{{\left| {{\lambda _{l,l,k}}\rho _{l,l,k}^\tau } \right| }^2}}}{{\sum _{i \ne l}^L {{{\left| {{\lambda _{l,i,k}}\rho _{l,i,k}^\tau } \right| }^2}} }}} \right) } \end{aligned}$$
(23)

If there is no CSI delay, that is \(\rho _{l,i,k}^\tau = 1,\forall i,k\), (23) degrades to

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau = \sum \limits _{k = 1}^K {{{\log }_2}\left( {1 + \frac{{{{\left( {{\lambda _{l,l,k}}} \right) }^2}}}{{\sum _{i \ne l}^L {{{\left( {{\lambda _{l,i,k}}} \right) }^2}} }}} \right) } \end{aligned}$$
(24)

which coincides with the conclusions derived by [1].

Remark Theorem 2 shows that for large \(M\) the approximation of \(\hat{C}_{LB}^\tau \) can be given by \(\hat{C}_{LB,\inf }^\tau \). And setting \(\rho _{l,i,k}^\tau = 1,\forall i,k\), (18) and (21) deduce to the conclusions with no CSI delay, which coincide with Theorem 1 and 2 of [9]. Corollary 1 tells us that as \(M \rightarrow \infty \), and \({M / K} \rightarrow \infty \), the effects of uncorrelated receiver noise, prediction error, estimation error and fast fading are eliminated completely. And transmissions from users within ones own cell do not interfere. However, transmission from users in other cells that use the same pilot sequence constitutes a residual interference called pilot contamination. And due to CSI delay, the direct gains and the cross gains are both propotional to the square of their respective prediction coefficients. It also tells us that as \(M \rightarrow \infty \), and \({M / K} \rightarrow \infty \), the sum-rate will be up-bounded by the same infinite rate which is independent of the receive correlation matrix.

More interestingly, as \(M \rightarrow \infty \), and \({M / K} \rightarrow \infty \),if we schedule the \(k\)-th user of target cell who move at lower speed than other \(L-1\) \(k\)-th users, his uplink rate will be greater than the one when all the users are immobile, and vice versa. It can be proven that the uplink rate of some of the \(k\)-th users will increase while others will decrease if all the \(k\)-th users move at different velocities. But if we schedule the \(k\)-th user of all cells with same mobile velocity, all of their uplink sum-rate will not degrade in spite of their mobility. Such observation is very useful for mobile communications. And the reason for such interesting observation is that the existence of pilot contamination makes CSI delay not only degrade the signal power, but also the interference power.

If , we define

$$\begin{aligned} {R_\infty } = \sum \limits _{k = 1}^K {{{\log }_2}\left( {1 + \frac{{{{\left( {{\lambda _{l,l,k}}} \right) }^2}}}{{\sum _{i \ne l}^L {{{\left( {{\lambda _{l,i,k}}} \right) }^2}} }}} \right) } \end{aligned}$$
(25)

3.3 Sum-Rate Analysis for i.i.d Channel Model

In order to deduce some insightful conclusions, we now consider the i.i.d Rayleigh fading channel model

$$\begin{aligned} {{\varvec{G}}_{l,i}} = {{\varvec{H}}_{l,i}}{\varvec{\varLambda }}_{l,i}^{\frac{1}{2}} \end{aligned}$$
(26)

The covariance matrix of the estimated channel vector \({{\varvec{\hat{g}}}_{l,i,k}}\) can be simplified by

$$\begin{aligned} {\mathop {\mathrm{cov}}} \left( {{{{\varvec{\hat{g}}}}_{l,i,k}},{{{\varvec{\hat{g}}}}_{l,i,k}}} \right)&= \lambda _{l,i,k}^2{{\varvec{Q}}_{l,k}} \nonumber \\&= \lambda _{l,i,k}^2{\left[ {\left( {\sum \limits _{i = 1}^L {{\lambda _{l,i,k}}} } \right) + {\gamma _{\mathrm{P}}}} \right] ^{ - 1}}{{\varvec{I}}_M} \end{aligned}$$
(27)

So in this case, the equivalent predicted channel estimation model (14) can be rewritten as

$$\begin{aligned} {\varvec{\hat{G}}}_{l,i}^\tau = {\varvec{H}}{{\varvec{\varLambda }}_{l,i}}{\left( {\sum \limits _{i = 1}^L {{{\varvec{\varLambda }}_{l,i}}} + {\gamma _{\mathrm{P}}}{{\varvec{I}}_K}} \right) ^{ - \frac{1}{2}}}\textit{diag}\left\{ {\left[ {\rho _{l,i,1}^\tau \cdots \rho _{l,i,K}^\tau } \right] } \right\} \end{aligned}$$
(28)

where \(H\) is an \(M \times K\) standard complex Gaussian matrices, and the covariance matrix of noise and interference caused by prediction and estimation reads

$$\begin{aligned} {\varvec{\varSigma }}_l^\tau = \left( {{\varepsilon ^\tau } + {\gamma _{{\mathrm{UL}}}}} \right) {{\varvec{I}}_M} \end{aligned}$$
(29)

where

(30)

So for the physical channel model (26), the lower bound of the sum-rate defined in (18) can be given by

$$\begin{aligned} \hat{C}_{LB}^\tau&= {\log _2}\det \left( {{\varepsilon ^{ - \tau }}{\varvec{H}}\left( {\sum \limits _{i = 1}^L {{\varvec{\varLambda }}_{l,i}^\tau {{\left( {{\varvec{\varLambda }}_{l,i}^\tau } \right) }^{\mathrm{H}}}} } \right) {{\left( {\sum \limits _{i = 1}^L {{{\varvec{\varLambda }}_{l,i}}} + {\gamma _{\mathrm{P}}}{{\varvec{I}}_K}} \right) }^{ - 1}}{{\varvec{H}}^{\mathrm{H}}} + {{\varvec{I}}_M}} \right) \nonumber \\&\quad - {\log _2}\det \left( {{\varepsilon ^{ - \tau }}{\varvec{H}}\left( {\sum \limits _{i \ne l}^L {{\varvec{\varLambda }}_{l,i}^\tau {{\left( {{\varvec{\varLambda }}_{l,i}^\tau } \right) }^{\mathrm{H}}}} } \right) {{\left( {\sum \limits _{i = 1}^L {{{\varvec{\varLambda }}_{l,i}}} + {\gamma _{\mathrm{P}}}{{\varvec{I}}_K}} \right) }^{ - 1}}{{\varvec{H}}^{\mathrm{H}}} + {{\varvec{I}}_M}} \right) \end{aligned}$$
(31)

where \({\varepsilon ^{ - \tau }}\triangleq {{{\left( {{\varepsilon ^\tau } + {\gamma _{{\mathrm{UL}}}}} \right) }^{ - 1}}}\).

Theorem 3

For the physical channel model (26), as \(M \rightarrow \infty \), \(\hat{C}_{LB}^\tau \) obeys \(\hat{C}_{LB}^\tau - \hat{C}_{LB,\inf }^\tau {\xrightarrow {M \rightarrow \infty }} 0\), where the infinite sum-rate is expressed as

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau = \sum \limits _{k = 1}^K {{{\log }_2}\left( {1 + \frac{{{{\left| {{\lambda _{l,l,k}}\rho _{l,l,k}^\tau } \right| }^2}}}{{\sum _{i \ne l}^L {{{\left| {{\lambda _{l,i,k}}\rho _{l,i,k}^\tau } \right| }^2}} + \frac{\left( {{\varepsilon ^\tau } + {\gamma _{{\mathrm{UL}}}}} \right) }{M}\left( {\sum _{i = 1}^L {{\lambda _{l,i,k}}} + {\gamma _{\mathrm{P}}}} \right) }}} \right) } \end{aligned}$$
(32)

Proof

See “Appendix”.

We can make several observations from (32). We consider a group of users with the same CSI delay in all cells, so their channel prediction coefficients only depend on their mobile velocity. If \(\rho _l^\tau \Delta \over = \rho _{l,i,k}^\tau \ne 0\), for all \(i\),\(k\), substituting (30) into (32), and make some algebraic manipulations, the second part of the logarithm becomes

$$\begin{aligned} \frac{{\lambda _{l,l,k}^2}}{{\sum _{i \ne l}^L {\lambda _{l,i,k}^2} + {{\lambda '}_{l,k}}\left( {\frac{{\left( {\sum _{k = 1}^K {\sum _{i = 1}^L {{\lambda _{l,i,k}}} } + {\gamma _{{\mathrm{UL}}}}} \right) }}{{M{{\left| {\rho _l^\tau } \right| }^2}}} - \frac{{\sum _{k = 1}^K {\left( {\sum _{i = 1}^L {\lambda _{l,i,k}^2} } \right) {{{{\lambda '}_{l,k}} }^{ - 1}}} }}{M}} \right) }} \end{aligned}$$

where \({{{\lambda '}_{l,k}} = \left( {\sum _{i = 1}^L {{\lambda _{l,i,k}}} + {\gamma _{\mathrm{P}}}} \right) }\).

Since \({\left| {\rho _l^\tau } \right| ^2} \le 1\), \(\hat{C}_{LB,\inf }^\tau \le \hat{C}_{LB,\inf }^0\) That is to say, if the scheduled users are all with the same mobile velocity, the uplink sum-rate will degrade due to mobility. But if we schedule the users of all cells with different mobile velocity, how will the sum-rate become? It seems as if the sum-rate may increase as well as decrease. To verify our guess, we make approximation as follows.

According to (29), we know \({\varepsilon ^\tau }{{\varvec{I}}_M}\) is the covariance matrix of channel error caused by prediction and estimation, so \({\varepsilon ^\tau } \ge 0\). For each \(k\),

$$\begin{aligned} \varepsilon _k^\tau&= \sum \limits _{i = 1}^L {{\lambda _{l,i,k}}} - \left( {\sum \limits _{i = 1}^L {\lambda _{l,i,k}^2{{\left| {\rho _{l,i,k}^\tau } \right| }^2}} } \right) {\left( {\sum \limits _{i = 1}^L {{\lambda _{l,i,k}}} + {\gamma _{\mathrm{P}}}} \right) ^{ - 1}} \\&= \frac{{\sum _{i = 1}^L {\lambda _{l,i,k}^2} - \sum _{i = 1}^L {\lambda _{l,i,k}^2{{\left| {\rho _{l,i,k}^\tau } \right| }^2}} }+{\sum _{j \ne i}^L {\sum _{i = 1}^L {{\lambda _{l,i,k}}{\lambda _{j,i,k}}} } + {\gamma _{\mathrm{P}}}\sum _{i = 1}^L {{\lambda _{l,i,k}}} }}{{\sum _{i = 1}^L {{\lambda _{l,i,k}}} + {\gamma _{\mathrm{P}}}}} \end{aligned}$$

since \({\left| {\rho _{l,i,k}^\tau } \right| ^2} \le 1\), \(\sum _{i = 1}^L {\lambda _{l,i,k}^2} - \sum _{i = 1}^L {\lambda _{l,i,k}^2{{\left| {\rho _{l,i,k}^\tau } \right| }^2}} \ge 0\), so \(\varepsilon _k^\tau > 0\).

This proves that \({\varepsilon ^\tau }\) is monotonic increasing of \(K\). So the interference depends mainly on the ration \({K / M}\). As a result, when \({M / K} \rightarrow \infty \),

$$\begin{aligned} \frac{{\left( {{\varepsilon ^\tau } + {\gamma _{{\mathrm{UL}}}}} \right) }}{M}\left( {\sum \limits _{i = 1}^L {{\lambda _{l,i,k}}} + {\gamma _{\mathrm{P}}}} \right) \rightarrow 0 \end{aligned}$$

and (32) can be further simplified as

$$\begin{aligned} \hat{C}_{LB,\inf }^\tau = \sum \limits _{k = 1}^K {{{\log }_2}\left( {1 + \frac{{{{\left| {{\lambda _{l,l,k}}\rho _{l,l,k}^\tau } \right| }^2}}}{{\sum _{i \ne l}^L {{{\left| {{\lambda _{l,i,k}}\rho _{l,i,k}^\tau } \right| }^2}} }}} \right) } \end{aligned}$$
(33)

which coincides with (23).

4 Numerical Results

In this section, we validate the analyses presented above through a set of Monte–Carlo simulations. Same as [9], a 7-cell hexagonal system layout is adopted. The inner cell radius is normalized to one, the distance between two adjacent cells is normalized to 2, and we assume a distance-based path loss model with path loss exponent \(\alpha = 3.7\). To allow for reproducibility of our results, we distribute \(K = 30\) UTs uniformly on a circle of radius 2/3 around each BS and do not consider shadowing. The entries of the correlation matrix are modeled via the common exponential correlation model \({\left[ {{{\varvec{R}}_{l,i}}} \right] _{m,n}} = {\kappa ^{\left| {m - n} \right| }}\) with \(\kappa \) being the transmit correlation coefficient [14]. The carrier frequency is 2.3 GHz. Symbol interval is \(\frac{1}{14}\) ms. We assume the CSI delay is invariable, and equals 4 in symbol. So the prediction coefficient is only related to mobile velocity \(v\). We further assume \({\gamma _{\mathrm{P}}} = {\gamma _{{\mathrm{UP}}}}\).

Attention: for Figs. 1, 2, 3 and 5, users in all cells move at the same given velocity. Figure 1 plots the achievable sum-rate against the number of antennas \(M\) while uses move at different velocity, and SNR equals 0, \(\kappa \) equals 0.9. We can see that for not so large number and low velocity, our approximation is not very accurate. So we plots Fig. 2 which plots the achievable sum-rate against the signal-to-noise ratio (SNR) with different numbers of antenna \(M\) and the mobile velocity is set to 120 km/h. As ecpected, it can be seen that the approximation for lare \(M\) is very acurate. More important, we can see that as the mobile velocity increases, the sum-rate degrades more deeply for finite \(M\). Interstingly, our approximatin is more accurate for higher velocity for finite \(M\). As Fig. 1 shows for users with different mobile velocity, the faster the user moves, the more antennas he needs to achive the same given sum-rate; and as Fig. 2 shows, the lower the SNR is (which means the lower the transmit power is when we assume the power of the noise is invariable), the more antennas he needs to achive the same given sum-rate. In other words, large antenna array can not only potentially reduce uplink transmit power, but also compensate for the decay due to user mobility.

Fig. 1
figure 1

Ergodic sum-rate for different \(v\), \(\kappa = 0.9\)

Fig. 2
figure 2

Ergodic sum-rate for different \(M\), \(\kappa = 0.9\)

Fig. 3
figure 3

Ergodic sum-rate for different \(v\), \(\kappa = 0\)

Figure 3 plots the achievable sum-rate against the number of antennas \(M\) for i.i.d channel model while users move at different given velocity, and SNR equals 0. Compared to Fig. 1, we see that high correlation degrades the sum-rate deeply, and low correlation makes our approximation more accurate.

In order to verity our guess that the sum-rate may increase as well as decrease if we schedule the users of all cells with different mobile velocity, we plot Fig. 4. In Fig. 4, the curve marked immobile is the case with pilot contamination only; the curve marked mobile is the case with both pilot contamination and CSI delay while all users in target cell move at 60 km/h, and all users in the interfering cells move at 240 km/h. SNR all equal 0. As is analyzed, due to the impact of pilot contamiantion the sum-rate even increase when we schedule the users in target cell who move at lower velocity than users in the interfering cells.

Fig. 4
figure 4

Ergodic sum-rate for immobile and mobile users, \(\kappa = 0.9\)

Figure 5 plots the average rate per user against large number of antennas \(M\) for i.i.d channel model while users move at different given velocity, and SNR equals 0. We use average rate per user instead of sum-rate for more distinct exhibition. As is expected, while we schedule the \(k\)-th user of all cells with same mobile velocity, the average rate per user with different mobile velocity will both approach \(R_\infty \) as \(M\) goes to infinity.

Fig. 5
figure 5

Average rate per user for different \(v\), \(\kappa = 0\)

5 Conclusions

We have analyzed the spectral efficiency for massive MIMO taking the practical problems such as the antenna correlation, pilot contamination and CSI delay into consideration. The equivalent channel model for massive MIMO with pilot contamination and CSI delay is derived. With this new model, the lower bound of the sum-rate is obtained, and the asymptotic performance of the sum-rate is analyzed for \(M \rightarrow \infty \). The results are general and it covers the conclusions of [9]. Simulation results show that the asymptotic approximation has good performance for large \(M\). We also find that large antenna array can not only potentially reduce uplink transmit power, but also compensate for the decay due to user mobility. Simulation results also verify our guess that CSI delay may not decrease the uplink sum-rate due to the impact of pilot contamination. When the number of BS antennas goes without bound at a much greater rate than the number of users, if we schedule the \(k\)-th user of all cells who have the same prediction coefficient, the uplink sum-rate is the same as that with no CSI delay, and simulations verify our analysis.