1 Introduction

The Internet of Things (IoT) has grown immensely during the last decade with many emerging applications built around different types of sensors. With issues at device and protocol levels, there is a growing trend in integration of sensors and sensor based systems with cyber physical systems as well as device-to-device (D2D) communications [1,2,3]. Recently, research directions has been shifted towards 5G networks and one current active area of research in 5G is IoT [4]. One of the most important component of the IoT paradigm is wireless sensor networks (WSNs). WSNs can sense the environment, process the sensed data, and communicate for monitoring and diagnostics [6]. The sensor data originating from the future IoT is expected to be diverse and also grow manifold with each passing year. During the last few years, multi sensor data fusion attracted a lot of attention in many applications (such as radar and target detection, target tracking, smart grid, intelligent transportation systems, robot networks, smart cities, etc.) as they provide a cost-effective solution. Unknown parameters or phenomenon of interest can be estimated using centralized and distributed estimation systems. For centralized estimation systems, the sensor nodes collect data and transmit it to a central processor for signal processing. The central node is also responsible for communication with the individual nodes. The resulting estimate is as accurate as if each sensor had access to the complete information available across the network. However, this centralized system has several disadvantages including cost of communication overhead, power consumption, and single point of failure. In order to overcome these issues, the distributed estimation systems have been proposed in the literature. Each sensor node senses data and collaborates with its neighboring nodes in order to arrive at an estimate of the unknown parameter instead of transmitting the entire data to a central node. This approach offers less communication overhead and low processing load at the cost of added complexity at the nodes [7,8,9,10,11].

Various algorithms have been proposed in the literature for distributed estimation over WSNs [9,10,11,12,13,14,15,16,17,18,19,20,21]. Further, a detailed survey of distributed algorithms is available in [22]. The focus is primarily on least mean-square (LMS)-based algorithms due to simplicity and effectiveness [23]. However, an important limiting factor of LMS is the trade-off between convergence speed and steady-state miss-adjustment. One of the most popular and fully distributed algorithm is diffusion LMS [12]. To overcome the problem with LMS-based algorithms, several variable step-size (VSS) strategies have been proposed, which provide fast convergence speed initially while reducing the step-size with time in order to achieve a low error performance [24,25,26,27,28,29,30].

1.1 VSS strategies in literature

The work in [24] introduces a normalization factor to overcome the problem of slow convergence in the case high auto-correlation for the input data, thus making the step-size variable. A novel VSS algorithm is introduced in [25], which has become the most popular VSS algorithm in literature due to its low complexity and excellent performance. The algorithm uses the energy of the instantaneous error to update the step-size. However, the algorithm may suffer in performance if the noise power is high. To overcome this problem, another VSS strategy is proposed in [26], which uses the cross-correlation between the current and previous error values to update the step-size. The work in [27] proposes an improved version of [26] by incorporating the cross-correlation of the input signal in the update equation as well. However, despite these improvements, the VSS strategy of [25] generally outperforms the two algorithms as well as most other VSS algorithms. A noise-constrained VSS algorithm is proposed in [28]. The algorithm uses Lagrange multipliers with the constraint that the noise variance is known. The work in [29] proposes a VSS normalized LMS algorithm that provided a VSS strategy to the algorithm of [24]. The authors in [30] propose a sparsity-aware VSS strategy, which updates the step-size using the absolute value of the instantaneous error instead of the energy. The algorithm is shown to perform better for sparse systems compared with other VSS strategies.

1.2 VSS strategies for WSNs

The idea of varying the step-size has been extended to estimation in WSNs. In some cases, the authors have proposed to simply incorporate existing VSS strategies within the WSN framework [13,14,15,16]. The work in [13, 14] incorporates the VSS strategy proposed in [25] directly into the distributed estimation framework. The same VSS strategy has been used to transform domain distributed estimation by the authors in [15]. The authors in [16] use a recently proposed sparse VSS technique of [30] for distributed estimation of compressible systems. While others have used the setup of the network to specifically derive new VSS strategies [17,18,19,20,21]. The work in [17] improves the strategy introduced in [13] by diffusing the error values into the network as well. The work in [18] proposes a noise constrained distributed algorithm, derived using Lagrangian multipliers. The authors in [19,20,21] derive VSS strategies from mean square deviation calculations.

1.3 Analysis for VSS strategies

In general, each VSS algorithm aims to improve performance at the cost of computational complexity. Generally, this is an acceptable trade-off as the improvement in performance is considerable. The drawback is that the additional complexity results in the analysis of the algorithm becoming tedious. This complexity is further increased for the case of WSNs as different sensor nodes collaborate with each other to improve performance. Authors have used various assumptions in order to perform the analysis of these algorithms. However, each algorithm has been dealt with independently in order to find a closed-form solution. An exact method of analysis has been recently proposed in [31, 32]. Although the results are accurate, this method is mathematically complex as well as algorithm specific and cannot be generalized for all algorithms. A generic treatment of VSS algorithms has been presented in [33] but without taking into account a WSN. In a WSN, the data is shared between nodes and this fact needs to be taken into account while performing the analysis. Due to this very important factor, the analysis of [33] cannot be extended to the WSN scenario.

Table 1 lists which analysis has been performed for the VSS algorithms of [13,14,15,16,17,18,19,20,21] and compares them with the proposed work. As can be seen, most of the work includes either no analysis or only a part of the analysis. None of the authors have, however, performed the analysis in a generic way.

Table 1 Distributed VSS algorithms and analysis in literature

1.4 Contributions

This work presents a generalized analysis approach for LMS-based VSS techniques that have been applied to WSNs. The proposed generalized analysis can be applied to most existing as well as any forthcoming VSS approaches. Following are the main contributions of this work:

  • Derivation of the step-size limit for stability of distributed VSSLMS algorithms in WSNs.

  • Generalized transient analysis for distributed VSSLMS algorithms in WSNs, including derivation of iterative equations for mean square deviation (MSD) and excess mean square error (EMSE).

  • Derivation of steady-state equations for MSD and EMSE for distributed VSSLMS algorithms in WSNs.

  • Derivation of steady-state step-size terms for the VSSLMS algorithms being used as case study in this work.

  • Validation of theoretical analysis through experimentation.

A list of the acronyms being used within the paper are given below:

Table 2 Description of the acronyms used in this paper

The rest of the paper is organized as follows. Section 2 presents a working system model and problem statement. Section 3 details the complete theoretical analysis for diffusion-based distributed VSS LMS algorithms. Simulation results are presented in Sect. 4. Section 5 concludes the paper (Table 2).

2 System model

We consider an adaptive network consisting of N sensor nodes, as shown in Fig. reffigspsans, deployed over a geographical area to estimate an M dimensional unknown parameter vector, whose optimum value is denoted by \({\mathbf{w}}^o \in {\mathbb {R}}^{M}\). We denote the neighborhood of any node k by \({\mathcal {N}}_k\) and number of neighbors of node k by \(n_k\). The neighborhood of a node k is a set of nodes in close vicinity such that they have a single-hop communication link with node k, i.e., for \(l =1, 2, \ldots , N\) and \(l \ne k\), nodes l and k have single-hop communication between them if \(l \in {\mathcal {N}}_k\).

Fig. 1
figure 1

An illustration of adaptive network of N nodes

Each node k has access to a time realization of a known regressor row vector \({\mathbf{u}}_{k} (i)\) of length M and a scalar measurement \(d_k(i)\), which are related by

$$\begin{aligned} d_{k}(i)=\mathbf {u}_{k}(i)\mathbf {w}^{o}+v_{k}(i), \end{aligned}$$
(1)

where \(k=1,\ldots , N,v_k(i)\) is zero-mean additive white Gaussian noise with variance \(\sigma ^2_{v_k}\), which is spatially uncorrelated, and i is the time index. At each node, these measurements are used to generate an estimate of the unknown vector \(\mathbf{w}^o\), denoted by \({\mathbf{w}}_{k}(i)\).

The fully distributed Adapt-Then-Combine (ATC) DLMS algorithm proposed in [12] is summarized below.

$$\begin{aligned} e_k (i)= & {} d_k (i) - {\mathbf{u}}_k (i){\mathbf{w}}_k (i) \end{aligned}$$
(2)
$$\begin{aligned} {\varvec{\Psi }}_k (i)= & {} {\mathbf{w}}_k (i) + \mu _k {\mathbf{u}}_k^T (i)e_k (i) \end{aligned}$$
(3)
$$\begin{aligned} {\mathbf{w}}_k \left( i + 1 \right)= & {} \sum \limits _{l \in {\mathcal {N}}_k } {c_{lk} {\varvec{\Psi }}_l \left( {i} \right) } , \end{aligned}$$
(4)

where \({\varvec{\Psi }}_k (i)\) is the intermediate update, \(e_k (i)\) is the instantaneous error and \(c_{lk}\) are the combination weights [12].

The step-size \(\mu _k\) can be varied using any one of several available strategies [13,14,15,16,17,18,19,20,21]. Thus, the generic VSSDLMS algorithm is described by (2), (4) and the following set of equations

$$\begin{aligned} {\varvec{\Psi }}_k (i)= & {} {\mathbf{w}}_k (i) + \mu _k (i) {\mathbf{u}}_k^T (i)e_k (i) \end{aligned}$$
(5)
$$\begin{aligned} \mu _k (i + 1)= & {} f\{ \mu _k (i) \}, \end{aligned}$$
(6)

where \(f\{ \mu _k (i) \}\) is a function that defines the update equation for the step-size and varies for each VSS strategy.

While performing the analysis of the LMS algorithm, the input regressor vector is assumed to be independent of the estimated vector. For the VSS algorithms, it is generally assumed that control parameters are chosen such that the step-size and the input regressor vector are asymptotically independent of each other, resulting in a closed-form steady-state solution that closely matches with the simulation results. For some VSS algorithms, the analytical and simulation results are closely matched during the transition stage as well but this is not always the case. The results are still acceptable for all algorithms as a closed-form solution is obtained.

The main objective of this work is to provide a generalized analysis for diffusion-based VSSLMS algorithms, in lieu with the assumptions mentioned above. A list of symbols used in this paper is presented in Table 3.

Table 3 Description of the symbols used in the model

3 Proposed unified analysis framework

In a distributed network, the nodes exchange data, as can be seen from (4). As a result, there is going to exist a correlation between the data of the entire network. To account for this inter-node dependence, the performance of the whole network needs to be studied. Some new variables are introduced, transforming the local ones into global variables as follows:

$$\begin{aligned} {\mathbf{w}}(i)= & {} \text{ col } \left\{ {{\mathbf{w}}_{1}(i) ,\ldots ,{\mathbf{w}}_{N}(i) } \right\} ,\\ {\varvec{\Psi }}(i)= & {} \text{ col } \left\{ {{\varvec{\Psi }}_{1}(i) ,\ldots ,{\varvec{\Psi }}_{N}(i) } \right\} , \\ {\mathbf{U}}(i)= & {} \text{ diag } \left\{ {{\mathbf{u}}_{1}(i) ,\ldots ,{\mathbf{u}}_{N}(i) } \right\} , \\ {\mathbf{D}}(i)= & {} \text{ diag } \left\{ {\mu _{1}(i) \mathrm{{\mathbf{I}}}_M ,\ldots ,\mu _{N}(i) \mathrm{{\mathbf{I}}}_M } \right\} , \\ {\mathbf{d}}(i)= & {} \text{ col } \left\{ {d_1 (i),\ldots ,d_N (i)} \right\} , \\ {\mathbf{v}}(i)= & {} \text{ col } \left\{ {v_1 (i),\ldots ,v_N (i)} \right\} . \end{aligned}$$

Using the new variables, the set of equations representing the entire network is given by

$$\begin{aligned} {\mathbf{d}}(i)= & {} {\mathbf{U}}(i) {\mathbf{w}}^{\left( o \right) } + {\mathbf{v}}(i), \nonumber \\ {\varvec{\Psi }}\left( i+1\right)= & {} {\mathbf{w}}(i) + {\mathbf{D}}(i) {\mathbf{U}}^T(i) \left( {{\mathbf{d}}(i) - {\mathbf{U}}(i) {\mathbf{w}}(i) } \right) , \nonumber \\ {\mathbf{w}}\left( i+1\right)= & {} {\mathbf{G\Psi }}\left( i+1\right) , \nonumber \\ {\mathbf{D}}\left( i+1\right)= & {} f \{{\mathbf{D}}(i)\}, \end{aligned}$$
(7)

where \({\mathbf{w}}^{\left( o \right) } = {\mathbf{Q}} {\mathbf{w}}^o,{\mathbf{Q}} = col\left\{ \mathrm {{\mathbf{I}}}_M,\mathrm {{\mathbf{I}}}_M,\ldots ,\mathrm {{\mathbf{I}}}_M\right\} \) is a \(MN \times M\) matrix, \({\mathbf{G}} = {\mathbf{C}} \otimes \mathrm{{\mathbf{I}}}_M,{\mathbf{C}}\) is an \(N \times N\) weighting matrix, \(\left\{ \mathbf{C}\right\} _{lk} = c_{lk}\) and \(\otimes \) is the Kronecker product.

3.1 Mean analysis

The analysis is now carried out using the new set of equations. The global weight-error vector is given by

$$\begin{aligned} {{\tilde{\mathbf{w}}}}(i) = {\mathbf{w}}^{\left( o \right) } - {\mathbf{w}}(i). \end{aligned}$$
(8)

Since \({\mathbf{G}}{\mathbf{w}^{\left( o\right) }} \buildrel \Delta \over = {\mathbf{w}^{\left( o\right) }}\), by incorporating the global weight-error vector into (7), we get

$$\begin{aligned} {{\tilde{\mathbf{w}}}}\left( i+1\right)= & {} {\mathbf{G}{\tilde{{\varvec{\Psi }}}} }\left( i+1\right) \nonumber \\= & {} {\mathbf{G}{\tilde{\mathbf{w}}}}(i) - {\mathbf{GD}}(i) {\mathbf{U}}^T(i) \left( {\mathbf{U}}(i) {{\tilde{\mathbf{w}}}}(i) + {\mathbf{v}}(i) \right) \nonumber \\= & {} {\mathbf{G}}\left( {{{\mathbf{I}}}_{MN} - {\mathbf{D}}(i) {\mathbf{U}}^T(i) {\mathbf{U}}(i) } \right) {{\tilde{\mathbf{w}}}}(i) \nonumber \\&-\,{\mathbf{GD}}(i) {\mathbf{U}}^T(i) {\mathbf{v}}(i). \end{aligned}$$
(9)

Using the assumption that the step-size matrix \({{\mathbf{D}}(i) }\) is independent of the regressor matrix \({\mathbf{U}}(i)\) [14, 16, 18, 25,26,27,28,29], the following relation holds true asymptotically

$$\begin{aligned} {\mathbb {E}}\left[ \mathbf{D} (i) \mathbf{U}^T(i) \mathbf{U} (i) \right] \approx {\mathbb {E}}\left[ \mathbf{D} (i) \right] {\mathbb {E}}\left[ {\mathbf{U}}^T(i) \mathbf{U} (i) \right] , \end{aligned}$$
(10)

where \({\mathbb {E}}\left[ {{\mathbf{U}}^T(i) {\mathbf{U}}(i) } \right] = {\mathbf{R}_{\mathbf{U}}}\) is the auto-correlation matrix of \({{\mathbf{U}}(i)}\). Now, taking the expectation on both sides of (9) and simplifying gives

$$\begin{aligned} {\mathbb {E}}\left[ {{{\tilde{\mathbf{w}}}}\left( i+1\right) } \right] = {\mathbf{G}} \left( {\mathrm{{\mathbf{I}}}_{MN} - {\mathbb {E}}\left[ {{\mathbf{D}}(i) } \right] {\mathbf{R}}_{\mathbf{U}} } \right) {\mathbb {E}}\left[ {{{\tilde{\mathbf{w}}}}(i) } \right] . \end{aligned}$$
(11)

The expectation of the second term of the right-hand side of (9) is zero due to the measurement noise being zero-mean as well as spatially uncorrelated with the input regressor.

It can be seen from (11) that the term defining the stability of the algorithm is \(\left( {\mathrm{{\mathbf{I}}}_{MN} - {\mathbb {E}}\left[ {{\mathbf{D}}(i) } \right] {\mathbf{R}}_{\mathbf{U}} } \right) \). As the matrix \({{\mathbf{D}}(i) }\) is diagonal, this term can be further simplified to the node-level as \({\left( {\mathrm{\mathbf{I}} - {\mathbb {E}}\left[ {\mu _{k}(i) } \right] {\mathbf{R}}_{{\mathbf{u}},k} } \right) }\). Thus, the stability condition is given by

$$\begin{aligned} \prod \limits _{i = 0}^n {\left( {\mathrm{\mathbf{I}} - {\mathbb {E}}\left[ {\mu _{k}(i) } \right] {\mathbf{R}}_{{\mathbf{u}},k} } \right) } \rightarrow 0, \quad as\, n \rightarrow \infty \end{aligned}$$
(12)

which holds true if the mean of the step-size is governed by

$$\begin{aligned} 0< {\mathbb {E}}\left[ {\mu _{k}(i) } \right] < \frac{2}{{\lambda _{\max } \left( {{\mathbf{R}}_{{\mathbf{u}},k} } \right) }}, \quad 1 \le k \le N , \end{aligned}$$
(13)

where \({{\lambda _{\max } \left( {{\mathbf{R}}_{{\mathbf{u}},k} } \right) }}\) is the maximum eigenvalue of the auto-correlation matrix \({{\mathbf{R}}_{{\mathbf{u}},k} }\).

3.2 Mean-square analysis

Following the analysis procedure of [23], we take the weighted norm of (9) and then apply the expectation operator. After simplifying we get

$$\begin{aligned} {\mathbb {E}}\left[ {\left\| {{{\tilde{\mathbf{w}}}}\left( i+1\right) } \right\| _{\varvec{\Sigma }} ^2 } \right]= & {} {\mathbb {E}}\left[ {\left\| {{{\tilde{\mathbf{w}}}}(i) } \right\| _{{{\hat{\varvec{\Sigma }}}}}^2 } \right] \nonumber \\&+\,{\mathbb {E}}\left[ {{\mathbf{v}}^T(i) {\mathbf{Y}}^T(i) {\varvec{\Sigma }} {\mathbf{Y}}(i) {\mathbf{v}}(i) } \right] , \end{aligned}$$
(14)

where

$$\begin{aligned} {\mathbf{Y}}(i)= & {} {\mathbf{GD}}(i) {\mathbf{U}}^T(i) \end{aligned}$$
(15)
$$\begin{aligned} {{\hat{\varvec{\Sigma }}}}= & {} {\mathbf{G}}^T {\varvec{\Sigma }} {\mathbf{G}} - {\mathbf{G}}^T {\varvec{\Sigma }} {\mathbf{Y}}(i) {\mathbf{U}}(i) \nonumber \\&-\,{\mathbf{U}}^T(i) {\mathbf{Y}}^T(i) {\varvec{\Sigma }} {\mathbf{G}} + {\mathbf{U}}^T(i) {\mathbf{Y}}^T(i) {\varvec{\Sigma }} {\mathbf{Y}}(i) {\mathbf{U}}(i). \end{aligned}$$
(16)

The analysis becomes quite tedious for non-Gaussian data. Therefore, the data is assumed to be Gaussian, without loss of generality [23]. The auto-correlation matrix is decomposed as \({\mathbf{R}}_{{\mathbf{U}}} = {\mathbf{T}} {\varvec{\Lambda }} {\mathbf{T}}^T\), where \(\varvec{\Lambda }\) is an eigenvalues diagonal matrix and \(\mathbf{T}\) is a matrix of eigenvectors corresponding to the eigenvalues, such that \({\mathbf{T}}^T {\mathbf{T}} = {\mathbf{I}}\). Using \(\mathbf{T}\), the variables are redefined as

$$\begin{aligned} {{\bar{\mathbf{w}}}}(i)= & {} {\mathbf{T}}^T {{\tilde{\mathbf{w}}}}(i) \quad {{\bar{\mathbf{U}}}}(i) = {\mathbf{U}}(i) {\mathbf{T}} \\ {{\bar{\mathbf{G}}}}= & {} {\mathbf{T}}^T {\mathbf{GT}} \quad {\bar{\varvec{\Sigma }} = {\mathbf{T}}^T {\varvec{\Sigma }} {\mathbf{T}}} \\ \bar{\varvec{\Sigma }} '= & {} {\mathbf{T}}^T {\varvec{\Sigma }} '{\mathbf{T}} \quad {{\bar{\mathbf{D}}}}(i) = {\mathbf{T}}^T {\mathbf{D}}(i) {\mathbf{T}} = {\mathbf{D}}(i) , \end{aligned}$$

where the input regressors are considered independent for each node and the step-size matrix \({\mathbf{D}} (i)\) is block-diagonal so it remains the same.

Using the data independence assumption [23] and simplifying, we arrive at the final recursive update equation

$$\begin{aligned} {\mathbb {E}} \left[ \left\| {\bar{\mathbf{w}}}(i+1) \right\| ^2_{\bar{\varvec{\sigma }}} \right]= & {} {\mathbb {E}} \left[ \left\| {\bar{\mathbf{w}}}(i) \right\| ^2_{\bar{\varvec{\sigma }}} \right] + {\mathbf{b}}^T(i) \bar{\varvec{\sigma }} \nonumber \\&+\,\left\| {\bar{\mathbf{w}}}^{(o)} \right\| ^2_{{{\mathcal {A}}}(i)\left[ \mathbf{F}(i) - \mathbf{I}_{M^2 N^2} \right] \bar{\varvec{\sigma }}} \nonumber \\&+\,{{\mathcal {B}}} (i) \left[ \mathbf{F}(i) - \mathbf{I}_{M^2 N^2} \right] \bar{\varvec{\sigma }}, \end{aligned}$$
(17)

where

$$\begin{aligned} {\mathbf{F}}(i)= & {} \left[ {{\mathbf{I}}_{M^2 N^2 } - \left( {{\mathbf{I}}_{MN} \odot \varvec{\Lambda } {\mathbb {E}}\left[ {\mathbf{D}} (i) \right] } \right) } \right. \nonumber \\&\left. -\,\left( {\varvec{\Lambda } {\mathbb {E}}\left[ {\mathbf{D}} (i) \right] \odot {\mathbf{I}}_{MN} } \right) + \left( {{\mathbb {E}}\left[ {{\mathbf{D}}(i) \odot {\mathbf{D}}(i)} \right] } \right) \mathbf{A} \right] \nonumber \\&.\left( {{\mathbf{G}}^T \odot {\mathbf{G}}^T } \right) . \end{aligned}$$
(18)
$$\begin{aligned} {{\mathcal {A}}}(i+1)= & {} {{\mathcal {A}}}(i) \mathbf{F}(i). \end{aligned}$$
(19)
$$\begin{aligned} {{\mathcal {B}}}(i+1)= & {} {{\mathcal {B}}} (i) \mathbf{F}(i) + \mathbf{b}^T (i) \mathbf{I}_{M^2 N^2}. \end{aligned}$$
(20)

Finally, taking the weighting matrix \(\varvec{\Sigma } = \mathbf{I}_{M^2 N^2}\) gives the mean-square-deviation (MSD) while \(\varvec{\Sigma } = \varvec{\Lambda }\) gives the EMSE. The detailed analysis and description of the variables are given in Appendix A.

3.3 Steady-state analysis

At steady-state, (31) and (18) become

$$\begin{aligned} {\mathbb {E}}\left[ {\left\| {{{\bar{\mathbf{w}}}}_{ss} } \right\| _{\overline{\varvec{\sigma }} }^2 } \right]= & {} {\mathbb {E}}\left[ {\left\| {{{\bar{\mathbf{w}}}}_{ss} } \right\| _{{\mathbf{F}}_{ss} \overline{\varvec{\sigma }} }^2 } \right] + {\mathbf{b}}_{ss}^T \overline{\varvec{\sigma }}, \end{aligned}$$
(21)
$$\begin{aligned} {\mathbf{F}}_{ss}= & {} \left[ {\mathbf{I}}_{M^2 N^2 } - \left( {{\mathbf{I}}_{MN} \odot \Lambda {\mathbb {E}}\left[ {{\mathbf{D}}_{ss} } \right] } \right) \right. \nonumber \\&\left. -\,\left( {\Lambda {\mathbb {E}}\left[ {{\mathbf{D}}_{ss} } \right] \odot {\mathbf{I}}_{MN} } \right) \right. \nonumber \\&\left. +\,\left( {{\mathbb {E}}\left[ {{\mathbf{D}}_{ss} \odot {\mathbf{D}}_{ss} } \right] } \right) \mathbf{A} \right] \nonumber \\&.\left( {{\mathbf{G}}^T \odot {\mathbf{G}}^T } \right) , \end{aligned}$$
(22)

where \({\mathbf{D}}_{ss} = \text{ diag } \left\{ \mu _{ss,k} {\mathbf{I}}_M \right\} ,{\mathbf{b}}_{ss} = {\mathbf{R}}_{\mathbf{v}} {\mathbf{D}}_{ss}^2 {\varvec{\Lambda }}\) and \({\mathbf{D}}_{ss}^2 = \text{ diag } \left\{ \mu _{ss,k}^2 {\mathbf{I}}_M \right\} \). Simplifying (21), we get

$$\begin{aligned} {\mathbb {E}}\left[ {\left\| {{{\bar{\mathbf{w}}}}_{ss} } \right\| _{\overline{\varvec{\sigma }} }^2 } \right] = {\mathbf{b}}_{ss}^T \left[ {{\mathbf{\mathbf{{I}}}}_{M^2 N^2 } - {\mathbf{F}}_{ss} } \right] ^{ - 1} \overline{\varvec{\sigma }}. \end{aligned}$$
(23)

This equation gives the steady-state performance measure for the entire network. In order to solve for the steady-state values of MSD and EMSE, we take \(\bar{\varvec{\sigma }} = \text{ bvec }\{\mathbf{I}_{M^2 N^2}\}\) and \(\bar{\varvec{\sigma }} = \text{ bvec }\{\varvec{\Lambda }\}\), respectively.

3.4 Steady-state step-size analysis

The analysis presented in the above section has been generic for any VSS algorithm. In this section, 3 different VSS algorithms are chosen to present the steady-state analysis for the step-size. These steady-state step-size values are then directly inserted into (23) and (22). The 3 different VSS algorithms and their step-size update equations are given in Table 4. The first algorithm, denoted KJ is the work of Kwong and Johnston [25], as used in [14]. The NC algorithm refers to the noise-constrained LMS algorithm [28], as used in [18]. Finally, Sp refers to the Sparse VSSLMS algorithm of [30], as used in [16]. It should be noted that the step-size matrix for the network, \({\mathbf{D}}\), is diagonal. Therefore, the step-size for each node can be studied independently.

Applying the expectation operator and simplifying results in the equations presented in Table 5, where \(\zeta _k(i)\) is the value of the EMSE for node k.

Table 4 Step-size update equations for the VSSLMS algorithms
Table 5 Expectations of the update equations from Table 4

At steady-state, the step-size is given by \(\mu _{k,ss}\) and the approximate steady-state equations are given in Table 6. It should be noted that the steady-state EMSE values are assumed small enough to be ignored.

Table 6 Steady-state step-size values for equations from Table 4

4 Results and discussion

In this section, the analysis presented above will be tested upon three VSS algorithms listed in Table 4 when they are incorporated within the DLMS framework. The analysis is verified through three different experiments. The first experiment plots the theoretical transient MSD using (17) and compares with simulation results. Further, steady-state results are tabulated to see the effects of the assumptions. Simulation steady-state results are compared to theoretical results attained using (17) as well as (23). In the second experiment, steady-state MSD results are compared for different network sizes while the signal-to-noise ratio (SNR) is varied. Finally, a tabular comparison has been shown to ascertain the difference between the results obtained from (23) with those from [14, 18] and [16].

4.1 Experiment 1

For the first experiment, the network size, \(N = 10\), the size of the unknown parameter vector, \(M = 4\) and the SNR is varied between 0, 10 and 20 dB. The step-size control parameters chosen for this experiment are given in Table 7. The values are slightly different in some cases in order to maintain similar convergence speed. The results are shown in Figs. 2, 3, 4.

Table 7 Step-size control parameters for experiment 1
Fig. 2
figure 2

Theory (17) v simulation MSD comparison for the KJ algorithm [14]

Fig. 3
figure 3

Theory (17) v simulation MSD comparison for the NC algorithm [18]

Fig. 4
figure 4

Theory (17) v simulation MSD comparison for the Sp algorithm [16]

The comparison for the KJ algorithm are shown in Fig. 2. There is a slight discrepancy due to the assumptions that have been made. However, this discrepancy is acceptable as the steady-state results match very closely. The results for the NC algorithm are shown in Fig. 3. There is a mismatch in the transient stage in this case as well but the results match very closely at steady-state again. Finally, the comparison for the Sp algorithm is shown in Fig. 4. The mismatch during the transient stage is greater for this algorithm compared with the previous two cases. However, since the step-size is assumed to be asymptotically independent of the regressor data and there is an excellent match at steady-state, these results are completely acceptable. The steady-state results are tabulated in Table 8. Due to the assumption that the EMSE is small enough to be ignored at steady-state, there exists a slight mismatch between the results of (17) and (23). However, the results are overall closely matched and the results are acceptable.

Table 8 Theory versus simulation comparison for steady-state MSD for the different VSS algorithms

4.2 Experiment 2

For the second experiment, the SNR is varied from 0 dB to 40 dB and the steady-state results are plotted. These are compared with the theoretical results obtained from (23). The network size is varied between \(N = 10\) and \(N = 20\). The step-size controlling parameters used for this experiment are given in Table 9. The initial step-size is kept small for the NC algorithm as the initial step-size effects the steady-state value as shown in Table 6. However, there is no relation between the steady-state step-size value and the initial step-size value for the other algorithms. This has also been shown for the case of the KJ algorithm in [14]. The results are shown in Fig. 5 for the KJ algorithm. There is a close match and a steady downward trend with increase in SNR. The results for the NC algorithm are shown in Fig. 6. There is a slight mismatch but this is not significant and can be ignored. Finally, the results for the Sp algorithm are shown in Fig. 7. There is an excellent match and a steady improvement in performance with increase in SNR.

Table 9 Step-size control parameters for experiment 2
Fig. 5
figure 5

Theory (23) v simulation MSD comparison for the KJ algorithm [14] for different network sizes and varying SNR

Fig. 6
figure 6

Theory (23) v simulation MSD comparison for the NC algorithm [18] for different network sizes and varying SNR

Fig. 7
figure 7

Theory (23) v simulation MSD comparison for the Sp algorithm [16] for different network sizes and varying SNR

4.3 Experiment 3

In this final experiment, the steady-state MSD theoretical results from [14, 18] and [16] are compared with the results obtained from (23). The analysis performed in [14, 18] and [16] gives exact expressions for \(\mu _{k,ss}\) as well as \(\mu ^2_{k,ss}\) for evaluating steady-state MSD. In the present work, a generic expression has been proposed. This experiment shows that the results from the exact analysis and those obtained from (23) match closely. The unknown vector length is varied between \(M=2,M=4\) and \(M=6\). The SNR is varied between 0, 10 and 20 dB. The step-size control parameters are given in Table 10. The results are shown in Table 10. As can be seen, there are slight mismatches among certain values. However, the values are very closely matched for all cases. Thus, the proposed analysis procedure has been verified as generic and applicable to different VSS strategies in WSNs (Table 11).

5 Conclusion

Parameter estimation is an important aspect for WSNs that form an integral part of the IoT paradigm. This work presents a generalized approach for the theoretical analysis of LMS-based VSS algorithms being employed for estimation in diffusion-based WSNs. The analysis provided here can prove to be an excellent tool for the analysis of any VSS approach applied to the WSN framework in future. Various existing algorithms have been tested thoroughly to verify the results of the presented work under different conditions. Simulation results confirm the generic behavior for the proposed analysis for both the transient state as well as steady-state. Furthermore, a comparison has been performed between the proposed analysis results and results obtained from the analysis that already exists in the literature. Results were found to be closely matched. further strengthening our claim that the proposed analysis is generic for VSS strategies applied to WSNs.

Table 10 Step-size control parameters for experiment 3
Table 11 Comparison between proposed and existing analyses