13.1 Introduction

13.1.1 Spectral Unmixing

Hyperspectral remote sensing sensors collect spectral information from the Earth’s surface using hundreds of narrow and contiguous wavelength bands [1]. It has been widely applied in various fields, such as target detection, material mapping, and material identification [2]. However, due to insufficient spatial resolution and spatial complexity, pixels in remotely sensed hyperspectral images are likely to be formed by a mixture of pure spectral constituents (endmembers) rather than a single substance [3]. The existence of mixed pixels complicates the exploitation of hyperspectral images [4]. Spectral unmixing, aimed at estimating the fractional abundance of the pure spectral signatures or endmembers, was proposed to deal with the problem of spectral mixing and effectively identifies the components of the mixed spectra in each pixel [5].

Unmixing algorithms rely on specific mixing models, which can be characterized as either linear or nonlinear [5, 6]. On the one hand, the linear model assumes that the spectral response of a pixel is given by a linear combination of the endmembers present in the pixel. On the other hand, the nonlinear mixture model assumes that the incident radiation interacts with more than one component and is affected by multiple scattering effects [3, 7]. As a result, nonlinear unmixing generally requires prior knowledge about object geometry and the physical properties of the observed objects [8]. The linear mixture model exhibits practical advantages, such as ease of implementation and flexibility in different applications. In this chapter, we will focus exclusively on the linear mixture model.

Under the linear mixture model, a group of unmixing approaches has been proposed [9,10,11,12,13]. Depending on whether a spectral library is available or not, we classify these methods into two categories, i.e., unsupervised and semi-supervised unmixing algorithms. With the wide availability of spectral libraries, sparse unmixing [8], as a semi-supervised approach in which mixed pixels are expressed in the form of combinations of a number of pure spectral signatures from a large spectral library, is able to handle the drawbacks introduced by such virtual endmembers and the unavailability of pure pixels.

13.1.2 Sparse Unmixing

The sparse unmixing approach exhibits significant advantages over unsupervised approaches, as it does not need to extract endmembers from the hyperspectral data or estimate the number of endmembers. Another advantage of sparse unmixing is that it provides great potential for accurate estimation of the fractional abundances, as all endmembers are normally represented in the library. However, these algorithms fully rely on the availability of a library in advance, and hence their semi-supervised nature.

The success of sparse unmixing relies on the fact that the unmixing solution is sparse, as the number of endmembers used to represent a mixed pixel is generally much smaller than the number of spectral signatures in the library [8]. As a result, new algorithms have been developed to enforce the sparsity on the solution. The sparse unmixing algorithm via variable splitting and augmented Lagrangian (SUnSAL) [8] adopts the \(\ell _{1}\) regularizer on the abundance matrix, which aims at introducing sparsity through the spectral domain that a pixel is unlikely to be mixed by a high number of components. The introduction of SUnSAL brought new insights into the concept of sparse unmixing. However, the real degree of sparsity is beyond the reach of the \(\ell _{1}\) regularizer due to the imbalance between the number of endmembers in the library and the number of components that generally participate in a mixed pixel. New algorithms have been developed in order to perform a better characterization of the degree of sparsity. Some techniques have focused on the introduction of new orders over the sparse regularizer such as the collaborative SUnSAL (CLSUnSAL) algorithm [14] and the graph-regularized \(\ell _{1/2}\)-NMF (GLNMF) method [15]. Other algorithms have introduced weighting factors to penalize the nonzero coefficients on the sparse solution [16], such as the reweighted sparse unmixing method [17] and the double reweighted sparse unmixing (DRSU) algorithm [18]. Although these methods obtained promising results, they consider pixels in a hyperspectral data as independent entities, and the spatial–contextual information in the hyperspectral image is generally disregarded. Since hyperspectral images generally follow specifical spatial arrangements by nature, it is important to consider spatial information for their characterization [19].

Following this observation, several algorithms have focused on incorporating spatial correlation into the final solution. For instance, the sparse unmixing via variable splitting augmented Lagrangian and total variation (SUnSAL-TV) [20] represents one of the first attempts to include spatial information in sparse unmixing. It exploits the spatial information via a first-order pixel neighborhood system. Similar to SUnSAL, SUnSAL-TV opened new avenues and brought new insights into the concept of spatial sparse unmixing, which is able to promote piece-wise transitions in the estimated abundances. However, its performance strongly relies on the parameter settings [21]. At the same time, its model complexity results in a heavy computational cost, further limiting its practical application potential. New developments aimed at fully exploiting the spatial correlation among image features (and further imposing sparsity on the abundance matrix) have been mainly developed along two directions. High-order neighborhood information over spatial regularizers has been introduced to reach this goal. For instance, the nonlocal sparse unmixing (NLSU) algorithm [22] can take advantage of high-order structural information. However, the neighborhood of the pixel changes randomly, thus limiting the continuity of spectral information. Another drawback of NLSU is that its model is more complex than that of SUnSAL-TV, which limits its practical application. Spatially weighted factors (aimed at characterizing spatial information through the inclusion of a weight on the sparse regularizer) have also been used to account for the spatial information in sparse unmixing. For example, the local collaborative sparse unmixing (LCSU) uses a spatial weight to impose local collaborativity, thus addressing some of the issues observed in SUnSAL-TV (including oversmoothed boundaries and blurred abundance maps) [23]. With similar complexity as the SUnSAL-TV, the LCSU exhibits similar unmixing performance as the SUnSAL-TV. This indicates that using spatial weights (as compared to spatial regularizers) has good potential in terms of improved unmixing performance and computational complexity. In [24], the spectral–spatial weighted sparse unmixing (S\(^{2}\)WSU) is proposed, which simultaneously exploits the spectral and spatial information contained in hyperspectral images via weighting factors, aiming at enhancing the sparsity of the solution. As a framework, the S\(^{2}\)WSU algorithm with its open structure, it is able to accept multiple types of spectral and spatial weighting factors, thus providing great flexibility for the exploration of different spatial scenarios, such as edge information, nonlocal similarity, homogeneous neighborhood information, etc.

13.1.3 Deep Learning for Spectral Unmixing

With advances in computer technology, learning-based approaches for unmixing have achieved a fast development in the past few years. Joint Bayesian unmixing is a typical example of learning-based approaches, which leads to good abundance estimates due to the incorporation of a full additivity (i.e., sum-to-one) and nonnegativity constraints [25,26,27]. Approaches based on artificial neural networks (ANNs) have also been developed for the learning of abundance fractions, assuming the prior knowledge of the endmember signatures [28,29,30]. These approaches exhibit better performance when compared with handcrafted methods, but they assume that endmembers are known in advance and, therefore, need to incorporate endmember extraction algorithms to perform unmixing. More recently, as a common tool for deep learning, auto-encoders have achieved a fast development in unmixing applications. Nonnegative sparse auto-encoder (NNSAE) and denoising auto-encoder were employed to obtain the endmember signatures and abundance fractions simultaneously for unmixing, with advanced denoising and intrinsic self-adaptation capabilities [31,32,33]. However, their strength lies in the aspect of noise reduction and they exhibit limitations when dealing with outliers. Due to the fact that outliers likely lead to initialization problems, their presence can bring strong interference to the unmixing solutions. In [34], a stacked nonnegative sparse auto-encoder (SNSA) is proposed to address the issue of outliers. For linear mixing model (LSM)-based hyperspectral unmixing, the physical meaning of the model implies the sum-to-one on abundance fractions when every material in a pixel can be identified [3, 33, 35]. However, similar to the NMF-based approaches, SNSA adopts an additivity penalty on the abundance coefficients. The additivity penalty denotes that a penalty coefficient is used for controlling approximation of the sum-to-one. As this is not a hard constraint, the sum-to-one constraint is not necessarily ensured [34].

In [36], the fully unsupervised deep auto-encoder network (DAEN) unmixing method was recently proposed to address the presence of outliers in hyperspectral data. The DAEN has two main steps. In the first step, the spectral features are learned by the stacked auto-encoders (SAEs), aiming at generating good initializations for the network. In the second step, it employs a variational auto-encoder (VAE) to perform unmixing for the estimation of the endmembers and abundances. VAE combines variational inference to perform unsupervised learning and inherit auto-encoder architecture which can be trained with gradient descent [37]. Different from conventional auto-encoders, VAEs include a reparameterization which strictly ensures the abundance sum-to-one constraint during unmixing. Compared with other NMF-based algorithms, the DAEN has three main advantages: (1) with the use of SAEs, it can effectively tackle the problem of outliers and generate a good initialization of the unmixing network; (2) with the adoption of a VAE, it can ensure the nonnegativity and sum-to-one constraints, resulting in the good performance on abundance estimation; and (3) the endmember signatures and abundance fractions are obtained simultaneously. We emphasize the fully unsupervised nature of DAEN as one of its most powerful features.

13.1.4 Contributions of This Chapter

In this chapter, we focus on two types of techniques that are currently at the forefront of spectral unmixing. First, we provide an overview of advances in sparse unmixing algorithms, which can improve over traditional sparse unmixing algorithms by including spatial–contextual information that is crucial for a better scene interpretation. As these algorithms are semi-supervised and dependent on a library, we then describe new developments in the use of deep learning to perform spectral unmixing in fully unsupervised fashion, focusing on the DAEN method. Our experiments with simulated and real hyperspectral datasets demonstrate the competitive advantages of these innovative approaches over some well-established unmixing methods.

The remainder of this paper is organized as follows. The principles of sparse unmixing theory are presented in Sect. 13.2. The DAEN unmixing method is described in detail in Sect. 13.3. Section 13.4 describes several experiments to evaluate sparse unmixing algorithms. Section 13.5 describes several experiments to evaluate the DAEN algorithm. Finally, Sect. 13.6 concludes with some remarks and hints at plausible future research lines.

13.2 Sparse Unmixing Techniques

13.2.1 Sparse Versus Spectral Unmixing

The linear mixture model assumes that the spectral response of a pixel in any given spectral band is a linear combination of all of the endmembers present in the pixel at the respective spectral band. For each pixel, the linear model can be written as follows:

$$\begin{aligned} \begin{array}{ccl} \mathbf{y} &{} = &{} \mathbf{M} {\varvec{\alpha }} + \mathbf{n}\\ \text{ s.t.: } &{} &{} {\varvec{\alpha }}_{j} \ge 0,\;\;\sum \limits _{j=1}^q{\varvec{\alpha }}_{j}=1, \end{array} \end{aligned}$$
(13.1)

where \(\mathbf{y}\) is a \({d\times 1}\) column vector (the measured spectrum of the pixel), d denotes the number of bands. \(\mathbf{M}\) is a \({d \times q}\) matrix containing q pure spectral signatures (endmembers), \({\varvec{\alpha }}\) is a \({q \times 1}\) vector containing the fractional abundances of the endmembers, and \(\mathbf{n}\) is a \({d \times 1}\) vector collecting the errors affecting the measurements at each spectral band. The so-called abundance nonnegativity constraint (ANC) (\({\varvec{\alpha }}_{j}\ge 0\) for \((j=1,2,\ldots ,q)\)) and the abundance sum-to-one constraint (ASC)(\(\sum _{j=1}^q{\varvec{\alpha }}_{j}=1\)).

Sparse unmixing reformulates (13.1) assuming the availability of a library of spectral signatures a priori as follows:

$$\begin{aligned} \mathbf{y} = \mathbf{A}{} \mathbf{h}+ \mathbf{n}, \end{aligned}$$
(13.2)

where \(\mathbf{h}\in \mathbb {R}^{m\times 1}\) is the fractional abundance vector compatible with spectral library \(\mathbf{A} \in \mathbb {R}^{d\times m}\) and m is the number of spectral signatures in \(\mathbf{A}\).

Assuming that the dataset contains n pixels organized in the matrix \(\mathbf{Y}=[\mathbf{y}_1,\dots ,{} \mathbf{y}_n]\in \mathbb {R}^{d \times n}\) we may write then

$$\begin{aligned} \mathbf{Y} = \mathbf{A}{} \mathbf{H}+ \mathbf{N}~~\text {s.t.:}\ \ \ \ ~\mathbf{H}\ge 0, \end{aligned}$$
(13.3)

where \(\mathbf{N}=[\mathbf{n}_1,\dots ,\mathbf{n}_n]\in \mathbb {R}^{d\times n}\) is the error. \(\mathbf{H} = [\mathbf{h}_1,\dots ,\mathbf{h}_n]\in \mathbb {R}^{m\times n}\) denotes the abundance maps corresponding to library \(\mathbf{A}\) for the observed data \(\mathbf{Y}\), and \(\mathbf{H}\ge 0\) is the so-called abundance nonnegativity constraint (ANC). It should be noted that we explicitly enforce the ANC constraint without the abundance sum-to-one constraint (ASC), due to some criticisms about the ASC in the literature [8].

As the number of endmembers involved in a mixed pixel is usually very small when compared with the size of the spectral library, the abundance matrix \(\mathbf{H}\) is sparse. With these considerations in mind, the unmixing problem can be formulated as an \(\ell _2-\ell _0\) optimization problem,

$$\begin{aligned} \min _{\mathbf{H}} \,\,\,\,\,\, \frac{1}{2}||\mathbf{A}{} \mathbf{H}-\mathbf{Y}||^2_F+\lambda ||\mathbf{H}||_0~~\,\text {s.t.:}\ \ \ \ ~\mathbf{H}\ge 0, \end{aligned}$$
(13.4)

where \(\Vert \cdot \Vert _F\) is the Frobenius norm and \(\lambda \) is a regularization parameter. Problem (13.4) is nonconvex and difficult to solve [38, 39]. The SUnSAL alternatively uses the \(\ell _2-\ell _1\) norm to replace the \(\ell _2-\ell _0\) norm and solves the unmixing problem as follows [40]:

$$\begin{aligned} \min _{\mathbf{H}} \,\,\,\,\,\, \frac{1}{2}||\mathbf{A}{} \mathbf{H}-\mathbf{Y}||^2_F+\lambda ||\mathbf{H}||_{1,1}~~\,\text {s.t.:}\ \ \ \ ~\mathbf{H}\ge 0, \end{aligned}$$
(13.5)

where \(||\mathbf{H}||_{1,1}=\sum ^n_{i=1}||\mathbf{h}_{i}||_{1}\) with \(\mathbf{h}_{i}\) (\(i=1,\ldots ,n\)) being the ith column of \(\mathbf{H}\). SUnSAL solves the optimization problem in (13.5) efficiently using the ADMM [40]. However, as stated before, the real degree of sparsity is generally beyond the reach of the \(\ell _{1}\) regularizer.

13.2.2 Collaborative Regularization

Similar to (13.5), in [14], an \(\ell _{2, 1}\) mixed norm (called collaborative regularization) was proposed, which globally imposes sparsity among the endmembers in collaborative fashion for all pixels. According to the collaborative sparse unmixing model described in [14], the objective function can be defined as follows:

$$\begin{aligned} \min _{\mathbf{H}} \frac{1}{2}||\mathbf{A H}-\mathbf{Y}||^2_F+\lambda \sum _{k=1}^m||\mathbf{h}^k||_2~~\text {s.t.}~\mathbf{h}\ge 0, \end{aligned}$$
(13.6)

where \(\mathbf{h}^k\) denotes the k-th line of \(\mathbf{H}\) (\(k=1,2,\ldots ,m\)) and \(\sum _{k=1}^m||{\mathbf{h}^k}||_2\) is the so-called \(\ell _{2, 1}\) mixed norm. Note that the main difference between SUnSAL and CLSUnSAL is that the former employs pixel-wise independent regressions, while the latter enforces joint sparsity among all the pixels.

13.2.3 Total Variation Regularization

In order to take into account the spatial information of the image, a total variation (TV) regularizer can be integrated with SUnSAL (called SUnSAL-TV) to promote spatial homogeneity among neighboring pixels [20]:

$$\begin{aligned}&~&\min _{\mathbf{H}}\ \ \ \ \ \ \frac{1}{2}||\mathbf{A H}-\mathbf{Y}||^2_F+\lambda ||\mathbf{H}||_{1,1}+\lambda _{TV}TV{(\mathbf{H})} \nonumber \\&~&\text {s.t.:}\ \ \ \ ~ \ \ \mathbf{H}\ge 0, \end{aligned}$$
(13.7)

where \(TV{{(\mathbf{H})}\equiv \sum _{\{k,i\}\in \mathscr {N}}||\mathbf{h}_k-\mathbf{h}_i||_1}\), \(\mathscr {N}\) represents the set of (horizontal and vertical) pixel neighbors in the image, and \(\mathbf{h}_k\) denotes a series of the neighboring pixels of \(\mathbf{h}_i\) in abundance matrix \(\mathbf{H}\). SUnSAL-TV shows great potential to exploit the spatial information for sparse unmixing. However, it may lead to oversmoothness and blurred boundaries.

13.2.4 Local Collaborative Regularization

In [23], the proposed LCSU assumes that endmembers tend to appear localized in spatially homogeneous areas instead of distributed over the full image. The proposed approach can also preserve global collaborativity (e.g., in the case that an endmember appears in the whole image), since it generalizes to global collaborativity through local searching:

(13.8)

where \(\mathbf{h}^k\) denotes the kth line of matrix \(\mathbf{H}\) (\(k=1,2,\ldots ,m\)), \(\sum _{k=1}^m||{\mathbf{h}^k}||_2\) is the so-called \(\ell _{2, 1}\) mixed norm, \(\mathcal{N}(i)\) is the neighborhood of pixel i (\(i=1,2,\ldots ,n\)), and \(\lambda \) is a regularization parameter controlling the degree of sparseness. The main difference between the proposed approach and SUnSAL-TV is that LCSU imposes collaborative sparsity among neighboring pixels, while SUnSAL-TV aims at promoting piece-wise smooth transitions in abundance estimations. In other words, SUnSAL-TV enforces that neighboring pixels share similar fractional abundances for the same endmember, while LCSU focuses on imposing local collaborativity among the full set of endmembers, thus addressing problems observed in SUnSAL-TV such as oversmoothed or blurred abundance maps. The main difference between problem (13.8) and problem (13.6) is that LCSU introduces spatial information to promote local collaborativity, while CLSUnSAL focuses on global collaborativity. In comparison with CLSUnSAL, the proposed LCSU assumes that neighboring pixels share the same support. This is more realistic, as a given endmember is likely to appear localized in a spatially homogeneous region rather than in the whole image.

13.2.5 Double Reweighted Regularization

Inspired by the success of weighted \(\ell _1\) minimization in sparse signal recovery, the double reweighted sparse unmixing and total variation (DRSU-TV) [41] was proposed to simultaneously exploit the spectral dual sparsity as well as the spatial smoothness of fractional abundances, as follows:

$$\begin{aligned} \begin{aligned} \min _{\mathbf{H}} \,\,\,\,\,\, \frac{1}{2}||\mathbf{A}{} \mathbf{H}-\mathbf{Y}||^2_F+ \lambda ||(\mathbf{W}_\mathrm{spe2}{} \mathbf{W}_\mathrm{spe1}) \odot \mathbf{H}||_{1,1}+ \lambda _\mathrm{TV}\mathrm{TV}{(\mathbf{H})}, ~~\,\text {s.t.:}\ \ \ \ ~\mathbf{H}\ge 0, \end{aligned} \end{aligned}$$
(13.9)

where the operator \(\odot \) denotes the element-wise multiplication of two variables. The first regularizer \(\lambda ||(\mathbf{W}_\mathrm{spe2}{} \mathbf{W}_\mathrm{spe1})\odot \mathbf{X}||_{1,1}\) introduces a prior with spectral sparsity, where \(\lambda \) is the regularization parameter, \(\mathbf{W}_\mathrm{spe1}=\{w_{\mathrm{{spe1}},\mathrm{{ki}}} | k=1,\ldots ,m, i=1,\ldots ,n\}\in \mathbb {R}^{m\times n}\) and \(\mathbf{W}_\mathrm{spe2}=\text {diag}(w_{\mathrm{spe2},11}, \ldots , w_{\mathrm{spe2},\mathrm{kk}},\ldots ,w_{\mathrm{spe2},\mathrm{mm}})\in \mathbb {R}^{m\times m}\), for \(k=1,\ldots , m\), are the dual weights, with \(\mathbf{W}_\mathrm{spe1}\) being the original weight introduced in [16] aimed at penalizing the nonzero coefficients on the solution and \(\mathbf{W}_\mathrm{spe2}\) promoting nonzero row vectors. The latter regularizer \(\lambda _\mathrm{TV}\mathrm{TV}{(\mathbf{H})}\) exploits the spatial prior with \(\lambda _\mathrm{TV}\) being the parameter controlling the degree of smoothness. It can be seen that DRSU-TV incorporates a TV-based regularizer to enforce the spatial smoothness of abundances compared to DRSU.

In [41], Problem (13.9) is optimized via ADMM under an iterative scheme. The dual weights \(\mathbf{W}_\mathrm{spe1}\) and \(\mathbf{W}_\mathrm{spe2}\) are updated as follows, at iteration \(t+1\):

$$\begin{aligned} {w}_{\mathrm{spe1},\mathrm{ki}}^{(t+1)}=\frac{1}{{h}_{ki}^{(t)}+\varepsilon }, \end{aligned}$$
(13.10)

where \(\varepsilon >0\) is a small positive value and

$$\begin{aligned} {w}^{(t+1)}_{\mathrm{spe2},\mathrm{kk}}= \frac{1}{{||\mathbf{H}^{(t)}(k,:)||_2}+\varepsilon }, \end{aligned}$$
(13.11)

where \(\mathbf{H}^{(t)}(k,:)\) is the kth row in the estimated abundance of the tth iteration. Notice that, as shown in (13.10) and (13.11), it is suggested that large weights be used to discourage nonzero entries in the recovered signal, while small weights encourage nonzero entries. DRSU-TV, exploiting the spectral and spatial priors simultaneously under the sparse unmixing model, exhibits good potential in comparison with the \(\ell _1\)- or TV-based methods. However, as an adaptation of the \(\ell _1\)- and TV-based approach, the limitations of DRSU-TV are associated with the use of a regularizer-based spatial prior. That is, the computational complexity is similar to that of SUnSAL-TV. Such high computational complexity constrains the practical applications of DRSU-TV. Furthermore, the unmixing performance of the method is sensitive to the regularization parameter \(\lambda _\mathrm{TV}\).

13.2.6 Spectral–Spatial Weighted Regularization

In [24], the S\(^{2}\)WSU algorithm is developed, which aims at exploiting the spatial information more efficiently for sparse unmixing purposes. As opposed to the approaches that exploit a regularizer-based spatial prior (which have one additional parameter for the spatial regularizer and often exhibit high complexity), the S\(^{2}\)WSU algorithm includes the spatial correlation via a weighting factor, resulting in good computational efficiency and less regularization parameters. Let \(\mathbf{W}_\mathrm{spe}\in \mathbb {R}^{m\times m}\) be the spectral weighting matrix and \(\mathbf{W}_\mathrm{spa}\in \mathbb {R}^{m\times n}\) be the spatial one. Following [16], the objective function of the S\(^{2}\)WSU is given as follows:

$$\begin{aligned} \min _{\mathbf{H}} \,\, \frac{1}{2}||\mathbf{A}{} \mathbf{H}-\mathbf{Y}||^2_F+\lambda ||(\mathbf{W}_\mathrm{spe}{} \mathbf{W}_\mathrm{spa})\odot \mathbf{H}||_{1,1}, ~~\,\text {s.t.:}\ ~\mathbf{H}\ge 0. \end{aligned}$$
(13.12)

For the spectral weighting factor \(\mathbf{W}_\mathrm{spe}\), relying on the success of [14, 18], it adopts row collaborativity to enforce joint sparsity among all the pixels. Similar to the \(\mathbf{W}_\mathrm{spe2}\) in DRSU-TV, the \(\mathbf{W}_\mathrm{spe}\) aims at enhancing the sparsity of the endmembers in the spectral library. In detail, at iteration \(t+1\), it can be updated as

(13.13)

For the spatial weighting factor \(\mathbf{W}_\mathrm{spa}\), let \({w}_{\mathrm{spa},{ki}}^{(t+1)}\) be the element of the kth line and ith row in \(\mathbf{W}_{\mathrm{spa}}\) at iteration \(t+1\), it incorporates the neighboring information as follows:

$$\begin{aligned} {w}_{\mathrm{spa},{ki}}^{(t+1)}=\frac{1}{f_{x\in \mathscr {N}(i)}({h}_{kx}^{(t)})+\varepsilon }, \end{aligned}$$
(13.14)

where \(\mathscr {N}(i)\) denotes the neighboring set for element \({h}_{ki}\), and \(f(\cdot )\) is a function explicitly exploiting the spatial correlations through the neighborhood system. It uses the neighboring coverage and importance to incorporate the spatial correlation as follows:

$$\begin{aligned} f({h}_{ki}) =\frac{\sum _{x\in \mathscr {N}(i)}{\theta }_{kx} h_{kx}}{\sum _{x\in \mathscr {N}(i)}{\theta }_{kx}}, \ \ \end{aligned}$$
(13.15)

where \(\mathscr {N}(i)\) corresponds to the neighboring coverage and \(\theta \) represents the neighborhood importance. It considers the 8-connected (\(3\times 3\) window) for algorithm design and experiments. With respect to the neighboring importance, for any two entries k and i, we compute it as follows:

$$\begin{aligned} {\theta }_{ki}=\frac{1}{im(k,i)}, \end{aligned}$$
(13.16)

where function \(\mathrm{im}(\cdot )\) is the important measurement over the two elements \(h_k\) and \(h_i\). Let (ab) and (cd) be the spatial coordinates of \({h}_k\) and \({h}_i\). The European distance is specifically considered, that is, \({\theta }_{ki}={1}/\sqrt{(a-c)^2+(b-d)^2}\).

It should be noted that the optimization problem of S\(^{2}\)WSU can be iteratively solved by an outer–inner looping scheme, where the inner loop updates the unmixing coefficients via ADMM and the outer loop updates the spectral and spatial weights [24].

13.3 Deep Learning for Hyperspectral Unmixing

As one of the very few unsupervised approaches available, the deep auto-encoder network (DAEN) unmixing method specifically addresses the presence of outliers in hyperspectral data [36]. In the following subsections, we describe the different processing modules that compose this promising approach for deep hyperspectral unmixing.

13.3.1 NMF-Based Unmixing

Let \(\mathbf{Y} \equiv [\mathbf{y}_1,\ldots ,\mathbf{y}_n]\in \mathbb {R}^{d \times n}\) be matrix representation of a hyperspectral dataset with n spectral vectors and d spectral bands. Under the linear mixing model, we have [3, 42]

$$\begin{aligned} \mathbf{Y}&= \;\mathbf{WH} +\mathbf{N}\\ \text {s.t.:}\;\mathbf{H}&\ge 0, {\mathbf{1}^{T}_m\mathbf{H}= \mathbf{1}_n^T,}\nonumber \end{aligned}$$
(13.17)

where \(\mathbf{W}\equiv [\mathbf{w}_1,\ldots ,\mathbf{w}_m]\in \mathbb {R}^{d\times m}\) is the mixing matrix containing m endmembers, \(\mathbf{w}_i\) denotes the ith endmember, \(\mathbf{H} \ge 0\) and \(\mathbf{1}^{T}_m\mathbf{H}= \mathbf{1}_n^T\) are the so-called abundance nonnegativity and sum-to-one constraints, which stem from a physical interpretation of the abundance vectors, and \(\mathbf{1}_m= [1, 1, \ldots ,1]^T\) is a column vector of size m (the notation \([\cdot ]^T\) stands for vector or matrix transpose). Finally, \(\mathbf{N} \in \mathbb {R}^{d\times n}\) is the error matrix that may affect the measurement process (e.g., noise). It should be noted that the symbol naming in this section is not the same as the naming in Sect. 13.2. In addition, we have a detailed description of each symbol.

For a given observation \(\mathbf{Y}\), unmixing aims at obtaining the mixing matrix \(\mathbf{W}\) and the abundance matrix \(\mathbf{H}\). In this work, we tackle the simultaneous estimation of \(\mathbf{W}\) and \(\mathbf{H}\) by seeking a solution with the following NMF-based optimization:

$$\begin{aligned} (\mathbf{W}, \mathbf{H}) \ \ = \ \ \displaystyle \arg \min _{\mathbf{W}, \mathbf{H}} \;\; \frac{1}{2}\Vert \mathbf{Y}-\mathbf{W}{} \mathbf{H}\Vert ^2_F + \mu \;{f_1}(\mathbf{W}) + \lambda f_2(\mathbf{H}), \end{aligned}$$
(13.18)

where \(\Vert \cdot \Vert _F^2\) denotes the Frobenius norm, \(f_1(\mathbf{W})\) and \(f_2(\mathbf{H})\) are two regularizers on the mixing matrix \(\mathbf{W}\) and the abundance fractions \(\mathbf{H}\), respectively, with \(\mu \) and \(\lambda \) being the regularization parameters.

13.3.2 Deep Auto-Encoder Network

In this section, the DAEN unmixing method [36] is described (illustrated in Fig. 13.1), where \(\mathbf{U}\) and \(\mathbf{V}\) are the latent variables (LV) of the reparameterization of the VAE, respectively. As shown in Fig. 13.1, the endmember matrix \(\mathbf{W}\) corresponds to the last weight matrix of the decoder in VAE, and the abundance \(\mathbf{H}\) is estimated from the hidden layers of VAE, while \(\widehat{\mathbf{W}}\) and \(\widehat{\mathbf{H}}\) denote the initializations for VAE generated by SAEs, respectively.

Fig. 13.1
figure 1

The flowchart of the proposed DAEN, which includes two parts, i.e., stacked auto-encoders (SAEs) and a variational auto-encoder (VAE). The stacked auto-encoders (SAEs) generate the initializations \(\widehat{\mathbf{W}}\) and \(\widehat{\mathbf{H}}\) for the VAE, while the VAE performs the NMF-based unmixing aiming at obtaining the endmembers W and abundances H, respectively

13.3.3 Stacked Auto-Encoders for Initialization

Based on the geometry assumption that endmembers are generally located around the vertices of the data simplex, we use a pure pixel-based method to extract a set of candidate pixels as the training set for the SAEs. Specifically, we adopt VCA to obtain a set of k candidates, with \(k>m\). As VCA considers random directions in the subspace projection [3, 43], we run it for p times, resulting in q candidates, with \(q = p\cdot k\). These q candidates are then grouped into m training sets \(\{\mathbf{C}_i\}_{i=1}^{m}\) based on the spectral angle distance (SAD) and clustering, with \(\mathbf{C}_i=[\mathbf{c}_1,\ldots , \mathbf{c}_{i_n}]\in \mathbb {R}^{d\times i_n}\) and \(i_n\) is the number of samples in \(\mathbf{C}_i\). Let \(\mathbf{c}_{i_o}\) and \(\mathbf{c}_{j_o}\) be the cluster centers of \(\mathbf{C}_i\) and \(\mathbf{C}_j\), respectively. For any candidate \(\mathbf{c}_{i_s}\) in \(\mathbf{C}_i\), for \(i_s=1,\ldots ,i_n\), we have \(\text {SAD}(\mathbf{c}_{i_o},\mathbf{c}_{i_s})\le \text {SAD}(\mathbf{c}_{j_o},\mathbf{c}_{i_s})\), for any \(j=1,\ldots ,m\) and \(j\ne i\), where

$$\begin{aligned} \text {SAD}(\mathbf{c}_{i_o},\mathbf{c}_{i_s}) = \mathrm{arc}\cos \Big ( \frac{[\mathbf{c}_{i_o},{\mathbf{c}_{i_s}}]}{\Vert \mathbf{c}_{i_o} \Vert \cdot \Vert {\mathbf{c}_{i_s}} \Vert } \Big ). \end{aligned}$$
(13.19)

In this work, for p and k, we empirically set \(p=30\) and \(k=3\) m, respectively. By enforcing nonnegativity, the training of SAEs minimizes the reconstruction error as follows:

$$\begin{aligned} \min \sum _{s=1}^{i_n} \Vert \mathbf{c}_{s}-{\widehat{\mathbf{w}}_i}\Vert ^2_2, \end{aligned}$$
(13.20)

where \({\widehat{\mathbf{w}}_i}\) is the reconstructed signature of the ith endmember and \(\widehat{\mathbf{W}}=[\widehat{\mathbf{w}}_1,\ldots ,\widehat{\mathbf{w}}_m]\) are the reconstructed endmember matrix. Following [44], the reconstructed signature is denoted as

$$\begin{aligned} {\widehat{\mathbf{w}}_i}={\mathbf{{M}}_i{{f}}({\mathbf{{M}}_i^T}{{{\mathbf{C}}_i}})}, \end{aligned}$$
(13.21)

where \(\mathbf{{M}}_i\) is the matrix of weights between the input and hidden neurons or those from hidden to output neurons, and \({ f}(\cdot )\) is the activation function [44] given by

$$\begin{aligned} {f}(\mathbf{g}_i)=\frac{1}{1+\exp (\mathbf{-a}_i.*\mathbf{g}_i-\mathbf{b}_i)}, \end{aligned}$$
(13.22)

where \(\mathbf{g}_i={\mathbf{{M}}_i^T}{\mathbf{C}}_i\), \(\mathbf{a}_i\) and \(\mathbf{b}_i\) are parameters aimed at controlling the information transmission between neurons, and \(.*\) is the dot product, i.e., element-wise operator. Notice that the number of input neurons and output neurons is the same as the hidden neurons, while the number of hidden neurons here is set as the number of bands. Then, we can use a gradient rule to update \(\mathbf{a}_i\) and \(\mathbf{b}_i\) as follows:

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} &{}{{\Delta {} \mathbf{a}_i}=\gamma (1-(2+\frac{1}{\tau }){ f}_i+\frac{1}{\tau }{ f}_i^2)}, \\ &{} \\ &{}{{\Delta {} \mathbf{b}_i}=\gamma \frac{1}{\mathbf{b}_i}+\mathbf{g}_i\Delta \mathbf{a}_i}, \end{array}\right. } \end{aligned} \end{aligned}$$
(13.23)

where \(\gamma \) and \(\tau \) are hyper-parameters in the learning process controlling the mean activity level of the desired output distribution. Following the empirical settings in [44], we set \(\gamma = 0.0001\) and \(\tau = 0.2\). With the aforementioned definition in hand, the learning reduces to the following update rule:

$$\begin{aligned} \Delta \mathbf{M}_{i}\Leftarrow {\eta }\Delta \widehat{\mathbf{w}}_i{{ f}_i^T}+ \vert {\mathbf{M}_{i}}\vert , \end{aligned}$$
(13.24)

where \(\Delta \widehat{\mathbf{w}}_i\) is the gradient of candidate i for update, \(\vert {\mathbf{M}_{i}}\vert \) enforces the weight matrix to be nonnegative, and \(\eta \) is an adaptive learning rate. In this work, following [44], we set \(\eta =\hat{\eta }({\Vert {f}_i\Vert }^2+\epsilon )^{-1}\) with \(\hat{\eta }=0.002\), where \(\epsilon =0.001\) is a small parameter to ensure the positivity of \(\eta \).

Finally, let \({\widehat{\mathbf{w}}_i^t}\), \({\widehat{\mathbf{w}}_i^{t+1}}\) be the reconstructions from the t-th and (\(t+1\))-th auto-encoders, respectively. The SAEs ends when \(\Vert {\widehat{\mathbf{w}}_i^{t+1}}-{\widehat{\mathbf{w}}_i^{t}}\Vert ^2_2\) converges.

After the endmember matrix \(\widehat{\mathbf{W}}\) is reconstructed, based on the linear mixing model (13.17), the abundances \(\widehat{\mathbf{H}}\) can be obtained via the fully constrained least square (FCLS) [42]. In the learning of the VAE, \(\widehat{\mathbf{W}}\) and \(\widehat{\mathbf{H}}\) are used as initializations of \(\mathbf{W}\) and \(\mathbf{H}\), respectively.

13.3.4 Variational Auto-Encoders for Unmixing

First, let us recall the NMF-based objective function in (13.18), which contains two regularizers on the mixing matrix and abundance matrix, respectively. For the first regularizer \(f_1(\mathbf{W})\) on the mixing matrix, following [11], we have

$$\begin{aligned} f_1(\mathbf{W})=\text {MinVol}(\mathbf{W}), \end{aligned}$$
(13.25)

where \(\text {MinVol}(\cdot )\) is a function aiming at enclosing all the pixels into the simplex constructed by the endmembers. Specifically, following [11], we set \(\text {MinVol}(\mathbf{W})=\Vert \text {det}(\mathbf{W})\Vert \), with \(\Vert \text{ det }(\mathbf{W})\Vert \) being the volume defined by the origin and the columns of \(\mathbf{W}\).

With respect to regularizer \(f_2(\mathbf{H})\) on the abundance matrix, in order to ensure the nonnegativity and sum-to-one constraints, we employ the variational auto-encoder (VAE) to penalize the solution of \(\mathbf{H}\), denoted as

$$\begin{aligned} f_2(\mathbf{H})=\text {VAE}(\mathbf{H}), \end{aligned}$$
(13.26)

where the neurons of all hidden layers are set as the number of endmembers, while the number of inputs and outputs corresponds to the number of pixels.

With these definitions in mind, we obtain the following objective function:

$$\begin{aligned} \begin{array}{ll} (\mathbf{W}, \mathbf{H}) &{}= \displaystyle \arg \min _{\mathbf{W}, \mathbf{H}} \frac{1}{2}\Vert \mathbf{Y}-\mathbf{W}{} \mathbf{H}\Vert ^2_F \\ &{}\quad +\, {\mu } \;\text {MinVol}(\mathbf{W}) + {\lambda } \text {VAE}(\mathbf{H}). \end{array} \end{aligned}$$
(13.27)

In the following, we present the VAE-based regularizer in detail. Let \(\mathbf{U}\) and \(\mathbf{V}\) be the LV, we define \(f_2(\mathbf{H})\) as

$$\begin{aligned} {{f}_{2}}(\mathbf{H}(\mathbf{U},\mathbf{V}))=\left\| \frac{1}{2n}(\mathbf{1}_{m \times n}+\ln {\mathbf{V}^{2}}-{\mathbf{U}^{2}}-{\mathbf{V}^{2}}){\mathbf{1}_{n}} \right\| _{2}^{2}, \end{aligned}$$
(13.28)

where \(\mathbf{1}_{m \times n}\in \mathbb {R}^{m\times n}\) with all elements being 1, and vector \(\mathbf{1}_{n}=[1,\ldots ,1]^{T}\in \mathbb {R}^{n}\), \(\mathbf{U}=\{ \mathbf{u}_1,\ldots ,\mathbf{u}_n\}\in \mathbb {R}^{m\times n}\), \(\mathbf{V}=\{ \mathbf{v}_1,\ldots ,\mathbf{v}_n\}\in \mathbb {R}^{m \times n}\). The derivation of (13.28) is shown in [36]. Following [37], let \(\mathbf{u}_j=[u_{1,j},\ldots ,u_{m,j}]^T\in \mathbb {R}^{m}\) and \(\mathbf{v}_j=[v_{1,j},\ldots ,v_{m,j}]^T\in \mathbb {R}^{m}\) be the reparameters of LV, we define \({h}_{i,j}=\mathrm{Cons}({u}_{i,j},{v}_{i,j})\), where \(\mathrm{Cons}(\cdot )\) represents a decay function as follows:

$$\begin{aligned} \mathrm{Cons}(u_{i,j},v_{i,j})={\left\{ \begin{array}{ll} u_{i,j}+{\sigma }v_{i,j}, \;0<(u_{i,j}+{\sigma }v_{i,j})<1\\ \\ 0, \;\; \mathrm{otherwise},\end{array}\right. } \end{aligned}$$
(13.29)

where \({\sigma }\) is a parameter that, as indicated in [25], can be obtained via Monte Carlo (MC) sampling. In order to meet the abundance sum-to-one constraint, we have

$$\begin{aligned} {h_{m,j}}=1-\sum _{i=1}^{m-1}h_{i,j}. \end{aligned}$$
(13.30)

The objective function in (13.27) is a combinational problem, which is nonconvex, and therefore it is difficult to solve. In [36], it proposes an iterative scheme to optimize \(\mathbf{W}\) and \(\mathbf{H}\), respectively, both of which are solved by a gradient descent method. The first-order derivatives of the objective function are computed as follows:

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} &{}\nabla _\mathbf{U}(\mathbf{W},\mathbf{H})=d(\mathbf{U})-\frac{2\lambda }{n}{} \mathbf{z}{{({\mathbf{1}_{n}})}^{T}}.*\mathbf{U}, \\ \\ &{}\nabla _\mathbf{V}(\mathbf{W},\mathbf{H})=d(\mathbf{V})\text {+}\frac{2\lambda }{n}{} \mathbf{z}{{({\mathbf{1}_{n}})}^{T}}.*({\ln \mathbf{V}}./\mathbf{V}-\mathbf{V}), \\ \end{array}\right. } \end{aligned} \end{aligned}$$
(13.31)

where ./ is the dot division, \(\mathbf{z}=\frac{1}{2n}(\mathbf{1}_{m \times n}+\ln {\mathbf{V}^{2}}-{\mathbf{U}^{2}}-{\mathbf{V}^{2}}){\mathbf{1}_{n}}\). \(d(\mathbf{U})\) and \(d(\mathbf{V})\) are gradients of reconstructed errors, which are

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} &{}d(\mathbf{U})={{\mathbf{W}}^{T}}({\mathbf{W}}{\mathbf{H}}-\mathbf{Y})).*\mathbf{{\mathbb C}}_{cons}, \\ \\ &{}d(\mathbf{V})=\sigma {{\mathbf{W}}^{T}}({\mathbf{W}}{\mathbf{H}}-\mathbf{Y}).*\mathbf{{\mathbb C}}_{cons},\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(13.32)

where \(\mathbf{\mathbb C}_{cons}\) is an indicative function, \({\mathbb C}_{cons}=\mathbf{1}_{m \times n}\{0<(\mathbf{U}\text {+}\sigma \mathbf{V})<1\}\). For more details, the derivation of (13.31) is given in [36].

figure a

With respect to the updates of \(\mathbf{H}\) and \(\mathbf{W}\), we employ the gradient descent method for the solutions as follows:

$$\mathbf{H} \Leftarrow \mathbf{H} + \Delta \mathbf{H},$$

and

$$\mathbf{W} \Leftarrow \mathbf{W}+\Delta \mathbf{W},$$

where \(\Delta \mathbf{H}\) and \(\Delta \mathbf{W}\) are the gradients for \(\mathbf{H}\) and \(\mathbf{W}\), respectively. Specifically,

  • For \(\mathbf{H}\), we have

    $$\begin{aligned} \Delta \mathbf{H}= - {\varphi } (\nabla _\mathbf{U}(\mathbf{W},\mathbf{H})+{\sigma } \nabla _\mathbf{V}(\mathbf{W},\mathbf{H})), \end{aligned}$$
    (13.33)

    where \(\varphi \) is the learning rates that can be estimated by the Armijo rule [45].

  • For \(\mathbf{W}\), we obtain \(\Delta \mathbf{W}\) via Adadelta [46] as follows:

    $$\begin{aligned} \Delta \mathbf{W}=-\frac{\text {RMS}{{[\Delta \mathbf{W}]}}}{\text {RMS}{{[\nabla \mathbf{W}(\mathbf{W},\mathbf{H})]}}}\nabla _{\mathbf{W}}(\mathbf{W},\mathbf{H}), \end{aligned}$$
    (13.34)

    where \(\text {RMS} [\cdot ]\) is the root-mean-square [46]. The first-order derivatives of the objective function (13.27) are calculated as follows:

    $$\begin{aligned} \nabla _\mathbf{W}(\mathbf{W},\mathbf{H})=(\mathbf{WH}-\mathbf{Y}){\mathbf{H}^{T}}+{\mu } d(\text {MinVol}(\mathbf{W})). \end{aligned}$$
    (13.35)

    where \(d(\text {MinVol}(\mathbf{W}))\) is the gradient for the volume function, which can be computed as the one in [47].

Finally, a pseudocode of the proposed DAEN is given in Algorithm 1. As shown in Algorithm 1, DAEN consists of two main parts, a set of SAEs for initialization and one VAE for unmixing. Specifically, in Line 1, \(\mathbf{M}_i\) is randomly initialized. In Line 2, the hyper-parameters are set following [44], while in Line 3, the candidate samples used for training are generated via VCA. In Lines 4 and 5, \(\{\widehat{\mathbf{w}}_i\}\) and \(\{{\mathbf{M}_i}\}\) are iteratively updated until SAE terminates. In Line 6, it computes the abundance estimation \(\widehat{\mathbf{H}}\) via FCLS. In Line 7, the LV variables, \(\mathbf{U}\) and \(\mathbf{V}\), are randomly initialized. Finally, in Lines 8 and 9, the endmember matrix \(\mathbf{W}\) and the abundance matrix \(\mathbf{H}\) are iteratively updated, respectively.

13.4 Experiments and Analysis: Sparse Unmixing

In this section, we illustrate the unmixing performance of these sparse unmixing methods using simulated hyperspectral datasets. For quantitative analysis, the signal-to-reconstruction error (SRE, measured in dB) is used to evaluate the unmixing accuracy. For comparative purposes, the results obtained by SUnSAL [8], SUnSAL-TV [20], LCSU [23], DRSU [18], DRSU-TV [41], and S\(^{2}\)WSU [24] algorithms are reported. Let \(\widehat{\mathbf{h}}\) be the estimated abundance, and \(\mathbf{h}\) be the true abundance. The SRE(dB) can be computed as follows:

$$\begin{aligned} \text {SRE}(\text {dB})=10\cdot \log _{10}(E(||\mathbf{h}||_2^2)/E(||\mathbf{h}- {\widehat{\mathbf{h}}}||_2^2)), \end{aligned}$$
(13.36)

where \(E(\cdot )\) denotes the expectation function. Furthermore, we use another indicator, i.e., the probability of success \(p_{s}\), which is an estimate of the probability that the relative error power be smaller than a certain threshold. It is formally defined as follows: \(p_{s}\equiv P(\Vert \mathbf{\widehat{h}}-\mathbf{h}\Vert ^{2}/\Vert \mathbf{h}\Vert ^{2}\le threshold)\). In our case, the estimation result is considered successfully when \(\Vert \mathbf{\widehat{h}}-\mathbf{h}\Vert ^{2}/\Vert \mathbf{h}\Vert ^{2}\le 3.16 \) (5 dB). This threshold was demonstrated to be appropriate in [8]. The larger the SRE (dB) or the \(p_{s}\), the more accurate the unmixing results.

13.4.1 Simulated Datasets

The spectral library that we use in our synthetic image experiments is a dictionary of minerals extracted from the United States Geological Survey (USGS) library.Footnote 1 Such library, denoted by \(\mathbf{A}\), contains \(m=240\) materials (different mineral types), with spectral signatures with reflectance values consisting of \(L=224\) spectral bands and distributed uniformly in the interval 0.4–2.5 \(\upmu \)m. Following the work in [20], simulated data cube is generated with \(100\times 100\) pixels and nine spectral signatures (Adularia GDS57 Orthoclase, Jarosite GDS99 K Sy 200C, Jarosite GDS101 Na Sy 200, Anorthite HS349.3B, Calcite WS272, Alunite GDS83 Na63, Howlite GDS155, Corrensite CorWa-1, Fassaite HS118.3B.), which are randomly chosen from the spectral library \(\mathbf{A}\). The fractional abundances are piece-wise smooth, i.e., they are smooth with sharp transitions; moreover, they are subject to the ANC and ASC. These data can reveal the spatial features quite well for the different unmixing algorithms. For illustrative purposes, Fig. 13.2 shows the true abundance maps of the endmembers. After generating the data cube, it was contaminated with i.i.d. Gaussian noise, for three levels of the signal-to-noise (SNR) ratio: 30, 40, and 50 dB.

Fig. 13.2
figure 2

True fractional abundances of the endmembers in the simulated data cube

Table 13.1 shows the SRE (dB) and \(p_{s}\) results achieved by the different tested algorithms under different SNR levels. For all the tested algorithms, the input parameters have been carefully tuned for optimal performance. From Table 13.1, we can see that the methods of using double weights (DRSU, DRSU-TV, and S\(^{2}\)WSU) have obtained better SRE (dB) results than other algorithms in all cases. Furthermore, the S\(^{2}\)WSU achieved better SRE (dB) results than the competitors in all cases, which indicates that the inclusion of a spatial factor in the sparse regularizer can further promote the spatial correlation on the solution and improve the unmixing performance. The \(p_s\) obtained by the S\(^{2}\)WSU is also much better than those obtained by other algorithms in the case of low SNR values, which reveals that the inclusion of spatial information leads to high robustness. Based on the aforementioned results, we can conclude that the spatial weighted strategy offers the potential to improve sparse unmixing performance.

Table 13.1 SRE(dB) and \(p_{s}\) scores achieved after applying different unmixing methods to the simulated data Cube 1 (the optimal parameters for which the reported values were achieved are indicated in the parentheses)

13.4.2 Real Hyperspectral Data

In this section, we resort to the well-known Airborne Visible Infrared Imaging Spectrometer (AVIRIS) Cuprite dataset for evaluation of the proposed approach, which is a common benchmark for validation of spectral unmixing algorithms. The data are available online in reflectance units.Footnote 2 The portion used in experiments corresponds to a \(350\times 350\)-pixel subset of the scene, with 224 spectral bands in the range 0.4–2.5 \(\upmu \)m and nominal spectral resolution of 10 nm. Prior to the analysis, bands 1–2, 105–115, 150–170, and 223–224 were removed due to water absorption and low SNR, leaving a total of 188 spectral bands. The spectral library used in this experiment is the same library \(\mathbf{A}\) used in our simulated experiments and the noisy bands are also removed from \(\mathbf{A}\). The classification maps of these materials produced by Tricorder softwareFootnote 3 are also displayed. Figure 13.3 shows a mineral map produced in 1995 by USGS, in which the Tricorder 3.3 software product [48] was used to map different minerals present in the Cuprite mining district. The USGS map serves as a good indicator for qualitative assessment of the fractional abundance maps produced by the different unmixing algorithms. Note that the publicly available AVIRIS Cuprite data were collected in 1997 but the Tricorder map was produced in 1995. In addition, the true abundances of the real hyperspectral data are unavailable. Thus, we can only make a qualitative analysis of the performances of different sparse unmixing algorithms by comparing their estimated abundances with the mineral maps.

Fig. 13.3
figure 3

USGS map showing the location of different minerals in the Cuprite mining district in Nevada

Fig. 13.4
figure 4

Fractional abundance maps estimated by SUnSAL, SUnSAL-TV, LCSU, DRSU, DRSU-TV, and S\(^{2}\)WSU as compared to the classification maps produced by USGS Tricorder software for the considered \(350\times 350\)-pixel subset of the AVIRIS Cuprite scene

Figure 13.4 conducts a qualitative comparison between the classification maps produced by the USGS Tricorder algorithm and the fractional abundances estimated by SUnSAL, SUnSAL-TV, LCSU, DRSU, DRSU-TV, and S\(^{2}\)WSU algorithms for three highly representative minerals in the Cuprite mining district (Alunite, Buddingtonite, and Chalcedony). In this experiment, the regularization parameters used for SUnSAL, LCSU, DRSU, and S\(^{2}\)WSU were empirically set to \(\lambda =0.001\), \(\lambda =0.001\), \(\lambda =0.0001\), and \(\lambda =0.002\), respectively, while the parameters for SUnSAL-TV and DRSU-TV were set to \(\lambda =0.001, \lambda _\mathrm{TV}=0.001\) and \(\lambda =0.002, \lambda _\mathrm{TV}=0.0001\), respectively. As shown in Fig. 13.4, all the algorithms obtained reasonable unmixing results, with high abundances for the pixels showing the presence of the considered minerals. This indicates that the sparse unmixing algorithms can lead to good interpretation of the considered hyperspectral dataset. However, it can be seen that some of the abundance maps (e.g., Buddingtonite mineral) estimated by SUnSAL and SUnSAL-TV look noisy and the results obtained by SUnSAL-TV are oversmoothed. In addition, DRSU yields abundance maps without good spatial consistency of the minerals of interest (e.g., Chalcedony mineral), and we can also find that the abundances estimated by S\(^{2}\)WSU algorithms are generally comparable or higher in the regions classified as respective minerals in comparison to DRSU. Finally, the sparsity obtained by SUnSAL, SUnSAL-TV, LCSU, DRSU, DRSU-TV, and S\(^{2}\)WSU are 0.0682, 0.0743, 0.0734, 0.0430, 0.0423, and 0.0420, respectively. These small differences lead to the conclusion that the proposed approaches use a smaller number of elements to explain the data, thus obtaining higher sparsity. Therefore, from a qualitatively viewpoint, we can conclude that the S\(^{2}\)WSU method exhibits good potential to improve the results obtained by other algorithms in real analysis scenarios.

13.5 Experiments and Analysis: Deep Learning

In this section, the DAEN approach is applied to two real hyperspectral images: Mangrove [49] and Samson [50] datasets for evaluation. In these experiments, the parameters involved in the considered algorithms follow the settings in the simulated experiments, i.e., we use \(\mu = 0.1\) and \(\lambda = 0.1\), respectively.

We compare the DAEN approach presented in this work with other advanced unmixing algorithms, specifically with the N-FINDR [51], VCA [43], MVC-NMF [47], Bayesian [25], PCOMMEND [52], and SNSA [34] methods.

Three indicators, i.e., SAD, reconstruction error (RE), and root-mean-square error (RMSE) are used to measure the accuracy of the unmixing results, which are defined as follows:

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} &{} \text {SAD}(\mathbf{w}_i,{\widehat{\mathbf{w}}_i})=\arccos \Big ( \frac{[\mathbf{w}_i,{\widehat{\mathbf{w}}_i}]}{\Vert \mathbf{w}_i \Vert \cdot \Vert {\widehat{\mathbf{w}}_i} \Vert } \Big ),\\ &{} {\text {RE}(\{\mathbf{y}_j\}_{j=1}^{n},\{{\widehat{\mathbf{y}}_j}\}_{j=1}^{n}) = \frac{1}{n}} {\sum _{j=1}^{n}} {\sqrt{\Vert \mathbf{y}_j-{\widehat{\mathbf{y}}_j} \Vert _{2}^2}},\\ &{} {\text {RMSE}(\widehat{\mathbf{h}}_j,\mathbf{\ h}_j)= {{\frac{1}{n}}{\sum _{i=1}^{n}}}{\sqrt{\Vert \mathbf{\ h}_j-{\widehat{\mathbf{h}}_j} \Vert _{2}^2}}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(13.37)

where \({\widehat{\mathbf{w}}_i}\) and \(\mathbf{w}_i\) denote the extracted endmember and the library spectrum, \({\widehat{\mathbf{y}}_j}\) and \(\mathbf{y}_j\) are the reconstruction and original signature of pixel j, and \({\widehat{\mathbf{h}}_j}\) and \(\mathbf{h}_j\) are the corresponding estimated and actual abundance fractions, respectively.

13.5.1 Mangrove Dataset

The Mangrove data is an EO-1 Hyperion (hyperspectral) image which has been obtained from the USGS Earth Resources Observation and Science (EROS) Center through a data acquisition request to the satellite data provider [49], and collected over the Henry Island of the Sunderban Biosphere Reserve of West Bengal, India. After applying atmospheric correction, we have converted the radiance data to reflectance units by using FLAASH model in ENVI software, and the endmembers (pure signatures of mangrove species) have been identified by a ground survey of the study area, including Avicennia, Bruguiera, Excoecaria, Phoenix. The Mangrove data, as shown in Fig. 13.5, includes \(137 \times 187\) pixels and 155 bands, with a spatial resolution of 30 m. For detailed information of the Mangrove data, we refer to [49].

Fig. 13.5
figure 5

The \(45\times 45\) pixel subscene of the Mangrove data used in our experiment

In our experiment, a subscene with \(45\times 45\) pixels of the Mangrove data has been used to further evaluate the proposed DAEN. Following [49], the considered subscene contains four endmembers, i.e., \(m=4\).

Table 13.2 presents the obtained quantitative results from the Mangrove data. It can be seen that the DAEN achieved very promising results for the four considered mangrove spices. However, the other competitors ended up with errors when detecting or estimating the endmembers. This is due to the fact that, according to our observation, the Mangrove scene contains many outliers across the whole image, which brings a lot of difficulties for general unmixing methods. This point was verified by our experiment, in which we detected a total of 17 outliers. For illustrative purposes, Fig. 13.6 scatterplots the unmixing results obtained by the considered methods, in which the detected outliers are also illustrated. From Fig. 13.6, we can observe that the DAEN produced good unmixing results for this dataset, while all the other methods resulted in problems.

Table 13.2 SADs (in radians) and REs along with their standard deviations obtained by different methods for the Mangrove data from 10 Monte Carlo runs, where the best results are in bold
Fig. 13.6
figure 6

Unmixing results for the subscene of the Mangrove data, where the data are projected onto the first two principal components (PCs)

Fig. 13.7
figure 7

The estimated endmember signatures (in red), along with the ground reference (in blue) and their corresponding abundance maps by the proposed DAEN. a Avicennia, b Bruguiera, c Excoecaria, d Phoenix

Finally, for illustrative purposes, the estimated endmember signatures, along with their ground references, and the corresponding abundance maps obtained by the DAEN are shown in Fig. 13.7. Effective results can be observed from these figures.

In summary, our experiments with this challenging Mangrove dataset demonstrate the effectiveness of the DAEN for real scenarios with outliers, which is a general situation in real problems.

13.5.2 Samson Dataset

In this experiment, we use the Samson dataset which includes 156 bands covering the wavelengths from 0.401 to 0.889 \(\upmu \)m, and \(95 \times 95\) pixels, as shown Fig. 13.8, for validation [50]. There are three endmembers including Soil, Tree, and Water in the ground truth image.

Fig. 13.8
figure 8

The Samson image (a) and its corresponding ground truth (b)

Table 13.3 demonstrates the obtained quantitative results for the considered methods. It can be observed that the proposed DAEN obtained the best mean SAD and RMSE. For illustrative purposes, the endmember signatures and the estimated abundances are shown in Fig. 13.9. These figures reveal that the endmembers and abundances, estimated from DAEN, have good matches with regard to the corresponding ones in the ground truth.

13.6 Conclusions and Future Work

Spectral unmixing provides a way to quantitatively analyze sub-pixel components in remotely sensed hyperspectral images [19]. Sparse unmixing has been widely used as a semi-supervised approach that requires the presence of a library of spectral signatures. In this context, spectral–spatial sparse unmixing methods, which aim at collaboratively exploiting spectral and spatial–contextual information, offer a powerful unmixing strategy in case a complete spectral library is available a priori. If no spectral library is available in advance, we suggest the fully unsupervised deep auto-encoder network (DAEN) unmixing as a powerful approach that can effectively deal with the presence of outliers in hyperspectral data. Our experimental results reveal that the two aforementioned techniques are currently at the forefront of spectral unmixing. Specifically, we empirically found that the S\(^{2}\)WSU algorithm consistently achieves better unmixing performance than other advanced spectral unmixing algorithms in case a spectral library is available. This implies that the integration of spectral and spatial–contextual information via the considered spectral–spatial weighted strategy has great potential in improving unmixing performance. Our experiments also indicate that the fully unsupervised DAEN approach can handle problems with significant outliers more effectively than other popular spectral unmixing approaches. This is an important observation, since the presence of outliers is common in real problems and traditional unmixing algorithms are often misguided by outliers (that can be also understood as endmembers due to their singularity). Our future work will focus on exploring the combination of sparse unmixing and deep learning algorithms to further improve the unmixing performance.

Table 13.3 SADs (in radians) and REs along with their standard deviations obtained by different methods for the Samson data from 10 Monte Carlo runs, where the best results are in bold
Fig. 13.9
figure 9

Results obtained by the proposed DAEN on the Samson dataset. Top: Ground truth abundance maps on Samson data. Middle: Estimated abundance maps from the proposed DAEN. Bottom: Estimated endmember signatures (in red) along with their corresponding reference signatures (in blue). a Soil, b tree, c water