1 Introduction

Hyperspectral imaging (HSI) acquires images using hundreds of contiguous narrow electromagnetic wavelength bands that result in spectral pixels that represent spectral signatures of materials in a scene [1]. As every material has its own spectral signature that is different from other materials, HSI could be used to discriminate between different materials that are present in an acquired scene. That is why HSI has been used in a wide variety of civilian, environmental and military applications [2, 3]. Most digital processing algorithms applied to HSI data include spectral unmixing [4], anomaly detection [3], spectra classification [5] and material identification [6]. Spectral unmixing, i.e., the separation of a spectral pixel into its individual pure spectral signatures, commonly referred to as endmembers, has been an important data processing task from the earliest days of hyperspectral imaging.

This is because hyperspectral cameras typically have low spatial resolutions and image acquisition usually takes place from very far distances, e.g., in the case of satellite imaging. Therefore, an acquired spectral pixel is usually comprised of a linear mixture of endmembers that exist in the imaged region corresponding to it. The proportion of each endmember in this linear mixture is called its abundance. Spectral unmixing of a group of spectral pixels involves determining the endmembers, in addition to their abundances, that are available in each spectral pixel.

An overview of approaches to linear spectral unmixing in HSI is given in [4]. These approaches could be categorized into four broad categories. First, geometrical approaches that exploit the fact that acquired spectral pixels must lie in a linear simplex formed by the available endmembers [7]. Second, statistical approaches, e.g., Bayesian techniques that use probabilistic data models, utilize prior distributions to impose model constraints and estimate parameters of posterior probability distributions [8, 9]. Third, spatial–spectral contextual approaches that exploit both spatial and spectral features available in HSI data through computing spatial correlations between different spectral pixels [10]. Fourth, sparse signal recovery approaches assume that the unknown endmembers are sparse in some dictionary, where they could be obtained using a sparse signal recovery algorithm.

Sparse signal recovery-based spectral unmixing techniques could be implemented in a supervised fashion. This is done by assuming that each acquired spectral pixel is a linear combination of a priori known endmembers. These endmembers, i.e., pure spectral signatures, would have been obtained in a laboratory beforehand using a spectroradiometer. Supervised spectral unmixing then aims to find the optimal subset of endmembers from a very large library that can best fit each spectral pixel [11, 12].

Also, sparse signal recovery-based spectral unmixing could be solved using an unsupervised approach, where a priori knowledge of the endmembers would not be required. This unsupervised spectral unmixing could be viewed as a blind source separation (BSS) problem [13], where the unknown sources are assumed to be sparse signals. Two common mathematical formulations of this sparse BSS problem are sparse nonnegative matrix factorization (S-NMF) [14,15,16,17,18,19] and generalized morphological component analysis (GMCA) [20,21,22]. Both formulations are typically solved using a coordinate descent optimization approach [23], where one alternates between estimating endmembers while keeping their abundances constant (endmembers estimation step), and estimating abundances while keeping endmembers constant (abundances estimation step). Both steps are repeated till conversion to the sought optimal values of both endmembers and their abundances.

In its endmember estimation step, GMCA solves an l1-norm minimization problem, i.e., a basis pursuit problem [24] using thresholding methods [25] to obtain sparse estimates of the endmembers from the given multichannel hyperspectral data. Any basis pursuit problem requires specifying a data-dependent regularization parameter to specify the relative importance of sparsity levels of unknown variables and the error in their estimates, also this parameter sets the threshold value in the GMCA algorithm. Therefore, computationally inefficient trial and error is typically needed to find a suitable value for this parameter. GMCA, however, exploits the fact that it solves the basis pursuit problem only as a step in the larger coordinate descent optimization procedure. In its first iteration of coordinate descent, GMCA solves the required basis pursuit problem with a large initial value of this regularization parameter (obtained using trial and error), corresponding to endmembers with a high level of sparsity. In its following iterations of coordinate descent, GMCA solves the required basis pursuit problem with a lower value of this regularization parameter, thereby increasing details of the estimated endmembers. However, trial and error is a computationally inefficient manner to choose the initial value of the regularization parameter, and the arbitrary manner, in choosing the schedule used to lower its value, is not optimal and could affect convergence and spectral unmixing results obtained using GMCA.

Least angle regression (LARS) could solve the basis pursuit minimization problem efficiently [26], by simultaneously obtaining solutions corresponding to all relevant values of \( \lambda \), with a computational complexity comparable to solving a single unconstrained least squares problem. However, despite this clear computational advantage of LARS, it has not been applied to the spectral unmixing problem before. This is likely the case because the LARS algorithm is not directly suitable for multichannel and multicomponent HSI data, where data vectorization [27] would be necessary before its application. Such HSI data vectorization would result in extremely large matrices (~ 1010 elements) and vectors that would be very challenging to store and process using a typical computer.

In this paper, we exploit the properties of Kronecker products [28] to extend the LARS algorithm to handle large multichannel and multicomponent data without the need to construct or process very large arrays. This extension of the LARS algorithm would make the application of LARS to HSI spectral unmixing practical. Also, this extension of LARS would be of general importance, where it could be used for practical and efficient sparse signal recovery from large multichannel data. We refer to our extended LARS method as the Kronecker LARS (K-LARS) algorithm.

This paper is arranged as follows: Sect. 2 presents a mathematical formulation of the HSI spectral unmixing problem. Section 3 describes the sparsity-based approach to solve this problem. Section 4 presents our new K-LARS algorithm. In Sect. 5, we apply K-LARS to successfully achieve unsupervised spectral unmixing of both synthetic and AVIRIS HSI data. We also compare our results to ones obtained using an earlier basis pursuit-based algorithm (GMCA). We show that these two results are similar, albeit our results were obtained without arbitrary choices related to specifying the regularization parameter.

2 Spectral unmixing problem definition

2.1 Linear mixture model (LMM)

A linear mixing model (LMM) is widely used to represent the mixing of spectra in HSI [4]. Any spectral pixel, \( \varvec{y}_{\varvec{i}} \), could be represented as a superposition of pure endmembers that are available in the imaged scene,

$$ \varvec{y}_{i} = \mathop \sum \limits_{j = 1}^{M} \varvec{s}_{j} a_{ji} + \varvec{e}_{i} $$
(1)

where \( \varvec{s}_{\varvec{j}} = \left[ {s_{j1} , \ldots s_{jl} \ldots , s_{jL} } \right]^{T} \) is the absorption spectrum of the \( j{\text{th}} \) endmember of the available \( M \) endmembers and \( L \) is the number of HSI wavelengths. The abundance, \( a_{ji} \ge 0 \), is the proportion of the \( j{\text{th}} \) endmember in the \( i{\text{th}} \) spectral pixel that should be constrained as

$$ \mathop \sum \limits_{j = 1}^{M} a_{ji} = 1 $$
(2)

The vector \( \varvec{e}_{\varvec{i}} \) represents additive noise. Equation (1) could be written in matrix form as

$$ \varvec{Y} = \varvec{SA} + \varvec{E} $$
(3)

where \( \varvec{Y} = \left[ {\varvec{y}_{1} , \ldots , \varvec{y}_{N} } \right] \) is an \( L \times N \) HSI data matrix, \( N \) is the number of spectral pixels, \( \varvec{S} = \left[ {\varvec{s}_{1} , \ldots , \varvec{s}_{M} } \right] \,{\text{is}}\,{\text{an}}\,L \times M \) matrix whose columns represent the available endmembers, and \( \varvec{A} \) is an \( M \times N \) matrix that contains all the abundances of the endmembers in all spectral pixels. The problem of spectral unmixing is to estimate the endmember matrix \( \varvec{S} \) and the abundance matrix \( \varvec{A} \) given the spectral pixels matrix \( \varvec{Y} \). It could be considered a blind source separation problem [29].

2.2 Sparsity-constrained multichannel data fitting

For the LMM in (3), spectral unmixing could be formulated as a matrix least squares-based multichannel data fitting problem,

$$ \mathop {\hbox{min} }\limits_{{\varvec{S},\varvec{A}}} \frac{1}{2}\left\| {\varvec{Y} - \varvec{SA}} \right\|_{2}^{2} $$
(4)

This problem is equivalent to decomposing the multichannel data matrix \( \varvec{Y} \) into the sum of rank-1 matrices. Typically, such decomposition would not be unique, so additional constraints would be needed to obtain a unique solution. In sparsity-based spectral unmixing, such constraints could be added as a penalty term on the l1-norm of \( \varvec{S} \)

$$ \mathop {\hbox{min} }\limits_{{\varvec{S},\varvec{A}}} \frac{1}{2}\left\| {\varvec{Y} - \varvec{SA}} \right\|_{2}^{2} + \lambda \left\| \varvec{S} \right\|_{1} $$
(5)

2.3 Sparse endmember representation

An endmember, \( \varvec{s}_{\varvec{l}} \), is considered sparse in a dictionary,\( \boldsymbol{\varPhi}= \left[ {\boldsymbol{\varphi }_{1} , \ldots , \boldsymbol{\varphi }_{T} } \right] \), if it can be represented as a superposition

$$ \varvec{s}_{l} = \boldsymbol{\varPhi \alpha }_{l} = \mathop \sum \limits_{{\varvec{i} = 1}}^{\varvec{T}} \alpha_{l} \left[ i \right]\boldsymbol{\varphi }_{i} $$
(6)

where only a small number, compared to the dimension of \( \varvec{s}_{l} \), of the coefficients \( \alpha_{l} \left[ i \right] \) are nonzero. This sparsity could be quantified exactly by the l0-norm of \( \boldsymbol{\alpha}_{l} \) , \( \left\| {\boldsymbol{\alpha}_{l} } \right\|_{0} \) or quantified approximately by the l1-norm of \( \boldsymbol{\alpha}_{l} \), \( \left\| {\boldsymbol{\alpha}_{l} } \right\|_{1} \). We note that any lp-norm with p < 1 does not fulfill the triangle inequality, thus is not strictly a norm [30].

2.4 Representation to promote endmember sparsity

To facilitate solving the minimization problem (5), we could transform it into a different domain, represented by, e.g., an \( L \times L \) dictionary \( \boldsymbol{\varPhi} \), where the transformed endmembers \( \boldsymbol{\alpha}=\boldsymbol{\varPhi}^{T} \varvec{S} \) would likely be more sparse. The optimization problem (5) would then become

$$ \mathop {{\mathbf{min}} }\limits_{{\boldsymbol{\alpha}, {\mathbf{A}}}} \frac{1}{2}\left\| {\boldsymbol{\beta}- \boldsymbol{\alpha A}} \right\|_{2}^{2} + \lambda \left\|\boldsymbol{\alpha}\right\|_{1} $$
(7)

where \( \boldsymbol{\beta}=\boldsymbol{\varPhi}^{T} \varvec{Y} \) is an \( L \times N \) matrix whose columns represent coefficients of each transformed pixel \( \varvec{y}_{i} \). After solving minimization problem (7), the original endmemebers \( \varvec{S} \) could be restored as \( \varvec{S} = \boldsymbol{\tilde{\varPhi }\alpha } \), where \( \tilde{\boldsymbol{\varPhi }} \) represents the dual of dictionary \( \boldsymbol{\varPhi} \) [31].

3 Sparsity-based spectral unmixing

3.1 Solving the spectral unmixing problem

The minimization problem (7) is nonconvex, but it could be solved using a coordinate descent method [32]. Therefore, it could be divided into two convex problems, \( \mathop {\hbox{min} }\limits_{\boldsymbol{\alpha}} \left( \cdot \right)|_{\varvec{A}} \) and \( \mathop {\hbox{min} }\limits_{\varvec{A}} \left( \cdot \right)|_{\boldsymbol{\alpha}} \), and solved alternately until convergence to optimal values for both \( \boldsymbol{\alpha} \) and \( \varvec{A} \). The first of these two convex problems,

$$ \mathop {\hbox{min} }\limits_{\boldsymbol{\alpha}} \frac{1}{2}\left\| {\boldsymbol{\beta}- \boldsymbol{\alpha A}} \right\|_{2}^{2} + \lambda \left\|\boldsymbol{\alpha}\right\|_{1} $$
(8)

obtains an optimal value of \( \boldsymbol{\alpha} \) by solving a matrix least squares minimization problem with an l1-norm sparsity constraint on \( \boldsymbol{\alpha} \), i.e., a matrix basis pursuit problem [24]. The second convex problem

$$ \mathop {\hbox{min} }\limits_{\varvec{A}} \frac{1}{2}\left\| {\boldsymbol{\beta}- \boldsymbol{\alpha A}} \right\|_{2}^{2} $$
(9)

obtains an optimal value of \( \varvec{A} \) by solving a matrix least squares minimization problem, under the constraints given by (2) [33].

3.2 Vectorizing of HSI multichannel data

To solve minimization problem (8), one would write the matrix linear system of equations, \( \boldsymbol{\beta}= \boldsymbol{\alpha A} \), as a vector linear system of equations

$$ \left( {\varvec{A}^{\varvec{T}} \otimes \varvec{I}_{\varvec{L}} } \right){\text{vec}}\left(\boldsymbol{\alpha}\right) = {\text{vec}}\left(\boldsymbol{\beta}\right) $$
(10)

where \( {\text{vec}}\left( \cdot \right) \) denotes the vectorization of a matrix obtained by stacking its columns on top of one another, \( \otimes \) denotes Kronecker product, and \( \varvec{I}_{L} \) is an \( L \times L \) identity matrix. Therefore, the matrix basis pursuit in (8) could be written in vectorized form as

$$ \mathop {\hbox{min} }\limits_{\alpha } \frac{1}{2}\left\| {\left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right){\text{vec}}\left(\boldsymbol{\alpha}\right) - {\text{vec}}\left(\boldsymbol{\beta}\right)} \right\|_{2}^{2} + \lambda \left\| {{\text{vec}}\left(\boldsymbol{\alpha}\right)} \right\|_{1} $$
(11)

On the other hand, solving minimization problem (9) in its matrix linear systems of equations form is straightforward.

3.3 Choice of regularization parameter \( \lambda \) value

Different solutions for the vector basis pursuit problem (11), corresponding to different values of the regularization parameter \( \lambda \), could be obtained [26]. As shown by the Pareto curve in Fig. 1, different values of \( \boldsymbol{\lambda} \) quantify the trade-off between the sparsity level of the unknown endmembers, \( \left\| {{\text{vec}}\left(\boldsymbol{\alpha}\right)} \right\|_{1} \), and the data fitting error resulting from using their estimates, \( \frac{1}{2}\left\| {\left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right){\text{vec}}\left(\boldsymbol{\alpha}\right) - {\text{vec}}\left(\boldsymbol{\beta}\right)} \right\|_{2}^{2} \).

Fig. 1
figure 1

Typical Pareto curve for an arbitrary vector basis pursuit problem

From Fig. 1, we note that when the value of \( \boldsymbol{\lambda} \) is high (low), and the sparsity of the endmembers would be high (low), i.e., \( \left\| {{\text{vec}}\left(\boldsymbol{\alpha}\right)} \right\|_{1} \) would be small (large) and the data fitting error, \( \frac{1}{2}\left\| {\left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right){\text{vec}}\left(\boldsymbol{\alpha}\right) - {\text{vec}}\left(\boldsymbol{\beta}\right)} \right\|_{2}^{2} \), would be high (low). Therefore, a traditional way to solve the vector basis pursuit problem (11) is to solve it many times using different values of \( \lambda \), and then pick the solution that would successfully solve the spectral unmixing problem. This traditional way however is not practical as it could be both time and resource consuming.

3.4 Least angle regression (LARS)

Least angle regression (LARS) is a computationally efficient method to solve the vector basis pursuit problem (11) for all important values of the regularization parameter \( \lambda \) [34]. To avoid rewriting the cumbersome mathematical expressions in (11), we will explain the basic steps of LARS using a generic vector basis pursuit problem,

$$ \mathop {\hbox{min} }\limits_{\varvec{x}} \frac{1}{2}\left\| {\varvec{Hx} - \varvec{b}} \right\|_{2}^{2} + \lambda \left\| \varvec{x} \right\|_{1} $$
(12)

which we will consider in this subsection only.

As the sub-differential of the objective function in (12) must contain the zero vector, the optimal solution \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) must satisfy the following optimality conditions [35]

$$ \varvec{H}_{I}^{T} \left( {\varvec{H\tilde{x}}_{\lambda } - \varvec{b}} \right) = - \lambda \varvec{z} $$
(13)
$$ \left\| {\varvec{H}_{{I^{c} }}^{T} \left( {\varvec{H\tilde{x}}_{\lambda } - \varvec{b}} \right)} \right\|_{\infty } \le \lambda $$
(14)

where \( I \) denotes an active set, i.e., the indices where \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) is nonzero, \( \varvec{I}^{\varvec{c}} \) denotes the corresponding inactive set, \( \varvec{z} \) is the sign sequence of \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) on \( \varvec{I} \), and \( \varvec{H}_{\varvec{I}} \) is a matrix whose columns are the columns of \( \varvec{H} \) whose indices comprise the active set \( I \). For a given active set, using (11), \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) could be written as

$$ \tilde{\varvec{x}}_{\lambda } = \left\{ {\begin{array}{*{20}l} {\left( {\varvec{H}_{I}^{T} \varvec{H}_{I} } \right)^{ - 1} \left( {\varvec{H}_{I}^{T} \varvec{b} - \lambda \varvec{z}} \right),} \hfill & {{\text{on}}\,I} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(15)

LARS is a homotopy algorithm [36] that obtains an optimal solution \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) for every critical value of \( \lambda \), i.e., values of \( \lambda \) where elements should be added or removed from the active set. LARS starts with a very high value of \( \lambda \), an empty active set and a zero vector for \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \). As \( \lambda \) is decreased, using (15), the update direction that keeps \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) optimal will be

$$ \varvec{d} = \left\{ {\begin{array}{*{20}l} {\left( {\varvec{H}_{I}^{T} \varvec{H}_{I} } \right)^{ - 1} \varvec{z}} \hfill & {{\text{on}}\,I} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(16)

Thus, the optimal solution \( \tilde{\varvec{x}}_{\lambda } \) is moved in direction \( \varvec{d} \), until a constraint in either (13) or (14) is violated. In this case, either an additional column of \( \varvec{H} \) must be added to \( \varvec{H}_{\varvec{I}} \), thereby a nonzero value would be added to \( \tilde{\varvec{x}}_{\lambda } \), or a column of \( \varvec{H}_{I} \) must be removed, thereby a nonzero value of \( \tilde{\varvec{x}}_{\lambda } \) would be set to zero.

The smallest step size that would violate one of these constraints is given by \( \delta^{*} = \hbox{min} \left( {\delta^{ + } ,\delta^{ - } } \right) \), where

$$ \delta^{ + } = \mathop {\hbox{min} }\limits_{{i \in I^{c} }}^{ + } \left\{ {\frac{{{\lambda}- {\varvec{p}}\left( i \right)}}{{1 + {\varvec{q}}\left( {i} \right)}}, - \frac{{\lambda + {\varvec{p}}\left( {i} \right)}}{{1 - \varvec{q}\left( i \right)}}} \right\} $$
(17)
$$ \delta^{ - } = \mathop {\hbox{min} }\limits_{i \in I}^{ + } \left\{ { - \frac{{\tilde{\varvec{x}}_{\lambda } \left( i \right)}}{{\varvec{d}\left( i \right)}}} \right\} $$
(18)

\( \varvec{p} = \varvec{H}_{{I^{c} }}^{T} \left( {\varvec{H\tilde{x}}_{\lambda } - \varvec{b}} \right) \) and \( \varvec{q} = \varvec{H}_{{I^{c} }}^{T} \varvec{Hd} \). We note that min + denotes the minimum of positive components only.

At every step of this homotopy algorithm, the new critical value of \( \boldsymbol{\lambda} \) would become \( \lambda - \delta^{*} \), the new optimal solution \( \tilde{\varvec{x}}_{\boldsymbol{\lambda}} \) would become \( \tilde{\varvec{x}}_{\lambda } + \delta^{*} \varvec{d} \), and both active set \( I \) and sign sequence \( \varvec{z} \) would be updated accordingly. This procedure is repeated until \( \lambda \) has been lowered to a required minimum value [35].

3.5 Limitations of LARS for HSI large multichannel data

In the optimization problem (11), we note that \( {\text{vec}}\left(\boldsymbol{\beta}\right) \) has dimensions \( \left( {L*N} \right) \) × 1, \( {\text{vec}}\left(\boldsymbol{\alpha}\right) \) has dimensions \( \left( {L*M} \right) \times 1 \), and \( \varvec{A}^{T} \otimes \varvec{I}_{L} \) has dimensions \( \left( {L*N} \right) \times \left( {L*M} \right) \). Therefore, even relatively small regions of interest in hyperspectral data cubes, e.g., \( N = 75 \times 75 \) spectral pixels, \( L = 512 \) wavelengths, and \( M = 3 \) endmembers, would result in \( {\text{vec}}\left(\boldsymbol{\beta}\right) \) with dimensions \( \left( {2.88 \times 10^{6} } \right) \)\( \times \,1, {\text{vec}}\left(\boldsymbol{\alpha}\right) \) with dimensions \( \left( {1.54 \times 10^{3} } \right) \times 1 \) and \( \varvec{A}^{T} \otimes \varvec{I}_{L} \) with dimensions \( \left( {2.88 \times 10^{6} } \right) \times \left( {1.54 \times 10^{3} } \right) \). Such extremely large arrays (~ 109 elements) would be very challenging to process and would require very large computer storage (~ 1010 bytes).

4 Kronecker least angle regression (K-LARS)

In this section, we use properties of Kronecker products to extend the standard LARS algorithm to solve the basis pursuit in (11) without actually constructing, storing or processing very large arrays that would typically result from Kronecker products. This would allow practical and efficient recovery of sparse multicomponent signals from multichannel data, i.e., without constructing or processing very large arrays. We call this extension of standard LARS the Kronecker LARS (K-LARS) algorithm.

4.1 Steps of our K-LARS algorithm

Initial steps:

  • Divide all elements of \( \boldsymbol{\beta} \) by \( \sqrt {{\text{trace}}\left( {\boldsymbol{\beta}^{T}\boldsymbol{\beta}} \right)} \). This is equivalent to normalizing \( {\text{vec}}\left(\boldsymbol{\beta}\right) \) to have unit l2-norm, i.e., \( \left\| {{\text{vec}}\left(\boldsymbol{\beta}\right)} \right\|_{2} \). Subtract the mean of all elements of \( \boldsymbol{\beta} \) from all the elements of \( \boldsymbol{\beta} \). A simple post-processing step could easily offset this normalization and centering of \( \boldsymbol{\beta} \).

  • Normalize all columns of matrix \( \varvec{A}^{T} \) to have unit l2-norm. A simple post-processing step could easily offset this normalization.

  • Set all elements of matrix \( \boldsymbol{\alpha} \) to zero.

  • Define an initial residual matrix \( \varvec{R}_{0} \) and set it equal to \( \boldsymbol{\beta} \).

  • Define an \( L \times M \) active set matrix \( \varvec{Z}_{0} \), where the indices of its nonzero elements would specify active columns selected from \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \), and set all its elements to zero.

  • Obtain correlations between all columns of \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \) and \( {\text{vec}}(\varvec{R}_{0} ) \) as \( \varvec{C}_{1} = \varvec{R}_{0} \varvec{A}^{T} \). Select the column that is most correlated with the residual \( \varvec{R}_{0} \), i.e.,

    $$\left( {i_{1} ,j_{1} }\right) =\mathop{\max\limits_{1\le i \le L}}\limits_{1 \le j\le M}\left| {\varvec{C}_{1}\left( {i,j} \right)} \right|$$
    (19)
  • Generate an \( L \times M \) mask matrix,\( \varvec{K}^{i,j} = \left( {\varvec{I}_{L} } \right)_{:, i} \times \left( {\varvec{I}_{M} } \right)_{{\varvec{j},:}} \), whose elements are all zeros, except a single element of unit value at \( \left( {i,j} \right) \). Therefore, any column of \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \) specified by \( \left( {i,j} \right) \) would be equal to \( {\text{vec}}\left( {\varvec{K}^{i,j} \varvec{A}} \right) \).

  • Update the active set matrix, \( \varvec{Z}_{1} = \varvec{Z}_{0} + \varvec{K}^{{i_{1} ,j_{1} }} \), to indicate that the column of \( \left( {\varvec{A}^{\varvec{T}} \otimes \varvec{I}_{\varvec{L}} } \right) \) specified by \( \left( {i_{1} ,j_{1} } \right) \) now belongs to the active set.

For each iteration, \( t \) = 1, 2…

  • Obtain correlations between all columns of \( \left( {\varvec{A}^{\varvec{T}} \otimes \varvec{I}_{\varvec{L}} } \right) \) and \( \varvec{R}_{t - 1} \) as \( \varvec{C}_{t} = \varvec{R}_{t - 1} \varvec{A}^{\varvec{T}} \).

  • Set \( \lambda_{t} \) to the maximum value of \( \left| {\varvec{C}_{t} \left( {i,j} \right)} \right| \).

  • Obtain the Gram matrix \( \varvec{G} \) of the active columns of \( \left( {\varvec{A}^{\varvec{T}} \otimes \varvec{I}_{\varvec{L}} } \right) \) corresponding to all nonzero entries of \( \varvec{Z}_{t} \) specified by \( \left( {i,j} \right) \). Each element \( \varvec{G}\left( {m,n} \right) \) is given by the inner product of the \( m{\text{th}} \) and \( n{\text{th}} \) active columns specified by \( \left( {i_{m} ,j_{m} } \right) \) and \( \left( {i_{n} ,j_{n} } \right) \), respectively,

    $$ \varvec{G}_{t} \left( {m,n} \right) = {\text{trace}}\left( {\varvec{A}^{\varvec{T}} \left( {\varvec{K}^{{i_{m} ,j_{m} }} } \right)^{T} \varvec{K}^{{i_{n} ,j_{n} }} \varvec{A}} \right) . $$
    (20)
  • Obtain a vector, \( \varvec{q}_{\varvec{t}} \), indexed by \( k \) where \( \left( {i_{k} ,j_{k} } \right) \) corresponds to all nonzero elements of \( \varvec{Z}_{t} \), and whose length is equal to the number of all active columns of \( \left( {\varvec{A}^{\varvec{T}} \otimes \varvec{I}_{\varvec{L}} } \right) \). Its elements are the inner products between these active columns and \( {\text{vec}}\left( {\varvec{R}_{t} } \right) \),

    $$ \varvec{q}_{t} \left( k \right) = {\text{trace}}\left( {\varvec{A}^{T} \left( {\varvec{K}^{{i_{k} ,j_{k} }} } \right)^{T} \varvec{R}_{t} } \right). $$
  • Obtain the nonzero elements of the matrix \( \varvec{D}_{t} \left( {i,j} \right) \) that represents the update direction of the solution \( \boldsymbol{\alpha} \), as the vector

    $$ \varvec{u}_{t} = \frac{1}{{\lambda_{t} }}\varvec{G}_{t}^{ - 1} \varvec{q}_{t} . $$
    (21)

    Using \( \left( {i_{k} ,j_{k} } \right) \) update the matrix \( \varvec{D}_{t} \left( {i,j} \right) \) accordingly.

  • For all \( \left( {i,j} \right) \) where \( \varvec{Z}_{t} \left( {i,j} \right) = 0 \), i.e., indices of the inactive set, obtain the positive step size \( \delta_{t}^{ + } \) using

    $$ \delta_{t}^{ + } = \mathop {\min^{ + } }\limits_{ } \left\{ {\frac{{\lambda_{t} - \varvec{C}_{t} \left( {i,j} \right)}}{{1 - \varvec{V}_{t} \left( {i,j} \right)}}, \frac{{\lambda_{t} + \varvec{C}_{t} \left( {i,j} \right)}}{{1 + \varvec{V}_{t} \left( {i,j} \right)}}} \right\} $$
    (22)

    where \( \varvec{V}_{t} = \varvec{D}_{t} \varvec{AA}^{T} \). Set (\( i_{\text{add}} \),\( j_{\text{add}} \)) to the indices corresponding to \( \delta_{t}^{ + } \) in (22).

  • For all \( \left( {i,j} \right) \) where \( \varvec{Z}_{t} \left( {i,j} \right) = 1 \), i.e., indices of the active set, obtain the negative step size \( \boldsymbol{\delta}_{\varvec{t}}^{ - } \) using

    $$ \delta_{t}^{ - } = \min^{ + } \left\{ {\frac{{ \boldsymbol{\alpha}_{t - 1} \left( {i,j} \right)}}{{\varvec{D}_{t} \left( {i,j} \right)}}} \right\} $$
    (23)

    Set (\( i_{\text{remove}} \), \( j_{\text{remove}} \)) to the indices corresponding to \( \delta_{t}^{ - } \) in (23).

  • Obtain the step size \( \delta_{t}^{*} \) using

    $$ \delta_{t}^{*} = \hbox{min} \left\{ {\delta_{t}^{ + } ,\delta_{t}^{ - } } \right\} $$
    (24)
  • Update both solution matrix \( \boldsymbol{\alpha} \) and residual matrix using

    $$ \boldsymbol{\alpha}_{t} =\boldsymbol{\alpha}_{t - 1} + \delta_{t}^{*} \varvec{D}_{t} $$
    (25)
    $$ \varvec{R}_{t} = \varvec{R}_{t - 1} -{\delta}_{t}^{*} \varvec{D}_{t} \varvec{A} $$
    (26)
  • If \( \delta_{t}^{*} = \delta_{t}^{ + } \), set \( \varvec{Z}_{t + 1} = \varvec{Z}_{t} + \varvec{K}^{{i_{\text{add}} ,j_{\text{add}} }} \), to indicate that the column of \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \) specified by \( \left( {i_{\text{add}} ,j_{\text{add}} } \right) \) now belongs to the active set.

  • Else if \( \delta_{t}^{*} = \delta_{t}^{ - } \), set \( \varvec{Z}_{t + 1} = \varvec{Z}_{t} - \varvec{K}^{{i_{\text{remove}} ,j_{\text{remove}} }} \), to indicate that the column of \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \) specified by \( \left( {i_{\text{remove}} ,j_{\text{remove}} } \right) \) now belongs to the inactive set.

Continue iterations until all columns of \( \left( {\varvec{A}^{T} \otimes \varvec{I}_{L} } \right) \) are included in the active set, all values of the correlation matrix \( \varvec{C} \) become equal, \( \lambda \) reaches its minimum value, and all entries of the active set matrix \( \varvec{Z} \) are ones. This is equivalent to reaching the full least squares solution. However, typically one would terminate the iterations when the desired sparsity level is reached or when the norm of the residual becomes within a prescribed tolerance. The K-LARS algorithm steps can be summarized in the following steps.

figure a

5 Spectral unmixing results

To demonstrate the validity of our K-LARS-based spectral unmixing method, we applied it on two HSI datasets. The first is a synthetic dataset with spectral pixels that are linear mixtures of endmembers, from the ASTER spectral library [37], with uniformly random abundances. The second is an actual HSI dataset from NASA’s AVIRIS website [38]. We also compare our results to ones obtained using GMCA. We show that these two results are similar, albeit our results were obtained without arbitrary choices related to specifying the regularization parameter.

5.1 Preprocessing of HSI multichannel data

Assuming the presence of white noise in the HSI data, then a data denoising step would be necessary before solving optimization problem (7). This is because white noise could not be sparsely represented in any dictionary [39]. Therefore, in this paper, we denoised each raw spectral pixel using a wavelet (Symmlet) thresholding method with a universal threshold [40].

As discussed in Sect. 2.4, to promote endmember sparsity, we transformed the used HSI datasets to the wavelet domain, where the transformed endmembers \( \boldsymbol{\alpha} \) would likely be more sparse than in the original measurement domain. We used the Coiflet 5 wavelet, because both its scaling and wavelet functions have vanishing moments, thereby significantly promoting sparsity of piecewise smooth functions, e.g., endmember spectra.

5.2 Applying K-LARS to HSI spectral unmixing

As discussed in Sect. 3.1, the sparsity-based spectral unmixing problem, problem (7), could be solved by an iterative and alternate estimation of \( \boldsymbol{\alpha} \), problem (8), and \( \varvec{A} \), problem (9). Initially, \( \boldsymbol{\alpha} \) is set to zero and \( \varvec{A} \) is set to random abundance values. At each iteration, we fixed \( \varvec{A} \) and estimated \( \boldsymbol{\alpha} \) using our K-LARS algorithm, then we fixed \( \boldsymbol{\alpha} \) and estimated \( \varvec{A} \) by solving a simple constrained linear least squares problem.

The sparsity levels of any of the estimates of \( \boldsymbol{\alpha} \) are crucial to the accuracy of our final spectral unmixing results. Therefore, in our first spectral unmixing iteration, we terminated the K-LARS algorithm when the data fitting error corresponding to the sparse solution was within 0.5% of the data fitting error corresponding to the full least squares solution. We then used this sparsity level of \( \boldsymbol{\alpha} \) as the K-LARS termination condition for all subsequent spectral unmixing iterations.

5.3 Spectral unmixing of synthetic HSI multichannel data

We generated a synthetic HSI dataset of 3000 spectral pixels. These pixels are linear mixtures of the spectra of grass, concrete and asphalt, obtained from the ASTER spectral library [36], with uniformly random abundances. Additive Gaussian noise was added to these spectral pixels to obtain a signal-to-noise ratio (SNR) of \( 30\, {\text{dB}} \). Therefore, in this dataset, the number of endmembers, \( M = 3 \), the number of spectral pixels, \( N = 3000 \) and the number of wavelengths, \( L = 491 \) (0.42–14 μm). Figure 2a–c shows the spectra of grass, concrete and asphalt, whose different linear mixtures make up different spectral pixels in this dataset.

Fig. 2
figure 2

ac Exact spectra of grass, concrete, and asphalt, respectively. df Estimated spectra obtained using GMCA-based spectral unmixing algorithm. gi Estimated spectra obtained using our K-LARS-based spectral unmixing algorithm

Figure 2d–f shows the estimated spectra obtained by applying the GMCA-based spectral unmixing algorithm to this dataset. To obtain these GMCA-based results, we used trial and error that consumed more computational time and effort to obtain both initial and final values of the regularization parameter, \( \lambda_{\text{initial}} = 15 \), \( \lambda_{\text{final}} = 4 \), and we arbitrarily chose a linear schedule for its reduction with each coordinate descent iteration. The correlation coefficients between the exact spectra of grass, concrete, asphalt and their GMCA-estimated counterparts are 0.9673, 0.9942 and 0.9583, respectively.

Figure 2g–i shows the estimated spectra obtained by applying our K-LARS-based spectral unmixing algorithm to this dataset. The correlation coefficients between the exact spectra of grass, concrete, asphalt and their K-LARS-estimated counterparts are 0.9970, 0.9993 and 0.9727, respectively. These quantitative results demonstrate the validity of our new K-LARS algorithm and its successful application to HSI spectral unmixing. Compared to spectral unmixing results obtained by GMCA, our results obtained by K-LARS are very close, albeit they were obtained without any trial and error, or arbitrary choices, in specifying the regularization parameter.

5.4 Spectral unmixing of AVIRIS HSI multichannel data

We downloaded an actual HSI dataset called the Moffet Field from NASA’s AVIRIS website [37]. This dataset is comprised of images obtained using 224 wavelengths (0.365–2.497 μm). We selected an arbitrary region (70 × 60 pixels) as our region of interest, and then used Google Maps to estimate that our selected region had three endmembers: water, grass and land. Therefore, in the dataset corresponding to our selected region, the number of endmembers, \( M = 3 \), the number of spectral pixels, \( N = 4200 \) and the number of wavelengths, \( L = 224 \).

To validate spectral unmixing results obtained by applying our K-LARS-based unmixing algorithm to this dataset, we need to know its ground truth, i.e., spectra of endmembers present in our selected region. For this purpose, we used N-FINDR, a software package that uses a geometrical approach to find approximately pure spectral pixels, i.e., spectral pixels comprised of a single endmember only, in the given HSI data. The locations of these approximately pure spectral pixels in our selected region obtained by N-FINDR [41] are shown in Fig. 3.

Fig. 3
figure 3

Locations of approximately pure spectral pixels obtained by N-FINDR

Figure 4a–c shows the ground truth spectra of the three endmembers present in our selected region, and whose different linear mixtures make up different spectral pixels in this dataset. Figure 4d–f shows the estimated spectra of these three endmembers obtained by applying the GMCA-based spectral unmixing algorithm to this dataset. To obtain these GMCA-based results, we used trial and error that consumed more computational time and effort to obtain both initial and final values of the regularization parameter, \( \lambda_{\text{initial}} = 15 \), \( \lambda_{\text{final}} = 2.8 \), and we arbitrarily chose a linear schedule for its reduction with each coordinate descent iteration. The correlations between the ground truth spectra and their corresponding estimated endmember spectra are 0.9923, 0.8514 and 0.8607, respectively. Figure 4g–i shows the estimated spectra of these three endmembers obtained by applying our K-LARS-based spectral unmixing algorithm to this dataset. The correlations between the ground truth spectra and their corresponding estimated endmember spectra are 0.8358, 0.9892 and 0.8878, respectively. These quantitative results further demonstrate the validity of our new K-LARS algorithm and its successful application to HSI spectral unmixing. Compared to spectral unmixing results obtained by GMCA, our results obtained by K-LARS are very close, albeit they were obtained without any trial and error, or arbitrary choices, in specifying the regularization parameter.

Fig. 4
figure 4

ac Ground truth spectra of three endmembers obtained by N-FINDR. df Estimated spectra obtained using GMCA-based spectral unmixing algorithm. gi Estimated spectra obtained using our K-LARS-based spectral unmixing algorithm

6 Discussion and conclusions

A recent approach to solve the unsupervised spectral unmixing of HSI data requires that these endmembers be sparse in some dictionary. This requires the solution of a sparse signal recovery optimization problem using basis pursuit that requires the specification of a data-dependent regularization parameter. LARS is a recent method for sparse signal recovery that efficiently recovers all sparse signals corresponding to all relevant regularization parameter values. However, the application of LARS to large multichannel data could be very challenging in practice, due to the need for generation and storage of extremely large arrays (~ 1010 bytes in a relatively small spectral unmixing problem). In this paper, we extended the standard LARS algorithm, using Kronecker products, to achieve practical and efficient recovery of sparse multicomponent signals from multichannel data, i.e., without the need to construct and process very large arrays or the need for trial and error to specify regularization parameters. We then applied our new K-LARS algorithm to successfully achieve unsupervised spectral unmixing of both synthetic and AVIRIS hyperspectral imaging data. Our quantitative spectral unmixing results for both synthetic HSI multichannel data (Sect. 5.3) and AVIRIS HSI multichannel data (Sect. 5.4) demonstrate the validity of our new K-LARS algorithm and its successful application to HSI spectral unmixing. Compared to spectral unmixing results obtained by GMCA, our results obtained by K-LARS are very close, albeit they were obtained without any trial and error, or arbitrary choices, in specifying the regularization parameter. This lack of trial and error that could consume significant computational time and effort is the main advantage of using K-LARS, instead of GMAC, for sparse spectral unmixing. More important, our K-LARS algorithm could be a very valuable research tool to the signal processing community, where it could be used to solve sparse least squares problems involving large multichannel data. On the other hand, K-LARS has a disadvantage of being an intricate algorithm that would require careful programming. As future work, we are currently generalizing our K-LARS algorithm that we introduced in this paper to efficiently represent and recover multidimensional signals using Kronecker dictionaries.