1 Introduction

Hyperspectral images (HSI’s) constitute the main database for remote sensing problems [1] such as agricultural and environmental monitoring [2], mineral exploration [3], and military surveillance [4]. In the HSI, each pixel is presented by a three-dimensional data cube whose third dimension contains the spectral information. To do so, the hyperspectral cameras collect 2-D spatial photographs over many adjacent spectral bands commonly containing the visible, near-infrared, and shortwave infrared spectral bands in the range 0.35–\(2.5\,\upmu \)m [2]. However, due to the low spatial resolution of imaging sensors, each single pixel represents a mixture of different materials located in the field of view [5]. In some applications of remote sensing, we are interested in identifying the materials involved in each pixel which motivates more research on unmixing techniques. For this purpose, each pixel of hyperspectral images is decomposed into a group of pure spectral signatures and their corresponding proportions, known as endmembers and abundances, respectively [6]. In practice, this is performed under two physical limitations on abundances, i.e., the non-negativity and sum-to-one constraints. From the first one, the mean of each abundance vector should be larger than zero, while the second constraint implies that the sum of abundance fractions should be equal to one. Moreover, an encountered difficulty that arises in unmixing problems is variability of the measured spectral signatures of endmembers which is due to the unstable atmospheric, illumination, and temporal conditions [7]. A number of methods have addressed the spectral variability [8] among which the multiple endmember spectral mixture analysis (MESMA) is well known [7]. However, when the spectral library becomes large, the MESMA leads to extremely large computations due to the requirement of exhaustive search over all possible combinations of endmembers. The support vector machines (SVMs) have also been incorporated for spectral unmixing with addressing spectral variations [9].

On the other hand, the endmember spectral variability may be modeled statistically using the Gaussian and beta distributions. Accordingly, the normal compositional model (NCM) and beta compositional model (BCM) [10, 11], and also Bayesian estimators are developed. To do so, the uniform prior has already been considered over a set of proportion values that satisfy the non-negativity and sum-to-one constraints [10,11,12,13]. However, this prior is essentially more suitable for supervised unmixing scenarios, in which exact endmembers are assumed known. In contrast, in the semi-supervised unmixing problems, a few endmembers are chosen from a large spectral dictionary and thus the abundance vector is assumed sparse. Although the sparse property along with the endmember variability condition has already been studied for reducing the unmixing error [14,15,16,17], still developing more drastic tools is of interest to researchers.

In this paper, we propose a new method for unmixing of hyperspectral images in Bayesian sense referred to as the normal compositional model with the sparse Dirichlet prior (NCM-SDP). We consider a semi-supervised scenario with the NCM and use the Dirichlet prior to represent the abundance vector sparsity. The Markov Chain Monte Carlo (MCMC) sampler is used to generate posteriors.

2 NCM-SDP method

To introduce the proposed NCM-SDP method, we elaborate on the NCM definition, prior selection, and the posterior derivation as follows.

2.1 Normal compositional model

In mixing approaches, the spectral variation of endmembers is randomly defined as:

$$\begin{aligned} {\varvec{e}}_{\varvec{r}} \sim \mathcal{F}\left( {.|{\varvec{\theta }}_r } \right) , \end{aligned}$$
(1)

where \(\mathcal{F}\) shows the conditional probability density function (pdf) of a material and \({\varvec{\theta }} _r \) is the vector of hyper parameters of the distribution corresponding to the rth endmember. Also, a random vector of L-spectral band pixel \({\varvec{y}}=\left[ {y_1 ,\ldots ,y_L } \right] ^{T}\) with a stochastic linear mixture of endmembers is given by [18]:

$$\begin{aligned} {\varvec{y}}=\sum \limits _{r=1}^R {\varvec{e}}_r \alpha _r , \end{aligned}$$
(2)

where \({\varvec{e}}_r \) is the spectral signature of the \(r{\mathrm{th}}\) endmember defined by (1), R is the number of endmembers, and \(\alpha _r \) denotes the abundance of the \(r{\mathrm{th}}\) endmember. To define each endmember in Bayesian sense, the Gamma and beta priors have already been applied [18, 19]. In such cases, however, the exact knowledge of endmember distributions is required, which may not be available in practice. The Gaussian distribution may also be considered in which the unknown parameters can be jointly estimated together with the abundance fractions. In the NCM, endmembers are defined by independent multivariate Gaussian vectors. We assume that the mean of each endmember is known and the covariance matrix of endmembers can be written as a scalar matrix. The pdf of the \(r\mathrm{th}\) endmember is defined as:

$$\begin{aligned} {\varvec{e}}_r \sim \mathcal{N}\left( {{\varvec{m}}_r, \sigma ^{2}{\varvec{I}}_L } \right) , \end{aligned}$$
(3)

where \({\varvec{m}}_r =\left[ {m_{r,1} , \ldots ,m_{r,L} } \right] ^{T}\) is the known mean of \({\varvec{e}}_r \) for \(r=1,\ldots ,R\), \({\varvec{I}}_L \) is an \(L\times L\) identity matrix, and \(\sigma ^{2}\) shows the unknown variance of endmembers in each spectral band. Since the endmembers spectra are independent from each other, the likelihood function of each hyperspectral mixed pixel is expressed as:

$$\begin{aligned} f\left( {{\varvec{y}}|{\varvec{\alpha }} ,\sigma ^{2}} \right) =\frac{1}{\left( {2\pi \sigma ^{2}c\left( {\varvec{\alpha }} \right) } \right) ^{L/2}}\exp \left( {-\frac{\Vert y-\mu \left( {\varvec{\alpha }} \right) \Vert _2^2 }{2\sigma ^{2}c\left( {\varvec{\alpha }} \right) }} \right) , \end{aligned}$$
(4)

where \(\left\| \cdot \right\| _2 \) defines the standard \(\ell _2 \) norm, \(c\left( {\varvec{\alpha }} \right) =\sum \nolimits _{r=1}^R \alpha _r^2\), \(\mu ({\varvec{\alpha }})=\sum \nolimits _{r=1}^R {\varvec{m}}_r \alpha _r\), and \({\varvec{\alpha }}=\left[ {\alpha _1 ,\ldots ,\alpha _R } \right] ^{T}\) shows the abundance vector. Using a hierarchical Bayesian algorithm, the unknown parameters \({\varvec{\alpha }}\) and \(\sigma ^{2}\) are estimated.

2.2 Prior selection

2.2.1 Endmember variance prior

As in [12], a conjugate inverse gamma distribution is chosen as a prior distribution for the endmember variance as:

$$\begin{aligned} f\left( {\sigma ^{2}|\delta } \right) \sim \mathcal{I}\mathcal{G}\left( {\nu ,\delta } \right) , \end{aligned}$$
(5)

where \(\nu \) and \(\delta \) show the shape and scale parameters, respectively. We assume \(\nu =1\), and the hyperparameter \(\delta \) is defined by the non-informative Jeffreys’ prior as:

$$\begin{aligned} f\left( \delta \right) \sim \frac{1}{\delta }1_{{\mathbb {R}}^{+}} \left( \delta \right) , \end{aligned}$$
(6)

where \(1_{{\mathbb {R}}^{+}}(.)\) is the indicator function defined on \({\mathbb {R}}^{+}\) as:

$$\begin{aligned} 1_{{\mathbb {R}}^{+}} (\delta )=\left\{ {{\begin{array}{ll} {1,} &{}\quad {\hbox {if }\delta \in {\mathbb {R}}^{+};} \\ {0,} &{}\quad {\hbox {otherwise}.} \\ \end{array}}} \right. \end{aligned}$$
(7)

2.2.2 Abundance prior

The abundance vector should be estimated under the non-negativity and sum-to-one constraints shown as:

$$\begin{aligned} \alpha _r \ge 0, \quad r=1,\ldots ,R, \qquad \sum \limits _{r=1}^R \alpha _r =1. \end{aligned}$$
(8)

Then, the fractional abundance vector \({\varvec{\alpha }}\) will be in the standard \(\left( {R-1} \right) \)-simplex. Under such circumstances, choosing an appropriate prior for the abundance vector becomes harder. To fulfill the above constraints, in most recent works the uniform distribution is considered over a set of fractional values [11]. However, since the number of endmembers contributing in a mixed pixel is usually much smaller than that of the dictionary, this distribution cannot describe the abundance vector properly. In turn, we may exploit the sparse property of the abundance vector due to its small number of nonzero elements [19].

In this paper, we propose the symmetric Dirichlet distribution as the abundance prior defined as [20]:

$$\begin{aligned} f({\varvec{\alpha }}) \sim \mathcal{D}\left( {{\varvec{\alpha }} ;\beta } \right) =\frac{\varGamma \left( {\beta R} \right) }{\varGamma \left( \beta \right) ^{R}}\prod \limits _{r=1}^R \alpha _r^{\beta -1} , \end{aligned}$$
(9)

where \(\varGamma (\cdot )\) is the Gamma function and \(\beta \) determines the concentration of the Dirichlet distribution and accordingly the abundance vector sparsity. This distribution satisfies the abundance vector non-negativity and sum-to-one constraints. For \(\beta =1\), it corresponds to the uniform distribution over the standard \(\left( {R-1} \right) \)-simplex, and for \(\beta >1\), becomes denser around its mean. On the other hand, for \(\beta <1\), the Dirichlet prior tends to concentrate close to zero in which case most of the elements of \({\varvec{\alpha }}\) would be extremely small. This case can properly describe the sparse behavior of an abundance vector in a semi-supervised scenario. Clearly, if the uniform prior (\(\beta =1\)), which would be more appropriate for a supervised scheme, as used in [11,12,13], is applied to a semi-supervised scheme, we might encounter with large unmixing errors. Here, we consider a semi-supervised scheme which is more encountered in practice and propose to incorporate the sparse Dirichlet prior, and we will show its outperformance later.

2.3 Derivation of posterior distribution

Based on Bayes’ theorem, the joint posterior distribution of the unknown variables is defined as:

$$\begin{aligned} f\left( {{\varvec{\alpha }}, \sigma ^{2},\delta |{\varvec{y}}} \right) \propto f\left( {{\varvec{y}}|{\varvec{\alpha }}, \sigma ^{2}} \right) f\left( {{\varvec{\alpha }}, \sigma ^{2}| \delta } \right) f\left( \delta \right) , \end{aligned}$$
(10)

where by assuming independency between unknown parameters, we get \(f\left( {{\varvec{\alpha }},\sigma ^{2}| \delta } \right) =f\left( {\varvec{\alpha }}\right) f\left( {\sigma ^{2}|\delta } \right) \). Then, by using (4), (5), (6), and (9) in (10), \(f\left( {{\varvec{\alpha }}, \sigma ^{2},{{\delta }}|{\varvec{y}}} \right) \) is obtained as:

$$\begin{aligned}&f\left( {{\varvec{\alpha }}, \sigma ^{2},\delta |{\varvec{y}}} \right) \propto \frac{\mathbf{1}_{\mathcal{R}^{+}} (\delta )\prod \nolimits _{r=1}^R \alpha _r^{\beta -1} }{\left( {\sigma ^{2}c({\varvec{\alpha }})} \right) ^{\frac{L}{2}}\sigma ^{2}}\nonumber \\&\quad \times \exp \left( {-\frac{\Vert y-\mu \left( {\varvec{\alpha }} \right) \Vert ^{2}}{2\sigma ^{2}c\left( {\varvec{\alpha }} \right) }-\delta } \right) . \end{aligned}$$
(11)

Due to the complexity of (11), it is intractable to obtain the MMSE or MAP estimates in closed form for the abundances and endmember variance. A solution is to generate the samples according to (11) and then to approximately apply Bayesian estimators to these samples [21].

3 MCMC sampling

The MCMC methods are used for iterative sampling from a probability distribution based on generating a Markov chain [21]. To do so, we should derive \(f\left( {{\varvec{\alpha }}|{\varvec{y}},\sigma ^{2}} \right) \), \(f\left( {\sigma ^{2}|{\varvec{y}},{\varvec{\alpha }},\delta } \right) ,\) and \(f\left( {\delta |\sigma ^{2}} \right) \) to estimate the unknown parameters \(\alpha \), \(\sigma ^{2},\) and the unknown hyperparameter \(\delta \). Using (4) and (9) in the Bayes’ theorem, the posterior \(f({{\varvec{\alpha }}|{\varvec{y}}, \sigma ^{2}})\) is given by:

$$\begin{aligned}&f\left( {{\varvec{\alpha }} |{\varvec{y}}, \sigma ^{2}} \right) \propto \frac{1}{\left( {\sigma ^{2}c\left( \alpha \right) } \right) ^{\frac{L}{2}}}\nonumber \\&\quad \times \exp \left( {-\frac{\Vert {\varvec{y}}-\mu \left( {\varvec{\alpha }} \right) \Vert ^{2}}{2\sigma ^{2}c\left( {\varvec{\alpha }} \right) }} \right) \prod \limits _{r=1}^R \alpha _r^{\beta -1} . \end{aligned}$$
(12)

According to (12), the samples of abundance vector are generated using the MCMC [12]. Using (4), (5), and (6), the posteriors of \(\sigma ^{2}\) and \(\delta \) are also calculated, respectively, as:

$$\begin{aligned} f\left( {\sigma ^{2}|{\varvec{y}},{\varvec{\alpha }} , \delta } \right) \propto \mathcal{I}\mathcal{G}\left( {\frac{L}{2}+1, \frac{\Vert {\varvec{y}}-\mu \left( {\varvec{\alpha }} \right) \Vert ^{2}}{2c\left( {\varvec{\alpha }} \right) }+\delta } \right) , \end{aligned}$$
(13)

and

$$\begin{aligned} f\left( {\delta |\sigma ^{2}} \right) \propto \mathcal{G}\left( { 1,\frac{1}{\sigma ^{2}}} \right) . \end{aligned}$$
(14)

The required procedure for using the NCM-SDP is summarized in Alg. 1.

figure a

4 Experimental results

To evaluate the performance of the NCM-SDP, our experiments are performed on both simulated and real hyperspectral images as follows.

4.1 Simulated data

We first compare the performance of the NCM-SDP method to that of the classical Bayesian NCM algorithm with the uniform prior [12], which we call the “NCM uniform.” Ten endmembers are randomly selected from the USGS digital spectral library each one consisting of 2151 spectral bands in the wavelength range of 0.35–\(2.5\,\upmu \)m [22] to build up our spectral dictionary. The materials are “asphalt,” “brick,” “cedar,” “concrete,” “fabric,” “fiberglass,” “nylon,” “pipe,” “plastic,” and “polyester.” Hyperspectral pixels are mixed of asphalt and brick shown in Fig. 1 with proportions \(\left[ {0.3, 0.7} \right] \). Then, \({\varvec{\alpha }}=\left[ {0.3, 0.7, 0, 0, 0, 0, 0, 0, 0, 0} \right] \) shows the abundance vector with the sparsity level of 0.2. Also, the endmembers variance is set to \(\sigma ^{2}=0.05\) for all spectral bands of all 10 endmembers.

To choose a proper value of \(\beta \) in the NCM-SDP, we repeat the unmixing algorithm for \(0 \le \beta \le 1\) and compare the absolute values of the abundance estimation errors using:

$$\begin{aligned} \left| e \right| =\Vert \alpha _r -{\hat{\alpha }}_r\Vert _1 . \end{aligned}$$
(15)

where \(\hat{a}_r \) denotes the MAP abundance estimate of the \(r{\mathrm{th}}\) endmember.

To define the sparsity level, we generate thousands of Dirichlet distributed samples for each specified \(\beta \) and consider the percentage of significant values to the whole number of samples.

Fig. 1
figure 1

Endmember spectral signatures for “asphalt” and “brick”[22]

The results shown in Table 1 clearly show a salient reduction in \(\left| e \right| \) for the smaller values of \(\beta \). However, note that, in practice, the number of participant endmembers for modeling of a hyperspectral pixel is unknown, a priori, and choosing a very small \(\beta \) can lead to a large error. In fact, there is a trade-off between the sparsity level of the prior and the generated abundance estimation error. By including these considerations, we have chosen \(\beta =0.1\) and later will show that this value is appropriate for our unmixing problem. Note that for different sizes of dictionaries, proper values of \(\beta \) should be reselected correspondingly. This subject can be regarded as an open problem for future research.

Table 1 Abundance estimation errors for different \( \beta '\hbox {s}\)

For \(\beta =0.1\), the posteriors of abundances are generated by the NCM-SDP method for 300 independent trials of the experiment. The averages of the generated posteriors for the first and second abundances are illustrated in Fig. 2a, b, respectively. Considering that \(\alpha \) contains 10 entries, in Fig. 2c, we have only shown the distribution of the third entry which is very similar to those of the fourth to tenth ones. As seen, the peaks of the posteriors obtained from the NCM-SDP are more concentrated around the real values.

From the MAP estimation theory, the first and second abundances are estimated as 0.31 and 0.68, respectively, which are very close to the corresponding real values in \({\varvec{\alpha }}\), i.e., 0.3, and 0.7. These estimates from the NCM uniform [12] are 0.47 and 0.47, respectively, showing the lower accuracy of the method. Moreover, for the other 8 entries of \({\varvec{\alpha }}\) (zero values), the NCM-SDP generates negligible values of order \(10^{-11}\), while they are of order \(10^{-4}\) for [12]. Also, the values of |e| achieved by the NCM-SDP and [12] are 0.137 and 0.484, respectively. This enhancement is effectively achieved due to applying the Dirichlet prior with a proper value of \(\beta \).

Next, the posteriors of the variance of endmembers estimated based on the Gaussian prior are plotted in Fig. 3. As seen, the resulted posterior by the NCM-SDP is much closer to the real value compared to that of [12] which logically leads to a lower estimation error.

Fig. 2
figure 2

Estimated posteriors of the a first, b second, and c third abundance values

Fig. 3
figure 3

Estimated posteriors of the variance of endmembers

4.2 Real hyperspectral data

The performance of NCM-SDP is now compared to that of [12] for the real hyperspectral image shown in Fig. 4a. This image has been collected by the airborne visible/infrared imaging spectrometer (AVIRIS) over Cuprite, Nevada, USA [23]. A square patch of \(50\times 50\) pixel of the image is cropped as the region of interest (ROI), as shown in Fig. 1b. The reconstructed images using the NCM uniform and NCM-SDP methods are shown in Fig. 4c, d, respectively.

Fig. 4
figure 4

a Cuprite image [23], b ROI, c reconstructed image by [12], and d reconstructed image by the NCM-SDP

By comparing Fig. 4c, d to b, one can observe that the NCM-SDP reconstructs the real image more similarly. To inspect this matter quantitatively, the MSEs of both methods are shown in Table 2 using (16).

$$\begin{aligned} \hbox {MSE}=\frac{1}{N}\sum \limits _{n=1}^N \left\| {\varvec{y}}_n -\sum \limits _{r=1}^R {\varvec{e}}_r \alpha _{r,n} \right\| _2^2 \end{aligned}$$
(16)

To more evaluate the NCM-SDP, in another experiment, we utilize the real data set “Gulfport hyperspectral image” collected from Long beach MS [11]. A block of \(13\times 19\) pixel of this image is cropped for which the accurate endmembers are available. Also, for each endmember of this region, there exist 10 different samples. Note that for the NCM-SDP, only one of these samples is sufficient to be used as the mean of that endmember signature, while for the BCM algorithms more samples are needed for extraction of the beta distribution parameter. Here, we use the average of samples as the mean of each Gaussian distributed endmember. NCM-SDP is compared to the FCLS [24], BCM QP [25], BCM sampling [25], BCM-spatial QP [11], BCM-spatial sampling [11], NCM QP [10], NCM sampling [10], and NCM uniform [12].

Table 2 MSEs of reconstructed images
Fig. 5
figure 5

Abundance maps of “asphalt,” “yellow curb,” “grass,” and “oak leaves” in the Gulfport image for a FCLS, b BCM-spectral QP, c BCM-spectral sampling, d BCM-spatial QP, e BCM-spatial sampling, f Gaussian QP, g Gaussian sampling, h NCM uniform, i NCM-SDP, and j ground truth

The maps of estimated abundances for 4 endmembers are depicted in Fig. 5. We use the ground truth of data addressed in [11] for 4 endmembers including “asphalt,” “yellow curb,” “grass,” and “oak leaves” as shown in Fig. 5j, respectively. In each figure, the yellow and blue parts correspond to the abundance values equal to 1 and 0, respectively. For a better evaluation, the average per pixel per endmember proportion error is defined as [11]:

$$\begin{aligned} \hbox {PError}=\frac{1}{\hbox {NR}}\sum \limits _{n=1}^N \Vert {\varvec{\alpha }}_n-\hat{{\varvec{\alpha }}}_n \Vert _2 \end{aligned}$$
(17)

where \({\varvec{\alpha }}_n \) is the true abundance vector of pixel n, \({\varvec{\alpha }}_n\) is the corresponding estimated abundance vector, N shows the number of pixels, and R is the number of endmembers. Each method is run 10 times, and the average of total errors is calculated. The results are presented in Table 3. It is vital to note that in this experiment all the above comparative methods are supervised schemes with a dictionary of 4 endmembers as opposed to the NCM-SDP which is a semi-supervised scheme with 30 endmembers in the dictionary. It is interesting to note that although this scenario is intrinsically appropriate for supervised cases, still the corresponding techniques are unable to reconstruct the images as accurate as the NCM-SDP. From Table 3, one can conclude that the NCM-SDP has estimated the abundance vector more accurately than the BCM- and NCM-based unmixing methods.

Table 3 Average per pixel per endmember proportion error for the Gulfport hyperspectral image

All the enhanced results achieved by the NCM-SDP reveal that incorporation of the sparse property of abundance vectors based on the sparse Dirichlet prior into the mixing model is a rational and realistic proposition.

5 Conclusions

A new hierarchical Bayesian method was derived for unmixing of hyperspectral images. Endmembers were considered variable based on the Gaussian distribution. Also, we assumed that the abundance vectors are sparse. The sparse Dirichlet prior was proposed for sparse modeling and accordingly the NCM-SDP method was developed. Using the simulated data, it was shown that the error of the estimated abundance vector is approximately 7 times smaller than that of the uniform prior. Also, by repeating the simulations for real data, 20% improvement in the MSE sense was achieved.