Keywords

1 Introduction

Inverse imaging problems like denoising and inpainting play a very important role as a pre-processing step for many of the machine vision tasks. Traditionally, these problems were shown to be effectively solved by using compressive sensing (CS) techniques. The main aim of CS is to exploit the sparsity in natural images in the domain of standard basis like discrete cosine transform (DCT), wavelet, etc. Dictionary learning (DL) has shown the possibility of solving the inverse imaging problems by adaptively estimating the sparsifying basis rather than fixed basis as was done in CS. These are transductive learning techniques. Another class of techniques, which are gaining importance in the recent past are based on neural networks. These are inductive learning approaches, where the main assumption is that the training data itself can be used to learn generalized model, which in turn can predict the outcome of an unseen test sample. Here, large amounts of data is needed to build the model and can be easily applied for the case of natural images but not for other scientific data. Tariyal et al.  [19] proposed a deep dictionary learning (DDL) framework which combines the advantages of both transductive and inductive nature of dictionary learning and deep learning, respectively, and is very well suited where there is a scarcity of training data. In some of the recent works, DDL was shown to be effective for classification problems  [13,14,15], load monitoring  [18], and speech recognition  [17]. In this paper, the DDL framework is suitably modified to predict the intensity values in the missing regions in a single band of a multispectral image, which is basically a regression problem.

Every material in this world has its own unique spectral signature. However, human eye can record the signature only within red, blue, and green bands of the spectrum. Multispectral imaging can be used in scenarios where human eye might fail in distinguishing materials. Although this approach was predominantly used in satellite imaging, recently, it is finding multiple applications in biology and medicine  [11], and agricultural applications  [10]. A multispectral image is captured by using the sensors which are sensitive to different wavelengths beyond the visible light range i.e., infrared and ultra-violet. For e.g., LandSat7 has seven bands. One of the most important challenges in multispectral imaging is the malfunctioning of on-board sensors. This typically results in missing information and will appear as streaks in the acquired multispectral image. Filling the missing information in these images is popularly referred to as inpainting in the image processing literature.

Most of the classical inpainting techniques use the local or non-local information within the degraded image itself to perform inpainting. Efros et al.  [5] proposed to fill the holes in the image by finding a similar texture in the same image. Some other techniques exploit the natural image priors such as statistics of patch off-sets [6], planarity [8], or low rank [7] to improve the efficiency of inpainting. All these techniques use the information from the input image itself to inpaint. When the missing information is large, then these single image inpainting techniques will fail to perform well. Another set of techniques that are emerging fast in the recent past are learning based approaches. Mairal et al.  [12] proposed a dictionary learning based technique which is solved by using K-SVD. Xie et al.  [20] proposed a method that combines the sparse coding and deep networks, which are pretrained by denoising auto-encoders to perform denoising as well as inpainting. Yeh et al.  [21] proposed a deep generative network based approach to perform blind semantic inpainting. Zhang et al.  [22] proposed a spatio-temporal based deep network architecture to address different kinds of streaks that might be present in multispectral images. However, the main limitation with any kind of deep network based technique is that they require large amounts of data for training. The prime advantage with DDL is that the training data needed for it will be very less when compared to the data needed to train deep networks. Hence, for multispectral imaging where the availability of public datasets is very less, DDL will be a better way to perform inpainting.

Yet another important challenge of multispectral images is the storage, due to their huge size. Sparse coding techniques have shown to perform well in obtaining the sparser representations which in turn will aid in effective storage of these images. In this paper, we show that the streaked multispectral image can be inpainted as well as sparsely represented with performance similar or better than the state-of-the-art inpainting techniques. The following are the main contributions of this paper:

  • An alternating minimization methodology for DDL is proposed for addressing regression problems in image processing.

  • The proposed DDL framework has been shown to be useful in inpainting multispectral images.

  • We leverage the multi-level architecture to derive the sparse representations of multispectral images.

  • The proposed method is experimentally validated and compared with state-of-the-art inpainting techniques.

2 Deep Dictionary Learning

In this section, the mathematical framework of deep dictionary learning is described, which was first proposed by  [19] and it is adapted for regression problem in image processing like inpainting as explained in Sect. 3. The shallow (single layer) dictionary learning will provide the sparse representation of the input image. Let \(\mathbf {D_1}\) and \(\mathbf {Z_1}\) be the dictionary and sparse codes, respectively, for the input matrix X, where its columns indicate the lexicographically ordered input image patches. Mathematically, the relation between \(\mathbf {X}\), \(\mathbf {D_1}\), and \(\mathbf {Z_1}\) is expressed as \(\mathbf {X} = \mathbf {D_1}\mathbf {Z_1}\). The multiple layer extension of the shallow dictionary learning is defined as deep dictionary learning. This is motivated from the concept of deep learning. If \(\mathbf {D_N}\) and \(\mathbf {Z_N}\) denotes the learnt dictionary and sparse codes, respectively at \(N^{th}\) layer, then the relation between \(\mathbf {X}\), \(\mathbf {D_N}\), and \(\mathbf {Z_N}\) can be mathematically expressed as \(\mathbf {X} = \mathbf {D_1} \phi \left( \mathbf {D_2} \phi \left( \mathbf {D_3} \phi \left( ...\phi \left( \mathbf {D_N} \mathbf {Z_N} \right) \right) \right) \right) \), where \(\phi \) is the non-linear activation function. In this multi-level architechture, it is important to note that the sparse codes derived at one stage are passed onto next higher level. Since the higher level dictionaries are derived corresponding to sparse codes rather than intensity images itself, we show in Sect. 4.1 that the degree of sparsity in sparse codes estimated at final stage of proposed framework is much higher than when compared to the sparse codes derived from their single level counterparts.

3 Alternating Minimization Methodology

In this section, we describe the alternating minimization methodology followed to implement the DDL framework. The methodology for three layers is described. The same can be extended for N number of layers. Let \(\mathbf {Y}\) be the input data matrix where its columns are the lexicographically ordered streaked image patches, \(\mathbf {R}\) is the mask indicating the streaks, and \(\mathbf {X}\) is the data to be reconstructed, then

$$\begin{aligned} \mathbf {Y} = \mathbf {R} \odot \mathbf {X} \end{aligned}$$
(1)

where, \(\odot \) is a pixel wise multiplication operator. For a three layer DDL framework, the clean image X can be factorized as \(\mathbf {X} = \mathbf {D_1} \phi \left( \mathbf {D_2} \phi \left( \mathbf {D_3} \mathbf {Z_3} \right) \right) \). The dictionary and the coefficient matrix at each layer can be estimated by minimizing the following cost function:

$$\begin{aligned} \begin{aligned}&\underset{\mathbf {D_{1}},\mathbf {D_{2}},\mathbf {D_{3}},\mathbf {Z_3}}{\text {min}}&\Vert { \mathbf {Y}- \mathbf {R} \odot \mathbf {D_{1}} \phi \left( \mathbf {D_{2}} \phi \left( \mathbf {D_{3}}\mathbf {Z_3} \right) \right) }\Vert ^{2}_{F} + \lambda _3 \Vert \mathbf {Z_3} \Vert _{1}\\&\text {s.t}&\mathbf {Z_{2}} = \phi \left( \mathbf {D_{3}}\mathbf {Z_3} \right) , \mathbf {Z_{1}} = \phi \left( \mathbf {D_{2}}\mathbf {Z_{2}} \right) \end{aligned} \end{aligned}$$
(2)

The augmented lagrangian form [9] of Eq. 2 is given by:

(3)

where, \(\lambda _1\), \(\lambda _2\), \(\lambda _3\), \(\mu _1\), and \(\mu _2\) are the regularization constants. Above cost function is non-convex in nature as all the unknowns are coupled together by a multiplication operator (see Section 1 of supplementary material for proof). Hence, we propose to solve by using an alternating minimization (AM) approach [2]. At the first layer of DDL framework, Eq. 3 can be simplified as follows:

(4)
$$\begin{aligned} \begin{aligned} \widehat{\mathbf {D_1}} \leftarrow&\underset{\mathbf {D_1}}{\text {min}}&\Vert { \mathbf {Y} - \mathbf {R} \odot \mathbf {D_1}\mathbf {Z_1}}\Vert ^{2}_{F} \end{aligned} \end{aligned}$$
(5)

For the sake of mathematical convenience, we introduce \(\mathbf {R}\) and \(\mathbf {D_1}\) in the regularization term of Eq. 4 and can be modified as:

(6)

Equation 6 can be further simplified as

$$\begin{aligned} \widehat{\mathbf {Z_1}} \leftarrow \underset{\mathbf {Z_1}}{\text {min}} \Vert \mathbf {R} \odot \mathbf {A_1} \mathbf {Z_1} - \mathbf {B_1} \Vert ^2_F + \lambda _1 \Vert \mathbf {Z_1} \Vert _{1} \end{aligned}$$
(7)

where, and . Similarly, at second layer Eq. 3 can be simplified as

$$\begin{aligned} \begin{aligned} \widehat{\mathbf {D_2}} \leftarrow&\underset{\mathbf {D_2}}{\text {min}}&\Vert { \phi ^{-1}(\mathbf {Z_1}) -\mathbf {D_2}\mathbf {Z_2}}\Vert ^{2}_{F} \end{aligned} \end{aligned}$$
(8)
$$\begin{aligned} \widehat{\mathbf {Z_2}} \leftarrow \underset{\mathbf {Z_2}}{\text {min}} \Vert \mathbf {A_2} \mathbf {Z_2} - \mathbf {B_2} \Vert ^2_F + \lambda _2 \Vert \mathbf {Z_2} \Vert _{1} \end{aligned}$$
(9)

where and \(\mathbf {B_2} = \left[ {\begin{matrix} \phi ^{-1}\left( \mathbf {Z_1} \right) \\ \sqrt{\mu _2}\phi (\mathbf {D_3}\mathbf {Z_3}) \end{matrix}} \right] \), where I is the identity matrix. At third layer, Eq. 3 can be simplified as

$$\begin{aligned} \begin{aligned} \widehat{\mathbf {D_3}} \leftarrow&\underset{\mathbf {D_3}}{\text {min}}&\Vert { \phi ^{-1}(\mathbf {Z_2}) -\mathbf {D_3}\mathbf {Z_3}}\Vert ^{2}_{F} \end{aligned} \end{aligned}$$
(10)
$$\begin{aligned} \widehat{\mathbf {Z_3}} \leftarrow \underset{\mathbf {Z_3}}{\text {min}} \Vert \mathbf {D_3}\mathbf {Z_3} - \phi ^{-1} \left( \mathbf {Z_2} \right) \Vert ^2_F + \lambda _3 \Vert \mathbf {Z_3} \Vert _1 \end{aligned}$$
(11)

Equations 7, 9, and 11 are similar to sparse coding stage of K-SVD algorithm [1] and we solve it by using standard orthogonal matching pursuit (OMP) [16] as we found it to be effective and simple to use. After obtaining the coefficients \(\mathbf {Z_1}\), \(\mathbf {Z_2}\), and \(\mathbf {Z_3}\) we solve Eqs. 5, 8, and 10 by using the approach as was employed for dictionary update stage of K-SVD [1]. The final inpainted image can be reconstructed by using the following equation

$$\begin{aligned} \widehat{\mathbf {X}} = \widehat{\mathbf {D_1}}\phi \left( \widehat{\mathbf {D_2}}\phi \left( \widehat{\mathbf {D_3}} \widehat{\mathbf {Z_3}}\right) \right) \end{aligned}$$
(12)

The alternating minimization approach used for proposed DDL framework is summarized in Algorithm 1.

figure a

It has to be noted in Algorithm 1, we have initialized \(\mathbf {D_1}\), \(\mathbf {D_2}\), and \(\mathbf {D_3}\) as over-complete dictionaries. This is due to the fact that we impose sparsity at all the three levels. Even though our approach looks similar as proposed by [19], it differs from the original formulation in the following ways:

  • The DDL framework developed by [19] is used for classification while our DDL is used for inpainting which is a regression problem.

  • We use OMP at sparse coding stage while [19] use iterative soft thresolding algorithm [4], which we found to be slow in finding the dictionary coefficients.

  • Unlike [19], we added the additional constraints on the dependency of the sparse codes on the previous layers which improved our restored results.

  • They used ‘\(\tanh \)’ as the non-linearity between the layers while we found ‘\(\tan \)’ to be more appropriate as the range of sparse coefficient values is beyond the domain of ‘\(\tanh ^{-1}\)’ in our scenario.

  • Tariyal et al.  [19] imposes sparsity at the final stage of DDL while we impose at every stage which we found advantageous in obtaining the higher degree of sparsity without compromising on the reconstruction quality.

4 Experimental Results

In this section, the experimental results are presented on inpainting of multispectral images. We compare proposed technique with conventional exemplar based method [3], deep dictionary learning approach by [19], and state-of-the-art deep neural network technique [22]. Instead of working on the original streaked image, we synthetically generate the streaked observations from original clean images so as to do a quantitative analysis with state-of-the-art approaches. Patches of size \(8 \times 8\) are extracted from the streaked image and then arranged lexicographically along the columns of \(\mathbf {Y}\). For all our experiments, we found the optimal values for regularization constants as \(\mu _{1} = \mu _{2} = 0.1\). The number of sparse coefficients at first, second, and third layers were empirically chosen as 20, 10, and 5 respectively. We have tried multiple non-linear activation functions and found that ‘tangent’ function suited the best for our experiments. We have included an experiment in Sect. refsec:ddl of supplementary material where the comparative study of the performance of proposed approach by using different non-linear functions is presented.

Fig. 1.
figure 1

Results of Inpainting: (a) Example 1, (b) example 2, (c) example 3, (d) example 4, and (e) example 5. First row: original image. Second row: streaked image. Third row: Output of [3]. Fourth row: Result of [19]. Fifth row: Restoration by [22]. Sixth row: Proposed approach.

Figure 1 shows the results for LandSat7 dataset  [22]. The first and the second rows represent the original and streaked images respectively. The output of [3] is presented in the third row of Fig. 1. It is to be noted that the output reconstructed by [3] has some boundary artifacts in the regions where the filling has been done. The main reason behind such artifacts might be due to the absence of smoothly varying regions in multispectral images. Restored result by [19] (in fourth row of Fig. 1) could not fill in the missing regions completely. This can be attributed to the reasons as mentioned in Sect. 2. The same is reflected in the quantitative analysis as was presented in Table 1. Finally, we also compare with a deep neural network approach [22]. The architecture proposed in [22] needs temporal observations as test input, where one of the observation is clean taken at a different time. Since all the competitive methods work on the single degraded observation, for fare evaluation, we give both the inputs to the architecture in [22] as the same streaked observation and the corresponding ouptuts are given in the fifth row of Fig. 1. It can be clearly seen that the streaks are not completely filled which affects the visual quality of the output. This might be due to the necessity of one clean observation among the test input given to its architecture. Also, the neural network architecture in [22] might lack the ability to generalize different kinds of degradations. The outputs of our approach are presented in the last row of Fig. 1. It can be seen that the reconstruction quality is much better than state-of-the-art qualitatively as well as quantitatively.

Table 1. Quantitative analysis using PSNR.

4.1 Sparsifying the Sparsity

In this subsection, we present a synthetic experiment to show another interesting aspect of our proposed architechture with regards to its sparse representation capability. Figures 2(a) and (b) shows the clean and synthesized streaked images, respectively, of a single band in multispectral image from LandSat7 dataset. Here we compare the degree of sparsity that can be attained by proposed approach when compared to the single level dictionary learning technique in [1] without the loss in reconstrcution quality. The outputs of [1] and proposed approach (3 levels) with the constraint on the number of sparse coefficients as 5 is shown in Figs. 2(c) and (d), respectively. Quantitative analysis is mentioned in the caption of Fig. 2. It can be clearly seen that the output of [1] has lots of artifacts which significantly affects the visual quality. For the same degree of sparsity we were able to outperform the single level approach by several times quantitatively and the visual quality of our reconstructed output is much better than [1]. This was possible as the higher degree of sparsity is obtained by working on sparse latent codes (\(\mathbf {Z_1}\) and \(\mathbf {Z_2}\)) domain rather than the intensity image domain as was done by a single level dictionary learning approach in [1]. This ability to produce sparse codes with higher degree of sparsity will help in the effective storage of these large multispectral images without the loss in quality.

Fig. 2.
figure 2

Sparsifying the sparsity. (a) Clean image, (b) streaked observation. Output of: (c) Aharon et al.  [1] with 5 non-zero coefficients (PSNR = 20.7288 dB), (d) proposed approach (PSNR = 36.3664 dB).

5 Conclusions and Future Work

In this paper, an alternate minimization based strategy for deep dictionary learning is proposed for addressing regression problems in image processing. The DDL framework is shown to be effective in terms of the sparsity while having a state-of-the-art performance. Even though we have taken inpainting as an application, the same framework can very well be used for other inverse problems like denoising where the size of the artifact is small when compared to the size of the patch used. Here, we inpainted the streaks in multi-spectral images. Compared to the traditional inpainting approaches, the drawback of the DDL method includes retraining the model for different scenarios. Further, the time taken by DDL is dependent on the number of layers and based on the formulation, it is clear that it takes more time than K-SVD. The main aim of this work was to achieve state-of-the-art performance with smaller model sizes. In future, we plan to address these issues by incorporating local neighborhood information within the DDL framework in addition to new methods for dictionary update. We would like to do further analysis on the image compression ability of proposed DDL framework. Also, we plan to incorporate spectral consistency priors and improve the reconstruction capability of the proposed technique with smaller model sizes.