1 Introduction

Images with all focused objects are more useful in image processing, remote sensing, and robotics applications than those with few focused objects (Li and Yang 2008). The amount/quality of information in the captured images is directly affected by the contained depth of optical systems. The objects in the illustrated scene will be defocused if they fall outside the effective depth of field (EDOF) limits (Pertuz et al. 2013). The EDOF of optical systems can be enhanced by acquiring numerous images (of the same scene) with different focus settings and then fusing them in such a way that all regions of the scene are in focus (De et al. 2006).

There are numerous methods in literature to enhance the EDOF. Multiresolution (pyramid or wavelet transform) fusion techniques (Burt and Kolezynski 1993; Yang et al. 2000) are based on the fact that image contains relevant features at different scales. Gradient pyramid and variance (for activity measure) produces blocking effects in the fused image (Burt and Kolezynski 1993). Discrete wavelet transform (DWT) based multisource image fusion using spatial frequency and simplified pulse coupled neural network suffers from shift variance (because of downsamplers) (Wang et al. 2014a, b; Geng et al. 2013). Multiscale geometric analysis tools (curvelet transform Choi et al. 2004, contourlet transform (CT) Doa and Vetterli 2005) obtain the asymptotic optimal representation by taking advantage of the geometric regularity of image intrinsic structures. Localization, multidirection, and anisotropy are the characteristics of CT (Li et al. 2013). However, it does not possess the shift-invariance property resulting in artifacts along the edges to some extent (Li et al. 2013).

Non subsampled contourlet transform (NSCT) selects the lowpass and highpass coefficients (using sum-modified laplacian and local neighbour sum of laplacian) to obtain the fused image (Geng et al. 2013). Similarly a NSCT based multi-focus fusion combines the advantages of transformed and spatial domain methods (Li et al. 2013). Surfacelet transform and compound pulse-coupled neural network selects the fusion coefficients in an optimized fashion (Zhang et al. 2014). Multi-scale weighted gradient-based fusion technique minimizes the problem of anisotropic blur and mis-registration (Zhou et al. 2014). The limitations of these schemes include computationally complexity and less robustness (Liu et al. 2014).

Robust principal component analysis and local sparse features assumes sparse nature of images (Wan et al. 2013). A block based image fusion uses a quad-tree structure to obtain an optimal subdivision of blocks (De and Chanda 2013). Pertuz et al. (2013) proposed a selective all-in-focus algorithm (SAF) for fusion of noisy images. The technique is based on three step procedure (measure, selectivity and fusion). The SAF (Pertuz et al. 2013) all in focus image obtained appears to be blurry in some portions of the image, hence some of the details are enhanced while others are flattened out.

To overcome the above issues, a guided filter based fusion scheme is proposed for multi-focus images. Source images are decomposed into base and detail layers. The base layers of the source images are averaged out to obtain the base layer of the fused image. The detail layers weights are computed based on whether the object in a particular image is in focus compared to the same object in all other images. Guided filtering is performed to refine the fusion weights. Simulation results show that the proposed scheme is more efficient and accurate comparable to state of the art schemes. The salient features of the proposed scheme include: (i) it is based on a fast two scale decomposition method; (ii) it preserves the details in the fused image; (iii) and it efficiently reduces noise.

2 Proposed image fusion

Let \(\digamma \) be the fused image obtained by combining a sequence of input images, \(I_{1}, I_{2}, \ldots , I_{K}\), of the same scene with different focus settings. The source images are decomposed into base \(B\) and detail \(D\) layers (Jameel et al. 2014).

$$\begin{aligned} B_{k}&= I_{k}\times f \nonumber \\ D_{k}&= I_{k}-B_{k} \end{aligned}$$
(1)

where \(I_{k}\) represents \(k\)th source image, \(B_{k}\) and \(D_{k}\) are the base and detail layers of the \(k\)th source image respectively, and \(f\) is the average filter of size \(31 \times 31\). The base and detail layers contain large and small scale variations respectively.

2.1 Base layer weight map assignment

The decomposition process is then followed by the appropriate assignment of weights to these layers. The base layers of all the images contain the large scale variations and are averaged out to obtain the base layer of the fused image.

$$\begin{aligned} B_{\digamma }=\frac{1}{K}\sum _{k=1}^{K}B_{k} \end{aligned}$$
(2)

where \(B_{\digamma }\) represents the base layer of the fused image. The averaging process also helps to smooth the noise.

2.2 Detail layer weight map assignment

However, the detail layers weights are computed based on whether the objects in a particular image are in focus compared to the same object in all other images. This is done by calculating the focus measure for every pixel by considering a small area around the pixel. The focus measure (gray-level variance Sun et al. 2004) \(\mathfrak {I}_{k}(i, j)\) of a pixel at coordinates \((i,j)\) in the \(k\)th source image is calculated as

$$\begin{aligned} \mathfrak {I}_{k}(i,j) = \sum _{(x,y)\varepsilon \varOmega (i,j)}\Big (I(x,y)-\mu ^{2}\Big ) \end{aligned}$$
(3)

where \(\varOmega (i,j)\) is the \(\ell \) x \(\ell \) area around \((i,j)\) and \(\mu \) is the mean gray-level of pixels within \(\varOmega (i,j)\). The focus measures \(\mathfrak {I}_{k}\) is calculated for each image \(I_{k}\) in the sequence. Spatial resolution and robustness to noise is dependent on the selection of the value of \(\ell \) (Malik and Choi 2007). This work uses a radius \(\ell \)=9.

\(\mathfrak {I}_{k}\) are compared to determine the weight maps for the detail layers as follows:

$$\begin{aligned} P_k^p= \left\{ \begin{array}{ll} 1 &{}\quad if \;\mathfrak {I}_k^p = \max \Big (\mathfrak {I}_1^p, \mathfrak {I}_2^p, \ldots , \mathfrak {I}_K^p \Big ) \\ 0 &{}\quad otherwise \end{array} \right. \end{aligned}$$
(4)

where \(\mathfrak {I}_{k}^{p}\) is the value of the focus measure of pixel \(p\) in the \(k\)th source image and \(K\) is the total number of source images. However, the fused image obtained from these weight maps may contain artifacts since they are noisy and not aligned with the boundaries of the object. Each weight map \(P_{k}\) is then passed through a guided filter \(G\) (He et al. 2013) with the corresponding source image \(I_{k}\) serving as the guidance image. The weight maps are passed through the guided filter to obtain the refined weights for the detail layers as follows,

$$\begin{aligned} \varGamma _{k}^{D}=G_{\nu ,\omega }(P_{k},I_{k}) \end{aligned}$$
(5)

where \(\nu , \omega \) are the parameters of the guided filter and represent filter size and blur degree respectively and \(k=1,2,\ldots ,K\). In this paper, the default parameters are set as \(\nu = 9\) and \(\omega = 10^{-6}\). \(\varGamma _{k}^{D}\) is the resulting detail layer weight map for the \(k\)th source image.

2.3 Fusion

The fused detail layer \(D_{\digamma }\) is obtained as (He et al. 2013),

$$\begin{aligned} D_{\digamma } = \sum _{k=1}^{K}\varGamma _{k}^{D}\times D_{k} \end{aligned}$$
(6)

The fused base and detail layers are combined to obtain the fused image. The fused image \(\digamma \) is,

$$\begin{aligned} \digamma = B_{\digamma }+D_{\digamma } \end{aligned}$$
(7)

The proposed algorithm is robust against noise (since averaging is done on the base layers) and enhances the details (by giving specific weights to each detail layer).

3 Results and analysis

The proposed method is tested on several sequence of source images. The images are obtained from (http://www.sayonics.com/downloads.html). Each synthetic sequence is generated by obtaining a blurred image for every scene point by convolving it with its corresponding point spread function according to its depth. The defocused image for the pixel located at \((i_{0}, j_{0})\) is obtained by adding the contributions of every defocused point. Complete detail of the process is given in Pertuz et al. (2013). Quantitative analysis is performed using signal to noise ratio (SNR), universal image quality index (QI) (Wang and Bovik 2002), peak signal to noise ratio (PSNR) and root mean square error (RMSE). The higher the SNR value, lesser the noise in the image. Similarly, a higher QI and PSNR values indicate a better fused image. Lower values of RMSE indicates less noise in the fused image.

Figure 1a shows a sequence of castle images with high level noise. Figure 2a, b shows the fused images obtained from weighted and pyramid based methods. Figure 2c shows the fused image obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 2d shows the fused image obtained from the proposed scheme for low level noise. Visually the fused images obtained from different methods appear similar. However, the quantitative analysis show that the proposed scheme provides better results compared to existing state of the art (Pertuz et al. 2013; Helicon Soft, Helicon Focus 2011; Zerene Systems 2011) schemes.

Fig. 1
figure 1

Example 1: castle images with high noise level

Fig. 2
figure 2

Example 1: fused images with different techniques ad low level noise, eh medium level noise, il high level noise

Figure 2e–h shows the fused images obtained given a castle sequence corrupted with middle level noise. Figure 2e, f are the images obtained from weighted and pyramid based methods. Figure 2g is obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 2h shows the fused image obtained from the proposed scheme. As the noise in increased in the input images, Fig. 2e has a blurred appearance and the details in the bridge walls are totally lost. A large amount of noise is still present in Fig. 2f. We can also see that in Fig. 2g the details in the water as well as those in the walls are flattened out while the details can be clearly seen in the proposed fused image. Quantitative analysis also show that the proposed scheme provides better results compared to existing (Pertuz et al. 2013; Helicon Soft, Helicon Focus 2011; Zerene Systems 2011) schemes.

Figure 2i–l shows the fused castle images obtained given a sequence corrupted with high level noise. Figure 2i has a very blurred appearance, noise is still present in Fig. 2j, the details are again smoothed out in fused image obtained using SAF algorithm (Pertuz et al. 2013) Fig. 2k while the details are preserved in the fused image obtained from the proposed scheme Fig. 2l.

The fused cameraman images obtained using weighted (Helicon Soft, Helicon Focus 2011) and pyramid (Zerene Systems 2011) based methods are shown in Fig. 3 a, e, i and b, f, j respectively, for input sequences corrupted with low, middle and high level of noise. Figure 3c, g, k shows the fused images obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 3d, h, l shows the fused image obtained from the proposed scheme for low, middle and high level of noise respectively. As the level of noise in increased in the input sequence, blurriness can be seen in the images (Fig. 3e, i) obtained from weighted (Helicon Soft, Helicon Focus 2011) method, noise is not eliminated in the images (Fig. 3f, j) obtained from pyramid based (Zerene Systems 2011) method, in Fig. 3g, k the left portion of the images are flat, however the details are preserved in the fused images obtained from the proposed method (Fig. 3h, l). Proposed scheme provides better quantitative results compared to existing (Pertuz et al. 2013; Helicon Soft, Helicon Focus 2011; Zerene Systems 2011) schemes.

Fig. 3
figure 3

Example 2: fused images with different techniques ad low level noise, eh medium level noise, il high level noise

Figure 4 show the fusion results obtained on Lena image. The fused images obtained using weighted scheme (Helicon Soft, Helicon Focus 2011) and pyramid (Zerene Systems 2011) based schemes are shown in Fig. 4 a, e, i and b, f, j respectively. Figure 4c, g, k shows the fused images obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 4d, h, l shows the fused image obtained from the proposed scheme for low, middle and high level of noise respectively. There is clear visual improvement in the images obtained using SAF and proposed algorithms as compared to weighted and pyramid based schemes. However, the superiority of the proposed scheme can be seen if we look at the hat portion. The details in the hat can be seen in Fig. 4h, l which are missing in Fig. 4g, k. Proposed scheme provides better quantitative results compared to existing schemes (Pertuz et al. 2013; Helicon Soft, Helicon Focus 2011; Zerene Systems 2011).

Fig. 4
figure 4

Example 3: fused images with different techniques ad low level noise, eh medium level noise, il high level noise

Figure 5 shows the results obtained on sequence of vegetable images. Figure 5a, b shows the fused images obtained using weighted and pyramid based methods. Figure 5c shows the fused image obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 5d shows the fused image obtained from the proposed scheme for low level noise. Visually the fused images obtained from different methods appear similar. However, the quantitative analysis show that the proposed scheme provides better results compared to state of the art schemes. Figure 5e–h shows the fused images obtained given a vegetable sequence corrupted with middle level noise. Figure 5e, f are the images obtained from weighted and pyramid based schemes. Figure 5g is obtained using SAF algorithm (Pertuz et al. 2013) while Fig. 5h shows the fused image obtained from the proposed scheme. As the noise in increased in the input images, Fig. 5e has a blurred appearance while noise is still present in Fig. 5f. We can also see that in Fig. 5g the details and texture of the vegetables are flattened out while the details can be clearly seen in the proposed fused image Fig. 5h. Quantitative analysis also show that the proposed scheme provides better results compared to existing schemes. Figure 5i–l shows the fused vegetable images obtained given a sequence corrupted with high level noise. Figure 5i again has a blurred appearance, noise is still present in Fig. 5j, the details are again smoothed out in fused image obtained using SAF algorithm (Pertuz et al. 2013) Fig. 5k while the details are preserved in the fused image obtained using the proposed scheme Fig. 5l.

Fig. 5
figure 5

Example 4: fused images with different techniques ad low level noise, eh medium level noise, il high level noise

Table 1 shows that proposed scheme provides better results in terms of SNR, QI, PSNR and RMSE as compared to existing related schemes (Pertuz et al. 2013; Helicon Soft, Helicon Focus 2011; Zerene Systems 2011). It can be seen that for low level noise, pyramid based scheme (Zerene Systems 2011) generally gives the worst performance. SAF algorithm (Pertuz et al. 2013) shows better results as compared to pyramid based scheme. The proposed scheme has slightly better SNR, QI, PSNR and RMSE values as compared to SAF algorithm (Pertuz et al. 2013). However, as noise is increased to medium level, the weighted based scheme (Helicon Soft, Helicon Focus 2011) produce the worst results. The proposed scheme yields much better results in terms of SNR, QI, PSNR and RMSE values as compared to SAF algorithm (Pertuz et al. 2013). For higher values of noise content, the proposed scheme again yields much better results as compared to SAF algorithm (Pertuz et al. 2013). The details in the images produced by the proposed method are preserved while noise is significantly reduced. It is important to note that the higher the noise, better the performance of our algorithm.

Table 1 Quantitative comparison of proposed and existing schemes

4 Conclusions

A guided filter based multifocus image fusion scheme is proposed. Input images are decomposed into base and detail layers. The base layer contains the large scale variations and are averaged out to obtain the base layer of the fused image. The detail layer weights are computed based on whether the objects in a particular image are in focus compared to the same objects in all other images. Guided filtering is performed to refine the weights. Simulation results show that the proposed scheme is a significant improvement compared to existing schemes.