Keywords

15.1 Introduction

The image fusion is a process of combining multiple images into an image in such a that it not only removes redundant information, integrates complementary information and increases the amount of image information, but also removes noise, improves contrast and visual effect, enhances the scenic details, etc. The Human Visual System (HVS) is good at identifying salient features, such as colors, edges, contrast, etc., present in an image. Devices nowadays have a limited set of colors which are required to represent the color or grayscale image at a more vibrant level. This can deteriorates the image quality, which leads to an image that can be unpleasant to human eyes.

In general the image fusion techniques fall under two domains, one is the spatial domain and the other is transform domain [1]. The spatial domain-based approach is straightforward and includes methods such as taking average, maximum or minimum of the source image’s pixel intensity. However, along with its simplicity, this technique generates side effects such as unwanted noise, artifacts, and reduced contrast [1]. To eliminates the shortcomings of the spatial domain approach, the transform domain approach was suggested in [2], where a pixel-wise comparison was made between two source images and the ultimate result was gained after processing every single pixel present in both images. The method in [2] worked well in improving the quality of a fused image and enhancing the contrast. However, the method was lengthy in terms of processing time. In order to reduce the computational complexity and for further improvements in the contrast, a region-based image fusion technique in transform domain has been proposed in this paper.

The rest of the paper is organized as follows. Section 15.2 gives a brief account of the related work and our previous work. Section 15.3 discusses the proposed method. Section 15.4 presents the experimental results which have been obtained using MATLAB, and discussions of the results, followed by a conclusion in Sect. 15.5.

15.2 Related Work

In [2], Nirmala et al. have proposed a fusion method based on standard deviation. Their method uses DWT (Discrete Wavelet Transforms) to achieve multi-level image fusion and proposes that standard deviation can be performed on the approximate coefficients before the final fused image is reconstructed using IDWT (Inverse Discrete Wavelet Transforms). The method is novel, and that motivated our previous work on contrast enhancement in image fusion.

In [3] a robust sparse representation (RSR) model to extract detailed information in a set of input images has been proposed. It has been achieved by replacing the conventional least squared reconstruction error with a so called sparse reconstruction error. In [3] the local information from each input image patch and its spatial contextual information are collaboratively employed to determine the focused and defocused regions in multi-focus input images. In [4] the author proposed a novel boundary finding based multi-focus image fusion algorithm, in which the task of detecting the focused regions is treated as finding the boundaries between the focused and defocused regions from the source images. As these recent works have preferred region-based processing over the pixel-based processing so they have served as a motivation for the work proposed in this paper.

Two major shortcomings which have been identified in [1, 2] are:

  1. 1.

    Loss of contrast: A shift in the input signal can lead to a significant variation in the energy distribution of the DWT coefficients at different scales which can result in contrast loss [5].

  2. 2.

    Loss of processing time: As our previous work is based on a pixel-based processing, so each and every pixel present in the source images needs to be processed before the right selection is made. Hence this process is very lengthy and time-consuming.

15.3 Proposed Method

Wavelet based image fusion is achieved by performing the wavelet transform on input images in addition to a fusion rule. A subsequent step includes performing an inverse wavelet transform \(\omega^{ - 1}\) to retrieve the final fused image. This process can be expressed below in Eq. (15.1).

$$C = \omega^{ - 1} \left( {\phi \left( {\omega \left( A \right),\omega \left( B \right)} \right)} \right)$$
(15.1)

In the proposed method, we have dealt with the region of the image rather than dealing with the actual pixels as was done in [2]. The word ‘region’ here stands for a group of pixels combined together. There are various region-based image fusion algorithms available but we have concentrated mostly on the region based image fusion using energy estimation [6]. Two registered source images are shown in Fig. 15.1, out of which one image is out of focus on the right-hand side and the other image is out of focus on the left-hand side. They are considered as inputs and DWT is performed on them to extract the wavelet coefficients by using Eq. (15.1), once we have the coefficients they are clubbed under a \(3 \times 3\) mask. Once we have created our desired mask it is convolved with both of the input images simultaneously. And at every step it computes the energy of the region that comes under the mask (i.e. squaring and adding all the coefficient values that come under the mask) and once the energy has been computed for a particular region in both the images (let’s say image A and B), a comparison among these regions is made using a fusion rule.

Fig. 15.1
figure 1

Parent images (a) and (b) are fused together to give image (c) and image (d) Although image (c) has less noise in comparison to both the parent images, it has lost sharpness and contrast. Image (d) has been achieved by using a contrast enhancement technique while achieving image fusion. In comparison to image (c) it has further suppressed the noise and has also maintained better contrast characteristics (yellow oval) along with better detail preservation (red rectangle)

15.3.1 Fusion Rule Used

The proposed method uses an absolute maximum selection rule in the transform domain [7]. Let \(A(i,j)\) and \(B(i,j)\) be the two images to be fused and their approximate wavelet coefficients corresponding to low-frequency subbands are \(CAA(i,j)\) and \(CAB(i,j)\) respectively. So, a \(3 \times 3\) mask for these coefficients is defined and the energy associated with this mask can be calculated using Eq. (15.2), where M and N stand for the total rows and columns number of the image and the size of an image is \(M \times N\), i and j are the row and column location of the pixel, n represents the centre point of the mask. Here \(\alpha\) stands for the value of wavelet coefficients corresponding to separate input images.

$$\begin{aligned} & For\;i = 1:M \\ & For\;j = 1:N \\ & ECAA(i,j) = \sum\limits_{n = - 1}^{1} {\alpha_{AA} (i + n,j + n)^{2} } \\ & ECAB(i,j) = \sum\limits_{n = - 1}^{1} {\alpha_{AB} (i + n,j + n)^{2} } \\ \end{aligned}$$
(15.2)

Once the energy is calculated then both the regions can be compared by the absolute maximum rule in the following manner.

$$ECAF\left( {i,j} \right) = \left\{ {\begin{array}{*{20}c} {ECAA(i,j)\;{\text{if}}\;\left| {ECAA} \right| \ge \left| {ECAB} \right|} \\ {ECAB(i,j)\;{\text{if}}\;\left| {ECAB} \right| \ge \left| {ECAA} \right|} \\ \end{array} } \right\}$$
(15.3)

In Eq. (15.3), \(ECAF\left( {i,j} \right)\) is the energy value of the region that will be present in the final fused image. Same steps have been repeated until the mask has fully mapped both input images and the final image C is filled up completely. Similarly, the coefficients corresponding to the high frequency subbands or detail coefficients are processed by the defined fusion rule and are made present in the final image.

To achieve the proposed region based fusion DWT is carried up to 3 levels as it was done in our previous work in [2]. It can be observed that the fused image turned to be better than the input images but still had a certain level of blurriness associated with it. To eliminate this shortcoming, the approximate coefficient at level 1 are scaled by multiplying it by a factor of 0.75, which is selected empirically. As a result, the overall energy within the fused images has reduced. This makes the image to appear darker although the features present within become more prominent and definite as illustrated in Fig. 15.1. Once the features are captured improvement in contrast can be addressed so that the final image appears to be sharper and crisper. This has been achieved by using an image enhancement techniques, i.e., Gamma Correction [8] given as:

$$S = P * R^{\gamma }$$

where S is output pixel value, R is an input pixel value and \({\text{P}}\;\& \;\gamma\) are non-negative real numbers. We have empirically defined \(\gamma = 1.1\).

15.3.2 Flowchart for the Proposed Method

As indicated in Fig. 15.2, in the proposed method, firstly, we read two test images, followed by performing DWT on three levels that give us approximate and detail coefficients respectively. The next step is to generate masks and calculate energies of the defined regions. The following step is the comparisons between the regions comprising approximate coefficients i.e. ECAA and ECAB. This results in the generation of the region corresponding to ECAF using the maximum fusion rule in the final image. In a similar fashion, we compare the rest of the regions comprising detail coefficients of the images A and B. After analysing and generating the appropriate regions, we can directly reconstruct the image to get the output fused image using IDWT. Finally, we can use the power law to enhance the contrast to get the final fused image.

Fig. 15.2
figure 2

Flowchart for the proposed region-based method

15.4 Experimental Results

Test images which have been used to demonstrate the research work are shown in Fig. 15.3. Yellow circles in Fig. 15.3 represent out of focus or blurry regions. Figure 15.4 compares the proposed method with our previous method and an existing method. The left column represents the results from our previous work in [2]. In the left column both of the reconstructed images has less grayish and blurry appearance in comparison to the images in the far right column. The far-right column in Fig. 15.4 demonstrates the results of method proposed in [1] and shows that the reconstructed images possess a uniform gray texture. As a result, the reconstructed images lack in edges and features as images look blurry and soft. The results of the proposed method can be seen in the middle column of Fig. 15.4. An overall improvement in contrast can be seen in comparison to the other methods. An improvement, in contrast, makes the presence of feature more evident and improves the quality of the edges present, which are considered as details of an image. Red markings are used in the middle column to illustrate to the improved features and edges as a result of the proposed method. In Fig. 15.4b, the hat linings are more prominent in contrast as depicted by the red rectangle and the yellow oval illustrates the clarity in hair follicles. In Fig. 15.4a, the red rectangles also project the visible improvements in contrast, which makes the markings on the surface to look more crisp and clear.

Fig. 15.3
figure 3

Test images from left to right: plane and Lena

Fig. 15.4
figure 4

Comparison of reconstructed images from left column to right column: method used in [2], the proposed method, method used in [1]

The quality parameters taken into consideration are Peak, Signal to Noise Ratio (PSNR) and entropy [9]. If \(I\left( {i,j} \right)\) represents the grey-level in input image at the ith row and jth column and \(D\left( {i,j} \right)\) stands for the value in the output image, then an error \(e\left( {i,j} \right)\) is defined as \(e\left( {i,j} \right) = I\left( {i,j} \right) - D\left( {i,j} \right)\). The mean- squared-error (MSE) is defined as:

$$MSE = \frac{1}{MN}\sum\limits_{\begin{subarray}{l} 0 \le i < M \\ 0 \le j < N \end{subarray} } {\left[ {I\left( {i,j} \right) - D\left( {i,j} \right)} \right]}^{2}$$
(15.4)

where M and N are the numbers of rows and columns of the image. Once MSE is calculated, PSNR can be calculated using, \(PSNR = 10\log_{10} \left( {\frac{{Max_{i}^{2} }}{MSE}} \right)\) where \(Max_{i} = 255\). PSNR should be as high as possible. Table 15.1, makes it evident that the proposed method performs better in terms of PSNR than the other two methods.

Table 15.1 Comparisons among the existing methods and the proposed method

The entropy [9] of an image is the measure of the information contained in the fused image. Higher values of entropy indicate that the fused image contains more information. The entropy is given by \(E = - \sum\nolimits_{L = 0}^{L - 1} {P_{l} \log_{2} P_{l} }\), where L represents the number of gray levels, is the ratio between the number of pixels with gray level \(P_{l}\) and the total number of pixels with gray level l and the total number of pixels.

Table 15.1 shows that the entropy is higher in the proposed method when compared to the other methods. As the entropy value is higher in the proposed method, it means that more information is contained in the fused image. In addition, the computational time taken by the proposed method is less and hence it is more time efficient.

15.5 Conclusion

In this research, we have been able to enhance the contrast in image by fusion. The proposed method produces a superior fused image with a reduction in processing time in comparison to the other two existing methods.

The proposed method can be extended to enhancing coloured images. The proposed method can also be extended to make further improvements in selecting regions or masks for processing.