1 Introduction

WITH the development of computer vision, consumers are demanding greater and more vivid visual experiences. Due to the limitation of the image sensor, the dynamic range of an ordinary digital camera is far lower than the natural scene, the captured images often show regions that are “too bright” or “too dark” [1,2,3]. High dynamic range (HDR) imaging technology is designed to solve this problem [4,5,6,7]. In general, the existing HDRI methods can be divided into two categories: the tone mapping based method and the multi-exposure image fusion based method [8,9,10]. Tone mapping based methods require HDR image data acquisition in advance, and show HDR images on the low dynamic range (LDR) displays using tone mapping techniques [11,12,13,14]. The method based on multi-exposure fusion skips the step of acquiring HDR image data and yields a tone mapped-like fused image directly, therefore, it generally takes less time to depict a high dynamic range of images than the tone mapping method [15,16,17]. In this paper, we achieve HDRI based on multi-exposure image fusion algorithm by merging multiple image sequences with different exposures.

The primary goal of the multi-exposure fusion algorithm is to show a high dynamic range (HDR) target scene in a LDR image [18]. In recent years, many researchers have investigated multi-exposure fusion algorithms. Mertens et al. [19] proposed a multi-exposure image fusion algorithm based on the pyramid decomposition, i.e., Laplacian pyramids of LDR images and Gaussian pyramids of weight maps consist of contrast, saturation and well-exposedness, but at the cost of local information of the detail and context. Li et al. [20] proposed an image fusion algorithm with on guided filtering. It divided input image into base layer and detail layer, and guided filtering was used to construct weight map. This method could preserve local detail information well. Ma et al. [21] proposed a multi-exposure fusion method based on structural patch decomposition. While it could preserve a good global contrast, halo artifacts occur in regions with large differences in intensity values. In addition, Kou et al. [22] proposed a gradient domain multi-scale exposure fusion algorithm based on weighted guided image filtering.

Ma et al. [23] proposed a low-resolution version of the input sequence to the full convolutional network for weight map prediction. Then, use a guided filter to jointly up-sample the weight map. The final image is calculated by weighted fusion. Compared with the algorithm in this paper, Algorithm proposed by Ma has higher time efficiency. The algorithm in this paper enhances the scale details of the image detail layer and the base layer separately, so it takes more time, but the image detail fusion is clearer. Li et al. [24] proposed a new multi-scale exposure fusion algorithm which smooths the Gaussian pyramid of the weight map of all LDR images by using weighted guided image filters. And the detail extraction component of the proposed fusion algorithm is designed to enhance the fusion image details. The algorithm in this paper decomposes the image into a base layer and a detail layer, and performs multi-scale fusion on the base layer and the detail layer. At the same time of multi-scale fusion, the corresponding detail enhancement algorithm is proposed to enhance the detail layer details and retain the basic layer image structure. Compared with the Algorithm proposed by Ma, the fusion image detail information of the algorithm in this paper retains more.

The previously mentioned methods all have some problems in generating HDR images, mainly due to the imbalance of global contrast, the loss of local details, and the existence of halo artifacts. To solve the aforementioned problems, an improved multi-exposure fusion algorithm based on multi-resolution pyramid with detail enhancement is proposed in this paper. Our main contributions are as follows: (1) the measurements of weight map are designed based on the brightness information, colorimetric information, and detail information respectively. (2) The fusion framework based on the Laplacian pyramids is improved. (3) Gain control is applied to the high-frequency layers to highlight the details, and gain control of the low-frequency layers make the fused image more consistent with the realistic brightness distribution.

The remainder of this paper is organized as follows. The second part of the paper discusses the functions of weight map measurement. Afterward, we investigate the improvement of the fusion framework and the fourth section is about the analysis of the experimental results. The final part discusses the conclusions and the future work.

2 Weight map measurement

Mertens et al. [19] designed weight measurement functions based on contrast, saturation and well-exposedness. Li et al. [20] designed a weighting factor based on Gaussian saliency. Ma et al. [21] divided the image patch into signal strength, signal structure and mean intensity; Xu et al. [25] designed weight measurements based on phase congruency, local contrast and color saturation, so it could then use the guided filter to refine the weight map.

Contrary to weight map construction of the previously mentioned methods, this paper constructs the weight map using three measurements: the well-exposedness evaluation function, the chromatic information evaluation function and the local detail preserved function. Figure 1 shows the construction process of the weight map. Firstly, the well-exposedness evaluation function is designed in the gray space. Then, we convert from GRB color space to CIE-Lab color space, the local detail preserved function and the chromatic information evaluation function are designed by the luminance information and the chrominance information of CIE-Lab color space respectively.

Fig. 1
figure 1

The construction process of weight map

2.1 Well-exposedness evaluation function

The human visual system is directly related to the exposure brightness of pixels, as regions that are too dark or too bright causing the human eye to be unable to extract the details of the scene. Therefore, this paper designs the well-exposedness evaluation function through brightness distribution for only the exposure. The “optimum exposed value” is the best performing brightness value across the entire brightness range; therefore, the closer the pixel is to this value, the better it will behave. To improve the computational efficiency, we use grayscale image to evaluate well-exposedness. This article sets the “optimum exposed value” as the median of the entire brightness range, which is taken as 0.5 after normalization.

$$ w_{k}^{e} (p) = \exp \left( { - \frac{{(I_{k}^{\text{gray}} (p) - 0.5)^{2} }}{{2\sigma^{2} }}} \right) , $$
(1)

where \( I_{k}^{\text{gray}} \) is the grayscale version of the \( k \)-th LDR image, \( p \) represents the coordinates, the standard deviation \( \sigma \) controls the impact of \( I_{k}^{\text{gray}} \) on \( w_{k}^{e} \), \( \sigma \) equals 0.2 in our implementation.

It can be seen from Table 1 that the image quality is best when the optimal exposure value is 0.5, and the details are more.

Table 1 Image quality evaluation index IL-NIQE results with different optimum exposure values

2.2 Chromatic information evaluation function

Color information is an important factor to measure the image quality; it is also an essential factor for the human eye to feel the outside world. For the chromatic information of the image, the algorithm in this paper is processed in the CIE-Lab color space. CIE-Lab was established based on the international standard of color measurement appointed by International Commission on Illumination in 1931 and is a physiological color system. The \( I^{\text{Lum}} \) component represents the luminance information of the image, and the \( I^{a} \) component and the \( I^{b} \) component represent the chrominance information of the image. This paper calculates the chromatic information evaluation function by Eq. (2).

$$ w_{k}^{c} (p) = \sqrt {\left( {I_{\text{k}}^{a} (p)} \right)^{2} + \left( {I_{k}^{b} (p)} \right)^{2} } , $$
(2)

where \( I_{k}^{a} \) and \( I_{k}^{\text{b}} \) are the a-channel and b-channel of the \( k \)-th source image in the Lab color model respectively. The chromatic information evaluation function is designed to preserve the ample color of the source LDR image.

2.3 Local detail preserved function

For multi-exposure image fusion, the preservation of local detail information is very important. This article uses a Laplacian filter to extract local details of the luminance information of source image.

$$ w_{k}^{d} (p) = |L*I_{k}^{\text{lum}} (p)| , $$
(3)

where \( L \) is a \( 3 \times 3 \) Laplacian filter; \( * \) represents convolution operation; \( I_{k}^{\text{lum}} \) represents the grayscale version of the \( k \)-th image; \( | \cdot | \) means absolute operation. The local detail preserved function is designed to make details more weighted.

According to the previous three weight map measurements, the initial weight map is constructed as shown in Eq. (4).

$$ W_{k} (p) = \left( {w_{k}^{e} (p)} \right)^{{\theta_{1} }} \times \left( {w_{k}^{c} (p)} \right)^{{\theta_{2} }} \times \left( {w_{k}^{l} (p)} \right)^{{\theta_{3} }} , $$
(4)

where \( w_{k}^{e} \), \( w_{k}^{c} \) and \( w_{k}^{l} \) represent the well-exposedness evaluation function, the chromatic information evaluation function and the local detail preserved function of the \( k \)-th input image respectively; \( \theta_{1} \), \( \theta_{2} \) and \( \theta_{3} \) are exponential parameters of \( w_{k}^{e} \), \( w_{k}^{c} \) and \( w_{k}^{l} \) respectively, with all functions set to 1.

To obtain a consistent result, the weight map is normalized so that the weight of each pixel is sum to one.

$$ \hat{W}_{k} (p) = \frac{{W_{k} (p)}}{{\sum\nolimits_{k = 1}^{N} {\left( {W_{k} (p) + \varepsilon } \right)} }} , $$
(5)

where \( N \) represents the number of input images; \( \varepsilon \) is a small number that prevents the denominator from zero.

The final weight map is the normalized weight map, as shown in Fig. 2, showing three input images and their corresponding weight map. Among them, the input images are the under-exposed, normal-exposed, and over-exposed images, respectively. It can be seen from Fig. 2b that the region of input images with well-exposed or ample color information has a large weight; the weight of the under-exposed or over-exposed regions is small, or even zero.

Fig. 2
figure 2

(Source image courtesy of Jacques Joffre)

Input images and weight maps

3 Fusion

The traditional direct weighted fusion methods often result in discontinuous regions and gaps in the fused images. To ensure the fused image has global consistency, the methods [19] fuse the Laplacian pyramid of the input image with the Gaussian pyramid of the weight map, as shown in Eq. (6). However, the traditional Laplacian pyramid often loses details and is time consuming. In addition, there are also multi-resolution image fusion methods based on the frequency domain, such as wavelet pyramid [18, 24]. The method of image fusion in the frequency domain is also time consuming and prone to fade.

$$ \text{Pyr} \{ R\}^{l} = \sum\limits_{k = 1}^{N} {\text{G} \{ W_{k} \}^{l} \text{L} \{ I_{k} \}^{l} } , $$
(6)

where \( \text{Pyr} \{ R\}^{l} \) represents the \( {\text{l}} \)-th layer of the fusion pyramid. \( \text{L} \{ I_{k} \}^{l} \) is the \( {\text{l}} \)-th level Laplacian pyramid of the \( k \)-th input image. \( \text{G} \{ W_{k} \}^{l} \) is the \( {\text{l}} \)-th level Gaussian pyramid of the \( k \)-th weight map.

This paper proposes a new detail improved fusion framework based on pyramid decomposition. According to the input LDR image sequence \( \{ {\text{I}}_{\text{k}} \} \), this paper constructs the Laplacian pyramid \( \text{L} \{ I_{k} \}^{l} \) of the source image and the Gaussian pyramid \( \text{G} \{ W_{k} \}^{l} \) of the weight map using the method in [19]. The improved fusion framework in this paper is shown in Fig. 3.

Fig. 3
figure 3

Improved fusion framework

Firstly, we calculate the number of pyramid layers based on the size of the input image sequence. If we assume that \( r \) and \( {\text{c}} \) are the number of pixels of the height and width of the source image, respectively. Then the number of pyramid layers \( L \) is calculated by the Eq. (7),

$$ L = \left\lfloor {\log_{2} \hbox{min} ({\text{r}},{\text{c}})} \right\rfloor - 2 . $$
(7)

As is shown in Fig. 3, we give a gain control factor to high-frequency and low-frequency information layers of Laplacian pyramids respectively.\( A_{k}^{l} \) represents the high-frequency information of the \( k \)-th source image; \( D_{k} \) represents the low-frequency information of the \( k \)-th source image. The fusion strategy of high frequency information is shown in Eq. (8),

$$ L\{ R\}_{A}^{l} = \sum\limits_{k = 1}^{N} {G\{ \hat{W}_{k} \}^{l} L\{ I_{k} \}^{l} \cdot \alpha } , $$
(8)

where \( \alpha \) represents the high-frequency information gain factor, designed to enhance the details of the original image texture information, described in this paper as the constant \( \alpha = 1.1 \).

Low-frequency information fusion strategy is as Eq. (9).

$$ L\{ R\}_{D}^{l} = \sum\limits_{k = 1}^{N} {G\{ \hat{W}_{k} \}^{L} L\{ I_{k} \}^{L} \cdot mI_{k} } , $$
(9)

where \( mI_{k} \) indicates the \( k \)-th source image of the true average brightness.

Referring to method in [26], this paper calculates the average brightness using the following method:

$$ mI_{k} = \exp \left( {\frac{1}{M}\sum\limits_{\varOmega } {\log \left( {{\text{lum}}_{k} (p)} \right) + \varepsilon )} } \right) , $$
(10)

where \( M \) is the total number of pixels; \( \varOmega \) is the spatial domain coordinates of the \( k \)-th image;\( {\text{lum}}_{k} ({\text{p}}) \) is the brightness of the \( k \)-th image at \( p \); and \( \varepsilon \) is a small constant to avoid singularity. To avoid the interference of unnormal-exposed pixels, \( \varOmega \) takes pixels between 0.4 and 0.6 (image was normalized into 0–1 before). The fusion process used in this paper is shown in Fig. 4.

Fig. 4
figure 4

Image fusion process

4 The experimental results and analysis

In this paper, we compare our fusion method with the traditional Mertens’ method [19], the fusion method based on the guided filter [20], and the fusion method based on the structure patch decomposition [21]. All experiments were done using MATLAB (R2016a) on an Intel i5 processor with a 4G memory PC platform. Some of the comparison results are shown in Figs. 5, 6, 7, 8, 9, 10, 11, 12 and 13. The images in each Figure show four different images in each and are labeled (a–d), which stand for the four different methods.

Fig. 5
figure 5

(courtesy of Dani Lischinski)

Comparison on image set “Belgium House”

Fig. 6
figure 6

(courtesy of Bartlomiej Okonek)

Comparison on image set “Cave”

Fig. 7
figure 7

(courtesy of Tom Mertens)

Comparison on image image set “House”

Fig. 8
figure 8

(courtesy of Bartlomiej Okonek)

Comparison on image set “Kluki”

Fig. 9
figure 9

courtesy of Bartlomiej Okonek)

Comparison on image set “Laurenziana” (

Fig. 10
figure 10

(courtesy of Paul Debevec)

Comparison on image set “Memorial”

Fig. 11
figure 11

(courtesy of MATLAB)

Comparison on image set “Office”

Fig. 12
figure 12

(courtesy of Shree K. Nayar)

Comparison on image set “Garage”

Fig. 13
figure 13

(courtesy of Jacques Joffre)

Comparison on image set “Studio”

4.1 Comparison of global contrast

Figures 5, 6, 7, 8, 9, 10, 11, 12 and 13 shows the comparison of 9 experimental results. In general, Mertens’ method shows a good global contrast distribution. However, there is a certain fading phenomenon, as shown in Figs. 5a and 10a. Li’s method uses a guided filter to smooth the weight map, as shown in Figs. 5b and 11b. Ma’s method is based on structural patch decomposition, which decomposes the image patch into signal intensity, signal structure, and average intensity, which shows a good global contrast. However, it may lead to brightness inversion, as shown in Figs. 12c and 13c. The above methods do not consider the real brightness distribution of HDR scene for consideration. The proposed method produces a better global contrast with realistic brightness distribution and can also avoid the brightness inversion.

4.2 Comparison of local contrast

The preservation of local details is an important factor to measure the fusion quality of multi-exposure images. We also compare the detail preservation performances of the four methods. Figures 14, 15, 16 and 17 shows the close-up view of local regions in images of Figs. 5, 6 and 7 and 13 respectively.

Fig. 14
figure 14

Details of the image set “Belgium House”

Fig. 15
figure 15

Details of the image set “Cave”

Fig. 16
figure 16

Details of the image set “House”

Fig. 17
figure 17

Details of the image set “Studio”

Although the Mertens [12] method maintains a good global contrast, the loss details are serious, and some details are not clear enough, as shown in Figs. 14a, 16a and 17a; Li’s [13] method uses a guided filter and can preserve local details to a greater effect. However, it may still result in halo artifacts, such as the black side of the pillar outside the window in Fig. 14b. Ma’s [14] method may also lead to halo artifacts and brightness inversion in regions with large contrast difference, as shown in Figs. 14c and 17c. The algorithm proposed in this paper has a better detail preservation performance, while presenting clearer color information, which is undoubtedly better than the appeals method, as shown in Figs. 14d, 15, 16 and 17d.

4.3 Quantitative comparison

The image evaluation index is divided into reference images based evaluation methods and non-reference images based evaluation methods [27,28,29]. Image evaluation methods based on reference images usually need actuality as reference, while multi-exposure images are fused without reference images. Therefore, we use image evaluation indexes based on non-reference images. Blind Image Spatial Quality Evaluator (BRISQUE) [30] and integrated local natural image quality evaluator (IL-NIQE) [31] are two evaluation indexes based on non-reference images, which directly evaluate the image quality according to the input image. The closer the score is to 0, the better the image quality. BRISQUE is an image quality evaluation algorithm in spatial domain without reference. The general principle of the algorithm is to extract mean subtracted contrast normalized (MSCN) coefficients from the image, fit the MSCN coefficients into an asymmetric generalized Gaussian distribution (AGGD) asymmetric generalized Gaussian distribution, extract the characteristics of the fitted Gaussian distribution, and input it to the support vector. Regression in the machine SVM, so as to get the evaluation result of image quality. A smaller score indicates better perceptual quality. IL-NIQE is one of blind image quality assessment (BIQA) methods. IL-NIQE compares images to a model computed from images of natural scenes. It is used to evaluate the distortion of fused images in static scenes in our paper, including the sharpness of details and noise in color components. A smaller score indicates better perceptual quality.

In addition to the nine sets of comparative experiments previously presented, six sets of comparative experiments are also performed in this paper. Figure 18 shows the fusion results of the remaining six sets of images using the proposed algorithm. It can be seen from Tables 2 and 3 that the evaluation index of the proposed algorithm is superior to the traditional fusion methods in most image sets (the bold values in Tables 2 and 3 stand for the minimum values of each row).

Fig. 18
figure 18

The fusion results of the remaining six image sets by our proposed algorithm

Table 2 Image quality evaluation index BRISQUE results
Table 3 Image quality evaluation index IL-NIQE results

Mertens method maintains a good global contrast, the loss details are serious, and some details are not clear enough; Li’s method uses a guided filter and can preserve local details to a greater effect. However, it may still result in halo artifacts. Ma’s method may also lead to halo artifacts and brightness inversion in regions with large contrast difference. The algorithm proposed in this paper has a better detail preservation performance, while presenting clearer color information, which is undoubtedly better than the appeals method.

5 Conclusions

In this paper, we investigated how to measure the global contrast and the preservation of local detail. We propose a new multi-exposure image fusion algorithm with detail enhancement and weight map measurement functions based on the luminance, colorimetric, and detail information. Experiments show that the proposed algorithm is good at preserving the global contrast and details while avoiding halo artifacts, making the fusion result more colorful. Two image quality evaluation indicators show that the proposed algorithm is superior to the traditional multi-exposure image fusion method.

At present, only the multi-exposure image fusion algorithm for static scenes is researched in this paper. In the future, we plan to investigate the removal of ghosting in multi-exposure image fusion for dynamic scenes and look for further improvements to the algorithm.