1 Introduction

With the rapid progress of data acquiring technology, a variety of sensors capturing images are emerging. Although each sensor has irreplaceable advantages in its appropriate working condition and range, the information obtained according a single imaging sensor is incomplete. Image fusion is an important technique to integrate image data of the same target collected by multi-sensors, so as to extract the favorable information in each sensor to the greatest extent and synthesize it into high-quality images. They are more suitable for human visual perception and further image processing tasks [1]. Image fusion has been extensively used in computer vision, military, remote sensing, etc. [2].

Fig. 1
figure 1

The pipeline of the proposed novel and fast EMD-based image fusion method for two multi-focus images, where the input source images are decomposed into several IMFs and a residue, the patch size of the partition is set into \(4\times 4\), and the overlapping number of rows/columns is set into 2 in order to better illustrate our method

The classical empirical mode decomposition (EMD) proposed by Huang et al. [3] is a powerful tool for processing non-stationary and nonlinear one-dimensional signals, which can adaptively decompose a 1D time series signal into several intrinsic mode functions (IMFs) and a residue through an iterative sifting processing. It has also been extended to 2D images called bidimensional empirical mode decomposition (BEMD) and can adaptively represent the feature information with different scales of input source images. Many researchers have applied EMD into image fusion to overcome the artifacts of predefined basis functions in the fusion methods based on classical transforms such as Fourier transform and wavelet transform. Unfortunately, the existing EMD-based image fusion approaches [4,5,6,7,8,9,10,11] have limited impact in image fusion since there are still some limitations that need to be addressed. Firstly, the computation efficiency of such methods is unsatisfactory since the envelope generation in the sifting processing of 2D images becomes very time-consuming as the image size rapidly increasing especially for the surface interpolation-based EMD methods. Secondly, they mainly employ the pixel-based fusion strategy to merge each component of EMD, which may introduce some spatial artifacts generated by the noisy characteristics of the pixel-wise maps. Furthermore, the fusion rules in such methods cannot capture the salient information of input source images well, therefore they generate fuzzy fusion results.

To improve the performance of EMD-based image fusion methods, we propose a novel and fast EMD-based image fusion method in this paper (Fig. 1). We firstly develop a multi-channel bidimensional EMD method based on morphological filter (we name it as MF-MBEMD) to decompose the input source images into several IMFs with different scales and a residue, which can accelerate the computation of EMD for multi-channel images significantly. We then adopt a patch-based fusion strategy with overlapping partition instead of the pixel-based fusion way usually used in EMD-based image fusion to fuse the IMFs and residue, where the energy-based rules are designed to extract the salient information of input source images. Finally, we obtain the final result by adding the fused IMFs and the fused residue together. The main contributions of this paper can be summarized as follows:

  • We develop a multi-channel bidimensional EMD method based on morphological filter (MF-MBEMD) to conduct image fusion. It uses the morphological expansion and erosion filters to compute the upper and lower envelopes of a multi-channel image, where the size of the morphological filter window is determined by the minimal average extremum distance of all image channels. It significantly improves the computation efficiency of EMD for multi-channel images while maintaining the decomposition quality.

  • We adopt a patch-based fusion strategy with overlapping partition to replace the pixel-based fusion way commonly used in EMD-based image fusion, where an energy-based maximum selection rule is designed to fuse the IMFs, and the feature information extracted by IMFs is used as a guide to merge the residue. It can capture the salient information of the source images well and also reduce the spatial artifacts introduced by the noisy characteristics of the pixel-wise maps.

  • We evaluate the fusion performance of several state-of-the-art EMD methods for multi-channel images under the popular fusion strategies used in the EMD-based image fusion methods on some commonly used data sets with multi-focus and multi-modal images. The evaluation results demonstrate that our proposed MF-MBEMD obtains better time performance, and our proposed patch and energy-based fusion strategy achieves better visualization and objective metrics.

2 Related work

2.1 Bidimensional empirical mode decomposition

The original 1D EMD [3] employs a sifting processing to extract IMFs with different scales, which needs to generate the upper and lower envelopes of all local extrema points (maxima and minima) by cubic splines. To generalize it to 2D images, Nunes et al. [12] generated envelope surfaces by radial basis function interpolation. Al-Baddai et al. [13] proposed a novel method of envelope surface interpolation based on Green’s function to improve the stability of BEMD. The approach using bi-Laplacian operator interpolation to compute the upper and lower envelopes [14,15,16] is developed to perform the decomposition for signals defined on 3D surfaces and can be applied to 2D images naturally [17, 18]. To avoid the computation of the envelope surfaces in the sifting processing, Pan and Yang [8] generated the mean surface by interpolating the centroid points of neighbour extremum points in its Delaunay triangulation, while Qin et al. [6] used a window function as an alternative to compute the mean surface and obtained a window BEMD method. Wang et al. [9] presented a bidimensional ensemble empirical mode decomposition method (BEEMD) to optimize the decomposition by averaging the modes of all noise-added images, which needs very expensive time costs. In order to improve the computation efficiency of the surface interpolation-based BEMD methods, a direct envelope estimation method based on order-statistics filters [19] is proposed to generate the envelope surfaces of an input image, where the distance map between the adjacent maxima/minima is computed to determine the filter size. Trusiak et al. developed an enhanced fast empirical mode decomposition method [20] to further accelerate the computation of envelope surfaces by using morphological operations instead of order-statistics filters. In addition, it generates the filter size only by the extremum number and can avoid the time-consuming computation of the distance map between the adjacent extrema.

The aforementioned BEMD methods are designed to decompose 2D images with one data channel (single-channel images). Although they can also be applied separately to decompose each channel of a bivariate/multivariate 2D signal (two/multiple data channel), they suffer from a lot of problems (e.g. mode alignment and nonuniqueness) as illustrated in [21]. These issues limit the application of EMD in data fusion which requires the same-index IMFs containing the same-scale information. To make the extracted modes from each of the multiple 2D signals match in feature scales, Yeh [5] proposed the complex BEMD to decompose a bivariate (complex) 2D signal, which applies the standard BEMD based on surface interpolation [12] to four real-valued 2D signals to generate the complex-valued IMFs. Rehman et al. [7] reshaped a multi-channel 2D image into a multivariate 1D signal and adopted the 1D multivariate EMD [22] to obtain the multi-scale decomposition. Xia et al. [10] improved such decomposition by using n-dimensional surface projections to estimate the mean surface of a multi-channel 2D image and can reserve its spatial information. Bhuiyan et al. [23] used a direct envelope estimation method based on order-statistics filters [19] to generate the upper and lower envelopes of color images, which is less time-consuming than the surface interpolation-based BEMD methods for multi-channel images. In this paper, we obtain the envelope surfaces of a multi-channel image by using the morphological filters so as to further reduce the decomposition time, which is inspired by the enhanced fast empirical mode decomposition [20] developed for single-channel images. It can not only generate good decomposition results, but also can significantly accelerate the computation.

2.2 Image fusion

With the development of signal processing and analysis, a large number of image fusion methods have been proposed such as transform domain methods, spatial domain methods, deep learning methods, etc. Here we mainly discuss the transform domain methods which are more relevant with EMD-based image fusion. More comprehensive survey can be found in [2, 24, 25].

The transform domain methods merge the transformed coefficients of the input source images obtained by a transform, and generate the fused image by a reconstruction step with the corresponding inverse transform. Undoubtedly, the selection of transform domain plays an important role in these methods. So far, a lot of transforms have been involved to conduct image fusion, which include Laplacian pyramid [26], multi-scale geometric analysis [27,28,29,30], fast Fourier transform [31], wavelet transform [32, 33], empirical mode decomposition (EMD) [4, 5, 7, 10], etc.

Different from many classical transforms such as Fourier transform and wavelet transform using predefined basis functions, EMD is fully adaptive and data-driven. Pan and Yang [8] used the mean approximation-based BEMD to decompose the source images and then employed a standard deviation-based weighted averaging rule to fuse each component pixel by pixel. Wang et al. [9] proposed a pixel-based image fusion method based on BEEMD and an entropy-based weighted averaging rule to conduct the fusion. Qin et al. [6] used the window BEMD to obtain the decomposition and fused all IMFs and the residue by the maximum selection rule based on two saliency measures in a pixel-based manner, which may introduce some artifacts. In order to make the decompositions of different source images match in number and property, the multivariate 1D EMD [22] are used to decompose source images and each component can be merged by a variance-based weighted averaging rule pixel by pixel [7]. Xia et al. [10] adopted the multivariate BEMD based on surface projection to obtain the multi-scale decomposition, which can improve the fusion quality of the multivariate 1D EMD-based fusion method [7]. Yeh [5] proposed a pixel-based multi-focus image fusion approach according to the complex BEMD. Ahmed and Manic [4] used the order-statistics filter-based EMD [19, 23] to decompose the source images and perform the fusion by using a variance-based maximum selection rule pixel by pixel. Zhu et al. [11] employed bivariate BEMD and sparse representation to conduct the fusion of infrared-visible images. Although many attempts have been made in the research of EMD-based image fusion, they are limited by the computation efficiency of the used EMD methods and the fusion schemes. In this paper, we propose a novel and fast EMD-based image fusion method based on a fast multi-channel bidimensional EMD method and a patch and energy-based fusion strategy, which obtains better performance than the existing EMD-based image fusion approaches on several commonly used image data sets with multi-focus and multi-modal images.

Fig. 2
figure 2

Decompositions of two color multi-focus images using MF-MBEMD. a Source images. b IMF 1. c IMF 2. d IMF 3. e Residue

3 Multi-channel bidimensional EMD based on morphological filter

To our knowledge, the enhanced fast empirical mode decomposition method [20] obtains the best time performance for the decomposition of a greyscale image in the existing BEMD methods. It adopts morphological filters to implement the direct envelope estimation method based on order-statistics filters [19] and employs the mean extremum distance as the filter size to avoid the computation of the distance map between the adjacent maxima/minima. However, it is only designed for single-channel images to conduct fringe pattern processing. In this paper, we present a multi-channel bidimensional EMD method based on morphological filter (MF-MBEMD), where the relevant modifications of the enhanced fast empirical mode decomposition method [20] are given to make it be suitable for more general images with multiple channels.

The proposed MF-MBEMD employs the morphological filters with the same widow size for each channel to generate the envelope surfaces of a multi-channel image, which can extract similar spatial scale of each channel image during the decomposition. Specifically, given a multi-channel image \({\varvec{{I}}}=(I_1,...,I_n)\) with the size \(W \times H\), its upper (lower) envelope \({\varvec{{{U}}}}=(U_1,...,U_n)\) (\({\varvec{{{D}}}} = (D_1,...,D_n)\)) can be generated by

$$\begin{aligned} \begin{array}{ll} U_k(x,y)|_{k=1,...,n} = (I_k \oplus b)(x,y) = \max \limits _{(s,t)\in Z_{xy}}{I_k(s,t)},\\ D_k(x,y)|_{k=1,...,n} = (I_k \ominus b)(x,y) = \min \limits _{(s,t)\in Z_{xy}}{I_k(s,t)}, \end{array} \end{aligned}$$
(1)

where \(\oplus \) denotes the morphological expansion filter, \(\ominus \) denotes the morphological corrosion filter, \(Z_{xy}\) represents the set of pixels in the window \(w \times w\) centered on the pixel (xy), and b is a binary set indicator function on \(Z_{xy}\). The average filter is used to get much smoother envelopes by

$$\begin{aligned} \begin{array}{ll} U'_k(x,y)\big |_{k=1,...,n} = \frac{1}{w\times w}\sum \limits _{(s,t)\in Z_{xy}}{U_k(s,t)},\\ D'_k(x,y)\big |_{k=1,...,n} = \frac{1}{w\times w}\sum \limits _{(s,t)\in Z_{xy}}{D_k(s,t)}. \end{array} \end{aligned}$$
(2)

To consider the feature extraction of all data channels of the input image, we set the window size w in Eq. (1) and Eq. (2) into the following minimal average extremum distance of all image channels,

$$\begin{aligned} w = \min \{w_1,...,w_n\}, \end{aligned}$$
(3)

where \(w_k(k=1,...,n)\) denotes the average extremum distance of the k-th channel image \(I_k\) and is computed by

$$\begin{aligned} w_k = \sqrt{\frac{W \times H}{ N_k}}, \end{aligned}$$
(4)

where \(N_k\) is the average value of the numbers of all local maxima and minima of \(I_k\). We find all local maxima (minima) of \(I_k\) by comparing the values of each pixel and its neighbours in the \(3\times 3\) window centered on it in each iteration. It is different from the extraction of local extrema in the enhanced fast empirical mode decomposition method [20], where the size of extremum window is equal to the average extremum distance of the previous iteration and the number of extracted extrema is reduced. In contrast, our strategy obtains more extrema in each iteration and can extract much-finer feature scale of each channel image.

figure f

The proposed MF-MBEMD can extract the IMFs with different scales iteratively by a sifting processing based on the above envelope computation method until the specified number of IMFs is achieved or the residue is a constant or a monotonic function (see Algorithm 1). Figures 1 and 2 give the decomposition results of some multi-focus images using MF-MBEMD. As can be seen that the leading IMFs extract much finer-scale features and the trailing IMFs describe much coarser-scale features. Furthermore, the extracted feature scales of two multi-focus images are matched very well for the same-index IMFs, which is very important for the application of EMD in image fusion.

Table 1 Time performance comparison (in seconds) between our proposed MF-MBEMD and several state-of-the-art multi-channel bidimensional EMD methods in generating three IMFs for the image in Fig. 2a with different sizes

Table 1 gives the time performance comparison in generating three IMFs for the image in Fig. 2a with different sizes between our proposed MF-MBEMD and several state-of-the-art multi-channel bidimensional EMD methods. The compared methods include the multivariate BEMD based on surface projection [10], the multi-channel extension of the EMD based on bi-Laplacian operator interpolation [14] [16], and the color bidimensional EMD method based on order-statistics filters [23], which will be called SP-MBEMD, BL-MEMD, and OSF-CBEMD in this paper, respectively. Benefited from the aforementioned direct envelope estimation method via morphological filters with the mean extremum distance-based filter size, MF-MBEMD obtains the best time performance among the compared EMD methods for multi-channel images as shown in Table 1. Furthermore, as the size of image increasing, the time of MF-MBEMD increases slowly, while the time of the other methods increases very fast. The worst case is SP-MBEMD [10] cannot obtain the decomposition for the image with the size 1024\(\times \)1024 in twenty hours. Therefore MF-MBEMD can be used to improve the efficiency of EMD-based image fusion algorithms significantly.

4 Proposed EMD-based image fusion

Firstly, our EMD-based image fusion method decomposes the input source images into several IMFs and a residue by the aforementioned MF-MBEMD. Secondly, it fuses the IMFs and the residue separately by a patch-based fusion strategy with overlapping partition. An energy-based maximum selection rule is developed for the fusion of IMFs, and two different rules based on the feature information extracted by IMFs are devised for the fusion of the residue according to the image types (multi-focus image or multi-modal images) in order to get high-quality fusion results. Finally, the fused image are reconstructed by the fused IMFs and the fused residue. The whole pipeline of our method is shown in Fig. 1.

4.1 Multi-scale decomposition by MF-MBEMD and overlapping patch partition

Given two source images \(I_1\) and \(I_2\), we combine them into a two-channel image \({\varvec{{{I}}}} = (I_1,I_2)\), which can be decomposed into K IMFs with the scale from fine to coarse and a residue by MF-MBEMD,

$$\begin{aligned} {\varvec{{I}}} = \sum _{i=1}^K{{\varvec{{F}}}_i} + {\varvec{{R}}}_K, \end{aligned}$$
(5)

where \({\varvec{{F}}}_i = (F_{i1},F_{i2})(i=1,\cdots , K)\) is the i-th IMF and \({\varvec{{R}}}_K = (R_{K1},R_{K2})\) is the corresponding residue.

We divide all IMFs and the residue into many patches of size \(M \times M\) with N overlapping rows/columns. Figure 3 gives the illustration of this overlapping patch partition scheme for a \(3 \times 6\) image, where \(M = 3\) and N is set into different values. If some pixels are not contained into the partition as shown in Fig. 3c, we need to add some patches to cover such pixels (Fig. 4b). This overlapping patch strategy is designed to reduce the artifacts which may be produced around the partition boundary in the patch-based fusion methods.

4.2 Patch-based fusion for IMFs and residue

Fusion of IMFs. As for the j-th patch \({\varvec{{F}}}_i^j= (F_{i1}^j,F_{i2}^j)\) of the i-th IMF \({\varvec{{F}}}_i\), we adopt an energy-based maximum selection rule to obtain the fused patch \(G_i^j\) by comparing the energy of two corresponding patches as follows,

$$\begin{aligned} G_i^j = \left\{ \begin{array}{ll} F_{i1}^j, &{} E(F_{i1}^j) \ge E(F_{i2}^j)\\ F_{i2}^j, &{} E(F_{i1}^j) < E(F_{i2}^j)\\ \end{array} \right. . \end{aligned}$$
(6)

The energy of each patch in Eq. (6) is computed by

$$\begin{aligned} E(F_{ip}^j) = \sum _{(s,t)\in Z_j}{F_{ip}^j(s,t)^2},p=1,2, \end{aligned}$$
(7)

where \(Z_j\) represents the set of pixels in the j-th patch. This rule is used to capture the salient information of the input source images.

Fusion of residue. As for the j-th patch \({\varvec{{R}}}_K^j= (R_{K1}^j,R_{K2}^j)\) of the residue \({\varvec{{{R}}}}_K\), we design two different fusion rules based on the feature information extracted by IMFs to obtain the fusion residue patch \(H_K^j\) according to the image types (multi-focus images or multi-modal images).

The first one is an energy-based maximum selection rule which is used to fuse multi-focus images. It compares the energy of two corresponding patches in the first IMF as

$$\begin{aligned} H_K^j= \left\{ \begin{array}{ll} R_{K1}^j, &{} E(F_{11}^j) \ge E(F_{12}^j)\\ R_{K2}^j, &{} E(F_{11}^j) < E(F_{12}^j)\\ \end{array} \right. , \end{aligned}$$
(8)

where \(E(F_{1p}^j)(p=1,2)\) denotes the energy of the first IMF and can capture the finest-scale features. This fusion scheme can describe the focused region of multi-focus images effectively.

Fig. 3
figure 3

Illustration of overlapping patch partitions for a \(3 \times 6\) image (a), where the patch size M is set into 3 and the overlapping row/column number N is set into 0 in (b), 1 in (c), and 2 in (d), respectively

Fig. 4
figure 4

Processing of the boundary pixels not contained in the partition. a The partition in Fig. 3c. b The partition with an additional blue patch covering these boundary pixels

The second one is an energy-based weighted averaging rule which is used to merge multi-modal images. It uses the mean of the residue patch as a guide to fuse the brightness of each modal image, and adopts the feature information extracted by IMFs as a guide to merge the mean-separated part of the residue. The fusion formula is given by

$$\begin{aligned} \begin{array}{ll} H_K^j = \sum \limits _{p=1}^2{a_p^j}(R_{Kp}^j-\mu _{Kp}^j) + \sum \limits _{p=1}^2 {b_p^j} \mu _{Kp}^j, \\ a_{p}^j = |\sum \limits _{i=1}^K{E(F_{ip}^j})|^l \big / \big (|\sum \limits _{i=1}^K{E(F_{i1}^j}\big )|^l + |\sum \limits _{i=1}^K{E(F_{i2}^j})|^l\big ),\\ b_{p}^j = | \mu _{Kp}^j |^m \big / \big (| \mu _{K1}^j |^m + | \mu _{K2}^j |^m \big ), \end{array} \end{aligned}$$
(9)

where \(\mu _{Kp}^j\) is the mean of the j-th patch of the residue \(R_{Kp}^j,p=1,2\), l and m are two nonnegative exponent parameters to control the feature guide intensity and the brightness fusion intensity, respectively. If we set \(l = 0\) and \(m = 0\), it becomes a simple averaging of the residue of two multi-modal images. If we set \(l > 0\) and \(m > 0\), it adopts the feature information extracted by IMFs as a guide to merge the mean-separated part of the residue of each modal image, and adopts the mean of the residue patch as a guide to fuse the brightness of each modal image. A larger value of l means much stronger features are transmitted to the fusion result, and a bigger value of m means more bright targets are involved into the merged result. In our experiments, both l and m are set into 6 and can generate good results.

4.3 Multi-scale image reconstruction

Once all patches of the IMFs and residue are fused, we first get the value at each pixel (xy) of the fused IMFs and residue by averaging the values of the pixel (xy) in all overlapping patches by

$$\begin{aligned} \begin{array}{ll} G'_i(x,y)&{}=\frac{1}{S(x,y)}\sum _{j}{G_i^j(x,y)},\\ H'_K(x,y)&{}=\frac{1}{S(x,y)}\sum _{j}{H_K^j(x,y)}, \end{array} \end{aligned}$$
(10)

where S(xy) denotes the overlapping patch number at the pixel (xy), and then we add the fused IMFs and the fused residue together to obtain the final fused image \(I'\) by

$$\begin{aligned} I'(x,y) = \sum _{i=1}^K{G'_i}(x,y) + H'_K(x,y). \end{aligned}$$
(11)
Fig. 5
figure 5

Fusion results of two greyscale multi-focus images generated by the pixel-based strategy and our patch-based strategy. a Source image 1. b Source image 2. c The pixel-based strategy based on our energy-based maximum selection rule. d Our fusion strategy based on overlapping patch division

4.4 Implementation

We implement our newly proposed EMD-based image fusion method effectively by accumulating the fused value at each pixel patch by patch (Algorithm 2). Specifically, we first set the initial value at each pixel (xy) of the fused IMFs, the fused residue, and the overlapping patch number by \(G'_i(x,y)|_{i=1,...,K} = 0\), \(H'_K(x,y) = 0\), and \(S(x,y) = 0\). After each patch is merged, we update the fused values by

$$\begin{aligned} G'_i(x,y) \rightarrow G'_i(x,y) + G_i^j(x,y), \end{aligned}$$
(12)

and

$$\begin{aligned} H'_K(x,y) \rightarrow H'_K(x,y) + H_K^j(x,y), \end{aligned}$$
(13)

and update the overlapping patch number at each pixel of the j-th patch by

$$\begin{aligned} S(x,y) \rightarrow S(x,y) + 1. \end{aligned}$$
(14)

Once all patches of K IMFs and the residue are fused, we get the final fused IMFs and residue at each pixel by

$$\begin{aligned} \begin{array}{ll} G'_i(x,y)&{}\rightarrow \frac{1}{S(x,y)}G'_i(x,y),\\ H'_K(x,y)&{}\rightarrow \frac{1}{S(x,y)}H'_K(x,y). \end{array} \end{aligned}$$
(15)

Our fusion strategy based on overlapping patch partition can reduce spatial artifacts introduced by the noisy characteristics of the pixel-wise maps in the pixel-based fusion methods. Figure 5 gives a comparison of these two fusion strategies based on our MF-MBEMD, where the pixel-based scheme obtains the fused IMFs and the fused residue pixel by pixel by comparing the IMFs’ energy of two patches centered at each pixel according to Eqs. (6) and  (8), respectively. It can be seen that some spatial artifacts have been produced by this pixel-based fusion scheme in Fig. 5c, while they cannot be observed in our fusion strategy based on overlapping patch division (Fig. 5d).

figure g

5 Experiments and discussions

In this section, we first introduce the fusion image data sets used in our experiments and give the used objective metrics for evaluating the algorithm. Then, we discuss how to select the main parameters of our method. Afterwards, we make various comparison to further illustrate the effectiveness of the proposed algorithm. All experimental results shown in this paper are carried out by MATLAB 2019 on a Laptop with Inter Core (TM) i7 CPU and 16.0 GB RAM.

Fig. 6
figure 6

Four data sets in our experiments. a Greyscale multi-focus set. b Color multi-focus set. c Medical multi-modal set. d Infrared-visible multi-modal set

Fig. 7
figure 7

Fusion results with different overlapping number N of rows/columns. a Source image 1. b Source image 1. c \(N = 0\). d \(N = M/6\). e \(N = M/2\). f \(N = M-2\). M is set into 31 in the first row and is set into 15 in the second row

Fig. 8
figure 8

The average values of ten fusion metrics for four data sets with the patch size varying from 1 to 50. a Greyscale multi-focus set. b Color multi-focus set. c Medical multi-modal set. d Infrared-visible multi-modal set

Fig. 9
figure 9

Comparison of EMD-based image fusion methods on two multi-focus images (a). The top results in (b), (c), (d) and (e) are generated in a pixel-based manner using SP-MBEMD [10] with the fusion rule in [10], BL-MEMD [16] with the fusion rule in [8], OSF-CBEMD [23] with the fusion rule in [4], and the proposed MF-MBEMD with the fusion rule in [9], respectively. The bottom results in (b), (c), (d) and (e) are obtained in a patch-based manner by the corresponding EMD methods in the top row and our fusion strategy

5.1 Data sets and objective metrics

Many widely-used multi-focus imagesFootnote 1 and multi-modal imagesFootnote 2Footnote 3 in the literature are selected to assess the performance of image fusion (Fig. 6). The multi-focus images are divided into two data sets including a greyscale set with 10 pairs of greyscale images and a color set with 20 pairs of color images, which are used to evaluate the performance of the state-of-the-art methods in a recent survey of multi-focus image fusion [25]. The multi-modal images are also divided into two data sets including a medical image set with 16 pairs of medical images and an infrared-visible image set with 16 pairs of infrared-visible images, which almost cover all tested multi-modal images in [26, 30, 34].

Fig. 10
figure 10

Comparison of EMD-based image fusion methods on two multi-focus images (a). The top results in (b), (c), (d) and (e) are generated in a pixel-based manner using SP-MBEMD [10] with the fusion rule in [10], BL-MEMD [16] with the fusion rule in [8], OSF-CBEMD [23] with the fusion rule in [4], and the proposed MF-MBEMD with the fusion rule in [9], respectively. The bottom results in (b), (c), (d) and (e) are obtained in a patch-based manner by the corresponding EMD methods in the top row and our fusion strategy

In order to measure the fusion quality of our method objectively, ten widely used fusion metrics are employed including mutual information \(Q_{MI}\) [25], feature mutual information \(Q_{FMI}\) [35], the nonlinear correlation information entropy \(Q_{NCIE}\) [25], the gradient-based metric \(Q_G\) [36], the phase congruency-based metric \(Q_P\) [25], the structural similarity-based metric \(Q_E\) [25] and \(Q_Y\) [37], the human perception inspired metric \(Q_{CB}\) and \(Q_{CV}\) [25], and visual information fidelity \(Q_{VIF}\) [38]. Larger values represent better fusion qualities for \(Q_{MI}\), \(Q_{FMI}\), \(Q_{NCIE}\), \(Q_G\), \(Q_P\), \(Q_E\), \(Q_Y\), \(Q_{CB}\) and \(Q_{VIF}\), while smaller values represent better fusion qualities for \(Q_{CV}\).

5.2 Selection of main parameters

Decomposition level K of MF-MBEMD. We use the energy-based maximum selection rule to fuse all IMFs in order to extract the salient information of the source images. The decomposition level K of MF-MBEMD is set into 1 for multi-focus images since the first IMF can effectively capture the focused region of the multi-focus images, while K is set into 2 for multi-modal images considering that the salient information of the source images concentrates on the leading two IMFs of MF-MBEMD in the tested multi-modal image sets.

Overlapping number N of rows/columns. Usually, a much larger value of the overlapping number N of rows/columns in the patch division can reduce the artifacts of our method as shown in Fig. 7, but it increases the computation cost at the same time. In our experiments, N is set into M/6 for multi-focus data sets, and \(M-2\) for multi-modal data sets, where M is the block size of the division. A larger number of experiments demonstrate such selections can produce satisfactory fusion results.

Block size M of the division. In order to select a proper M, we conduct the fusion of each data set with M varying from 1 to 50 to evaluate the fusion performance by the average values of ten fusion metrics (Fig. 8), where the overlapping number N of rows/columns is set into M/6 for multi-focus data sets, and \(M-2\) for multi-modal data sets. In our experiments, we have obtained good results by selecting \(M = 31\) for the greyscale multi-focus set, \(M = 15\) for the color multi-focus set, \(M = 15\) for the medical multi-modal set, and \(M = 33\) for the infrared-visible multi-modal set.

5.3 Comparisons

Comparison with EMD-based image fusion methods. There are two factors to determine the fusion quality of EMD-based image fusion methods which include the used EMD method and the used fusion strategy. In order to showcase the advantage of our newly proposed EMD-based image fusion method, we compare the fusion results generated by SP-MBEMD [10], BL-MEMD [16], OSF-CBEMD [23], and the proposed MF-MBEMD with the popular fusion strategies in a pixel-based manner [4] [8] [9] [10] and a patch-based manner in this paper (Fig. 9, Fig. 10 and Table 2). The weighted averaging rule based on variance [10], standard deviation [8] and entropy [9] in a pixel-wise manner cannot capture the salient information of input source images well and produce fuzzy fusion results as shown in the top of Fig. 9b, c, e. In contrast, the variance-based maximum selection rule [4] in a pixel-wise manner can obtain much clearer results, but it also generate some artifacts, as shown in the top of Fig. 9d and Fig. 10d. Compared with the pixel-based fusion strategies in the existing EMD-based image fusion methods [4] [8] [9] [10], the fusion strategy based on overlapping patch division proposed in this paper can improve the fusion quality of each EMD method in visualization and objective metrics (Fig. 9, Fig. 10 and Table 2), since the energy-based maximum selection rule for the fusion of each IMF can capture more salient information, and the patched-based strategy reduces the artifacts introduced by the pixel-wise fusion manner at the same time. Furthermore, as for the fusion of the residue, the activity level generated by the first IMF can more effectively capture the focused region of the multi-focus images, and the energy-based weighted averaging rule generated by the extracted IMFs can capture more structure information of the multi-modal images. The visual results of four BEMD methods under our fusion strategy are close according to the observation in Figs. 9 and 10. When considering all fusion objective metrics together (Table 2), MF-MBEMD and OSF-CBEMD [23] generate better results than SP-MBEMD [10] and BL-MEMD [16]. In terms of time performance, SP-MBEMD [10] is extremely time-consuming, while MF-MBEMD obtains the fastest speed among the four EMD methods as reported in Table 2.

Table 2 Comparison of EMD-based image fusion methods on the objective metrics and the average running time for the data sets in Fig. 6
Fig. 11
figure 11

Comparison with non-EMD-based methods for the fusion of two color multi-focus images. a Source image 1. b Source image 2. c CVT [29]. d DTCWT [33]. e NSCT [28]. f NSCT-SR [30]. g BRW [39]. h MADCNN [40]. i IFCNN [34]. j Ours

Fig. 12
figure 12

Comparison with non-EMD-based methods for the fusion of an infrared image and a visible image. a Source image 1. b Source image 2. c CVT [29]. d DTCWT [33]. e NSCT-SR [30]. f NSCT-PCLE [41]. g MISF [42]. h MSID [43]. i IFCNN [34]. j Ours

Comparison with non-EMD-based image fusion methods. We also compare our method with many state-of-the-art non-EMD-based methods including six transform domain methods (CVT [29], DTCWT [33], NSCT [28], NSCT-SR [30], NSCT-PCLE [41], IMA [26]), three spatial domain methods (BRW [39], MISF [42], MSID [43] and two deep learning method (MADCNN [40], IFCNN [34]. The source code of these methods are publicly available online. Among them, CVT [29], DTCWT [33], NSCT-SR [30] and IFCNN [34] can be used for various types of images. NSCT [28], BRW [39] and MADCNN [40] are designed for multi-focus images. NSCT-PCLE [41], MISF [42] IMA [26] and MSID [43]) are developed for multi-modal images. Therefore, we conduct our comparison experiments to take care of the preference of each method in the image types. Fig. 11 shows the fusion results of the compared methods for two color multi-focus images. As can be found that CVT [29], DTCWT [33], NSCT [28], NSCT-SR [30] and BRW [39] suffer from the visible ghosting artifact around the hat, while MADCNN [40], IFCNN [34] and our method can reduce this effect. Figure 12 gives the fusion results of the compared methods for an infrared image and a visible image. It can be seen that NSCT-SR [30], NSCT-PCLE [41] and MISF [42] produce some obvious artifacts. CVT [29] and DTCWT [33] cannot incorporate the bright target of the infrared image into the fusion result well, and IFCNN [34] loses some detail features of the visible image. In contrast, MSID [43] and our method obtain better visual results than the other methods. Table 3 lists the objective metrics of the compared fusion methods on the tested data sets in Fig. 6. It can be seen that our method and BRW [39] have some advantages on most objective metrics in greyscale and color multi-focus set, respectively. As for multi-modal images, MISF [42] obtains the best performance on most objective metrics, while our method can compare favorably with the other methods when considering all metrics together. In terms of time performance, we observe our method is much faster than the methods listed the running time in the fusion of multi-focus images as shown in Table 3. Although the efficiency of our method is reduced in the fusion of multi-modal images since a larger overlapping number are used, it only needs one second around to complete the fusion of two input images on average.

Table 3 Comparison with non-EMD-based image fusion methods on the objective metrics and the average running time for the data sets in Fig. 6

6 Conclusion

In this paper, we have presented a novel and fast EMD-based image fusion method via morphological filter in order to generate high-quality fusion images. A multi-channel bidimensional EMD method based on morphological filter (MF-MBEMD) is first developed to decompose the input images into several IMFs with different scales and a residue, which uses the morphological expansion and erosion filters to compute the upper and lower envelopes of a multi-channel image. It can significantly improve the computation efficiency of EMD-based image fusion techniques. And then, a patch-based fusion strategy with overlapping partition is adopted to instead the pixel-based fusion method commonly used in EMD-based image fusion, where an energy-based maximum selection rule is designed to fuse the IMFs, and the feature information extracted by IMFs is used as a guide to merge the residue. Finally, the final result is generated by adding all fused IMFs and fused residue together. Our newly proposed EMD-based image fusion method can be implemented effectively by accumulating the fused values at each pixel patch by patch. The performance evaluation of the EMD-based image fusion methods on several commonly used data sets with multi-focus and multi-modal images shows that our newly proposed image fusion method obtains better results. Furthermore, a large number of comparative experiments have also demonstrated our method is very competitive with the state-of-the-art image fusion methods in visualization, objective metrics, and time performance.

It is still possible to continue to improve the performance of our method in the following aspects. Our method can reduce the artifacts in the fusion of boundary regions of multi-focus images as illustrated in Fig. 11. However, the fusion of boundary regions is still a challenge especially for the regions with irregular shapes, which is an open problem in the multi-focus image fusion as introduced in a recent survey of this topic [25]. In addition, our method treats each patch separately in the fusion processing, which results in the loss of detail features in the fusion of multi-modal images. The relationship of different patches should be considered in order to improve the fusion quality.