1 Introduction

Image fusion is an effective tool in many theoretical research and practical applications [1, 2, 9, 15]. Because of the limitation of imaging mechanism, any single imaging sensor can hardly provide a comprehensive presentation of a scene. Consequently, image fusion can be employed to combine images captured by several sensors from the same scene to generate an informative image. The attractive feature of image fusion is that the fused image is more valuable than the sum of images, because image fusion can abstract complementary information from each source image and eliminate redundant information.

Recent years have witnessed an increase in the number of image fusion algorithms, among which multi-scale image fusion accounts for a large percentage. Such kind of fusion algorithm decomposes source images into several layers with different morphological components first and then designs different fusion rules for different layers; finally fused coefficients are reconstructed to get the fused image. Until now, many multi-scale image presentation models have been utilized in image fusion. Pyramid family (including Laplacian, contrast, ratio pyramid and the like) are widely used in early stage. Then wavelets are brought into this field [9, 15]. As wavelets are well established in the scientific community with strict theoretical foundation and get more directional information, they perform better than Pyramid based algorithm. Recently, several geometric extensions of wavelets are introduced into image fusion, such as Ripplet transform [2], Ridgelet transform [1], Shearlet Transform [14], Curvelet transform [23] and Contourlet transform [10, 22].

However, multi-scale image representation models use fixed filters for any image, which makes it impossible to guarantee that the corresponding filters are the optimal ones to represent an image. A better alternative is adaptive image representation where the filters are generated according to the characteristics of images. In [7], Huang et al. proposed a new signal analysis method based on empirical mode decomposition (EMD) where the aim of the method is to find narrow band oscillatory components called intrinsic mode functions (IMFs). The EMD method is fully driven by signals, and it needs no prior assumptions. The method has been applied to image fusion [6, 13]. However, there are two basic difficulties in applying EMD on this field. The first one is its lack of mathematical theory; and the second one is that it does not ensure that decompositions of different source images are matched (the corresponding sub-bands of all the source images share the same morphological components), which makes it meaningless for multi-scale comparison [13]. In literature [3], the authors presented a new signal analysis method called empirical wavelet transform (EWT). It can separate input signals into amplitude modulated-frequency modulated (AM-FM) components by building an adaptive wavelet tight frame. Compared with EMD, EWT possesses a strict theoretical foundation. Hence, it performs better than EMD in theory. However, as EWT generates wavelets according to data to be processed, it cannot guarantee to generate the same wavelets for all the source images, and consequently different source images may be projected into different basis sets. As a result, it is nearly impossible to design proper fusion rules.

With the aim at solving the problems mentioned above, a straightforward extension of EWT is invented, which is named as Simultaneous Empirical Wavelet Transform (SEWT). With it, all the source images can be projected into the same basis set which is generated according to the source images themselves. Meanwhile, we propose a novel image fusion method based on SEWT, abbreviated as SEWTF.

The reminder of the paper is constructed as follows: in Section 2, the theory of SEWT is explained; in Section 3, the proposed image fusion algorithm is presented in detail; Section 4 presents the experimental results and discussions; finally, conclusions are drawn in the Section 5.

2 Simultaneous empirical wavelet transform

Wavelets are useful tools for signal (images) analysis. However, their wavelet base sets are fixed, which makes it impossible for us to design a filter that is optimal to represent any signal, and consequently it affects the results of signal analysis. So it is good to design different filter sets according to the signal to be processed, and decompose signals with their optimal filter sets. In literature [3], the authors have done a pioneering job where EWT is proposed with the aim at extracting amplitude modulated-frequency modulated components of an input signal. EWT can build a wavelet tight frame corresponding to an adaptive filter bank.

However, signal decomposition has been widely used in some cases where several signals need to be processed at the same time, for example, signal comparison, image fusion and the like. In this case, EWT cannot be used directly, because its filter bank is not fixed but determined by the processed signal. Different signals may produce quite different filter banks, and then these signals unavoidably be projected into different wavelet spaces. Adaptive filter bank design may be an attractive feature of EWT in some cases, but it brings us much trouble in the case of simultaneous signal decomposition.

In order to address the problem mentioned above, we present a EWT’s straightforward revision named as Simultaneous Empirical Wavelet Transform (SEWT). The algorithm uses the same filter bank to process all the signals that need to be processed at the same time. In this way, signals can be projected into the same wavelet basis without the loss of adaptivity of filter bank. 1D SEWT and 2D SEWT are presented in subsection 2.1 and subsection 2.2, respectively.

2.1 1D SEWT

Traditional EWT algorithm contains three steps [4]: first the processed signal is transformed into Fourier domain; then find out the Fourier supports and build a tight frame set of wavelets based on them; finally the signal is decomposed with the assistance of the empirical wavelets. In order to get the same tight frame set of wavelets for several signals, we need to find Fourier supports based on all the signals rather than one single signal.

Let f i (t) (i = 1,…,N i ) denote the processed signals, and \( {\mathfrak{F}}_t\left({f}_i\right)\left(\omega \right) \) (i = 1,…,N i ) denote the Fourier transform of f i (t), namely \( {\mathfrak{F}}_t\left({f}_i\right)\left(\omega \right)={\displaystyle {\int}_{-\infty}^{+\infty }{f}_i(t)}{e}^{-iwt}dt \). Based on \( \left|{\mathfrak{F}}_t\left({f}_i\right)\right|\left(\omega \right) \), we can detect Fourier supports which can cut the signal into adjacent intervals. Fourier supports for each signal are detected by finding the local lowest minima. For the Fourier spectrum \( {\mathfrak{F}}_t\left({f}_i\right)\left(\omega \right) \), assume that it contains N local maximum ∇ n i (n = 1, …, N), and its location is indexed by \( {\widehat{\omega}}_i^n\left(n=1,\dots, N\right) \). Then we can get at least one local minima in the interval [\( {\widehat{\omega}}_i^{n-1} \), \( {\widehat{\omega}}_i^n \)] which is denoted as Δ n i . The location of these local minima ω n i (n = 1, …, N) is given by

$$ {\omega}_i^n=\left\{\begin{array}{ll}0,\hfill & i=0\hfill \\ {}\underset{\omega }{ \arg } \min {\varDelta}_i^n,\hfill & 0<i<{N}_i\hfill \\ {}\pi, \hfill & i={N}_i\hfill \end{array}\right.. $$
(1)

From the procedure discussed above, it is straightforward to see that different signals may generate quite different Fourier supports. However, in order to generate the same tight frame set, we need to get a unique one, here we set \( \omega ={\displaystyle \underset{i=1}{\overset{K}{\cup }}{\omega}_i} \). Based on the set ω, we can define a wavelet tight frame, B = {ϕ 1(t), {φ m (t)} M − 1 m = 1 }, where ϕ 1(ω) refers to empirical wavelets, and {φ m (t)} M − 1 m = 1 denotes the scaling function. Following the setting in [4], we can get the Fourier transforms of B using Meyer’s and Littlewood-Paley, which are denoted as \( \left\{{\mathfrak{F}}_t\left({\phi}_1\right)\left(\omega \right),{\left\{{\mathfrak{F}}_t\left({\varphi}_m\right)\left(\omega \right)\right\}}_{m=1}^{M-1}\right\} \).

Once the wavelet tight frame is determined, we can get the approximation layer for each of signal

$$ \mathcal{W}\left({f}_i\right)\left(0,t\right)={\mathfrak{F}}_{\omega}^{-1}\left({\mathfrak{F}}_t\left({f}_i\right)\left(\omega \right)\overline{{\mathfrak{F}}_t\left({\phi}_1\right)\left(\omega \right)}\right) $$
(2)

and the detail layers for each signal

$$ \mathcal{W}\left({f}_i\right)\left(m,t\right)={\mathfrak{F}}_{\omega}^{-1}\left({\mathfrak{F}}_t\left({f}_i\right)\left(\omega \right)\overline{{\mathfrak{F}}_t\left({\varphi}_{\mathrm{m}}\right)\left(\omega \right)}\right). $$
(3)

where \( {\mathfrak{F}}_{\omega}^{-1} \) denotes the inverse Fourier Transform.

The inverse 1D Simultaneous Empirical Wavelet Transform can be defined by

$$ {f}_i(t)=\mathcal{W}\left({f}_i\right)\left(0,t\right)\circ \left({\phi}_1\right)(t)+{\displaystyle \sum_{m=1}^{M-1}\mathcal{W}\left({f}_i\right)\left(m,t\right)\circ {\varphi}_m(t)} $$
(4)

where ° is a convolution operator.

2.2 2D SEWT

In the real world, we need to process 2D signals. So it is of great significance to extend the newly proposed SEWT to 2D space. One straightforward alternative is to reshape the signal into 1D row by row or column by column; then performs 1D SEWT on the reshaped signal; and finally 1D approximation layer and detail layers are mapped into 2D space. However, such process may lose some vital spatial information among neighboring pixels. Here, 2D Littlewood-Paley wavelet transform is used. The transform can be regarded as a filter bank obtained in the Fourier domain on annuli supports for filtering images.

Let I i (p) (i = 1,…,N i ) denote the processed images (2D signals) with p representing pixel locations, and \( {\mathfrak{F}}_P\left({I}_i\right)\left(\theta, \left|\omega \right|\right) \) denotes Pseudo-Polar Fourier Transform where θ refers to angle of Fourier Transform. For any image I i , each θ produces a \( {\mathfrak{F}}_P\left({I}_i\right)\left(\theta, \left|\omega \right|\right) \). Before detecting Fourier boundary for each image, we need to compute the average spectrum with respect to θ. Namely,

$$ \mathfrak{F}\left({I}_i\right)\left(\left|\omega \right|\right)=\frac{1}{N_{\theta }}{\displaystyle \sum_{k=1}^{N_{\theta }}{\mathfrak{F}}_P\left({I}_i\right)\left({\theta}_k,\left|\omega \right|\right)}. $$
(5)

The Fourier spectrum will indirectly influence the empirical Littlewood-Paley wavelets, so it should take into account all the images that need to be decomposed. Here, we define an average-average spectrum with respect to i:

$$ \mathfrak{F}\left(\left|\omega \right|\right)=\frac{1}{N_i}{\displaystyle \sum_{i=1}^{N_i}\mathfrak{F}\left({I}_i\right)\left(\left|\omega \right|\right)=\frac{1}{N_i{N}_{\theta }}{\displaystyle \sum_{i=1}^{N_i}{\displaystyle \sum_{l=1}^{N_{\theta }}{\mathfrak{F}}_P\left({I}_i\right)\left({\theta}_l,\left|\omega \right|\right)}}}. $$
(6)

Fourier boundary detection can be conducted on the average-average spectrum in Eq. (6), and the detection results are kept in ω m i (m = 1, …, M). Then build a set of 2D Simultaneous Empirical Wavelets B = {ϕ 1(p), {φ m (p)} M − 1 m = 1 }, where ϕ 1(p) denotes 2D empirical wavelets, and {φ m (p)} M − 1 m = 1 denotes the 2D scaling function. The approximation layer of image I i (p) is defined as

$$ \mathcal{W}\left({I}_i\right)\left(0,\mathbf{p}\right)={\mathfrak{F}}_{\omega}^{-1}\left({\mathfrak{F}}_P\left({f}_i\right)\left(\omega \right)\overline{{\mathfrak{F}}_{\mathbf{p}}\left({\phi}_1\right)\left(\omega \right)}\right), $$
(7)

and the detail layers of image I i (p) are defined as

$$ \mathcal{W}\left({I}_i\right)\left(m,\mathbf{p}\right)={\mathfrak{F}}_P^{-1}\left({\mathfrak{F}}_{\mathbf{p}}\left({I}_i\right)\left(\omega \right)\overline{{\mathfrak{F}}_{\mathbf{p}}\left({\varphi}_{\mathrm{m}}\right)\left(\omega \right)}\right). $$
(8)

The inverse 2D Simultaneous Empirical Wavelet Transform can be defined by

$$ {I}_i\left(\mathbf{p}\right)=\mathcal{W}\left({I}_i\right)\left(0,\mathbf{p}\right)\circ \left({\phi}_1\right)\left(\mathbf{p}\right)+{\displaystyle \sum_{m=1}^{M-1}\mathcal{W}\left({I}_i\right)\left(m,\mathbf{p}\right)\circ {\varphi}_m\left(\mathbf{p}\right)}. $$
(9)

where ° is the 2D convolution operator.

EWT in [4] and SEWT presented in this paper can be regarded as extensions of traditional Wavelet Transform (WT). In addition, EWT and SEWT share the same mathematical theory of WT. The difference of EWT and SEWT with WT is that their wavelet functions are not fixed for all the images but generated according to each signal to be processed. Hence, EWT and SEWT also contain all the excellent properties of WT, including linear addition characteristic, time-shift invariance, multi-scale analysis, tight frame, and so on.

3 The proposed fusion algorithm

3.1 Framework

The image fusion method based on 2D Simultaneous Empirical Wavelet Transform (SEWTF) can be summarized in Fig. 1. The proposed SEWTF algorithm contains no registration steps, so it is assumed that the source images are well registered. If the source images are not geometrically aligned in practice, users need to pre-process them using the registration methods such as [12, 18] before running this fusion algorithm.

Fig. 1
figure 1

Framework of the image fusion method based on 2D Simultaneous Empirical Wavelet Transform

In this paper, let I A and I B denote the well registered source images with the size of M × N, and I F denotes the fused image. The fusion framework takes the following three steps,

  1. (1)

    Based on the two source images I A and I B , build a set of 2D Simultaneous Empirical Wavelets B = {ϕ 1, {φ i } L − 1 i = 1 }. Decompose the source images I A and I B by 2D SEWT into subbands \( {\mathcal{W}}_A=\left\{{\mathcal{W}}_A^0,{\mathcal{W}}_A^1,\cdots, {\mathcal{W}}_A^L\right\} \) and \( {\mathcal{W}}_B=\left\{{\mathcal{W}}_B^0,{\mathcal{W}}_B^1,\cdots, {\mathcal{W}}_B^L\right\} \) using Eqs. (7) and (8) respectively. Here, \( {\mathcal{W}}_{*}^0 \) refers to the approximation layer and \( {\mathcal{W}}_{*}^1,\cdots, {\mathcal{W}}_{*}^L \) represent the detail layers.

  2. (2)

    With the assistance of the designed fusion rules, the approximation layer and detail layers are combined to get the fused layers \( {\mathcal{W}}_F=\left\{{\mathcal{W}}_F^0,{\mathcal{W}}_F^1,\cdots, {\mathcal{W}}_F^L\right\} \). Let r = {r 0, {r i } L i = 1 } denote the fusion map set where r 0 is the fusion map for approximation layer, and {r i } L i = 1 are the ones for detail layers. Then we have

    $$ {\mathcal{W}}_F^0={r}_0{\mathcal{W}}_A^0+\left(1-{r}_0\right){\mathcal{W}}_B^0 $$
    (10)

    and

    $$ \begin{array}{ll}{\mathcal{W}}_F^i={r}_i{\mathcal{W}}_A^i+\left(1-{r}_i\right){\mathcal{W}}_B^i\hfill & i=1,\dots, L\hfill \end{array} $$
    (11)
  3. (3)

    Conduct inverse 2D SEWT on \( {\mathcal{W}}_F \) to obtain the final fused image I F using Eq. (9).

In the proposed fusion algorithm, 2D Simultaneous Empirical Wavelet Transform plays a vital rule. It can decompose each source image into several subbands with different morphological components. In addition, the 2D wavelets are generated adaptively according to the source images themselves, which guarantees that the wavelets are the optimal ones for decomposing these images. Note that each group of source images will generate a tight frame set of wavelets.

Figure 2 displays an example of 2D SEWT: Fig. 2a shows a pair of source images (a CT image and an MRI image), their decomposed layers by 2D SEWT are listed in Fig. 2b. The tight frame set of wavelets is determined adaptively according to the two source images shown in Fig. 2a.

Fig. 2
figure 2

An illustration of 2D SEWT: (a) source images; (b) decomposed layers

The proposed SEWTF algorithm can be easily extended to fuse an image sequence which contains more than two source images. First, two source images are randomly selected for fusion using SEWTF algorithm, and then the fused image is combined with another source image using SEWTF; repeat such operation until all the source images are fused. Finally, we can get the fused image of the image sequence.

3.2 Fusion rules

2D SEWT can decompose each source image into one approximation layer and several detail layers. The approximation layer preserves the basic visual information of the source images, such as intensity level and structures; while the detail layers capture high frequency information, such as strong edges and textures. So it is necessary to design different fusion rules for the two kinds of layers in a bid to abstract salient visual information instead of implicit information from each source image.

Theory on biological vision shows that human vision system is sensitive to contrast in visual signal, so we need to preserve the visual information in all the layers of each source images, though the approximation layer contains little such information. Here, a global contrast measure is used to predict the contrast of pixels in approximation layer of each source image

$$ \mathcal{S}\left({\mathcal{W}}_{*}^0\left(\mathbf{p}\right)\right)=\frac{1}{M\times N}{\displaystyle \sum_{w\left(\mathbf{q}\right)\in {\mathcal{W}}_{*}^0}D\left({\mathcal{W}}_{*}^0\left(\mathbf{p}\right),{\mathcal{W}}_{*}^0\left(\mathbf{q}\right)\right)}, $$
(12)

where D(W 0* (p), W 0* (q)) refers to the intensity distance between coefficients at p and q, namely, D(W 0* (p), W 0* (q)) = |W 0* (p) ‐ W 0* (q)|. The fusion rule for approximation coefficients is defined as such a sigmoid function

$$ {r}_0=\frac{1}{1+{e}^{-a\left(\mathcal{S}\left({\mathcal{W}}_A^0\right)-\mathcal{S}\left({\mathcal{W}}_B^0\right)\right)}}, $$
(13)

where a is a positive constant that determines how threshold-like the sigmoid function is. Namely, if the parameter is assigned to be a high value, r 0 shall be very close to the following threshold function

$$ {r}_0=\left\{\begin{array}{ll}1,\hfill & \mathcal{S}\left({\mathcal{W}}_A^0\right)>\mathcal{S}\left({\mathcal{W}}_B^0\right)\hfill \\ {}0,\hfill & otherwise\hfill \end{array}\right.. $$
(14)

However, if a is close to zero, r 0 is close to 0.5. It is straightforward to see that sigmoid function is a tradeoff of averaging and maximum selection. Figure 3 displays a sigmoid function and a threshold function. In this algorithm, we choose the sigmoid function instead of the threshold function because the former one is smooth and continuous, which makes the fused images smoother and avoids blocking artifacts effectively.

Fig. 3
figure 3

Sigmoid function and threshold function

For detail layers, as they contain high frequency information of source images, we adopt absolute maximum selection as their fusion rules. In images, high frequency always corresponds to salient information, such as edges or textures. This fusion rule, on the one hand, is conducive to preserve salient information in final fused images; and on the other hand, performs well in terms of computation complexity. The fusion rule for detail layers is formally defined as,

$$ \begin{array}{ll}{r}_i=\left\{\begin{array}{l}1,\kern1em \left|{\mathcal{W}}_A^i\right|>\left|{\mathcal{W}}_B^i\right|\\ {}0,\kern1.12em otherwise\end{array}\right.\hfill & \left(i=1,\dots, L\right)\hfill \end{array} $$
(15)

Once getting the fusion map set r = {r 0, {r i } L i = 1 }, we can get a fused approximation layer and L detail layers using Eqs. (10) and (11).

4 Experimental results and discussions

4.1 Experimental settings

In order to examine the performance of the proposed fusion algorithm, experiments on two types of images are performed, namely multi-modal medical images and Visible-IR images. In addition, medical images cover CT (Computed Tomography), MRI (Magnetic Resonance Imaging), CBF (Cerebral Blood Flow), and SPECT (Single-Photon Emission Computed Tomography). In the experiments, medical images come from two websites (http://www.imagefusion.org) and (http://www.med.harvard.edu/AANLIB/home.html). Visible-IR images are downloaded from Dr. Liu’s Homepage (http://home.ustc.edu.cn/~liuyu1/).

The proposed fusion algorithm is compared with five popular methods: Laplacian Pyramid based algorithm (LP), Discrete wavelet transform based algorithm (DWT), Nonsubsampled contourlet transform based algorithm (NSCT), Guided filtering based fusion algorithm (GFF) [8], and the algorithm proposed by Liu et al. (LIU) [11]. In the fusion algorithms LP, DWT, NSCT, source images are decomposed into four layers. The coefficients in approximation layers are combined by averaging, and the ones in detail layers are combined by selecting maximum absolute values. The settings of the algorithms GFF and LIU are the default ones in the provided code.

To quantitatively compare the fusion performance of the proposed algorithm with others, five quality measures are adopted in the experiments, including Q F AB [16], two Piella measures (Q 0 , Q E ) [17], and VIFF [5]. Q F AB evaluates how much edge information are transferred from source images to their fused one; Both Q 0 and Q E , constructed on the Structural Similarity theory [20, 21], measure similarity of fused images and source images from the aspects of correlation, luminance and contrast. VIFF measures the quality of fused images using visual information fidelity. For all the measures, a higher value corresponds to a better fusion quality.

4.2 Parameter selections

There is only one parameter that should be determined, namely a in Eq. (13). It controls the sharpness of the sigmoid function. As discussed above, a = 0 corresponds to r 0 = 0.5 which means that approximation layer of the fused image is obtained by averaging; a = + ∞ corresponds to an indicator function which means that approximation layer of the fused image is obtained by selecting maximum global contrast. It is clear that neither 0 nor + ∞ is the optimal value for the parameter a. In order to tackle this problem, an empirical method is used.

Six pairs of source images shown in Fig. 4 are adopted here to empirically investigate the influence of the parameter a on the performance of the proposed SEWTF algorithm. First, these source images are fused by the proposed method with different parameter setting. Then, the objective evaluation measures are calculated. The results with respect to the different parameter setting are displayed in Fig. 5. In the figure, x-axis corresponds to a’s value, and y-axis corresponds to the scores predicted by objective evaluation measures, including Q F AB , two Piella measures (Q 0 , Q E ), and VIFF. For all the measures, larger scores indicate better fused results. Based on the curves in Fig. 5, it can be seen that the performance of proposed method is the worst, when the parameter a is set to be zero. It suggests that averaging coefficients in approximation layers is not a proper fusion rule. Meanwhile, we also note that there are nearly no difference in terms of the performance of proposed method when a varies from 1 to 5. We also check the performance of the proposed algorithm when the parameter a is distributed in the interval of [10, 100], the performance is the same with that in the case of a varying from 1 to 5.

Fig. 4
figure 4

Source images for parameter selections

Fig. 5
figure 5

The relationship of the metric with the parameter a within the interval of [0, 5]

Figure 6 shows the relationship of the metric with the parameter a with the step of 0.2 in the interval of [0.1, 0.9]. Here, we note that Q F AB and Q 0 suggest that a larger a may be conducive to produce a better fusion, while, Q E and VIFF make the contrary conclusion. So in our experiments, we set a as 0.5. In this case, the proposed algorithm can get satisfactory fusion performance.

Fig. 6
figure 6

The relationship of the metric with the parameter a within the interval of [0.1, 0.9]

Note that in the proposed fusion algorithm, there is only one time decomposition and reconstruction processing (DRP). Here, we test whether more times DRP are needed. The six groups of source images shown in Fig. 4 are also adopted to check the influence of the time of DRP on the performance of the proposed SEWTF algorithm. The objective evaluation measures are employed to compare the fused images. The results with respect to the different time of DRP are displayed in Fig. 7. The figure reveals that the scores given by evaluation measures Q 0 , Q E and VIFF do not increase with time of DRP, which means that more times DRP does not result in better fusion. In addition, more times DRP need more computation time. Hence, in the following experiments, there is only one time DRP.

Fig. 7
figure 7

The relationship of the metric with the time of DRP

4.3 Experiments on multi-modal medical images

Nowadays, medical images play an increasingly important role in clinical diagnosis and treatment. For example, CT (Computed Tomography) images are widely used in orthopaedics, because they can capture status of bones clearly; MRI (Magnetic Resonance Imaging) images can reflect soft tissue information clearly, so they are usually adopted for diagnosing intracerebral hematoma, brain tumors and the like. CBF (cerebral blood flow) can objectively reflect the changes of tension and elasticity of cerebral blood vessels. SPECT (Single-Photon Emission Computed Tomography) can measure the biology activities of cellular and molecular. However, these images contain their own limitations, which may restrict their applications in real applications. For example, SPECT images are not clear enough and they cannot capture the structure of tissues, so SPECT images are always used along with MRI or CT in clinical diagnosis.

In this subsection, the proposed algorithm is compared with others on various types of medical images. The source images, as shown in Fig. 8, contain CT, MRI, CBF, and SPECT images. In each row of the figure, two pairs of source images are displayed. Each pair contains two images captured by different medical imaging equipments.

Fig. 8
figure 8

Medical source images: each row contains two pairs of source images. The top row lists CT-MRI images; the middle rule lists MRI-CBF images; and the bottom row lists MRI-SPECT images

Figure 9 shows the fused images obtained by different methods. If taking a close look at the figure, we can note the difference of the proposed algorithm with other schemes. As shown in Fig. 9-(f1), the final image fused by the proposed SEWTF algorithm can well preserve the complementary visual information of different source images, namely the fused image preserves the soft tissue of human brain from the MRI image and the bone from CT image simultaneously. However, in the brain tissue part of the fused images obtained by LP and DWT, the brightness is decreased. In the fused image obtained by NSCT, the brain tissue is neglected. LIU algorithm does not work well for this example as well, because it produces some visible artifacts around the bone in the fused image.

Fig. 9
figure 9

Fused medical images obtained by different algorithms

From Fig. 9-(a3) to (f6), we list 4 groups of color fused images. Different from the case of gray image fusion, color image fusion needs to preserve salient visual information of each source image; meanwhile, it should avoid color distortion. Figure 9-(f3) shows that the proposed SEWTF algorithm can preserve the color from CBF image and the soft tissue of human brain from MRI. From Fig. 9(a3)–(e3), we can note that the fused images generated by the LP, DWT, GFF, and LIU method lose the color of CBF image, and consequently makes some details invisible. Figure 9-(c3) shows that the NSCT based algorithm fails in producing satisfactory fused image, because it tends to loss texture parts.

In order to compare the proposed SEWTF algorithm with others objectively, Q F AB [16], two Piella measures (Q 0 , Q E ) [17], and VIFF [5] are employed to predict the quality of the fused images obtained by different algorithm. The objective evaluations for the medical test images are shown in Table 1. The objective evaluations indicate that SEWTF performs better than other methods, which justifies the subjective conclusions drawn above.

Table 1 Objective evaluation results of various algorithms on medical images

4.4 Experiments on visible-ir images

In the real world, visible images and IR(infrared) images are always fused into one image. Visible image can only capture the surface information, while, the IR image can reflect the inner information based on their temperature. The fusion of Visible-IR images allows improved detection and directly recognizable localization of a target in the IR image with respect to its background in the visible image [19].

In this subsection, the proposed algorithm is compared with others on Visible-IR images. In each row of the figure, two pairs of source images are displayed. Each pair contains two images captured by different scanners from the same scene (Fig. 10).

Fig. 10
figure 10

Visible-IR source images

Figure 11 shows the fused images obtained by different methods. By taking a careful look at the figure, we note the final image fused by the proposed SEWT can well preserve the complementary visual information of different source images. Take the first group of images as an example, Fig. 11-(f1) suggests that SEWTF produces a more informative and natural fused image. It can be noted that the images fused by NSCT based algorithm nearly neglect the information from the IR image. The reason is that only the major principal components are preserved and minor components are neglected in this fusion algorithm.

Fig. 11
figure 11

Fused Visible-IR images obtained by different algorithms

The objective evaluations for the Visible-IR test images are shown in Table 2. The objective evaluations indicate that SEWTF performs better than other methods, which justifies the subjective conclusions drawn above.

Table 2 Objective evaluation results of various algorithms on Visible-IR images

4.5 Image sequence fusion

Figure 12 displays an image sequence which is constructed by blurring an image’s different regions. Two source images are selected for fusion using the proposed SEWTF algorithm, and then the fused image is combined with another source image using SEWTF; repeat such operation until all the source images are fused. Finally, we can get the fused image of the image sequence, which is shown in Fig. 13. It can be observed that all the coins in the fused image are clear.

Fig. 12
figure 12

An image sequence

Fig. 13
figure 13

Fused image

4.6 Computation time comparison

Image fusion algorithms have been widely employed in various fields, such as remote sensing, optical imaging systems, and clinical diagnosis. Nearly all the applications demand that the fusion algorithms run in real time. Hence, it is necessary to analyze the computation consumption of the proposed algorithm. The algorithm contains three parts with different computation time. First, source images are decomposed into several layers using SEWT; then, the approximation layer and detail layers are fused, respectively; finally, the fused approximation layer and detail layers are combined to get the final fused image.

The comparison of computation time among the proposed algorithm and other methods is presented. Fusion methods include (LP, DWT, NSCT, GFF, LIU). All experiments are performed on a PC with an Intel Xeon 2.66 GHz CPU and 3.25 GB RAM using Matlab 7.9.0. The elapsed time of fusion algorithms that process 1024 pixels are listed in Table 3.

Table 3 Comparison on time consumption

From Table 3, it can be seen that the consumption time of GFF is the lowest, followed by the DWT, LP, and LIU. As NSCT based fusion algorithm needs complicated image decomposition and reconstruction, its computation time is the highest among these fusion algorithms. The consumption time of the proposed algorithms is longer than other four algorithms (except NSCT based fusion algorithm). In the proposed fusion algorithm, time is mainly consumed in the process of image decomposition, because the algorithm needs to find optimal wavelets rather than design fixed wavelets.

5 Conclusions

In this paper, we presented a novel multi-scale image fusion algorithm for multi-sensor images based on simultaneous empirical wavelet transform. Like other multi-scale fusion algorithms, it contains three steps: image decomposition, coefficient combination, and image reconstruction. But compared with traditional ones, an attractive feature of the proposed algorithm is that it can simultaneously decompose the source images into the same wavelet set which is the optimal one for these images. The global contrast based salience detection model is utilized to fuse smooth parts, and maximum absolute value selection is employed to fuse the detail layers. The proposed algorithm is tested on various types of image infusion, including multimodal medical image fusion and Visible-IR fusion. The experimental results show that it can preserve salient and complementary information of the source images. In addition, the results show good performance on color image fusion.

In the real applications, the algorithm can be used in various fields, such as clinical diagnosis, target detection, and digital camera. For example, in clinical diagnosis, MRI can capture structural and anatomical information with high resolution, and SPECT images reveal the metabolic change which is significant to clinical diagnosis. The proposed algorithm can combine the two images into one image which contains more information. Hence, the algorithm can help doctors to make correct diagnosis. In target detection, visible image can only capture the surface information, while, the IR image can reflect the inner information based on their temperature. The fusion of Visible-IR images allows improved detection and directly recognizable localization of a target in the IR image with respect to its background in the visible image.

However, there are still some limitations in the algorithm. First, the time consumption of the algorithm is relatively high, which restricts its application in real time systems. Time is mainly consumed in the process of image decomposition, because it needs to find out the optimal wavelets for signals. Second, we do not test the proposed algorithm on noised images. Considering that noises unavoidably affect the results in the process of generating optimal wavelets, it is necessary to conduct denoising operations before running the proposed fusion algorithm.