1 Introduction

Biomedical images are extensively utilized for automated diagnosis of various diseases such as COVID-19, pneumonia, tuberculosis, cancer, etc. Biomedical imaging systems play an efficient role to monitor and diagnose the internal body organs without utilizing any kind of surgery. Biomedical images come up with various modalities to understand the internal body organs (Du et al. 2016). These modalities are positron emission tomography (PET), magnetic resonance imaging (MRI), computerized tomography (CT), X-ray, ultrasound, etc. (James and Dasarathy 2014). Every modality comes up with its own significance during its usage in the diagnosis process. Due to a single modality, these images are limited to certain diseases or issues. Therefore, multi-modality biomedical images are desirable. A multi-modality image can be obtained by using the efficient fusion approach (Daniel et al. 2017). Thus, these images have more information as compared to classical images and also more helpful to diagnose various kind of diseases (Du et al. 2016).

Recently, many researchers have designed and implemented fusion approaches to obtain efficient multi-modality biomedical images (Ravi and Krishnan 2018; Hu et al. 2020). However, many researchers have utilized the existing image fusion approaches. Therefore, the obtained multi-modality fused images may suffer from various issues such as gradient and texture distortion, especially for the infected region. To overcome the issues associated with the existing approaches, many researchers have utilized deep learning and dictionary learning kind of approaches. These are dictionary learning (Hu et al. 2020), local-features fuzzy sets (Ullah et al. 2020), deep learning (Algarni 2020; Xia et al. 2019; Zhou et al. 2019), deep learning and NSST (Wang et al. 2019), etc. that found to be the best tool to obtain efficient multi-modality fused images.

The design and development of an efficient multi-modality biomedical image fusion approach is still an open area for research. Deep learning is found to be one of the best fusion approaches to obtain promising results. Additionally, a deep transfer learning-based multi-modality biomedical fusion approach can provide better results.

The main contributions are summarized as:

  • A multi-objective differential evolution and Xception model based multi-modality biomedical fusion model is proposed.

  • The proposed approach, initially, decomposes the image into subbands using a non-subsampled contourlet transform (NSCT).

  • An extreme version of the Inception (Xception) is then used for feature extraction of the source images.

  • The multi-objective differential evolution is used to select the optimal features.

  • To obtain the fused coefficients, a coefficient of determination and the energy loss based fusion functions are used.

  • Finally, a fused image is computed by applying the inverse NSCT.

  • The proposed and the competitive approaches are compared by considering the benchmark multi-modality image fusion dataset.

The remaining structure of this paper is organized as: Existing literature is presented in Sect. 2. The proposed medical image fusion is illustrated in Sect. 3. Experimental results and comparative analyses are discussed in Sect. 4. Section 5 concludes the proposed work.

2 Literature review

Shu et al. (Zhu et al. 2019) implemented local laplacian energy and phase congruency based fusion approach in the NSCT domain (LEPN). Local laplacian energy utilized weighted local energy and sum of laplacian coefficients to obtain the regulated details and features of input images. Zhu et al. (2020) designed a diffusion-based approach by using the synchronized-anisotropic operators (DSA). A maximum absolute value constraint was also utilized for base layers fusion. The fusion decision map was computed by considering the sum of the modified anisotropic Laplacian approach by using the similar corrosion sub-bands obtained from the anisotropic diffusion. Kumar et al. (2020) proposed a co-learning based fusion maps for obtained more efficient multi-modality fused biomedical images. A convolutional neural network (CNN) was also used for the prediction and segmentation of potential objects.

Lu et al. (2014) designed an edge-guided dual-modality (EGDM) approach to obtain the multi-modality images. It performs significantly better even on highly under-sampled data. Lifeng et al. (2001) utilized wavelet and quality analysis to obtain the biomedical multi-modality fused images. The pyramid wavelet used for the fusion process. Ma et al. (2020) designed a dual-discriminator conditional generative adversarial network (DDcGAN) to obtain multi-modality fused images. It obtained a real-like fused image by using the content loss to dupe both discriminators. Two discriminators were also considered intention to differentiate the composition variations between the fused and source images, respectively.

Wang et al. (2019) developed a 3D auto-context-based locality adaptive multi-modality GANs (3GANs) to obtain more efficient multi-modality fused images. A non-unified kernel was also used along with the adaptive approach for multimodality fusion. Gai et al. (2019) utilized pulse coupled neural network (PCNN) by considering the edge preservation and enhanced sparse representation in the nonsubsampled shearlet transform (NSST). It completely utilized the features of various modalities that can handle edge details well and improves the results.

Liu et al. (2020) utilized a VGG16 based deep transfer learning approaches for image fusion to improve the classification process. VGG16 can obtain more efficient features as compared to the traditional deep learning approaches. The obtained features were then used to fuse the multi-modality biomedical images. Tavard et al. (2014) designed a multi-modal registration and fusion approach to improve the cardiac re-synchronization. The approach helps in improving therapy optimization.

Zhu et al. (2016) designed a novel dictionary learning-based image fusion approach for multi-modality biomedical images. Due to the use of dictionary learning, this approach achieves higher accuracy. But it is computationally complex in nature. Liu et al. (2018) proposed the biomedical image decomposition approach using NSST to fuse the multi-modality images. This approach has shown significant results over the existing approaches, but, suffer from the edge degradation issue. Wang et al. (2020) utilized CNN and contrast pyramid to fuse the biomedical multi-modality images. CNN can fuse the images efficiently. However, it is a computationally extensive approach, and some time also not provide promising results if the biomedical images are very similar to each other.

From the literature review, it has been observed that multi-modality image fusion is still an open area for research. The deep learning is found to be the best promising techniques to obtain better multi-modality fused biomedical images. But these approaches can provide better results if pre-trained deep transfer learning approaches are used. Additionally, the initial parameter selection of the deep learning and deep transfer learning approach is also a challenging issue (Pannu et al. 2019; Kaur and Singh 2019; Pannu et al. 2018). Therefore, in this paper, a well-known multi-objective differential evolution is used to enhance the results.

3 Proposed multi-objective differential evolution based deep transfer learning model for multi-modality image fusion

Assume that \({I_1}\) and \({I_2}\) show source images. Initially, NSCT is used to decompose \({I_1}\) and \({I_2}\) into sub-bands. Our primary objective is to fuse the respective sub-bands of both source images. The fusion of high sub-band is achieved by using the extreme version of the Inception (Xception) model. The coefficient of determination is utilized to evaluate the significance of the computed fused high sub-bands. Low sub-bands fusion is achieved by using the local energy function. Inverse NSCT is utilized to compute the multi-modality fused image. Figure 1 shows the step by step methodology of the proposed model.

Fig. 1
figure 1

Diagrammatic flow of the proposed multi-objective differential evolution based deep transfer learning model for multi-modality image fusion

3.1 Nonsubsampled contourlet transform

Nonsubsampled Contourlet Transform (NSCT) is a well-known transform used to decompose the images into the wavelet domain. It is a shift-invariant transform which can provide rich directional details. This directionality is effective to convert the transformed images to the actual one with minimum root mean square error (for more details please see Da Cunha et al. 2006).

3.2 Feature extraction using deep Xception model

CNN may suffer from the under-fitting issue, as many potential features may not be extracted. To overcome this issue, an extreme version of the Inception (Xception) model is used. Figure 2 represents the block diagram of the xception model (for mathematical and other information please see Chollet 2017).

Fig. 2
figure 2

Architecture of Xception model (obtained from Chollet 2017)

Both high sub-bands of source images are placed in parallel fashion in the Xception model. Consider \(\eta {I_1}(p,q)\) and \(\eta {I_2}(p,q)\) are the obtained features from respective high sub-bands by using the Xception model.

3.3 Feature selection using multi-objective differential evolution

In this step, the optimal features are selected from the features obtained from the Xception model. The fusion factor and entropy metrics are used as the fitness function to select the optimal features. A multi-objective differential evolution can solve many computationally complex problems (Babu et al. 2005). It can significantly balance the fast convergence and population diversity. It can be described in the following steps:

I. Initialization: First of all, various parameters related to differential evolution are defined such as population size (\(t_p\)), crossover rate (\(c_r\)), mutation rate (\(m_r\)), etc. Random distribution is used to generate the random solutions \(\beta _\alpha ^0 (\alpha = 1, 2, \ldots ,t_p)\). h defines the number of function evaluations. It is used to control the iterative process of differential optimization with maximum function evaluations (\({h}_{M}\)).

II. Iterative step Mutation and crossover operators are used to obtain the optimal number of features.

Mutation is implemented on a \(\beta _\alpha ^{h}\) to evaluate a child vector \(\Pi _\alpha ^{h}\). In this paper, following mutation is used:

$$\begin{aligned} \Pi _\alpha ^{h} =\beta ^{h} _{{d1}}+m_r \cdot (\beta ^{h}_{d2}-\beta ^{h}_{d3}) \end{aligned}$$
(1)

Here, \(\alpha \) shows index values. \({d}_i\) \(\ne \) \(\alpha \) \(\forall i=1:3\). \({d}_1\), \({d}_2\), and \({d}_3\) are random numbers selected from \([1,\;t_p,]\).

Crossover is used to obtain the news solutions. A child \(\epsilon _\alpha ^{h}\) can be obtained from \(\forall \) \(\beta _\alpha ^{h}\), as:

$$\begin{aligned} \epsilon _{\alpha _\kappa }^{h} ={\left\{ \begin{array}{ll} \Pi _{\alpha _\kappa }^{h}, \quad \beta _{\kappa } \le c_r \quad \text {or}\quad \kappa =\kappa _{{d_n}} \\ \beta _{\alpha _\kappa }^{h}, \quad otherwise \end{array}\right. } \kappa =1,2,\dots ,{D}, \end{aligned}$$
(2)

where D shows dimensions of the problem. \(\beta _{\kappa }\) \( \in \; [0,\; 1]\) and \(\kappa _{{d_n}}\) \(\in \; [1,\; {D}]\).

III. Selection: A child vector \(\epsilon _\alpha ^{h}\) can be drawn by considering its parent vector \(\beta _\alpha ^{h}\) as:

$$\begin{aligned} \beta _\alpha ^{{h}+1}={\left\{ \begin{array}{ll} \epsilon _\alpha ^{h}, \quad f(\epsilon _\alpha ^{h}) \le f (\beta _\alpha ^{h}) \\ \beta _\alpha ^{h}, \quad otherwise \end{array}\right. } \end{aligned}$$
(3)

IV. Stopping condition: If number of functional evaluations are lesser than the total available evaluation then Steps II and III will be repeated.

3.4 Fusion of high sub-bands

The extracted and selected features using the Xception model from high sub-bands are then fused by using the coefficient of determination (R). R between \(\eta {I_1}(p,q)\) and \(\eta {I_2}(p,q)\) can be computed as:

$$\begin{aligned}&R_{N}(\eta {I_1}, \eta {I_2}) \nonumber \\&\quad = \frac{\Big (\sum _{p=1}^{m} \sum _{q=1}^{n} (\eta {I_1}(p,q)-\overline{\eta {I_1}})(\eta {I_2}(p,q)- \overline{\eta {I_2}})\Big )^2}{{\sum _{p=1}^{m}\sum _{p=1}^{n} (\eta {I_1}(p,q)-\overline{\eta {I_1}})^2}\; \times \; \sqrt{\sum _{p=1}^{m}\sum _{p=1}^{n}(\eta {I_2}(p,q)-\overline{\eta {I_2}})^2}} \end{aligned}$$
(4)

Here, \(\overline{\eta {I_1}}\) and \(\overline{\eta {I_2}}\) shows the average of high sub-bands, respectively.

The dominated features are preserved in the obtained feature maps as:

$$\begin{aligned} F_{s}(p,q) = max(s{I_1} \times R_{N} + s{I_2} \times (1-R_{N})) \end{aligned}$$
(5)

Here, \(s{I_1}\) and \(s{I_2}\) show high sub-bands of \({I_1}\) and \({I_2}\), respectively.

3.5 Fusion of low sub-bands

Motivated from (Hermessi et al. 2018), local energy is used to fuse the low sub-bands as:

$$\begin{aligned} \chi _I(p,q) = \sum _{p' \in \gamma } \sum _{q' \in \delta }|I(p+p', q+q')| \end{aligned}$$
(6)

Here, \(I = {I_1}\; or\; {I_2}\). \(\gamma \; \times \; \delta \) represents neighbors of patch placed at (pq). size of local patch is assigned as \(5 \times 5\). Fused coefficients of low sub-bands can be computed as:

$$\begin{aligned} \psi _f(p,q) = {\left\{ \begin{array}{ll} \psi {I_1}(p,q)\quad |\chi {I_1}(p,q)|\ge |\chi {I_2}(p,q)|\\ \psi {I_2}(p,q) \quad |\chi {I_1}(p,q)|< |\chi {I_2}(p,q)|\\ \end{array}\right. } \end{aligned}$$
(7)

4 Experimental results

To evaluate the performance of the proposed approach, benchmark multi-modality biomedical images dataset is obtained from Ullah et al. (2020). Fifteen different pairs of modality images are taken for comparative purposes. The main goal is to fuse these images to obtain multi-modality fused images. To draw the comparisons, six competitive multi-modality biomedical fusion approaches such as LEPN (Zhu et al. 2019), DSA (Zhu et al. 2020), CNN (Kumar et al. 2020), EGDM (Lu et al. 2014), DDcGAN (Ma et al. 2020), and 3GANs (Wang et al. 2019) are also implemented on the same set of images. The hyper-parameters of these approaches are assigned as mentioned in their respective papers.

4.1 Visual analysis

Figures 3 and 4 represent the source images and their respective multi-modality fused biomedical images obtained from the LEPN (Zhu et al. 2019), DSA (Zhu et al. 2020), CNN (Kumar et al. 2020), EGDM (Lu et al. 2014), DDcGAN (Ma et al. 2020), 3GANs (Wang et al. 2019) and the proposed approach. It is clearly shown that the obtained results have better modality as compared to the competitive approaches. Although, the existing approaches such as LEPN (Zhu et al. 2019), DSA (Zhu et al. 2020), CNN (Kumar et al. 2020), EGDM (Lu et al. 2014), DDcGAN (Ma et al. 2020), and 3GANs (Wang et al. 2019) provide significant visual results but have little edge and texture distortion. Figures 3i and 4j show the obtained results from the proposed approach. These images prove that the proposed approach provides a better visual appearance of the obtained multi-modality fused images.

Fig. 3
figure 3

Analysis of multi-modality biomedical fusion approaches: a MRI, b CT, c LEPN (Zhu et al. 2019), d DSA (Zhu et al. 2020), e CNN (Kumar et al. 2020), f EGDM (Lu et al. 2014), g DDcGAN (Ma et al. 2020), and h 3GANs (Wang et al. 2019) and i proposed approach

Fig. 4
figure 4

Analysis of multi-modality biomedical fusion approaches: a MRI, b CT, c LEPN (Zhu et al. 2019), d DSA (Zhu et al. 2020), e CNN (Kumar et al. 2020), f EGDM (Lu et al. 2014), g DDcGAN (Ma et al. 2020), and h 3GANs (Wang et al. 2019) and i proposed approach

4.2 Quantitative analysis

In this section, we have compared the proposed approach with the existing approaches such as LEPN (Zhu et al. 2019), DSA (Zhu et al. 2020), CNN (Kumar et al. 2020), EGDM (Lu et al. 2014), DDcGAN (Ma et al. 2020), and 3GANs (Wang et al. 2019), by considering the some well-known performance metrics. The selected performance measures are as edge strength, fusion symmetry, entropy, and fusion factor (for mathematical information see Prakash et al. 2019).

A multi-modality biomedical image fusion approach generally provides significant entropy values. Table 1 depicts entropy analysis of the proposed deep transfer learning-based multi-modality biomedical image fusion approach. It shows that the proposed approach provides significantly more entropy values than the existing multi-modality biomedical image fusion approaches. It is found that the proposed approach provides \(1.8343\%\) improvement over the best available approaches.

Table 1 Comparative analysis among the proposed deep transfer learning based multi-modality image fusion and the competitive approaches in terms of entropy (maximum is better)

Mutual information represents the preserved details from the sourced image in the fused image. Therefore, it is desirable to be maximum. Table 2 depicts the mutual information analysis of the proposed approach over the competitive approaches. The proposed approach shows an average improvement of \(1.8373\%\).

Table 2 Comparative analysis among the proposed deep transfer learning based multi-modality image fusion and the competitive approaches in terms of mutual information (maximum is better)

The fusion factor is a well-known performance metric that shows the strength of the fusion process. It is desirable to be maximum. Table 3 shows the fusion factor analysis of the proposed and competitive approaches. The proposed approach shows an average improvement of \(1.3928\%\) over the competitive fusion models.

Table 3 Comparative analysis among the proposed deep transfer learning based multi-modality image fusion and the competitive approaches in terms of fusion factor (maximum is better)

Fusion symmetry evaluates the symmetric details between source and fused images. It is desirable to be maximum. Table 4 shows fusion symmetry analysis of the proposed deep transfer learning model based multi-modality fusion model. It is found that the proposed model achieves an average improvement of \(1.1974\%\) over the competitive models.

Table 4 Comparative analysis among the proposed deep transfer learning based multi-modality image fusion and the competitive approaches in terms of fusion symmetry (maximum is better)

Edge strength evaluates the degree of edge preservation and it is desirable to be maximum (Xydeas and Petrovic 2000). Table 5 shows the edge strength analysis of the proposed deep transfer learning-based multi-modality image fusion model. The proposed model achieves an average improvement of \(1.6928\%\) over the competitive approaches.

Table 5 Comparative analysis among the proposed deep transfer learning based multi-modality image fusion and the competitive approaches in terms of edge strength (maximum is better)

5 Conclusion

From the literature review, it has been found that multi-modality image fusion is still an open area for research. The deep learning-based fusion approaches are found to be one of the best promising techniques to obtain better multi-modality fused biomedical images. However, these approaches are computationally complex in nature and also still suffer from the under-fitting issue. The proposed approach, initially, decomposes the image into sub-bands using a non-subsampled contourlet transform (NSCT). Thereafter, an extreme version of the Inception (Xception) has been used for feature extraction of the source images. The multi-objective differential evolution has been used to select the optimal features. Thereafter, to obtain the fused coefficients, a coefficient of determination and the energy loss functions are used. Finally, the fused image has been computed by applying the inverse NSCT. Extensive experimental results have shown that the proposed approach outperforms the competitive multi-modality image fusion approaches in terms of various performance metrics. In near future, one may use the proposed model for other applications such as remote sensing images (Singh et al. 2018; Singh and Kumar 2019a), medical images, etc. Additionally, the proposed hyper-parameters tuning approach can be used to tune the hyper parameter of the other approaches such as visibility restoration models (Osterland and Weber 2019; Singh and Kumar 2018, 2019b; Wang et al. 2019; Singh et al. 2019a, 2019b), filtering models (Gupta et al. 2019; Kaur et al. 2020; Wiens 2019), deep learning models (Jaiswal et al. 2020; Basavegowda and Dagnew 2020; Kaur et al. 2019, 2020; Ghosh et al. 2020), etc.