1 Introduction

Several changes have been seen in clinical applications with advancement in technology and there has been a huge Increase of the number of images modality. The imaging technique used for medical purposes offers specialized information which is not otherwise accessible. Doctors need details from more than one approach to diagnose an illness properly [1, 2]. CT scans provide details on dense bone structures for accurate radiation, dose assessment, but do not offer any insight into internal organs. The MRI images demonstrate the hard tissue contrast, but lack bone information. Radiologists; therefore, require the fusion of two or more modalities Because of the inability of one single image to have all the details. By image fusion, using many techniques, better disease analyzes can be given by integrating information from more than one modality. CT and MRI image fusion allow for combined Representation of the connective tissue information provided by the MRI image and the CT image of the vertebral anatomy, which makes doctors understand the disease as well as provide decent healthcare. Image fusion efforts to integrate the necessary information from both images into one image, and care should be taken when fusing to avoid adding any artifacts to the image [27,28,2]. Spatial image fusion, transformations, neural networks and directed image filtering may be implemented. PCA, DWT, CVT, NSST, and GIF are the basic traditional fusion methods. However, these methods are not very successful in fusing specific source image information and leading to certain objects.

In the recent years, the study of pixel level image fusion has lasted for more than 30 years, during which around 1 k of related scientific papers have been published. Currently deep learning (DL) has gained many breakthroughs in various computer vision and image processing problems, such as classification, segmentation, super-resolution, etc. In the field of image fusion, the study based on deep learning has also become an active topic in the last three years. A variety of DL-based image fusion methods have been proposed for digital photography (e.g., multifocus image fusion, multi-exposure image fusion), multimodality imaging (e.g., medical image fusion, infrared/visible image fusion). In this paper, this issue is addressed from another viewpoint to overcome the difficulty in designing robust activity level measurements and weight assignment strategies. Specifically, a convolutional neural network (CNN) is trained to encode a direct mapping from input medical images to the weight map. In this way, the activity level measurement and weight assignment can be jointly achieved in an “optimal” manner via learning network parameters. Considering the different imaging modalities of multimodal medical images, we adopt a multiscale approach via image pyramids to make fusion process more consistent with human visual perception. In addition, a local similarity-based strategy is applied to adaptively adjust the fusion mode for the decomposed coefficients of input multimodal medical images [4].

Therefore, methods of hybrid fusion seek to solve those shortcomings of conventional methods. Owing to the appealing advantages in the medical decision-making team, the issue of multimodal medical image fusion has provided focused work in the recent years [5].

2 Literature survey

James et al. [6] lists the methods factually and describes the specific technical challenges facing the medical image fusion field. The process for fusing medical images using DTCWT and SOFM was proposed by Karthikeyan et al. [7]. Gupta [8] suggested the integration of medical images with CT and MR in the NSST domain using the adaptive neural spiking technique. Daniel suggested a homomorphic fusion of wavelets called optimum homomorphic fusion of wavelets using hybrid GWO algorithm. Daniel et al. [9] suggested an optimum spectrum mask fusion using traditional GWO algorithms for MIF. S.H. For both CT and MRI medical images, Bhadauria et al. [10] proposed a noise reduction approach which fuses the images by processing them via CVT. Hermessi et al. [11] proposed a CNN-based fusion process for CT and MR medical images in the ST. Shahdoosti et al. [12] proposed the MMIF tetrolet transform. Heba et al. [13] Analyzes several clinical image fusion techniques and discusses the most important strengths and limitations of these techniques in developing hybrid techniques that improve the consistency of the combined image. Xi et al. [14] suggested an MMIF algorithm for clinical condition research, combined with sparse representation and PCNN. Xia et al. [15] suggested a novel fusion scheme for MMI using both the multiscale transformation features as well as DCNN. El-Hoseny et al. [16] discusses some of the MIF techniques to improve the hybrid fusion algorithm to enhance the fused image quality. Chavan et al. [17] suggested transforming NSxRW-based image fusion used for NCC review and post-treatment inspection. Sharma, et al. proposed image fusion algorithm based on NSST with simplified PCNN model. Sreeja et al. [18] suggested a fusion algorithm which would fuse the diagnostic image and improve the image quality of the fusion. Xua in [19] proposed the medical image fusion DFRWT process. Liu et al. [20] suggested a tensor structure and NSST to extract geometric functionality and apply a unified optimization model for image fusion. Liu et al. [21] suggested a NSST-based fusion algorithm that exploits decomposition based on moving frames. Liu et al. [22] proposed the medical image fusion method based on the convolutional neural networks (CNNs). Liu et al. [23] proposed the new multifocus image fusion method based on the deep learning approach, aiming to learn a direct mapping between source images and focus map. A deep convolutional neural network (CNN) trained by high-quality image patches and their blurred versions is adopted to encode the mapping. Liu et al. [24] presented the survey paper on a systematic review of the DL-based pixel-level image fusion literature. This paper specifically summarizes the main difficulties that exist in conventional image fusion research and discusses the advantages that DL can offer to address each of these problems. Rajalingam et al. [4] proposed an efficient multimodal medical image fusion approach based on the deep learning neural networks (CNN) for fusion process. Du and Gao [25] proposed a new all CNN (ACNN)-based multifocus image fusion method in spatial domain. The main idea is that the max-pooling of CNN is replaced by a convolution layer, the residuals are propagated backwards by gradient descent, and the training parameters of the individual layers of the CNN are updated layer by layer.

3 Proposed hybrid image fusion algorithm (NSCT-GIF)

Fusion is performed using the NSCT and guided filtering shown in Fig. 1 on recorded input medical images X and Y.

Fig. 1
figure 1

Block diagram for the proposed hybrid fusion algorithm (NSCT—GIF)

3.1 Procedural steps for the hybrid fusion algorithm (NSCT—GIF)

Step 1 1: Read the two medical images X and Y for data.

Step 2 2: Decomposition of images from source.

Decompose the source modalities using NSCT. Input modalities is decomposed into low and high frequency images at each level and direction, \(\theta\), i.e., \(A:\left\{ {C_{L}^{X} \cdot C_{L}^{X} } \right\}\), \(B:\left\{ {\left. {C_{L}^{Y} \cdot C_{L}^{Y} } \right)} \right\}\), Here \(C_{L}\) represents images of low frequency and high frequency at the stage L and orientation. In the process, the degree of decomposition is 3 as described in [26].

Step 3 Coefficients of low frequency fusion.

$$ P = \frac{{\sum {W^{ \circ } x,y\left[ {A^{ \circ } _{{x,y}} } \right]\left( {\cos \left({{\theta ^{ \circ } _{{x,y}} }}\right)} \right)} }}{{\sum\limits_{n} {A_{{x,y}} + \in } }}. $$
(1)
$$ C_{L}^{F} (a,b) = C_{L}^{X} (a,b),\;{\text{if}}\;P_{{cl}}^{X} (a,b) > P_{{cl}}^{Y} (a,b) $$
(2)
$$ C_{L}^{F} (a,b) = C_{L}^{Y} (a,yb),\;{\text{if}}\;P^{Y} (a,b) > P_{{cl}}^{Y} (a,b) $$
(3)
$$ C_{L}^{F} (a,b) = (C_{L}^{Y} (a,b) + C_{L}^{X} (a,b))/2,\;{\text{if}}\;P_{{cl}} ^{X} (a,b) = P_{{cl}}^{Y} (a,b) . $$
(4)

Step 4 High frequency fusion coefficients.

$$ Q_{i} = \frac{1}{|w|}\sum\limits_{i \in wk} {(a_{k} I_{i} + } b_{k} ) $$
(5)

High frequency coefficient is the guiding feature in this method X \(C_{L}^{X}\) then input image is the coefficient of the high frequency signal Y \(C_{L}^{Y}\) for each level the output image is generated. All the higher frequencies images are then fused to produce a simple average \(C_{L}^{F}\), Qi—linear alteration of guideline picture.

Step 5 Inverse NSCT exists on the \(C_{L}^{F}\) to get a photo fused F.

3.2 NSCT

The NSCT is based on the Contourlet Transform principle that produces better outcomes in the visualization of geometrical images. The Contourlet Transform is a category of transition, as it involves down-samplers as well as up-samplers in the Laplacian pyramid levels as well as the dimensional bank filter. NSCT is an invariant, multiscale, and multidirectional shift conversion whose integration is very complex. This is accessed via the NSPFB and NSDFB [27].

3.3 Phase congruency

It is a contrasting invariant texture analysis method based on a local energy model, which follows the concept that significant features can be found in a picture in which the Fourier coefficients are up to the limit. It is also insensitive to map specific pixel pressure. It is insensitive to cause changes in illumination and contrast that makes it ideal for the use in fusion of multimodal medical images. It is a good feature for fusing clinical multimodal data. In the proposed method, it is then used as a fusion rule for the low frequency components (NSCT-GIF) [28, 26].

3.4 Guided image filtering (GIF)

$$ Q_{i} = a_{k} I_{i} + b_{k} \forall i \in w_{k} $$
(6)
$$ a_{k} = \frac{1}{W}\frac{{\sum\nolimits_{{i \in w_{k} }} {I_{I} t_{i} - u_{k} E[t_{k} ]} }}{{\sigma_{k}^{2} + \in }} $$
(7)
$$ b_{k} = E(t_{k} ) - a_{k} u_{k} $$
(8)

where \(a_{k}\) and \(b_{k}\) are the coefficients unchanged in \(w_{k}\). The coefficients \(a_{k}\) and \(b_{k}\) can be found using Eqs. (7) and (8). \(E(t_{k} )\) returns the mean image data in the window \(w_{k}\). The mean and the difference of the window feature \(w_{k}\) are represented by \(u_{k}\) and \(\sigma_{k}^{2}\). Driven image filter has been used extensively for image fusion and provides good results for combining multimodal clinical data [3].

4 Experimental outcomes and discussion

The proposed HMMIF methodology is evaluated for clinicians diagnosed with neurocysticercosis, incurable, and neoplastic disorders on pilot study sets incorporating CT and MRI, MRI and PET, MRI and SPECT of the brain. The mixture of each pair of input images of both CT and MRI sketches, MRI and PET sketches, and MRI and SPECT slices was chosen from the same individual based on the clinical and operational comparisons. The analysis results of the proposed composite image fusion strategies and other current strategies are shown in Figs. 2, 3, 4, 5, 6, and 7, which allow the output of the proposed fusion strategies to be compared and analyzed. The processed multimodality medical input images are gathered from Harvard medical school [30] and radiopedia.org [31] medical image online database. The size of the image is 256 × 256 for execution process.

Fig. 2
figure 2

Experimental results for neurocysticercosis disease affected images (set 1)

Fig. 3
figure 3

Experimental results for metastatic bronchogenic carcinoma disease affected images (set 2)

Fig. 4
figure 4

Experimental results for astrocytoma disease affected images (set 3)

Fig. 5
figure 5

Experimental results for anaplastic astrocytoma affected images (set 4)

Fig. 6
figure 6

Experimental results for Alzheimer’s affected images (set 5)

Fig. 7
figure 7

Experimental results for mild Alzheimer’s disease affected images (set 6)

For the image fusion six separate sets of CT/MRI, MRI/PET, and MRI/SPECT images will be taken. The set 1 of input images reflects the diagnostic images of the patient harmed by neurocysticercosis disease taken from CT and MRI scanners, respectively.

The set 2 of input images reflects the metastatic bronchogenic carcinoma disease caused by brain images in the MRI/SPECT combination. The 3 and 4 sets of input images reflect the affected brain images of astrocytoma and anaplastic astrocytoma disease in the MRI/SPECT combination, respectively. The 5 and 6 sets of input images show the affected brain images of the Alzheimer and mild Alzheimer disease in conjunction with MRI/SPECT and MRI/PET, respectively.

Using the mixture of hybrid algorithm (NSCT—GIF) the fusion results in the same input images using PCA, DWT, GIF, PCNN, NSCT, CNN, and proposed hybrid technique (NSCT-GIF). The fusion outcome of the proposed hybrid methodology delivers better efficiency in both theoretical and practical analysis, out of all conventional fusion strategies.

The comprehensive analysis of performance measures for traditional and theoretical hybrid fusion strategies such as fusion component, IQI, mSSIM, cross entropy, EQM, MI, PSNR, and standard deviation are shown in Tables 1 and 2. The target for the performance metrics values of the fusion factor should be the highest value for an effective image fusion methodology, the highest value should be IQI, mSSIM, EQM, and cross entropy, and the value closest to ' 1 ' should also be proclaimed as a higher quality for the fused output image.

Table 1 Performance metrics comparative analysis for different fusion methods (set 1, set 2, and set 3)
Table 2 Performance metrics comparative analysis for different fusion methods (set 4, set 5, and set 6)

The proposed method for hybrid fusion is contrasted to other traditional fusion methods, namely PCA, DWT, GIF, NSCT, PCNN, and CNN. All these strategies are applied as an average for estimated sub-band correlations and optimum value range for high-pass sub-band correlations, with fusion regulations. The proposed methodology is tested using analyzes of subjective and numerical parameters.

Figures 8 and 9 display the fusion component and IQI for the six CT-MRI, MRI-SPECT, and MRI-PET picture sets. The findings of the experiments are contrasted with the techniques PCA, DWT, GIF, PCNN, NSCT, and CNN. Compared with other current traditional methods, the proposed NSCT–GIF has higher fusion variable and IQI value.

Fig. 8
figure 8

Comparative analysis for fusion factor

Fig. 9
figure 9

Comparative analysis for Image Quality Index

Figures 10 and 11 demonstrate the comparable mSSIM and cross entropy analysis for the six CT-MRI, MRI-SPECT, and MRI-PET image collections. The outcomes of the experiments are contrasted with the techniques PCA, DWT, GIF, PCNN, NSCT, and CNN. The proposed NSCT-GIF has greater value for mSSIM and is given lower value for cross entropy relative to other traditional strategies that exist.

Fig. 10
figure 10

Comparative analysis for mean structural similarity index

Fig. 11
figure 11

Comparative analysis for cross entropy

Figures 12 and 13 show the comparative study of CT-MRI, MRI-SPECT, and MRI-PET for the EQM and MI for the six set images. The outcomes of the experiments are contrasted with the strategies PCA, DWT, GIF, PCNN, NSCT, and CNN. Similar to the other current traditional methods, the proposed NSCT-GIF and PCNN-GIF have greater value for both EQM and MI.

Fig. 12
figure 12

Comparative analysis for edge quality measure

Fig. 13
figure 13

Comparative analysis for mutual information

Figures 14 and 15 demonstrate the PSNR detailed analysis and the standard deviation for six sets of CT-MRI, MRI-SPECT, and MRI-PET images. The findings of the experiments are contrasted with the methods PCA, DWT, GIF, PCNN, NSCT, and CNN. When compared with other current traditional methods, the proposed NSCT–GIF has higher value for PSNR and standard deviation.

Fig. 14
figure 14

Comparative analysis for PSNR

Fig. 15
figure 15

Comparative analysis for standard deviation

5 Conclusions

This work examined the efficacy of various traditional methods, such as transform and neural network-guided filtering. Based on the convolutional neural networks, Siamese network is implemented to generate a direct mapping from input medical images to a weight map which contains the integrated pixel activity information and hybrid multimodal medical imaging fusion approaches using various assessment parameter values. The best medical imaging fusion approach has been applied using the latest hybrid methods. The suggested hybrid approaches (GIF-NSCT) have provided better results in all other traditional methods. It offers much more image details, better image quality, the fastest processing time, and better visual control. Both of these advantages make it a better selection for a variety of reasons for effective treatment, such as assisting with medical diagnosis. In addition to the proposed algorithm itself, another contribution of this work is that it exhibits greater potential of some deep learning techniques for image fusion, which will be further studied in the future.