Introduction

With the advent of modern technology, there is a there is a tremendous improvement in the capabilities of medical imaging systems and an increase in the number of imaging modalities in clinical use. Each medical imaging modality gives the specific information about the human body and the same is not available with other imaging modality. Medical imaging modalities can be classified into two types, i.e. functional imaging modalities and anatomical imaging modalities. Functional imaging modalities like positron emission tomography (PET), single positron emission computed tomography (SPECT), and functional magnetic resonance imaging (FMRI) gives the metabolic information or physiologic information of the human body. Whereas anatomical imaging modalities like X-ray computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound imaging (US) gives mainly structural information of the human body. CT images provides electron density map required for accurate radiation dose estimation and superior cortical bone contrast; however, it is limited in soft tissue contrast. MRI provides excellent soft tissue contrast which permits better visualization of tumors or tissue abnormalities in different parts of the body. But MRI has lack of signal from cortical bone and has image intensity values that have no relation to electron density. For the precise diagnosis of disease and for more effective interventional treatment procedures, radiologists need the information from two or more imaging modalities [1]. Through image fusion, it is possible to integrate and present the information from two or more imaging modalities in a more effective way. Image fusion finds applications in Oncology, Neurology, Cardiology and others [2, 3]. Fusion of CT and MR images is used to improve lesion delineation for Radiation therapy planning, prostate seed implant quality analysis [1], and planning the correct surgical procedure in computer-assisted navigated neurosurgery of temporal bone tumors [4], and orbital tumors [5]. This paper aims at designing an efficient CT and MR image fusion method.

Image fusion is process of integrating information from two or more images into a single composite image which is more suitable for human visual perception and further computer processing tasks [6]. Image fusion process must retain both redundant and complementary information present in the source images and it should not introduce any artefacts into the fused image. Depending on the merging stage, image fusion can be classified into three categories, i.e. pixel level, feature level and decision level as shown in Fig. 1. Pixel level image fusion directly combines the pixel data of source images to obtain fused image. It needs perfect registration of source images to sub pixel accuracy. Feature level fusion involves the extraction of representative features present in the source (e.g. by using segmentation) and then combining those features into a single feature vector by using neural networks, clustering algorithm or template methods [7]. Decision level is a high level fusion method in which first source images are processed individually for information extraction and then extracted information is combined (e.g. by using voting method). Compared to others, pixel level image fusion is more computationally efficient. Pixel level image fusion can be done in spatial domain or in transform domain (multiscale decomposition-based image fusion). Few spatial domain pixel level image fusion techniques are simple averaging of source images and principle component analysis (PCA) method. Transform domain pixel level image fusion methods are more efficient compared to spatial domain because it is possible to analyze the images at different resolutions through multiscale decomposition (MSD) transformation. Features which are sensitive to human visual system (HVS) are present at different resolutions or scales of images.

Fig. 1
figure 1

Classification of image fusion methods

The work presented in this paper is transform domain/multiscale decomposition-based image fusion method. Wide variety of MSD-based image fusion techniques have been proposed by various researchers. Key steps in MSD-based image fusion methods are as follows. First step is decomposing the source images with a MSD transform into low- and high-frequency sub-bands at different resolutions and spatial orientations. Second step is combining the coefficients of different sub-bands of source images by using a fusion rule. Third step is taking the inverse MSD of composite coefficient to obtain the fused image. Mainly the quality of fused image depends on the two factors: MSD transform used for decomposition and the fusion rule used to combine the coefficients. Initially Toet, [6] and Toet et al. [8] introduced different pyramid schemes for multi sensor image fusion. Pyramid scheme failed to provide any spatial orientation selectivity in the decomposition processes. Hence, it often causes blocking artefacts in the fused image. Next, discrete wavelet transform (DWT) has been introduced for image fusion by Munjunath et al. [9]. An enhancement of DWT called Dual–Tree complex wavelet Transform which is shift-invariant was proposed by N.G. Kingsbury has been used for image fusion [10]. DWT can provide only limited directional information, i.e. horizontal, vertical and diagonal information. It cannot capture the smoothness along the contours of the images and hence often causes artefacts along the edges. Next advanced MSD transforms like curvelet [1113], ripplet [14, 15], bandelet transform [16], shearlet transform [17], and contourlet transform [18, 19] which can provide more directional information have been used for image fusion. But these transforms lack shift-invariance and causes pseudo-Gibbs phenomena around the singularities. Shift-invariant version of Contourlet transform called nonsubsampled contourlet transform (NSCT) which is proposed by da Cunha et al. [20] has been used for image fusion in different applications. NSCT gives better performance for medical image fusion due to flexible multiscale, multidirection, and shift-invariant image decomposition [21, 22]. In the proposed method, NSCT is used for multiscale decomposition of source images.

Other important factor which influences the fusion quality is the fusion rule used for combining the coefficients of different sub-bands. Since low- and high-frequency sub-bands carries different information of source images, different fusion rules are used for combing LF sub-band and HF sub-bands. LF sub-band is smoothed version of original image and it represents the outline of the image. HF sub-bands represents details like edges and contours of original image. The basic fusion rule used for LF sub-band is averaging the source image sub-bands. It has a serious drawback, i.e. reduction in contrast and hence possible cancellation of few patterns in source images. Most commonly used fusion rule for high-frequency sub-band is selecting the source image coefficient having absolute maximum value. This scheme is sensitive to noise. These simple schemes which combine the coefficients based on single coefficient value may not retain the important information present in the source images because image features sensitive to human visual system are not completely defined by single pixel or coefficient. Hence window- or region-based activity level measurement at each coefficient is done and this information is used for coefficient combination. The coefficient combinations schemes used are (1) choose a maximum scheme in which source image coefficient having maximum activity level measurement is selected at each location, and (2) weighted averaging in which coefficient weights are calculated based on its activity level measurement value, and (3) a hybrid scheme which includes both of the above two schemes based on match measure value at that location. Comprehensive overview of possible fusion rules are given in references [23, 24].

Statistical parameters and texture features are used as activity level measurement parameters. For low-frequency coefficient fusion, features like energy [22], visibility [25], weighted energy, entropy [26], and spatial frequency [27], etc. are used as activity measurements by researchers. For high-frequency coefficient fusion features like contrast [28], gradient, variance [25], sum-modified laplacian (SML) [22], energy of gradient, and energy of laplacian, etc. are used as activity level measurement parameters by researchers. In the proposed method, new activity level measurement parameter is used for low-frequency sub-band fusion. Weighted sum of modified laplacian is used for high-frequency sub-band fusion.

The remaining paper is organised as follows: “Nonsubsampled Contourlet Transform” gives brief overview of NSCT, “Proposed Fusion Scheme” describes the proposed fusion rule, “Experimental Results and Comparative Analysis” presents the experimental results and comparative analysis and finally “Conclusions” gives the conclusions.

Nonsubsampled Contourlet Transform

NSCT is a shift-invariant, multiscale, and multidirection image decomposition transform. Its design is based on nonsubsampled pyramid structure (NSP) and nonsubsampled directional filter bank (NSDFB). NSP ensures the multiscale feature and NSDFB ensures multi direction feature of NSCT. Shift-invariance of NSCT is obtained by eliminating upsamplers and downsamplers both in NSP and NSDFB. The NSCT is constructed by combining the NSP and the NSDFB as shown in Fig. 2a.

Fig. 2
figure 2

Nonsubsampled contourlet transform. a NSFB structure that implements the NSCT. b Two-channel NSP filter bank. c Two-channel NSDFB. d Four-channel analysis NSDFB structure. e Frequency portioning obtained with NSCT

Nonsubsampled Pyramid

The basic building block of NSP is a two-channel filter bank without upsamplers and downsamplers and its ideal frequency response is as shown in Fig. 2b. H k (Z)(k = 0,1) are the first-stage analysis filters and G k (Z)(k = 0,1) are the synthesis filters. Each stage of NSP produces one low-pass filtered image (y 0) and one bandpass-filtered image (y 1). To get the subsequent stages of decomposition, low-frequency sub-band is filtered iteratively. The two-stage decomposition structure of NSP is shown in Fig. 2a. The filters for subsequent stages are obtained by upsampling the filters of the previous stage. This gives the multiscale property without the need for additional filter design. First-stage low-pass and bandpass filters are denoted as H 0(Z) and H 1(Z) respectively and second stage low-pass and bandpass filters are H 0(Z 2) and H 1(Z 2), respectively.

Nonsubsampled Directional Filter Bank

The bandpass images from NSP structure are fed to NSDFB for directional decomposition. The basic building block of NSDFB is a two-channel fan filter bank and its ideal frequency response is as shown in Fig. 2c. U k (Z) (k = 0,1) are the analysis filters and V k (Z) (k = 0,1) are the synthesis filters. To get more directional decomposition, this two-channel filter bank is iterated like tree structure after upsampling all filters by a quincunx matrix given by

$$ Q=\left[\begin{array}{cc}\hfill 1\hfill & \hfill 1\hfill \\ {}\hfill 1\hfill & \hfill -1\hfill \end{array}\right] $$
(1)

The second stage synthesis filters are denoted as U k (Z Q) (k = 0,1) and these have checker-board frequency support. The two-stage analysis NSDFB structure which gives four directional sub-bands (y k , k = 0,1,2,3) is shown in Fig. 2d. The resulting structure divides the 2D frequency plane into directional wedges. The L stage NSDFB produces 2L directional sub-bands. The NSCT is flexible in allowing any number of directions in each scale. The frequency partitioning with eight and four directional sub-bands decomposition in scales 1 and 2 is shown in Fig. 2e in which w 1 and w 2 represents the frequency in two dimensions.

Proposed Fusion Scheme

Different steps involved in the proposed image fusion method are represented in the Fig. 3. The first step is to decompose the source image into different resolutions and different directions by using NSCT. As more number of decomposition levels introduces artefacts into the fused image, source images are decomposed into two levels with eight and four directional sub-bands in the first and second decomposition level respectively. Then low-frequency sub-band and directional sub-bands (high-frequency sub-bands) are combined by using different fusion rules as discussed in the following subsections. Finall,y fused image is obtained by taking the inverse NSCT of the composite fused MSD coefficients.

Fig. 3
figure 3

Block diagram of the proposed image fusion method

Low-Frequency Sub-Band Fusion

Low-frequency sub-band is smoothed version of original image. It represents the outline of the image. As the number of decomposition levels are restricted to two in this work, most of the signal energy and few details of the original image still present in the low-frequency sub-band of the image. Hence, it is important fuse the low-frequency sub-band in such a way to retain both the detailed information as well as approximate information present in it. In the proposed method, activity measure used for low-frequency sub-band fusion is entropy of square of the coefficients within a 3 × 3 window. It is given by the following equation.

$$ {a}_L^A\left(m,n\right)={\displaystyle \sum_{i=-1}^{i=1}}{\displaystyle \sum_{j=-1}^{j=1}}{C_L^A}^2\left(m+i,n+j\right) \log \left({C_L^A}^2\left(m+i,n+j\right)\right)/9 $$
(2)

Where C L A (m, n) is the low-frequency sub-band coefficient of source image A at location (m, n). Similarly for the source image B, activity of low-frequency coefficient C L B(m,n) at location (m, n) is given by

$$ {a}_L^B\left(m,n\right)={\displaystyle \sum_{i=-1}^{i=1}}{\displaystyle \sum_{j=-1}^{j=1}}{C_L^B}^2\left(m+i,n+j\right) \log \left({C_L^B}^2\left(m+i,n+j\right)\right)/9 $$
(3)

Initial fusion decision map is obtained by choose max combination scheme, i.e. selecting the coefficient having the maximum activity measure as follows:

$$ {d}_i\left(m,n\right)=\left\{\begin{array}{c}\hfill 1\kern1.25em if{a}_L^A\left(m,n\right)\ge {a}_L^B\left(m,n\right)\hfill \\ {}\hfill 0\kern1.25em if{a}_L^A\left(m,n\right)<{a}_L^B\left(m,n\right)\hfill \end{array}\right. $$
(4)

Which implies that if d i (m,n) = 1 then image A coefficient is to be selected at (m, n) location and if d i (m,n) = 0 then image B coefficient is to be selected at (m, n) location. Then final fusion decision map (d f ) is obtained through consistency verification in a 3 × 3 window by using majority filtering operation. That is, in each 3 × 3 window, if more number of coefficients are from image A, whereas centre coefficient is from B, then centre coefficient is also made to come from image A. Otherwise it is kept as it is and vice versa. This verification is done at each coefficient. This is to make the neighbouring coefficients in the composite MSD belong to the same source image in order to overcome the effect due to noise and guarantee the homogeneity of the fused image. Then, fused low-frequency sub-band coefficients (C L F(m,n)) are calculated by using final fusion decision map (d f ) as follows.

$$ {C}_L^F\left(m,n\right)=\left\{\begin{array}{c}\hfill {C}_L^A\left(m,n\right)\kern1em if{d}_f\left(m,n\right)=1\ \hfill \\ {}\hfill {C}_L^B\left(m,n\right)\kern0.75em if\ {d}_f\left(m,n\right)=0\hfill \end{array}\right. $$
(5)

High-Frequency Sub-Band Fusion Rule

High-frequency sub-bands represent the detailed component of the source images such as edges, contours, and object boundaries. The most commonly used fusion rule for high-frequency sub-band is selecting the coefficient having absolute maximum value. But this scheme is sensitive to noise and also there is possibility to lose some important information as the coefficient selection is based on single coefficient value without considering neighbouring coefficients. Another scheme used is coefficient selection based on activity level measurement value. In the proposed method, weighted sum-modified laplacian (WSML) is used as activity level measurement parameter for high-frequency sub-band coefficients. The complete expression for WSML is as follows:

Modified Laplacian of f(x,y) is

$$ {\mathrm{ML}}_f\left(x,y\right)=\left|2f\left(x,y\right)-f\left(x-1,y\right)-f\left(x+1,y\right)\right|+|2f\left(x,y\right)-f\left(x,y-1\right)-f\left(x,y+1\right)\Big| $$
(6)

WSML of f(x,y) is

$$ \mathrm{WSML}\left[f\left(x,y\right)\right]={\displaystyle \sum_{i=-1}^1}{\displaystyle \sum_{j=-1}^1}w\left(i+1,j+1\right).{\mathrm{ML}}_f\left(x+i,y+j\right) $$
(7)

where w is the weight matrix. In the proposed method, city block distance weight matrix is used. That is

$$ w=\frac{1}{16}\left[\begin{array}{ccc}\hfill 1\hfill & \hfill 2\hfill & \hfill 1\hfill \\ {}\hfill 2\hfill & \hfill 4\hfill & \hfill 2\hfill \\ {}\hfill 1\hfill & \hfill 2\hfill & \hfill 1\hfill \end{array}\right] $$
(8)

WSML is calculated at each high-frequency sub-band coefficient of image A and B as their activity measure.

$$ {a}_{d,k}^A\left(m,n\right)=\mathrm{WSML}\left[{C}_{d,k}^A\left(m,n\right)\right] $$
(9)

Where C d,k A(m,n) is dth level, Kth directional sub-band coefficient of image A at location (m,n).

$$ {a}_{d,k}^B\left(m,n\right)=\mathrm{WSML}\left[{C}_{d,k}^B\left(m,n\right)\right] $$
(10)

where C d,k B(m,n) is dth level, Kth directional sub-band coefficient of image B at location (m,n). Initial fusion decision map is obtained by the chosen maximum activity coefficient scheme.

$$ {d}_i\left(m,n\right)=\left\{\begin{array}{c}\hfill 1\kern1.25em \mathrm{if}\ {a}_{d,k}^A\left(m,n\right)\ge {a}_{d,k}^B\left(m,n\right)\ \hfill \\ {}\hfill 0\kern1.25em \mathrm{if}\ {a}_{d,k}^A\left(m,n\right)<{a}_{d,k}^B\left(m,n\right)\hfill \end{array}\right. $$
(11)

Then final fusion decision map (d f ) is obtained through consistency verification as discussed in low-frequency fusion rule. Fused high-frequency sub-bands coefficients are calculated from the final fusion decision map.

$$ {C}_{d,k}^F\left(m,n\right)=\left\{\begin{array}{c}\hfill {C}_{d,k}^A\left(m,n\right)\kern1em \mathrm{if}\ {d}_f\left(m,n\right)=1\ \hfill \\ {}\hfill {C}_{d,k}^B\left(m,n\right)\kern0.75em \mathrm{if}\ {d}_f\left(m,n\right)=0\hfill \end{array}\right. $$
(12)

Experimental Results and Comparative Analysis

The proposed image fusion method has been tested on different cases of CT and MR images. Dataset-1 consists of one pair of CT and MRI brain images (shown in Fig. 4a, b) which are collected from www.imagefusion.org. The dataset-2 consists of nine pairs of CT and MR brain images corresponding to various pathologies (shown in Fig. 5a, b columns). These images are collected from Harvard university site (http://www.med.harvard.edu/AANLIB/home.html). The proposed image fusion method is compared with other image fusion methods like (1) Pixel averaging method (Pixe_ avg), (2) DWT, (3) CT, and (4) NSCT domain image fusion methods with basic fusion rule, i.e. averaging the low-frequency sub-band and selecting the absolute maximum for high-frequency sub-bands ((2) DWT_avg_max, (3) CT_avg_max, and (4) NSCT_avg_max) [21]. Experimental results are shown in Figs. 4 and 5.

Fig. 4
figure 4

Comparison of different image fusion methods using brain images (Dataset-1). (a) CT image (b) MR image (c) fused image by Pixel_avg (d) fused image by DWT_avg_max (e) fused image by CT_avg_max (f) fused image by NSCT_avg_max (g) fused image by the proposed method

Fig. 5
figure 5

Comparison of different image fusion methods using brain images of Dataset-2. Columns (a) CT images (b) MR images (c) fused images by Pixel_ avg (d) fused images by DWT_avg_max (e) fused images by CT_avg_max (f) fused images by NSCT_avg_max (g) fused images by the proposed method

Visual analysis of experimental results reveals that proposed method is retaining both clear bony structure of CT image and soft tissue details of the MR image with good contrast and without introducing any artefacts into the fused image.

Quantitative evaluation of proposed method is done with well-defined image fusion quality metrics like (1) Information Entropy (IE), (2) Overall Cross entropy (OCE) [19], (3) Spatial Frequency (SF), (4) Ratio of spatial frequency error (RSFE) [29], (5) Mutual Information (MI) [30], (6) Cross Correlation coefficient (CC) [19], (7) Xydias and Petrovic metric (Q AB/F)[31], (8) Universal Image Quality Index (UIQI) based metrics (a) Q, (b) Q W, and (c) Q E [32].

  1. 1.

    IE: Information entropy measures the amount of information present in an image. An image with high information content will have high entropy. Based on the principle of Shannon information theory, the IE of an image is given by the formula.

    $$ \mathrm{IE}=-{\displaystyle \sum_{i=0}^{L-1}}{P}_F(i){ \log}_2{P}_F(i) $$
    (13)

    Where P F (i) the ratio of the number of the pixels with gray value is equal to i over the total number of pixels in the fused image and L is the maximum gray value of the fused image. It is set to 256 in our case. Larger entropy value implies better fusion quality.

  2. 2.

    Overall cross entropy (OCE): Cross entropy measures the difference between two source images and the fused image. It is given by the formula:

    $$ \mathrm{OCE}\left({I}_A,{I}_B;{I}_F\right)=\frac{\mathrm{CE}\left({I}_A,{I}_F\right)+\mathrm{CE}\left({I}_B,{I}_F\right)}{2} $$
    (14)

    Where I A, I B are source images and I F is the fused image.

    $$ \mathrm{CE}\left({I}_A,{I}_F\right)={\displaystyle \sum_{i=0}^{L-1}}{P}_A(i){ \log}_2\left|\frac{P_A(i)}{P_F(i)}\right| $$
    (15)
    $$ \mathrm{CE}\left({I}_B,{I}_F\right)={\displaystyle \sum_{i=0}^{L-1}}{P}_B(i){ \log}_2\left|\frac{P_B(i)}{P_F(i)}\right| $$
    (16)

    Small value of OCE corresponds to good fusion quality.

  3. 3.

    SF: Spatial frequency reflects the activity level and clarity of an image. It is defined as follows:

    $$ \mathrm{SF}=\sqrt{{\mathrm{RF}}^2+{\mathrm{CF}}^2} $$
    (17)

    Where RF is row frequency:

    $$ \mathrm{RF}=\sqrt{\frac{1}{\mathrm{MN}}{\displaystyle \sum_{i=1}^M}{\displaystyle \sum_{j=2}^N}{\left[{I}_F\left(i,j\right)-{I}_F\left(i,j-1\right)\right]}^2} $$
    (18)

    and CF is column frequency:

    $$ \mathrm{CF}=\sqrt{\frac{1}{ MN}{\displaystyle \sum_{j=1}^N}{\displaystyle \sum_{i=2}^M}{\left[{I}_F\left(i,j\right)-{I}_F\left(i-1,j\right)\right]}^2} $$
    (19)

    Larger spatial frequency value denotes better the fusion quality.

  4. 4.

    RSFE: Spatial frequency error gives the difference between activity of fused image and ideal fused reference image. It is given by the following formula:

    $$ \mathrm{RSFE}=\frac{\left({\mathrm{SF}}_F-{\mathrm{SF}}_R\right)}{{\mathrm{SF}}_R} $$
    (20)
    $$ S{F}_F=\sqrt{R{F}^2+C{F}^2+ MD{F}^2+ SD{F}^2} $$
    (21)

    RF is row frequency and CF is column frequency is as defined above. MDF and SDF are main diagonal and secondary diagonal frequencies which are calculated as follows. All these are basically first order gradients along the four directions.

    $$ \mathrm{MDF}=\sqrt{w_d\frac{1}{ MN}{\displaystyle \sum_{i=2}^M}{\displaystyle \sum_{j=2}^N}{\left[{I}_F\left(i,j\right)-{I}_F\left(i-1,j-1\right)\right]}^2} $$
    (22)
    $$ \mathrm{SDF}=\sqrt{w_d\frac{1}{ MN}{\displaystyle \sum_{j=1}^{N-1}}{\displaystyle \sum_{i=2}^M}{\left[{I}_F\left(i,j\right)-{I}_F\left(i-1,j+1\right)\right]}^2} $$
    (23)

    where \( {w}_d=\frac{1}{\sqrt{2}} \)

    SF R is the reference spatial frequency calculated by taking maximum gradients of input images along four directions.

    $$ {\mathrm{Grad}}^D\left({I}_R\left(i,j\right)\right)= max\left\{\mathrm{abs}\left[{\mathrm{Grad}}^D\left({I}_A\left(i,j\right)\right)\right],\mathrm{abs}\left[{\mathrm{Grad}}^D\left({I}_{\boldsymbol{B}}\left(i,j\right)\right)\right]\right\} $$
    (24)

    for each of four directions, i.e. D = {H, V, MD, SD}.

    An ideal fusion has RSFE equal to zero. Smaller RSFE absolute value corresponds to better fusion quality. Furthermore, RSFE > 0 means that distortion or noise is introduced into the fused image and RSFE < 0 denotes that some meaningful information is lost in the fused image.

  5. 5.

    MI: Mutual information is the amount of information that one image contains about another. Considering two source images A, B, and fused image F, the amount of information that F contains about A and B can be calculated as

    $$ {I}_{FA}\left(f,a\right)={\displaystyle \sum_{f,a}}{P}_{FA}\left(f,a\right) \log \frac{P_{FA}\left(f,a\right)}{P_F(f){P}_A(a)} $$
    (25)
    $$ {I}_{FB}\left(f,b\right)={\displaystyle \sum_{f,b}}{P}_{FB}\left(f,b\right) \log \frac{P_{FB}\left(f,b\right)}{P_F(f){P}_B(b)} $$
    (26)

    Thus the image fusion performance measure Mutual Information (MI) can be defined as

    $$ \mathrm{MI}\left(F,A,B\right)={I}_{FA}\left(f,a\right)+{I}_{FB}\left(f,b\right) $$
    (27)

    The larger the MI value, the better the image fusion quality.

  6. 6.

    CC: Correlation coefficient can show similarity in the small structures between the input image and the fused image. A higher value of correlation means that more information is preserved. CC between input image I A and the fused image I F is given by the following equation:

    $$ CC\left({I}_F,{I}_A\right)=\frac{{\displaystyle {\sum}_{i=0}^{M-1}}{\displaystyle {\sum}_{j=0}^{N-1}}\left({I}_F\left(i,j\right)-{\overline{I}}_F\right)\left({I}_A\left(i,j\right)-{\overline{I}}_A\right)}{\sqrt{{\displaystyle {\sum}_{i=0}^{M-1}}{\displaystyle {\sum}_{j=0}^{N-1}}{\left({I}_F\left(i,j\right)-{\overline{I}}_F\right)}^2{\displaystyle {\sum}_{i=0}^{M-1}}{\displaystyle {\sum}_{j=0}^{N-1}}{\left({I}_A\left(i,j\right)-{\overline{I}}_A\right)}^2}} $$
    (28)

    where \( {\overline{I}}_F \) and \( {{\overline{I}}_F}_A \) are the mean values of the corresponding images. Similarly, correlation coefficient between image I B and the fused image I F can be calculated.

  7. 7.

    Xydeas and Petrovic Metric (Q AB/F): Xydeas and Petrovic, proposed an objective performance metric, which measures the relative amount of edge information that is transferred from the source images A and B into the fused image F. This method uses Sobel edge detector to calculate the edge strength g(n,m) and orientation α(n,m) information for each pixel p(n,m). Thus for an input image A:

    $$ {g}_A\left(n,m\right)=\sqrt{S_A^x{\left(n,m\right)}^2+{S}_A^y{\left(n,m\right)}^2} $$
    (29)
    $$ {\propto}_A\left(n,m\right)={ \tan}^{-1}\left(\frac{S_A^y\left(n,m\right)}{S_A^x\left(n,m\right)}\right) $$
    (30)

    Where S A x(n,m) and S A y(n,m) are the output of the horizontal and vertical Sobel templates centred on pixel p A(n,m) and convolved with the corresponding pixels of image A. The relative strength and orientation values of G AF(n,m) and A AF(n,m) of an input image A with respect to F are formed as

    $$ {G}^{AF}\left(n,m\right)=\left\{\begin{array}{l}\frac{g_F\left(n,m\right)}{g_A\left(n,m\right)},\kern0.5em if\ {g}_A\left(n,m\right)>{g}_F\left(n,m\right)\hfill \\ {}\begin{array}{cc}\hfill \frac{g_A\left(n,m\right)}{g_F\left(n,m\right)},\hfill & \hfill \mathrm{otherwise}\hfill \end{array}\hfill \end{array}\right. $$
    (31)
    $$ {A}^{\mathrm{AF}}\left(n,m\right)=\frac{\left|\left|{\propto}_A\left(n,m\right)-{\propto}_F\left(n,m\right)\right|-\pi /2\right|}{\pi /2} $$
    (32)

    The edge strength and orientation preservation values are

    $$ {Q}_g^{\mathrm{AF}}\left(n,m\right)=\frac{\varGamma_g}{1+{e}^{K_g\left({G}^{AF}\left(n,m\right)-{\sigma}_g\right)}} $$
    (33)
    $$ {Q}_{\propto}^{AF}\left(n,m\right)=\frac{\varGamma_{\propto }}{1+{e}^{K_{\propto}\left({A}^{AF}\left(n,m\right)-{\sigma}_{\propto}\right)}} $$
    (34)

    Edge information preservation values are then defined as

    $$ {Q}^{\mathrm{AF}}\left(n,m\right)={Q}_g^{\mathrm{AF}}\left(n,m\right){Q}_{\propto}^{\mathrm{AF}}\left(n,m\right) $$
    (35)

    Then fusion performance metric Q AB/F is obtained as follows:

    $$ {Q}^{\mathrm{AB}/F}=\frac{{\displaystyle {\sum}_{n=1}^N}{\displaystyle {\sum}_{m=1}^M}{Q}^{\mathrm{AF}}\left(n,m\right){w}^A\left(n,m\right)+{Q}^{\mathrm{BF}}\left(n,m\right){w}^B\left(n,m\right)}{{\displaystyle {\sum}_{n=1}^N}{\displaystyle {\sum}_{m=1}^M}\left({w}^A\left(n,m\right)+{w}^B\left(n,m\right)\right)} $$
    (36)

    Where w A(n,m) = [g A (n,m)]L and w B(n,m) = [g B (n,m)]L are the weights and L is a constant. The range of Q AB/F is [0 1]. A value of 0 corresponds to the complete loss of edge information, as transferred from A and B into F. A value 1 indicates fusion of A and B to F is with no loss of information.

  8. 8.

    UIQI-based metrics (Q, Q W, Q E ): Gemma Piella proposed three image fusion quality metrics based on UIQI which was proposed by Wang and Bovik [33]. UIQI quantifies the structural distortion between two images in a local region as:

    $$ {Q}_0\left(A,F/w\right)=\frac{4{\sigma}_{af}\overline{a}\overline{f}}{\left({\overline{a}}^2+{\overline{f}}^2\right)\left({\sigma}_a^2+{\sigma}_f^2\right)} $$
    (37)

    Where σ a , σ f are the variance of image A, F and \( \overline{a},\overline{f} \) are the mean of image A, F.

    Image fusion quality metrics based on UIQI are given by

    $$ Q\left(A,B,F\right)=\frac{1}{\left|W\right|}{\displaystyle \sum_{w\in W}}\left({\lambda}_a(w){Q}_0\left(a,f/w\right)+{\lambda}_b(w){Q}_0\left(b,f/w\right)\right) $$
    (38)
    $$ {\lambda}_a(w)=\frac{s\left(a/w\right)}{s\left(a/w\right)+s\left(b/w\right)},{\lambda}_b(w)=1-{\lambda}_a(w) $$
    (39)

    S(a/w) and S(b/w) are local saliency of image A and B respectively in the window w. In this paper variance is considered as local saliency.

    $$ {Q}_w\left(A,B,F\right)={\displaystyle \sum_{w\in W}}c(w)\left({\lambda}_a(w){Q}_0\left(a,f/w\right)+{\lambda}_b(w){Q}_0\left(b,f/w\right)\right) $$
    (40)

    Where \( c(w)=\boldsymbol{C}(w)/{\displaystyle \sum_{w^{\prime}\in W}}\boldsymbol{C}\left(w^{\prime}\right) \) and C(w) = max(s(a/w), s(b/w))

    $$ c){Q}_E\left(A,B,F\right)={Q}_w{\left(A,B,F\right)}^{1-\alpha }{Q}_w{\left(A\prime, B\prime, F\prime \right)}^{\alpha },\alpha \in \left[0\kern0.5em 1\right] $$
    (41)

    where A ′, B ′, and F ′ are edge images of A, B, and F, respectively. All the above three image fusion quality measures (Q, Q W , Q E ) have a dynamic range of [−1 1]. The closer the value to 1, higher is the quality of the fused image.

    Tables 1 and 2 show the quantitative evaluation of image fusion methods for Dataset-1 and Dataset-2, respectively. The average value of each quality metric for all images of Dataset-2 is given in Table 2. The best value of each quality metric is highlighted both in Tables 1 and 2. The proposed method gives the highest value of mutual information, which implies that amount information transferred from source images to the fused images is the maximum in the proposed method. The maximum value for SF and minimum value for absolute value of RSFE quality metrics were given by the proposed method which means that detail information present the source images is better preserved in proposed method than the other methods. The proposed method gives the highest value for Q AB/F compared to the other methods which implies that amount of edge information transferred from source images to the fused image is more in the proposed method. The proposed method also shows the best performance with respect to universal image quality index-based metrics, which implies that structural distortion between source images and fused image is less in the proposed method than other methods. The proposed method gives a comparatively high value for information entropy quality metric. This implies that the amount of information present in the fused image given by proposed method is comparatively high. The minimum overall cross entropy is given by the proposed method in the case of Dataset-1 whereas pixel averaging method is given by the minimum overall cross entropy value for Dataset-2. Correlation coefficient value between two source images and fused images is comparatively high in case of the proposed method, which implies that similarity of fused image with both the source images is comparatively good in the case of the proposed method.

    Table 1 Quantitative comparison of five image fusion methods with Dataset-1 image pair (shown in Fig. 4a, b)
    Table 2 Quantitative comparison of five image fusion methods with Data set-2 image pairs (shown in Fig. 5a and b columns)

    Further, proposed method is compared with two other methods, (1) energy- and contrast-based fusion rule in Contourlet Transform domain (CT-Energy-contrast) [19] and (II) energy- and SML-based fusion rule in Contourlet Transform domain (CT-Energy-SML) [22] by using Dataset-1. Comparison with CT-Energy-contrast method is done with respect to entropy (EN), overall cross entropy (OCE), spatial frequency (SF), and cross correlation coefficient (CC(A,F), CC(B,F)) quality metrics as followed in [19]. Table 3 shows the comparison of this method with the proposed method. The proposed method gives high values for EN, SF, and CC compared to CT-Energy-contrast method. This implies that more details are preserved in the proposed method. Low value of OCE is given by the proposed method compared to CT-Energy-contrast method indicates that similarity of fused image with source images is more. Hence, the performance of the proposed method is better than CT-Energy-contrast method. Universal Image Quality Index-based metrics are used for comparison of proposed method with CT-Energy-SML method as followed in [22]. These results are shown in Table 4. The proposed method gives slightly better values for Q and Q E than the CT-Energy-SML method. Thus, structural distortion is less and the amount of edge information transferred is more in the proposed method. Hence, the performance of proposed method is comparable or slightly better than energy and SML-based fusion rule.

    Table 3 Quantitative comparison of proposed method with energy- and contrast-based fusion rule in contourlet transform domain [19] by using Dataset-1
    Table 4 Quantitative comparison of proposed method with Energy and SML-based fusion rule in Contourlet transform domain [22] by using Dataset-1

    The quantitative evaluation results show the superiority of the proposed method compared to other methods and also these are consistent with the visual analysis results. Hence, the proposed fusion rule in NSCT domain is suitable for CT and MR image fusion.

Conclusions

An efficient CT and MR image fusion scheme in NSCT domain is proposed. A novel window based activity level measurement parameters are used for low- and high-frequency sub-bands fusion. Proposed method is compared with spatial domain averaging method, Discrete Wavelet Transform-based method, Contourlet Transform-based method, NSCT-based method with basic fusion rule, and also with Contourlet Transform-based methods with two different fusion rules. Proposed method has been tested on CT and MR brain images of different cases. Quantitative evaluation results demonstrate that the proposed method is superior to several existing methods compared in this paper. Visual analysis of experimental results reveal that proposed method is retaining the bony structure details present in the CT image and soft tissue details present in the MR image with good contrast.

There is further scope to improve the proposed method by pre-processing the CT image and then fusing the resulting image with the MR image. The proposed method can be extended to the fusion of anatomical and functional medical images which are usually represented as gray scale and colour images, respectively.