Introduction

In the medical field, radiologists need medical images with high resolution and information to diagnose diseases. Since computer-aided imaging techniques provide a quantitative assessment of the images under evaluation, it helps to improve the efficacy of radiologists in arriving at an objective decision in a short span of time [1, 2]. There are several popular imaging modalities used in the medical field such as computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET). CT gives details about bone structures but does not give information about the soft tissues, whereas MRI gives information about soft tissues.

Thus, the integration of CT and MRI images gives more details about both bone structures and tissues with higher accuracy and reliability by removing redundant information. Therefore, studying how complementary information from different modalities can be combined to obtain more effective particulars through image fusion has important significance for clinical use [1]. During the past two decades, many fusion algorithms have been developed and generally these algorithms are categorized into two domains, i.e., spatial and transform. Spatial domain directly operates on the pixel values of the source images, whereas in the transform domain, the images are projected into localized bases providing significant information [3,4,5,6,7]. One of the spatial fusion methods is principal component analysis (PCA) which improves the resolution as well as reduces the redundancy of the image and transforms correlated variables into uncorrelated variables [6]. Nandi et al. [8] have explained the effect of applying PCA in the fusion of biomedical images. However, the spatial domain fusion techniques have produced spatial and edge distortions in the fused image [9].

Discrete wavelet transform (DWT) is one of the most commonly used image fusion methods furnishing increased directional information with three spatial orientations [10, 11]. Wei et al. [12] proposed a technique which provides 3-D fiber architecture properties of the human heart using wavelet-based image fusion. Prakash et al. [13] have suggested the use of biorthogonal wavelet transform-based image fusion in the presence of noise. However, the real-valued wavelet transform suffers from shift sensitivity and the lack of phase information [5, 14]. This fractional time-shift may introduce significant differences in the energy of the wavelet coefficients, which can be overcome by the introduction of stationary wavelet transform (SWT) or redundant wavelet transform (RWT) or undecimated wavelet transform (UDWT) which detects the curved shapes more precisely than DWT [15]. But UDWT suffers from the lack of directionality [15, 16].Therefore, UDWT and PCA are combined in order to increase the contrast and morphological details of an image [16]. Harpreet and Rachna [17] have presented a combined DWT- and PCA-based image fusion approach for neuro-images from different modalities. But the edges are not preserved [18].

Furthermore, the real-valued wavelet transform does not provide any details related to amplitude and local behavior of the function, while the problem of lack of directionality also remains unsolved [19, 20]. These problems have been overcome by using a special wavelet transform with shift invariance property and phase information called dual-tree complex wavelet transform (DTCWT). This captures additional edge and structural information of the image [21,22,23]. The high directionality and shift invariant properties of DTCWT make it suitable for image fusion.

In this paper, a contemporary fusion technique based on the cascade of two different shift invariant wavelet transforms (UDWT and DTCWT) and PCA has been introduced. The combined effect of shift invariant time domain features and distinguishable spatial domain features provides more visual information with fewer artifacts. The rest of the paper is organized as follows. “Proposed cascaded image fusion framework” and “Image fusion algorithm” section discuss the proposed cascaded image fusion framework and algorithm, respectively. “Results and discussion” section investigates the experiments/results and discussions followed by “Conclusion” section.

Proposed cascaded image fusion framework

Intensity-based image registration

Image registration is a prerequisite step to align medical images obtained from different modalities. The input images might be of different coordinate systems and have to be aligned properly for efficient fusion. The main goal of image registration is to find the optimal transformation that best aligns the structures of interest in the input images [24]. In our proposed fusion method, intensity-based registration method is used. This registration method directly operates on image pixel or voxel values. The basic principle of this method is to search maximum similarity measures between fixed (CT) and moving (MRI) images within a certain space of transformation [25, 26].

Fig. 1
figure 1

Block diagram of proposed image fusion methodology

Figures 3 and 4 show registration of MRI and CT using their intensity values. The intensity values of bones are higher in CT images and lower in MRI images. Hence, hard tissues (bones) are more visible in CT image and less visible in MRI images. There will be gray level variations for two pixel classes: lesion and normal tissue. Based on that, lesion areas can be accurately mapped during registration of MRI and CT images. For preserving structural details, atlas construction is performed in spatial temporal wavelet domain [27, 28]. The resultant registered images are shown in Figs. 3c and 4c for two set of brain images. MRI image is represented in magenta, while CT image is represented in green.

Cascaded PCA and shift invariant wavelet fusion

The next step after image registration is image fusion. The schematic representation of the generic image fusion framework is shown in Fig. 1. The registered source images are first decomposed into low-frequency and high-frequency sub-bands in different scales using UDWT, which provide details in three directions for each scale. The detail and approximation coefficients are extracted from low- and high-frequency bands, respectively. The spatial features are obtained by applying PCA which also minimizes the dimension of the data, thereby reducing the redundancy in both the input images. In the next step, the resultant fused image obtained (i.e., images A and B) is again decomposed using a complex wavelet transform known as DTCWT yielding real and imaginary parts of the image in the complex wavelet domain. The resulting components are fused based on a specific fusion rule. Finally, a novel fused image is obtained by taking inverse DTCWT. The steps involved in this proposed methodology are explained in the subsequent sections.

Image fusion algorithm

The proposed algorithm for the image fusion framework consists of

Module I: Wavelet-based PCA fusion algorithm

In this module, linear transformations based on eigenvalue decomposition (EVD) are used to map the data from a high-dimensional to a low-dimensional space, thereby projecting the features from the original domain to a PCA domain, thus reducing the redundancy and improving the image enhancement.

Step 1 Find the undecimated wavelet coefficients of the CT and MRI images of the following equations:

$$\begin{aligned}&\hbox {Approximation Coefficients:}\;\;\varphi ^{1}(x,y)=\phi (x)\varphi (y) \end{aligned}$$
(1)
$$\begin{aligned}&\hbox {Vertical Coefficients:}\;\;\varphi ^{2}(x,y)=\varphi (x)\phi (y) \end{aligned}$$
(2)
$$\begin{aligned}&\hbox {Horizontal Coefficients:}\;\;\varphi ^{3}(x,y)=\varphi (x)\varphi (y) \end{aligned}$$
(3)
$$\begin{aligned}&\hbox {Diagonal Coefficients:}\;\;\varphi ^{4}(x,y)=\phi (x)\phi (y) \end{aligned}$$
(4)

where \(\phi (x)\) is the scaling function and \(\varphi \) (x) is the wavelet function.

Step 2 Represent the wavelet coefficients of the image in terms of column vector \(A=[a_1 ,a_2 ,\ldots \ldots .a_i ]\) where \(a_i \) represents ‘n’ features.

Step 3 Compute the covariance matrix using these vectors,

$$\begin{aligned} C=\hbox {cov}(A)=E\{AA^{T}\} \end{aligned}$$
(5)

Step 4 Determine the eigenvalues and calculate the eigenvector matrix \(E_y \) from the covariance matrix using the characteristic equation

$$\begin{aligned} (\lambda _i -EA)=0 \end{aligned}$$
(6)

Step 5 Select the column vector with the largest eigenvalue corresponding to the principal components. Normalize the column vector which acts as the weight values \(W^{T}\).

Step 6 Perform the multiplication of normalized eigenvalues by each term of the wavelet coefficient matrix, i.e., \(C_v =C_A W^{T}\) where \({C}_{A}\) is real and symmetric covariance matrix, \({C}_{V}\) is the diagonal matrix whose elements along the main diagonal are eigenvalues of \({C}_{A}\).

Step 7 Repeat the above steps for all the approximation and detail coefficients.

Step 8 Find the inverse wavelet transform of the scaled matrices calculated in step 7.

Step 9 Generate the fused image matrix by finding the two scaled matrices obtained in step 8.

Fig. 2
figure 2

Image decomposition using DTCWT

Module II: DTCWT-based fusion algorithm

The conventional DWT produces aliasing due to the shift variant nature because of subsampling at each level. Besides, a small shift in the input signal can cause a very different set of wavelet coefficients. But, the application of DTCWT provides better orientation selectivity over DWT, thus allowing perfect reconstruction of wavelets [22].

A and B represent the different source images derived after applying wavelet-based PCA fusion algorithm to the individual source images. Considering the 2D wavelet \(\varphi (a,b)=\varphi (a)\varphi (b)\) associated with row column implementation of wavelet transform, where \(\varphi (a)\) is a complex wavelet given by

$$\begin{aligned}&\varphi (a)=\varphi _h (a)+j\varphi _g (b) \end{aligned}$$
(7)
$$\begin{aligned}&\varphi (a,b)=[\varphi _h (a)+j\varphi _g (a)][\varphi _h (b)+j\varphi _g (b)] \end{aligned}$$
(8)
$$\begin{aligned}&Re\{\varphi (a,b)\}=\varphi _h (a)\varphi _h (b)-\varphi _g (a)\varphi _g (b) \end{aligned}$$
(9)
$$\begin{aligned}&Im\{\varphi (a,b)\}=\varphi _h (a)\varphi _h (b)+\varphi _g (a)\varphi _g (b) \end{aligned}$$
(10)

The real part of this complex wavelet is obtained as the difference of two separable wavelets and is oriented in \(-\,45^{\circ }\). At every decomposition level of DTCWT, six directional high-frequency wavelet coefficients are generated along with two low-frequency coefficients as shown in Fig 2.

Fig. 3
figure 3

Intensity-based registration of image set 1. a MRI. b CT. c Registered MRI-CT image

Fig. 4
figure 4

Intensity-based registration of image set 2. a MRI. b CT. c Registered MRI-CT image

The two-dimensional DTCWT decomposes a 2D image into different scales. The scaling functions \(\phi _{h}\) (a) and \(\phi _{g}\) (b) are implemented using low-pass filters, and the wavelet functions \(\varphi _{h}\) (a) and \(\varphi _{g}\) (b) are implemented using high-pass filters which form Hilbert transform pairs.

The essential steps involved in DTCWT-based fusion are arranged as follows:

Step 1 j-level decomposition of the images A and B is performed. The six directional high-frequency coefficients \(a_j\) and low-frequency coefficients \({b}_{j}\) are then extracted.

Step 2 The high-frequency coefficients \(a_j \) are fused based on maximum fusion rule

Maximum fusion rule The images are fused by choosing the maximum intensity of corresponding pixels from both the input source images [29].

$$\begin{aligned} F(x,y)=\sum _{x=0}^m {\sum _{y=0}^n } \hbox {Max}(A(x,y)+B(x,y)) \end{aligned}$$
(11)

where A(xy), B(xy) are input images and F(xy) is the fused image and point (xy) is the pixel value.

Step 3 The final image F is reconstructed by taking inverse DTCWT of the derived high- and low-frequency coefficients.

Results and discussion

The input images of the human brain under different modalities (MRI and CT) of size \(256 \times 256\) have been collected from whole brain atlas data distributed by Harvard University and have been fused using cascaded PCA and shift invariant wavelet transforms. Even though MRI, CT, PET are different medical imaging modalities, MRI and CT have been considered for the study. MRI and CT give structural information, whereas PET images give functional information. MRI and CT give a clear picture of both higher-grade and lower-grade tumors (slowly growing tumors), whereas PET images work better for only higher-grade tumors. To evaluate the performance of the proposed fusion approach, two different image sets of the human brain have been considered and our present work is compared with other image fusion methods based on shift variant transforms and different fusion rules. Figure 6a, b represents MRI and CT images, respectively. The light portion of the MRI image provides the soft tissue details, whereas the brighter or white portion of the CT image represents the presence of denser matter or hard tissue. The source images are decomposed based on the fusion algorithm discussed in “Image fusion algorithm” section.

Fig. 5
figure 5

Fused result of UDWT and PCA. a Image set 1. b Image set 2

Fig. 6
figure 6

Comparison of fused images set 1. a MRI. b CT. c DWT (\(E=6.5735\)). d UDWT (\(E=6.7715\)). e UDWT and PCA (\(E=7.0967\)). f DTCWT. g UDWT and DTCWT (\(E=6.8225\)). h Cascaded shift invariant WTs and PCA (\(E=7.1180\))

Fig. 7
figure 7

Comparison of fused images set 2. a MRI(1). b CT(1). c DWT (\(E=\mathbf{4.9694}\)). d UDWT. e UDWT and PCA (\(E=\mathbf{5.4610}\)). f DTCWT. g UDWT and DTCWT (\(E=\mathbf{5.8587}\)). h Cascaded shift invariant WTs and PCA (\(E=\mathbf{5.9433}\))

First the input images are registered using intensity-based registration method, and the results are shown in Figs. 3 and 4. The registered MRI and CT images are decomposed by applying UDWT. Here 2-level decomposition is performed because the first level sub-bands contain edges, but it is difficult to recognize them because of noise, while the second level sub-bands contain more useful information and less noise. The decomposed coefficients are fused using PCA fusion rule, and then, the resulting fused image is obtained by taking inverse UDWT. The fused images obtained using UDWT and PCA are shown in Fig. 5. Figure 6c–h shows the comparative results of visual information in the pixels of the fused image using the proposed method with other fusion methods. The fused image depicted in Fig. 6c shows both soft and hard tissue details with less contrast and contains more artifacts. The edges are also not well preserved. This is due to the lack of shift invariance in the DWT-based fusion. Thus, the introduction of shift invariance in wavelet domain, i.e., UDWT, significantly improves the visual information of the image. Figure 6d shows the fused image based on UDWT in which the quality of the image is high but edge information is not well preserved. The wavelets together with the PCA minimize redundancy and extract more details with a slight improvement in entropy value, i.e., (\(E=7.0967\)) as shown in Fig.  6e. The fused image contained limited directionality with fewer contrast features. The directional features are further improved by the introduction of DTCWT as depicted in Fig. 6f, g. The cascaded combination of PCA and shift invariant wavelet transforms minimizes the redundancy, preserves more information at the edges (i.e., \(E=7.1180\)) as well as provides more directional features with fewer artifacts as shown in Fig. 6h.

The proposed fusion technique is also applied to the other pair of human brain images. MRI(1) and CT(1) are shown in Fig. 7a, b. The fused images are shown in Fig. 7c, h.

It is manifested that the shift invariant-based transforms with PCA provide fine details at the edges with fewer artifacts (i.e., Fig.  7h) and more information, i.e., \(E=5.9433\), whereas Fig. 7c shows less image contrast with less entropy value due to the lack of directionality and shift invariance in DWT. Thus, the fused images give better details when compared to the original CT and MRI images. The quality and information content of the fused image is analyzed by utilizing suitable quality metrics such as entropy, peak signal-to-noise ratio, standard deviation, edge-based similarity measure (\(Q^{AB/F})\), spatial frequency, mean square error, normalized cross-correlation, average difference.

Entropy (E) The entropy can be used to measure the richness of information in an image which is given by the equation

$$\begin{aligned} E=-\,\sum _{i=0}^{L-1} {P_i } \log P_i \end{aligned}$$
(12)

where L is the number of gray levels of an image, and \({P} = \{{P}_{0},{P}_{1}{\ldots }..{P}_{L-1}\}\) is the probability distribution of each level. A higher value of entropy indicates more amount of information in the fused image.

Peak signal-to-noise ratio (PSNR) PSNR gives the relationship between the fused and the reference image.

$$\begin{aligned} \hbox {PSNR}=10\log _{10} \frac{(255)^{2}}{\hbox {MSE}} \end{aligned}$$
(13)

where MSE is the mean square error.

A higher value of PSNR indicates better quality of the fused image.

Standard deviation (SD) Standard deviation is the measure of the contrast of the fused image.

$$\begin{aligned} \sigma =\sqrt{\frac{1}{mxn}\sum _{n=0}^{L-1} {\sum _{m=0}^{L-1} {f(m,n)-\mu } } }, \end{aligned}$$
(14)

where f(mn) is the pixel value of the fused image and \(\mu \) represents mean value. The SD reflects the discrete image gray scale relative to the mean gray scale. If the value of SD is large, the image gray scale distribution of dispersion and the image contrast is greater, thus producing more information.

Edge-based similarity measure (Q \(^{{AB/F}}\)): \({Q}^{{AB/F}}\) measures the amount of edge information correctly transferred from input source images to the fused image [30].

$$\begin{aligned}&Q^{AB/F}\nonumber \\&\quad =\frac{\mathop \sum \nolimits _{n=1}^N \mathop \sum \nolimits _{m=1}^M Q^{AF}\left( {n,m} \right) W^{A}\left( {n,m} \right) +Q^{BF}\left( {n,m} \right) W^{B}\left( {n,m} \right) }{\mathop \sum \nolimits _{i=1}^N \mathop \sum \nolimits _{j=1}^M [W^{A}\left( {i,j} \right) +W^{B}\left( {i,j} \right) ]}\nonumber \\ \end{aligned}$$
(15)

where \({W}^{{A}}\) (nm) and \({W}^{{B}}\) (nm) are weights for edge preservation values \({Q}^{{AF}}\) (nm) and \({Q}^{{BF}}\) (nm), respectively.

The range of \({Q}^{{AB/F}}\) is \(0\le {Q}^{{AB/F}}\le \) 1.

Table 1 Comparison of fusion metrics

A higher value of \({Q}^{{AB/F}}\) implies that fused image has better edge information.

Spatial frequency (S.F) Spatial frequency measures the overall activity in an image. For an image with gray value f(mn) at position (mn), the spatial frequency is defined as

$$\begin{aligned} \hbox {S.F}=\sqrt{\hbox {RF}^{2}+\hbox {CF}^{2}} \end{aligned}$$
(16)

where row frequency

$$\begin{aligned} \hbox {RF}=\sqrt{\frac{1}{\hbox {MN}}\sum _{m=1}^M {\sum _{n=2}^N {[f(m,n)-f(m,n-1)]^{2}} } } \end{aligned}$$
(17)

Column frequency

$$\begin{aligned} \hbox {CF}=\sqrt{\frac{1}{\hbox {MN}}\sum _{n=1}^N {\sum _{m=2}^M {[f(m,n)-f(m-1,n)]^{2}} } } \end{aligned}$$
(18)

The higher the value of spatial frequency, the better the image quality.

Mean square error (MSE) Mean square error between the original image and fused image with a size of (\(m \times n\)) is given as follows:

$$\begin{aligned} \hbox {MSE }=\frac{1}{m\times n} \mathop \sum \limits _{i=1}^m \mathop \sum \limits _{j=1}^n (A_{ij} -B_{ij} )^{2} \end{aligned}$$
(19)

where \({A}_{ij}\) and \({B}_{ij}\) are the image pixel values of the original image and fused image, respectively.

A smaller value of MSE represents better fused result.

Normalized cross-correlation (NCC) Normalized cross-correlation measure is used to show the comparison between fused image and original image.

It is mathematically expressed as follows:

$$\begin{aligned} \hbox {NCC }=\mathop \sum \limits _{i=1}^m \mathop \sum \limits _{j=1}^n (A_{ij} B_{ij} )/A_{ij} ^{2} \end{aligned}$$
(20)

where \({A}_{ij}\) and \({B}_{ij}\) are the image pixel values of the original image and fused image, respectively.

A higher value of NCC represents a better fused result.

Average difference (AD) Average difference gives the average of change concerning the fused and original image.

It is mathematically given as follows:

$$\begin{aligned} \hbox {AD }=\frac{1}{m\times n}\mathop \sum \limits _{i=1}^m \mathop \sum \limits _{j=1}^n [A_{ij} -B_{ij} ] \end{aligned}$$
(21)

A lower value of AD represents a better fused performance.

The objective evaluations of the fused images of the proposed method and the other comparable fusion methods such as DWT, UDWT, UDWT and PCA, UDWT and DTCWT for the medical images (set 1 and set 2) are listed in Table 1. The higher values of entropy, PSNR and standard deviation for the fusion methods listed are highlighted in Table 1. It is observed that classical DWT-based fusion gives the entropy as an average of \(\sim \) 6.5 whereas the cascaded shift invariant wavelet transform with PCA produces the entropy of \(\sim \) 7.1 for image set 1. There is a slight change in PSNR value for a different set of wavelets as listed in Table 1. Moreover, other effective fusion metrics like \({Q}^{{AB/F}}\), spatial frequency, normalized cross-correlation, mean square error, average difference for image set 1 are obtained and listed in Table 2. The higher parameter values obtained are highlighted in Table 2.

Table 2 Comparison of performance metrics for image set 1

The introduction of DTCWT yields better results since it can retain more orientation information than DWT. Moreover, there is a drastic improvement in standard deviation, PSNR, spatial frequency, \({Q}^{{AB/F}}\), mean square error, normalized cross-correlation and average difference due to the introduction of shift invariant wavelet transform and PCA. The proposed fusion approach gives additional fine details such as edge, phase, directional information more precisely. Thus, it is concluded that the performance of the image fusion is greatly improved due to the shift invariant property of the wavelet transform.

Fig. 8
figure 8

Comparison results of entropy for different families (biorthogonal, Daubechies)

The wavelet families adopted in this fusion approach are Daubechies (dbN, \(N = 1{\ldots }20\)), symlets (symN, \(N = 1{\ldots }20\)), coiflets (coifN, \(N = 1{\ldots }5\)), biorthogonal (bior (MN), \(M =1{\ldots }6\), \(N = 1{\ldots }9\)). The entropies obtained for the fused image using different wavelets are revealed in Fig. 8, and it is found that biorthogonal has better performance than Daubechies. The visual comparison results of the fused images using different wavelet families for two image sets are shown in Figs. 9 and 10.

Fig. 9
figure 9

Comparison of fused image (set 1) using cascaded shift invariant WTs and PCA method using different wavelets a bior 5.5 (\(E=7.1700\)), b db2 (\(E= 7.0999\)), c coif1 (\(E=7.1013\)), d sym2 (\(E= 7.0999\)), e rbio5.5 (\(E= 7.1105\))

Fig. 10
figure 10

Comparison of fused images (set 2) using cascaded shift invariant WTs and PCA method using different wavelets a bior 5.5 (\(E=6.2809\)), b db2 (\(E=5.9007\)), c coif1 (\(E=5.9179\)), d sym2 (\(E=5.9007\)), e rbio5.5 (\(E= 5.8164\))

It is observed that Figs. 9a and 10a yield more visual information with the entropies 7.1700 and 6.2809, respectively, compared to the other wavelet families. It is noticed that the proposed fusion approach gives better entropy and PSNR values. The selection of fusion rule is also an important criterion for the enhancement of information and quality metrics. Table 3 shows the comparison results using different fusion rules. The higher values of entropy, PSNR and standard deviation obtained using maximum fusion rule for two sets of images are highlighted in Table 3.

Table 3 Comparison of performance metrics using different fusion rules
Fig. 11
figure 11

Comparison of entropy and PSNR for different fusion techniques

It is manifested that the maximum fusion rule provides higher values in terms of entropy (E), PSNR and standard deviation (SD). The entropy and PSNR variations of the resultant images obtained using different fusion methods like DWT, UDWT, PCA, UDWT and PCA, UDWT and DTCWT, combined UDWT, PCA and DTCWT have also been compared, and the results are depicted in Fig. 11.

Experimental outcomes presented in this section illustrate that the proposed methodology performs better than conventional multiscale transforms. The improved performance of the fused image is due to shift invariance and accessibility of phase information in imaginary part of the DTCWT. The biorthogonal wavelet family offers more desirable outcome, as it can retain information of individual images like lines, curves, edges, boundaries in a better way. Besides, the spatial domain representation of the image imparts high spatial resolution but the images have blurring problem. The proposed fusion technique could further reduce artifacts and produce more smooth transitions at boundaries with less blurriness. Thus, the combination of spatial and shift invariant-based transform domain fusion method ameliorates the performance as compared to the individual fusion algorithms.

Conclusion

In this paper, a novel image fusion framework based on the cascade of shift invariant wavelet transform (UDWT, DTCWT and PCA) has been presented. The property of shift invariance is important in image fusion for enhancing directional features and extracting fine edge details. Furthermore, the artifacts and distortions can be reduced due to the introduction of undecimated wavelet transform. The redundant details of the image can also be removed by the application of PCA. The complex wavelet transform (DTCWT) can even fuse misregistered images and significantly preserves the edges. Since DTCWT operates in the complex domain, it provides phase information. Other wavelets like DWT, UDWT do not have this feature. Thus, directional and spatial information extracted from the biomedical images can greatly improve the objective metrics performance. The experimental results demonstrated that the proposed method outperforms the other fusion methods in terms of objective evaluation.