1 Introduction

Image fusion has become an important part of image processing, which refers to integrate multiple images of the same scene or different scenes into a new image so that can provide more comprehensive help for a specific field [43]. The fusion image can minimize information redundancy and contain all useful information of the source image [5]. In recent years, image fusion technology has received widespread attention and is widely used in multi-focus images [34, 61], medical images [10, 60], infrared images [48] and remote sensing images [19] in many areas.

Image fusion can be divided into three levels: data-level fusion, feature-level fusion and decision-level fusion. The data-level fusion, also known as pixel-level fusion, refers to the process of directly processing the data collected by the sensor. The advantage of this fusion is to maintain as much raw data as possible, providing subtle information that other fusion levels can not provide. The feature-level fusion method first obtains the regional characteristics from the acquired source image according to the feature extracted principle, then analyzes the extracted feature information, summarizes the most representative data characteristics, and the feature information is extracted from the further integrated region again. The decision-level fusion method requires first filtering processing and signal enhancement, and continues the practice in feature-level fusion method for feature extraction. Decision-level fusion approach focuses more on the information decisions brought by the target region itself. According to the characteristics of image fusion algorithm, it is usually divided into two categories: image fusion algorithms based on the space domain and image fusion algorithms based on the frequency domain. The former directly processes images in the spatial domain with the advantage that pixels instead of transform domain coefficients can preserve the different scale details of the image rather than the finite scale determined by the decomposition layer, while also avoiding the additional computational burden imposed by the multi-scale decomposition. The latter is to obtain the fusion image in the frequency domain, extract the coefficients according to the local features of the source image, select the appropriate fusion rules, and use the coefficients to obtain the fusion image [2]. The common fusion methods based on frequency domain include: contourlet transform [25], discrete wavelet transform [30, 44], dual-tree complex wavelet transform [9], stationary wavelet transform [16], curvelet transform [6], slicelet transform [22] and so on.

With the increasing application of image fusion, the research on image fusion has become more and more extensive. Fusion algorithms based on discrete cosine transform (DCT) and discrete wavelet transform (DWT) are the most common fusion methods. Image fusion algorithm based on the wavelet transform has been successful and widely used. For example, Naeem et al. [38] used the discrete wavelet transform (DWT) to fuse the image with a small number of details and another image with rich details which can change the uniformity of the image with encrypted details. Yang [59] proposed a DWT-based image fusion method, which uses the maximum coefficient fusion rule. Haghighat [24] proposed an efficient multi-focus image fusion method based on wavelet domain variance, which improves the quality of the fused image and reduces the computational complexity. Tang [52] proposed a new image fusion method which based on local contrast measurement in the DCT domain. However, the fused image obtained by this method will cause image blurring. In [21], the maximum pixel replacement and pixel average fusion rules are proposed. Experimental results show that this method is more sensitive to noise and artifacts. Abdollahzadeh et al. [1] proposed to calculate Sum-Modified-Laplacian (SML) in the DCT domain.

Multi-modal medical image fusion can improve the clinical accuracy of medical images. The two medical source images are fused through a certain fusion algorithm so that the fused image can contain the effective information in the source image. Medical image fusion based on the wavelet transform has achieved good results. Vijayarajan [53] used the average principal component fusion method based on DWT to fuse the computed tomography (CT) image and magnetic resonance image (MRI) which decomposed the source image into multi-scale input and obtain good experimental results. In [23], the author proposed an image fusion algorithm based on DWT-DBSS and used the maximum selection rule to obtain detailed fusion coefficients. Rajarshi et al. [46] proposed to use the maximum local extrema fusion rule to fuse the magnetic resonance images (MRI) and computed tomography (CT) images. Experimental results show that the fused image obtained by DWT algorithm retains most of the useful information of the source image. However, the DWT-based image fusion algorithm has low efficiency due to its high computational complexity and long experimental time. Moreover, the fused image will cause problems such as blocking effect and quality loss. Therefore, the author [39] proposed a new multi-focus image fusion algorithm based on correlation coefficients. In addition, the author used the singular value decomposition (SVD) method to directly fuse multi-focus images in the DCT domain [40]. The results show that the fusion image obtained by the DCT-based image fusion algorithm is relatively clear, and the experiment takes less time and is more efficient. However, the existing image fusion algorithms based on DWT and DCT are mainly used for grayscale images. For color images, the three channels of the color image are usually processed separately. In [7], the author used the DCT algorithm to fuse satellite images, processed the multiple channels of satellite images separately and finally integrated them to obtain the fused image. This method usually ignores the correlation between the various channels of the image, resulting in incomplete fusion image information.

Fortunately, geometric algebra (GA) provides a computational framework for multi-dimensional signal processing, which can treat multi-channel images as a whole [20, 42, 51]. Wang et al. [57] proposed the Sparse Fast Clifford Fourier Transform (SFCFT) theory, which selectively uses input data in scalar and vector fields to deal with big data problems. Felsberg [18] used Clifford Algebra to define the corresponding Clifford-Fourier transform (CFT). Berthier et al. [8] focused on the use of geometric methods of group actions, which performed a Clifford Fourier transform for spectral analysis of color images. Julia et al. [17] proposed the Clifford Fourier transform and extended the Fourier transform to include the general elements of Clifford Algebra. DCT has been a basic tool for signal and image processing for many years. It can directly perform experiments in the DCT domain and avoid the complicated image encoding and decoding process which can save a lot of time and improve efficiency. The Geometric Algebra Discrete Cosine Transform (GA-DCT) represents the multi-modal medical image in a holistic way and considers the correlation between channels, so we propose to extend the DCT to the geometric algebraic domain to fuse the multi-modal medical images.

With the development of image fusion algorithms and the application of GA in image fusion, a novel multi-vector image fusion algorithm is proposed. Firstly, the source images are required to be divided into several blocks. Then, the proposed image fusion algorithm represents each multi-modal medical image block as a multi-vector by using the theory of GA algorithm and performs GA-DCT on each block. By calculating the average value of the coefficients of the corresponding GA-DCT block, the fusion coefficient is obtained by using the fusion rule of the coefficient average value. The Inverse Geometric Algebra Discrete Cosine Transform (IGA-DCT) is applied for each block and the fusion image is reconstructed by merging all the blocks. In order to test the performance of the proposed algorithm, this paper conducts several fusion experiments on four sets of multi-modal color medical images of the brain. The experimental results show that the fusion image obtained by the proposed image fusion algorithm in this paper has higher resolution and more comprehensive information, and has a great advantage in subjective vision and objective evaluation.

The rest of this paper is organized as follows. In Sect. 2, this paper introduces the basic knowledge of geometric algebra. Section 3 introduces the GA-DCT algorithm and the fusion steps of the proposed algorithm in detail. Section 4 introduces the experimental analysis including subjective and objective fusion image quality evaluations. Finally, we make a conclusion in Sect. 5.

2 Geometric Algebra

Geometric algebra (GA) [26] was proposed by William K. Clifford, also known as Clifford Algebra, which provides a new idea for the research and application of image representation. It can perform the geometric operations and analysis in high-dimensional space [12, 15, 27, 32, 50] and has become an important research tool in theoretical mathematics, computer vision and physics [13, 33].

In this section, we will introduce the relevant knowledge of GA in detail.

2.1 Fundamental of Geometric Algebra

Let Gn represents a n-dimensional GA. A set of orthogonal bases of Gn is \( \left\{ 1, \beta _{1}, \beta _{2}, \ldots , \beta _{n}\right\} \) , which leads to a basis by geometric product.

$$\begin{aligned} \{ 1 , \{ \beta _ { i } \} , \{ \beta _ { i } \beta _ { j } \} , \ldots , \{ \beta _ { 1 } \beta _ { 2 } \ldots \beta _ { n } \} \}. \end{aligned}$$
(2.1)

The orthogonal basis introduced above is non-commutative and satisfies the following formula,

$$\begin{aligned}&\beta _{i}^{2}=1, \quad i=1, \ldots , n, \end{aligned}$$
(2.2)
$$\begin{aligned}&\quad \beta _ { i } \beta _ { ij } = \beta _ { i } \beta _ { i } \beta _ { j } = \beta _ { j } , \quad i , j = 1 , \ldots , n, \quad i \ne j, \end{aligned}$$
(2.3)
$$\begin{aligned}&\quad \beta _{i j}=\beta _{i} \beta _{j}=-\beta _{j} \beta _{i}=-\beta _{j i}, \quad i, j=1, \ldots , n, \quad i \ne j. \end{aligned}$$
(2.4)

It can be seen from the above formula that there are \( 2^{n} \) orthogonal bases of Gn. For example, the G2 contains four orthogonal bases and G3 contains eight orthogonal bases. The struction of the orthogonal basis of G2 and G3 is shown below,

$$\begin{aligned}&G _ { 2 } : \{ 1 , \{ \beta _ { 1 } , \beta _ { 2 } \} , \beta _ { 1 } \beta _ { 2 } \} = \{ 1 , \beta _ { 1 } , \beta _ { 2 } , \beta _ { 12 } \}, \end{aligned}$$
(2.5)
$$\begin{aligned}&\quad \left. \begin{array} { c } { G _ { 3 } : \{ 1 , \{ \beta _ { 1 } , \beta _ { 2 } , \beta _ { 3 } \} , \{ \beta _ { 1 } \beta _ { 2 } , \beta _ { 2 } \beta _ { 3 } , \beta _ { 1 } \beta _ { 3 } \} , \beta _ { 1 } \beta _ { 2 } \beta _ { 3 } \} } \\ { = \{ 1 , \beta _ { 1 } , \beta _ { 2 } , \beta _ { 3 } , \beta _ { 12 } , \beta _ { 23 } , \beta _ { 13 } , \beta _ { 123 } \} }. \end{array} \right. \end{aligned}$$
(2.6)

Just as vectors are the basic elements of linear algebra, the multi-vectors are the basic elements of GA. By observing the forms of complex numbers, quaternions and GA, we can find that the multi-vector structure of GA is the n-dimensional extension of complex numbers and quaternions [36, 49]. If a multi-vector \( a \in G_{n} \), then a can be represented as

$$\begin{aligned} a = a _ { 0 } + \sum _ { i = 1 } ^ { n } a _ { i } \beta _ { i }, \end{aligned}$$
(2.7)

where \( a_{0}, a_{1}, \ldots , a_{n} \in R_{n} \).

2.2 Basic Operation of Geometric Algebra

In fact, the product operation in GA space is called geometric product. The geometric product calculation formula of GA is composed of inner product and outer product. For vectors p and q, the geometric product is defined as follows,

$$\begin{aligned} p q = p \cdot q + p \wedge q, \end{aligned}$$
(2.8)

where \( p \cdot q \) is the scalar part, which represents the inner product in the geometric product. \( p \wedge q \) is the vector part, which represents the outer product in the geometric product. Due to the outer product is non-commutative, that is, \( {\varvec{p}} \wedge {\varvec{q}}=-{\varvec{q}} \wedge {\varvec{p}} \), then the geometric product is also non-commutative. The relationship among geometric products, inner products and outer products is shown in the Eqs. (2.9) and (2.10).

$$\begin{aligned}&p \cdot q = \frac{ 1 }{ 2 } ( p q + q p ), \end{aligned}$$
(2.9)
$$\begin{aligned}&p \wedge q = \frac{ 1 }{ 2 } ( p q - q p ). \end{aligned}$$
(2.10)

If p and q are first-order vectors (FOV), then \( p \wedge q \) can be called a bivector, which is interpreted as a vector facet formed by two vectors in geometric algebra, as shown in Fig. 1; the trivector \( p \wedge q \wedge m \) can be interpreted as a volume element with the direction of the vector facet \( p \wedge q \) and the one-dimensional vector m facing inward, as shown in Fig. 2.

Fig. 1
figure 1

Bivectorgraph

Fig. 2
figure 2

3D outer product space

2.3 Geometric Algebraic Representation of Multi-modal Image

As we know, a complex number is composed of a scalar part and a vector part. A quaternion is composed of a scalar part and three vector parts. The GA space Gn is a geometric extension of Rn. Therefore, any multi-vector \( Z \in \left( G_{n}\right) \) can be expressed in Eq. (2.11).

$$\begin{aligned} {\mathbf {Z}}=E_{0}({\mathbf {Z}})+\sum _{1 \le i \le n} E_{i}({\mathbf {Z}}) \beta _{i}+\sum _{1 \le i<j \le n} E_{i j}({\mathbf {Z}}) \beta _{i j}+\cdots +E_{1 \ldots n}({\mathbf {Z}}) \beta _{1 \ldots n}.\nonumber \\ \end{aligned}$$
(2.11)

The multi-modal image is expressed in the form of GA, and the image is processed in an overall manner, which can take into account the correlation between the channels of the color image, so it is widely used in image processing [14, 54,55,56]. Given a multi-modal image \( K \in \left( G_{n}\right) \) and its GA form is

$$\begin{aligned} {\mathbf {K}}=0+\sum _{1 \le i \le n} E_{i}({\mathbf {K}}) \beta _{i}+\sum _{1 \le i<j \le n} E_{i j}({\mathbf {K}}) \beta _{i j}+\cdots +E_{1 \ldots n}({\mathbf {K}}) \beta _{1 \ldots n},\nonumber \\ \end{aligned}$$
(2.12)

where \( {E}(\mathrm {K}) \in \mathbb {R} \), which represents the value of each channel of the multi-modal image and \(\beta _ { i }\) represents the orthogonal basis of Geometric Algebra. All spectral channels of multi-modal image are represented by a set of orthogonal basis. Due to the scalar part is not used, the scalar part is zero.

3 Our Proposed Algorithm

Discrete cosine transform (DCT) is an effective tool in signal and image processing. As it becomes more widely used, researchers have tried to expand it to process the higher-dimensional signals. For multi-modal images, the traditional method is to divide the multi-modal image into several channels firstly, and DCT can be used for each spectral channel separately. However, the disadvantage of this implementation method is that it ignores the correlation between the spectrum channels. Therefore, this paper proposes a geometric algebra form of the discrete cosine transform, named the geometric algebra discrete cosine transform (GA-DCT). The GA-DCT proposed in this paper treats multi-modal image as a multi-vector, and processes the multi-modal image in an overall method by mapping each spectral channel to each blade of GA.

3.1 Geometric Algebra Discrete Cosine Transform

For a multi-modal image f(xy) with size \( {M} \times {N} \), according to the non-commutative nature of geometric algebra, GA-DCT can be defined in two forms. Formulas (3.1) and (3.2) represent the GA-DCT on the left and right sides respectively.

$$\begin{aligned} C_{L}(u, v)= & {} \alpha (u) \alpha (v) \sum _{x=0}^{M-1} \sum _{y=0}^{N-1} \lambda f(x, y) \cos \Big [\frac{\pi (2 x+1) u}{2 M}\Big ]\nonumber \\&\cos \Big [\frac{\pi (2 y+1) v}{2 N}\Big ], \end{aligned}$$
(3.1)
$$\begin{aligned} C _ { R } ( u , v )= & {} \alpha ( u ) \alpha (v ) \sum _ { x = 0 } ^ { M - 1 } \sum _ { y = 0 } ^ { N - 1 } f ( x , y ) \cos \Big [ \frac{ \pi ( 2 x + 1 ) u }{ 2 M } \Big ]\nonumber \\&\cos \Big [ \frac{ \pi ( 2 y + 1 ) v }{ 2 N } \Big ] \lambda . \end{aligned}$$
(3.2)

The two forms of GA-DCT correspond to two inverse transforms respectively. Formulas (3.3) and (3.4) represent inverse transformations for the left-sided and right-sided respectively.

$$\begin{aligned} f _ { L } ( x , y )= & {} - \sum _ { x = 0 } ^ { M - 1 } \sum _ { y = 0 } ^ { N - 1 } \alpha ( u ) \alpha ( v ) \lambda C ( u , v ) \cos \Big [ \frac{ \pi ( 2 x + 1 ) u }{ 2 M } \Big ]\nonumber \\&\cos \Big [ \frac{ \pi ( 2 y + 1 ) v }{ 2 N } \Big ], \end{aligned}$$
(3.3)
$$\begin{aligned} f _ { R } ( x , y )= & {} - \sum _ { x = 0 } ^ { M - 1 } \sum _ { y = 0 } ^ { N - 1 } \alpha ( u ) \alpha ( v ) C ( u , v ) \cos \Big [ \frac{ \pi ( 2 x + 1 ) u }{ 2 M } \Big ]\nonumber \\&\cos \Big [ \frac{ \pi ( 2 y + 1 ) v }{ 2 N } \Big ] \lambda , \end{aligned}$$
(3.4)

where \( \lambda \) is a GA multi-vector with unit magnitude having no scalar part, i.e. \( \lambda \) holds the following properties,

$$\begin{aligned}&\lambda =\sum _{1 \le i \le n} E_{i}(\lambda ) \beta _{i}+\sum _{1 \le i<j \le n} E_{i j}(\lambda ) \beta _{i j}+\cdots +E_{1 \ldots n}(\lambda )\nonumber \\&\qquad \beta _{1 \ldots n}, E(\lambda ) \in \mathbb {R}, \end{aligned}$$
(3.5)
$$\begin{aligned}&\quad | \lambda | ^ { 2 } = \lambda * \tilde{ \lambda } = 1. \end{aligned}$$
(3.6)

Similar to the traditional DCT, \( \alpha (u), \alpha (v) \) are shown in formula (3.7),

$$\begin{aligned} \alpha (u)=\left\{ \begin{array}{ll}\frac{1}{\sqrt{M}}, &{} u=0 \\ \sqrt{\frac{2}{M}}, &{} u \ne 0\end{array}, \quad \alpha (v)=\left\{ \begin{array}{ll}\frac{1}{\sqrt{N}}, &{} v=0 \\ \sqrt{\frac{2}{N}}, &{} v \ne 0\end{array}.\right. \right. \end{aligned}$$
(3.7)

3.2 Algorithm Details

For a multi-modal image \( F \in \left( G_{n}\right) ^{M \times N} \), which are divided into \( n \times n \) pixel blocks. Let \( \left\{ f_{i, j}\right\} \) be a \( n \times n \) block in a source image and the GA-DCT coefficients are \( \left\{ D_{u, v}\right\} \). In this paper, the right-sided GA-DCT and IGA-DCT are adopted for multi-model medical image fusion. The GA-DCT of an image is shown in formula (3.8),

$$\begin{aligned} D _ { u , v } = \alpha ( u ) \alpha ( v ) \sum _ { x = 0 } ^ { n - 1 } \sum _ { y = 0 } ^ { n - 1 } f _ { i , j } \cos \left[ \frac{ \pi ( 2 x + 1 ) u }{ 2 n } \right] \cos \left[ \frac{ \pi ( 2 y + 1 ) v }{ 2 n } \right] \lambda , \end{aligned}$$
(3.8)

where \( u, v=0,1, \ldots , \mathrm {n}-1 \).

The source image block \( \left\{ f_{i, j}\right\} \) can be recovered from the GA-DCT coefficients by employing the IGA-DCT as shown in formula (3.9),

$$\begin{aligned} f _ { i , j } =-\sum _{x=0}^{M-1} \sum _{y=0}^{N-1} \alpha (u) \alpha (v) D _ { u , v } \cos \left[ \frac{\pi (2 x+1) u}{2 M}\right] \cos \left[ \frac{\pi (2 y+1) v}{2 N}\right] \lambda ,\nonumber \\ \end{aligned}$$
(3.9)

where \(i, j=0,1, \ldots , \mathrm {n}-1\).

The GA-DCT coefficients of an image block of size \( n \times n \) is shown in Eq. (3.10). The source image is usually divided into \( 8 \times 8 \) blocks. Each image block is actually a 64-point discrete signal. GA-DCT takes these signals as input and then decomposes it into 64 orthogonal base signals. Therefore, the output of GA-DCT is the amplitude of 64 base signals, which is the GA-DCT coefficient. The transform coefficients on the frequency domain are the function of the two-dimensional frequency domain variables u and v. The coefficient corresponding to \( u=0 \) and \( v=0 \) is called DC component, which is DC coefficient, and the remaining 63 coefficients are called AC components, which are AC coefficients. Therefore, each data block is a matrix with 64 DCT coefficients. Among the 64 coefficients, the DC coefficient is located in the upper left corner of the image block and is equal to the average of 64 samples; the remaining 63 coefficients represent AC coefficients. The farther away from the DC component, the higher the frequency of the image AC component represented by the coefficient. The GA-DCT frequency band coefficients distribution is shown in Fig. 3.

$$\begin{aligned} D=\left[ \begin{array}{llllllll}d_{00} &{} d_{01} &{} d_{02} &{} d_{03} &{} d_{04} &{} d_{05} &{} d_{06} &{} d_{07} \\ d_{10} &{} d_{11} &{} d_{12} &{} d_{13} &{} d_{14} &{} d_{15} &{} d_{16} &{} d_{17} \\ d_{20} &{} d_{21} &{} d_{22} &{} d_{23} &{} d_{24} &{} d_{25} &{} d_{26} &{} d_{27} \\ d_{30} &{} d_{31} &{} d_{32} &{} d_{33} &{} d_{34} &{} d_{35} &{} d_{36} &{} d_{37} \\ d_{40} &{} d_{41} &{} d_{42} &{} d_{43} &{} d_{44} &{} d_{45} &{} d_{46} &{} d_{47} \\ d_{50} &{} d_{51} &{} d_{52} &{} d_{53} &{} d_{54} &{} d_{55} &{} d_{56} &{} d_{57} \\ d_{60} &{} d_{61} &{} d_{62} &{} d_{63} &{} d_{64} &{} d_{65} &{} d_{66} &{} d_{67} \\ d_{70} &{} d_{71} &{} d_{72} &{} d_{73} &{} d_{74} &{} d_{75} &{} d_{76} &{} d_{77}\end{array}\right] \end{aligned}$$
(3.10)
Fig. 3
figure 3

Frequency band information of GA-DCT coefficients

According to the above introduction, a multi-modal medical image fusion rule can be designed. At present, the more common fusion rules based on discrete cosine transform include local energy maximum rule, image contrast maximum rule and coefficient average rule. This paper uses the GA-DCT based on the average value of coefficients. Let M1 and M2 are two source color images of size \( M \times N \) and suppose that it can be divided into \( n \times n \) blocks and each block is represented in GA multi-vector form. Let \( X=x_{i, j} \) and \( Y=y_{i, j} \) be the GA form of two image blocks of the source color image M1 and M2,

$$\begin{aligned} \left. \begin{array} { l } { x _ { i , j } = x _ { R } \beta _ { 1 } + x _ { G } \beta _ { 2 } + x _ { B } \beta _ { 12 } }, \\ { y _ { i , j } = y _ { R } \beta _ { 1 } + y _ { G } \beta _ { 2 } + y _ { B } \beta _ { 12 } }, \end{array} \right. \end{aligned}$$
(3.11)

where \( x_{i, j} \) and \( y_{i, j} \) represent blocks of the two source image respectively.

Then, (3.8) can be applied to obtain the GA-DCT coefficients of \( x_{i, j} \) and \( y_{i, j} \). The GA-DCT coefficients of \( x_{i, j} \) and \( y_{i, j} \) are \( D_{x}=\left\{ d_{x, u, v}\right\} \) and \( D_{y}=\left\{ d_{y, u, v}\right\} \). Take the average of the block coefficients corresponding to the DCT coefficient matrix of the two source images as the DCT coefficient of the fused image. The formula is shown in (3.12),

$$\begin{aligned} D _ { f , u , v } = 0.5 \times ( d _ { x , u , v } + d _ { y , u , v } ), \end{aligned}$$
(3.12)

where \( d_{x, u, v} \) and \( d_{y, u, v} \) are the corresponding AC and DC coefficient of the input image block \( x_{i, j} \) and \( y_{i, j} \) respectively. Then, the fused image block is obtained by using the IGA-DCT.

Repeat the above steps for all image blocks to obtain the fused image blocks of the two source images, and then combine all fused image blocks to get the final fused image.

In conclusion, the steps of the GA-DCT algorithm are:

  1. 1.

    Let M1 and M2 represent two source color images of size \( M \times N \) and suppose that they can be divided into \( n \times n \) blocks;

  2. 2.

    Each divided image block will be represented into geometric algebra form;

  3. 3.

    Perform GA-DCT on the image to obtain transform coefficients;

  4. 4.

    Use the fusion rule of the coefficient averaging method to calculate the corresponding coefficients of the two source images to obtain the coefficients of the fusion image;

  5. 5.

    IGA-DCT is used to obtain the fusion image.

Figure 4 is the framework of the multi-modal image fusion based on the GA-DCT.

Fig. 4
figure 4

The framework of multi-modal image fusion based on GA-DCT

4 Experimental Analysis

In order to test the effectiveness of the proposed algorithm, experiments are conducted on four sets of multi-modal medical images of the brain in the matlab environment. For comparison, we choose five fusion algorithms that are commonly used and have better fusion effects including Laplacian Pyramid [35], DWT-DBSS [23], SIDWT-Haar [58], Morphological Difference Pyramid [37] and DCT based on variance [3] respectively. The source image sets are selected from available medical images data base provided by Harvard Medical School [11]. Each image set contains a SPECT-T1 image and a SPECT-TC image which size are \( 256 \times 256 \).

4.1 Evaluation Standard

The performance of image fusion algorithms is usually evaluated using subjective and objective indicators. For subjective measurement, we mainly compare fused effect through visual observation. However, the objective indicator of color fusion image quality evaluation usually requires ideal fusion image, and it is difficult to unify the standard of ideal fusion image. In this article, we take the two source image as the ideal fusion image. At present, the most widely used fusion image quality evaluation indicators include Multi-scale Structural Similarity (MSSSIM) [29], Peak Signal-to-Noise Ratio (PSNR) [28], Root-Mean-Square-Error (RMSE) [47], Mutual Information (MI) [45], Entropy [41], Correlation Coefficient (CC) [31] et al. This article chooses the above six objective evaluation standards to quantify the fusion images.

SSIM measures the structural similarity between the source image and the fused image. The value of SSIM is set between 0 and 1. The larger the obtained SSIM value, the more similar between the fusion image and the source image, and the better the fusion effect. The calculation formula of SSIM is shown in formulas (4.1) and (4.2).

$$\begin{aligned} {\text {SSIM}}_{(x, y, f)}=0.5 \times \left( {\text {SSIM}}_{(x, f)}+{\text {SSIM}}_{(y, f)}\right) . \end{aligned}$$
(4.1)

In

$$\begin{aligned} \begin{aligned} {\text {SSIM}}_{(x, f)}&=\frac{\left( 2 \mu _{x} \mu _{f}+C_{1}\right) \left( 2 \sigma _{x f}+C_{2}\right) }{\left( \mu _{x}^{2}+\mu _{f}^{2}+C_{1}\right) \left( \sigma _{x}^{2}+\sigma _{f}^{2}+C_{2}\right) }, \\ {\text {SSIM}}_{(y, f)}&=\frac{\left( 2 \mu _{y} \mu _{f}+C_{1}\right) \left( 2 \sigma _{y f}+C_{2}\right) }{\left( \mu _{y}^{2}+\mu _{f}^{2}+C_{1}\right) \left( \sigma _{y}^{2}+\sigma _{f}^{2}+C_{2}\right) }, \end{aligned} \end{aligned}$$
(4.2)

\( \mu _{x}, \mu _{y} \) and \( \mu _{f} \) represent the mean values of the source image xy and the fusion image f; \( \sigma _{x}^{2}, \sigma _{y}^{2} \) and \( \sigma ^{2} \) represent the variance of the source image and the fusion image respectively; \( \sigma _{x f} \) and \( \sigma _{y f} \) represent the covariance of the two source images and the fusion image respectively; \( \mathrm {C}_{1}, \mathrm {C}_{2} \) and \( \mathrm {C}_{3} \) are constants to avoid the denominator being 0 and maintain stability, \( C_{1}=\left( K_{1} \times L\right) ^{2} \), \( C_{2}=\left( K_{2} \times L\right) ^{2} \), usually \(K _ { 1 } = 0.01 , K _ { 2 } = 0.03 , L = 255\).

PSNR is an indicator defined based on the mean square error. In the fusion image, the higher the PSNR value obtained, the closer the fusion image is to the source image. The PSNR calculation formula is shown in formula (4.3),

$$\begin{aligned} P S N R=10 \times \log _{10}\left( \frac{L^{2}}{M S E}\right) =20 \times \log _{10}\left( \frac{L}{R M S E}\right) . \end{aligned}$$
(4.3)

RMSE denotes the mean square error of the image, and the RMSE value is inversely proportional to the quality of the fused image, that is, the lower the RMSE value, the better the quality of the fused image. The calculation formula is

$$\begin{aligned} R M S E=\sqrt{\frac{\sum _{m=1}^{M} \sum _{n=1}^{N}[{\text {source}}(m, n)-{\text {fused}}(m, n)]^{2}}{M \times N}}. \end{aligned}$$
(4.4)

MI represents the degree of interdependence between the source image and the fused image. The greater the MI value, the better the fusion effect.

$$\begin{aligned} M I=\frac{J E(x, f)+J E(y, f)}{I E_{x}+I E_{y}}. \end{aligned}$$
(4.5)

In

$$\begin{aligned} \left. \begin{array} { l } { J E ( x , f ) = \sum _ { i = 0 } ^ { L - 1 } \sum _ { j = 0 } ^ { L - 1 } P _ { x , f } ( i , k ) \log P _ { x , f } ( i , k ) / ( P _ { x } ( i ) \times P _ { f } ( k ) ) }, \\ { J E ( y , f ) = \sum _ { i = 0 } ^ { L - 1 } \sum _ { j = 0 } ^ { L - 1 } P _ { y , f } ( i , k ) \log P _ { y , f } ( i , k ) / ( P _ { y } ( i ) \times P _ { f } ( k ) ) }, \end{array} \right. \end{aligned}$$
(4.6)

JE(xf) and JE(yf) denote the joint entropy of the source image and the fusion image respectively. IE denotes the information entropy of the image.

Entropy is a standard to reflect the richness of image information from the perspective of information theory. The size of information entropy reflects the amount of information carried by the image. The greater the information entropy of the image, the richer its information and the better the quality. The formula is

$$\begin{aligned} E N=-\sum _{L=0}^{L-1} P_{i} \times \log _{2} P_{i}, \end{aligned}$$
(4.7)

where L represents the image gray level. \( P_{i} \) represents the proportion of gray value i pixels to the total pixels. The larger the EN, the larger the amount of information in the fused image.

Correlation Coefficient reflects the degree of correlation between the fused image and the source image. The larger the correlation coefficient, the higher the similarity between the two images. The calculation formula is as follow,

$$\begin{aligned} C C(X, Y)=\frac{\sum _{i=1}^{M} \sum _{j=1}^{N}\left( X_{i, j}-\bar{X}\right) \left( Y_{i, j}-\bar{Y}\right) }{\sqrt{\left( \sum _{i=1}^{M} \sum _{j=1}^{N}\left( X_{i, j}-\bar{X}\right) ^{2}\right) \left( \sum _{i=1}^{M} \sum _{j=1}^{N}\left( Y_{i, j}-\bar{Y}\right) ^{2}\right) }},\nonumber \\ \end{aligned}$$
(4.8)

where X and Y represent the source image and the fused image respectively.

Fig. 5
figure 5

Fusion results of image set1 a source image SPECT-T1, b source image SPECT-TC, c Laplacian Pyramid, d DWT-DBSS, e SIDWT-Haar, f Morphological Difference Pyramid, g DCT-Variance, h GA-DCT-Average

Fig. 6
figure 6

Fusion results of image set2 a source image SPECT-T1, b source image SPECT-TC, c Laplacian Pyramid, d DWT-DBSS, e SIDWT-Haar, f Morphological Difference Pyramid, g DCT-Variance, h GA-DCT-Average

Fig. 7
figure 7

Fusion results of image set3 a source image SPECT-T1, b source image SPECT-TC, c Laplacian Pyramid, d DWT-DBSS, e SIDWT-Haar, f Morphological Difference Pyramid, g DCT-Variance, h GA-DCT-Average

Fig. 8
figure 8

Fusion results of image set4 a source image SPECT-T1, b source image SPECT-TC, c Laplacian Pyramid, d DWT-DBSS, e SIDWT-Haar, f Morphological Difference Pyramid, g DCT-Variance, h GA-DCT-Average

4.2 Subjective Fusion Image Quality Evaluations

The visual results of fusion experiments on image sets 1–4 are shown in Figs. 5, 6, 7 and 8 respectively. The following four sets of pictures are the brain medical source images and the fusion images obtained using Laplacian Pyramid, DWT-DBSS, SIDWT-Haar, Morphological Difference Pyramid, DCT-Variance and GA-DCT-Average algorithms.

Subjectively, Figs. 5, 6, 7 and 8c–f are the fused images obtained by the Laplacian Pyramid, DWT-DBSS, SIDWT-Haar and Morphological Difference Pyramid algorithms. It can be clearly seen from the figures that the boundary part of the fusion images is relatively complete, but the middle position is relatively dark as a whole. The sharpness and contrast of the fused images are also very low, indicating that these four algorithms do not fuse the two source images well, resulting in distortion of the fused image and the information contained in the image is not comprehensive. From Figs. 5, 6, 7 and 8g, we can seen that the resolution and contrast of the fused image obtained by using the DCT-Variance algorithm have improved. However, comparing the white frame of each image in Figs. 5 and 6, it obvious that the image (g) contains a large red area, which obscures the original information and may provide wrong information to medical workers. The four sets of images obtained by the DCT-Variance algorithm have lost the key areas of the source image (a), as shown in the red frame in each group of image (g), which means that the DCT-Variance algorithm cannot accurately fuse the information in the source image, which is likely to cause confusion in subjective judgments, and it is not conducive to the doctors to obtain accurate information. Figures 5, 6, 7 and 8h are the fusion images obtained by the GA-DCT-Average algorithm. We can see that the fusion result obtained by GA-DCT-Average is generally clearer than other images, and the fused image basically contains all the key information of the source image.

Table 1 Qualitative results of image-set 1
Table 2 Qualitative results of image-set 2
Table 3 Qualitative results of image-set 3
Table 4 Qualitative results of image-set 4

4.3 Objective Fusion Image Quality Evaluations

The Tables 1, 2, 3 and 4 respectively show the objective quality evaluation of the results obtained by fusing the four groups of images with different fusion algorithms. The bold symbols in each table represent the algorithm with the best index among the six algorithms.

Table 5 Time consumption of different fusion algorithms

Objectively, the four groups of fused images obtained based on the GA-DCT-Average algorithm have absolute advantages in the two indicators of PSNR and RMSE, which shows that the fusion results are closer to the source images and the fusion effect is better than other algorithms in terms of these two indicators. The correlation coefficients of images obtained by GA-DCT-Average are significantly higher than other algorithms in the Tables 1, 2 and 3, while in Table 4, the results obtained by GA-DCT-Average are only slightly lower than the Laplacian Pyramid algorithm. It shows that the image obtained based on GA-DCT-Average has the highest correlation with the source image, and the two are the most similar. From Tables 1, 2, 3 and 4, we can also see that the result image obtained by the GA-DCT-Average algorithm is close to the best in terms of SSIM indicator, and there is only a slight gap between the DCT-Variance algorithm. Entropy indicates the amount of information carried by the image and the richness of image information. It can be seen from Tables 1 and 2 that the fusion image obtained by the GA-DCT-Average algorithm has the highest Entropy, which indicates that the image has the most information and the image quality is better than others. Tables 3 and 4 show that the image obtained by the GA-DCT-Average algorithm is only slightly lower than the DCT-Variance algorithm in the entropy indicator. In general, the proposed algorithm also occupies an advantage in objective evaluation indicators.

4.4 Time Consumption with Different Fusion Algorithms

The time-consuming is one of the important issue to evaluate the performance of the algorithm. Table 5 shows the comparison of the time-consuming by the six algorithms applied to the four sets of medical images. Since the consumed time of the six algorithms is very short, there will be a little error in the time of each experiment. In order to ensure the accuracy of the data, the average of the time obtained from ten experiments is taken as the time consumed by the algorithm. It can be seen from Table 5 that the algorithm proposed in this article takes longer than other algorithms because of some complicated calculations in geometric algebra. In general, the six algorithms mentioned in this article require relatively short experimental time and are relatively efficient.

Fig. 9
figure 9

Fusion performance of the first group medical image under various compression ratios

Fig. 10
figure 10

Fusion performance of the second group medical image under various compression ratios

Fig. 11
figure 11

Fusion performance of the third group medical image under various compression ratios

Fig. 12
figure 12

Fusion performance of the fourth group medical image under various compression ratios

4.5 Fusion Performance with Different Compression Ratios

The Figs. 9, 10, 11 and 12 show the PSNR values of four groups of color medical images fused by six different fusion algorithms under different compression ratios [4]. The compression ratio is defined as the ratio between the compressed image and the source image. It can be seen from Figs. 9, 10, 11 and 12 that the PSNR value of the GA-DCT-Average is significantly higher than other algorithms under different ratio. With the continuous increase of compression ratio, the PSNR values of GA-DCT-Average algorithms are constantly increasing and higher than other algorithms. It means that the proposed algorithm has great advantages under different compression ratios.

Therefore, we can find that the algorithm proposed in this paper occupies comparative advantages in multi-modal medical image fusion subjectively and objectively. The fusion effect is better than several common fusion algorithms comprehensively, which can provide great help for medical staff in diagnosing the cause.

5 Conclusion

This paper proposes a multi-modal medical image fusion algorithm based on the GA-DCT and conducts fusion experiments on four groups of brain medical color images. Considering the connection between the color image channels, we use multi-vector to represent the source image as a whole. Firstly, the source image is divided into several blocks and expressed them as the multi-vector in the GA form; then GA-DCT is used to process the image block. The DC and AC coefficients of the corresponding blocks of the source image are averaged as the coefficients of the fused image; finally perform IGA-DCT to obtain the results. Experimental results show that the proposed algorithm can overcome the problem of image blur and has a considerable improvement in sharpness and contrast. Under different compression ratios, the PSNR of the fused image obtained by the proposed algorithm is better than other algorithms. So it can be used as an effective method for multi-modal medical image fusion.

According to the research results, we can learn that the performance of the proposed algorithm has been improved compared with traditional algorithms, but it does not occupy a great advantage in some objective evaluation indicators. Therefore, the algorithm needs to be continuously improved, and the application of sparse representation and neural network based on geometric algebra in image fusion will be studied in subsequent research.