1 Introduction

As an important branch of image fusion, multi-focus image fusion has been widely used in the field of machine vision, object recognition and artificial intelligence. Due to the limitation of the focusing ability of imaging devices, many natural scenes are not homogeneously focused, which means that there is clarity and detail information only in the focus region, and other regions are not easily observed and understood for human visual perception (Liu et al. 2015; Wang et al. 2010). In order to get a clear image which contains more targets and detail information of true scene, many different fusion techniques are proposed by researchers (Yang 2011; Jin et al. 2016; Zheng et al. 2007; Jin et al. 2017; Tian and Chen 2012).

The key point of image fusion is to extract useful information from the images which are collected from multi-source channels in the same scene so that the fused image has higher accuracy, quality and reliability compared with single image (Zhang and Guo 2009). In recent years, multi-focus image fusion methods based on multi-scale transforms (MST) have become popular, such as Laplacian pyramid (LP), wavelet transform (WT), discrete wavelet transform (DWT) and nonsubsampled contourlet transform (NSCT) (Amolins et al. 2007; Zhong et al. 2014; Kavitha and Thyagharajan 2016) (He et al. 2017; Li et al. 2017). These MST-based methods fuse pixels directly according to different fusion strategies, in which the visual effects of the fused image are better compared with other traditional methods. However, these pixel-based fusion methods which only operated the pixels of the source images by different fusion strategies were easily disturbed by noise and misregistration; moreover, most of pixel-based methods cannot completely retain the clear pixels and more detail information of the focused region to the fused image (Aslantas and Toprak 2014; Li et al. 2006). Another popular multi-focus fusion method is block-based, the main idea of this method is to extract the focused region from the source images, and then these regions were processed and integrated to produce new image. The advantage of this method is that the sharpness and accuracy of the corresponding region in the fused image can be maximally achieved. However, the key problem of this method is how to distinguish the sharpness of the block image and whether it is the focused region. If the block which contains the focused region and non-focus regions is fused into the final image directly, the detail information of non-focus region will be lost (Li and Yang 2008; Li et al. 2004). According to these problems, we proposed a new fusion scheme named multi-focus image fusion combining focus-region-level partition and pulse-coupled neural network (PCNN). The proposed fusion scheme has the advantages of multi-scale transforms (MST)-based method and block-based method.

First step, we differentiate the different regions of the source image based on the sharpness. In order to achieve this objective, we construct an evaluation function to measure these regions. In the two fully registered images of the same scene, the clearer image contains more edges and detail information. And we can measure the edges and detail information of the image by using edge intensity function and gray difference, so that by analyzing the difference between the two indexes of the images, we can determine that which one is the focused image and which one is the non-focus image. In addition, using principal component analysis (PCA) of the image, the obtained eigenvalues and eigenvectors can also intuitively reflect the basic characteristics of the image data matrix. We use the maximum eigenvalue of the image as the third index to measure the sharpness degree. In order to prove the effectiveness of the three indexes, we choose a completely focused image and then process Gaussian blurred to get a new image, which can be regarded as non-focus image. Finally, three indexes of the two images are calculated, respectively, and the experimental results prove the feasibility of the indexes. This is equally valid for measuring the focused regions and the corresponding non-focus regions of two images, which the method is described in Sect. 2.

Back-propagation (BP) neural network has been widely used in the artificial neural networks area because of its excellent characteristics since it was proposed (Wang and Jeong 2017). Next, we use BP neural network to learn and train the three indexes, so as to use the training model to measure the sharpness of the regions in the source image. After dividing the regions of the source image, we obtain the focused regions and the non-focus regions, and then different fusion strategies are selected to fuse the non-focus regions. Since the focused region contains most of the information and clear pixels compared with the non-focus region, we fuse it directly into the final fused image. For the non-focus regions, since the regions contain clear pixels and fuzzy pixels which cannot fuse them to the final image directly, we use MST-based method to process them.

For these MST-based methods, NSCT is used in processing the non-focus regions in our scheme. Since NSCT was proposed by da Cunha et al. (2006), it has been widely used in image fusion due to its many advantages; NSCT is a multi-scale and multi-direction decomposition tool, which has many good properties, such as time–frequency localization, shift invariance, anisotropy; more importantly, it can effectively avoid frequency aliasing phenomenon (Li et al. 2013; Zhao et al. 2015). The non-focus regions of two source images are decomposed by NSCT to get one lowpass subimage and a series of bandpass subimages, respectively. In dealing with lowpass subimage, due to the characteristics of low-frequency image, Gaussian blurred is used in the lowpass subimage in the proposed fusion algorithm. In the bandpass subimage, the fusion method based on PCNN is used to process it. PCNN was proposed as a neural network model which has been widely used in image processing (Johnson and Padgett 1999); due to its excellent characteristics, it has been widely applied in image processing areas, such as image segmentation, image enhancement, image edge detection, image fusion (Qu et al. 2008; Jin et al. 2016a, b) and the PCNN-based methods in image fusion are effective (Xiang et al. 2015). We use the spatial frequency (SF) metric of high-frequency subimages as external incentive information of the PCNN model, which makes it better to deal with overexposed or weak exposure images.

The remaining sections are organized as follows: clarity evaluation functions and focus-region-level partition are presented in Sect. 2. The nonsubsampled contourlet transform and pulse-coupled neural network model are briefly reviewed, and the introduction of proposed fusion algorithm in detail is given in Sect. 3. Experimental results and analysis are given in Sect. 4. Finally, conclusion is shown in Sect. 6.

2 Focus-region-level partition

The first step of the proposed algorithm is to divide the source multi-focus image into the focused regions and the non-focus regions according to different focusing levels. In order to evaluate the definition of the image, we selected three evaluation functions, which will be described in detail as follows.

Fig. 1
figure 1

Comparison between the non-focus image and the simulated non-focus images

2.1 Principal component analysis

The principal component analysis (PCA) is a commonly used method of data analysis which is mainly used to extract the main characteristic components of data (Kaya et al. 2017). By finding a projecting relation, PCA projects high-dimensional data onto low-dimensional data subspace, which purpose is to find the orthogonal directions of strong variability in the source data. Assume a set of data X has n-dimensional independent vectors {\(x_{i}\)}, where \(i=1,...,n,\) the dimension reduction data y can be obtained by

$$\begin{aligned} y_i =A^\mathrm{T}(x_i -\mu ), \end{aligned}$$
(1)

where \(\mu =\frac{1}{n}\sum _{i=1}^n {x_i }\) is the sample mean of {\(x_{i}\)}, A is the orthogonal transformation matrix, which is consisted of the orthonormal eigenvectors of the sample covariance matrix, and the covariance matrix can be defined as

$$\begin{aligned} S=\frac{1}{n}\sum _{i=1}^n {(x_i -\mu )(} x_i -\mu )^\mathrm{T}. \end{aligned}$$
(2)

And the eigenvalues and their corresponding eigenvectors of S can be calculated. Then the eigenvalues are arranged in order from large to small; we use the largest eigenvalue to measure the different resolution images of the same scene; the following experiment shows the feasibility and effectiveness of that.

Figure 2 shows the images, the first image is the source image, which is a focused image, and the second to fourth is processed by Gaussian blurred with different \(\sigma \), which can be regarded as non-focus image. To show the Gaussian blurred image which can be used to simulate non-focus image, the experimental simulation results are given in Fig. 1. From Fig. 1, it can be seen that there is a small difference between the non-focus image and the Gaussian blurred image, and it is feasible to use Gaussian blurred image to simulate the non-focus image (Kumar et al. 2018). It can be seen that with the increasing of \(\sigma \), the image becomes increasingly blurred. And the largest eigenvalues of these images are given in Table 1. It can be seen from Table 1, the more blurred image is, and the greater the corresponding largest eigenvalue value is. The source image in Fig. 2 can be seen as the focused region in the multi-focus image, and the blurred image can be seen as the non-focus region; then we can calculate the eigenvalues of the corresponding regions of two multi-focus images to measure which regions are the focused regions.

Table 1 The largest eigenvalues of the images
Fig. 2
figure 2

The images with different Gaussian blurred

Fig. 3
figure 3

Focused images and corresponding non-focus images

2.2 Edge intensity function

Sobel operator is a discrete differential operator, which is mainly used for edge detection (Al-Nima et al. 2017; Gupta et al. 2016), and can be computed by

$$\begin{aligned} G= & {} \sqrt{G_x ^{2}+G_y ^{2}}, \end{aligned}$$
(3)
$$\begin{aligned} G_x= & {} \left[ {{\begin{array}{l@{\quad }l@{\quad }l} {-1}&{} 0&{} 1 \\ {-2}&{} 0&{} 2 \\ {-1}&{} 0&{} 1 \\ \end{array} }} \right] {*}{{\varvec{I}}}, G_y =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} 1&{} 2&{} 1 \\ 0&{} 0&{} 0 \\ {-1}&{} {-2}&{} {-1} \\ \end{array} }} \right] {*}{{\varvec{I}}}, \end{aligned}$$
(4)

where I is the source image. And (4) can be written as

$$\begin{aligned} G_x= & {} [f(x-1,y+1)+2f(x,y+1) \nonumber \\&\quad +\,f(x+1,y+1)]- {[}f(x-1,y-1)\nonumber \\&\quad +\,2f(x,y-1)+f(x+1,y-1)], \end{aligned}$$
(5)
$$\begin{aligned} G_y= & {} [f(x-1,y-1)+2f(x-1,y)\nonumber \\&\quad +\,f(x-1,y+1)]- {[}f(x+1,y-1)\nonumber \\&\quad +\,2f(x+1,y)+f(x+1,y+1)], \end{aligned}$$
(6)

where f(xy) is the gray value at pixel (xy) in the source image I. For each pixel (xy), we can obtain the corresponding value of G. Assume that the source image I=(\(a_{ij})_{m\times n}\), as shown in Fig. 3, the edge intensity image G=(\(G_{ij})_{m\times n}\) can be obtained by (3), which is shown in Fig. 4. We define \(g=\frac{1}{m\times n}\sum _{i=1}^m {\sum _{j=1}^n {G_{ij}}}\), and use g to represent the intensity of the image edge. The edge intensity values of g for Fig. 4 are given in Table 2.

The Sobel edge operator can provide more edges information, and it contains more accurate edges direction information of the image pixels where the gradient is the highest. In general, the focused image will contain more edges’ detail information compared with the non-focus image, so using edge intensity to measure the sharpness is feasible. The experiments are shown in Figs. 3 and 4, the focused region images and the corresponding non-focus region images of six multi-focus images are given in Fig. 3, and the edge intensity images are shown in Fig. 4.

Fig. 4
figure 4

Edge intensity images of Fig. 3

It can be seen from Fig. 3 that the focused image contains more detail and clearer edge information. Table 2 shows the edge intensity values of Fig. 3, in which a larger edge intensity value means that the image contains more edge details.

Table 2 Edge intensity values of Fig. 4
Fig. 5
figure 5

The gray-level difference images of Fig. 3

Table 3 Absolute values of gray-level difference of Fig. 3

2.3 Gray-level difference

The gray image is usually obtained by measuring the brightness of each pixel in the visible light spectrum; therefore, the gray value of the image can reflect the local feature; and the gray-level difference is commonly used in image enhancement and image segmentation (Ji et al. 2016). In this paper, we improve the image gray-level difference as the third evaluation index. Suppose that an image is \(m\times n\), the absolute value of gray-level difference d can be defined as follows:

$$\begin{aligned} I_x= & {} I(x+1,y)-I(x,y), \end{aligned}$$
(7)
$$\begin{aligned} I_y= & {} I(x,y+1)-I(x,y), \end{aligned}$$
(8)
$$\begin{aligned} d= & {} \frac{\sum \limits _{x=1}^m {\left| {I_x } \right| } +\sum \limits _{y=1}^n {\left| {I_y } \right| } }{m\times n}. \end{aligned}$$
(9)

The gray-level difference images of Fig. 3 are shown in Fig. 5, and the d of them is given in Table 3.

Fig. 6
figure 6

Three-parameter diagram of two multi-focus images. a\(\lambda \) of the source image. bg of the source image. cd of the source image. d Source image and focus-region-level partition

2.4 Focus-region-level partition by BP neural network

The key idea of BP neural network is to revise the network weighted by training sample data to minimize the errors, so as to make the results approximate the desired output. It has been widely used in function approximation, pattern recognition, classification and other fields. BP neural network can be simply described by (1013):

$$\begin{aligned} X_i= & {} \sum {\omega _{ij} } u_j , \end{aligned}$$
(10)
$$\begin{aligned} Y_i= & {} f(X_j), \end{aligned}$$
(11)

where X is the input, Y is the output of hidden layer, and the transformation function is:

$$\begin{aligned} f(x)=\frac{1}{1+e^{-x}}. \end{aligned}$$
(12)

And the output O of the output layer can be defined as follows:

$$\begin{aligned} S_k= & {} \sum {\omega _{jk} } Y_j , \end{aligned}$$
(13)
$$\begin{aligned} O_k= & {} f(S_k ), \end{aligned}$$
(14)

where Y is the input of the output layer, and assume the desired output is P, then the output error is:

$$\begin{aligned} E=\frac{\sum _k {\left( {O_k -P_k } \right) ^{2}} }{2}. \end{aligned}$$
(15)

From Sects. 2.1 to 2.3, we can learn the three parameters (\(\lambda \), gd) and measure the definition of the image. In order to train the BP neural network, the differences of the three parameters between the two images are used as BP training parameters. An experimental result of the multi-focus image is given in Fig. 6, and the size of the source image is \(640\times 480\). We divide it into the \(40\times 40\) size image and get 192 images. And then we calculate the differences of normalized \(\lambda \), g and d of two source images, which are described as:

$$\begin{aligned} \lambda= & {} \lambda _1 /\max (\lambda _1 )-\lambda _2 /\max (\lambda _2 ), \end{aligned}$$
(16)
$$\begin{aligned} g= & {} g_1 /\max (g_1 )-g_2 /\max (g_2 ), \end{aligned}$$
(17)
$$\begin{aligned} d= & {} d_1 /\max (d_1 )-d_2 /\max (d_2 ). \end{aligned}$$
(18)

The three parameters of two multi-focus images are given in Fig. 6. The x-axis shows the sequence number of images, and the y-axis represents the difference of parameter between the two corresponding multi-focus images. The values of \(\lambda \), g and d can be used to measure which region is the focused region. In order to show the differences of these parameters in detail, the detail values of g in Fig. 6c are shown in Fig. 7.

Fig. 7
figure 7

Edge intensity g of two multi-focus images

Fig. 8
figure 8

BP neural network model

Fig. 9
figure 9

Experimental results of focus region partition. a Multi-focus images. b Location of the focused region. c Focused region of the multi-focus image. d Focus-region-level partition

In Fig. 6 the relationship between the three indexes and their corresponding subimage’s clarity is shown. We use the three indexes as the training data of BP, and BP neural network model is shown in Fig. 8.

The BP neural network training parameters are set to the expected error is \(10^{-6}\), and the maximum number of training iteration of network is 5000. We selected ten fully focused images, processed Gaussian blurred, then divided each image into the \(40\times 40\) size image and obtained 192 image sequences, and then 1920 pairs of focused image sequences and blurred image sequences are obtained, which are used to train BP neural network. We create the BP neural network: net=newff(minmax(P),[6,6,1],{‘tansig’,‘tansig’,‘purelin’},‘trainlm’). BP consists of three parts: one input layer which has three neurons, two hidden layers which each layer has six neurons and one output layer which is to determine whether the input image is focused image or not. And we set tansig as the transfer function of the hidden layers, which is described as:

$$\begin{aligned} f(x)=\frac{2}{1+e^{-2x}}-1. \end{aligned}$$
(19)

Then we can divide the focused region of the multi-focus image by the trained BP neural network. And the experimental results of focus region partition are given in Fig. 9.

3 Proposed fusion scheme

The proposed fusion scheme will be discussed in detail in this section, which is shown in Fig. 10. As discussed above, the focused region of the source multi-focus image which we have separated should be fused into the final image as much as possible. Next we discuss the non-focus region fusion rules.

Fig. 10
figure 10

Proposed fusion method flow chart

Fig. 11
figure 11

NSCT framework. a The decomposition framework of NSCT. b Ideal frequency partitioning

For the non-focus regions, we will fuse them by Gaussian blurred and PCNN-based method in the NSCT domain. We will make a brief introduction for NSCT and PCNN as follows.

3.1 Nonsubsampled contourlet transform

Nonsubsampled contourlet transform (NSCT) is an effective tool for image decomposition, which is derived from contourlet transform (CT), and the NSCT frame is shown in Fig. 11.

As shown in Fig. 11, NSCT contains two parts: the nonsubsampled pyramids filter banks (NSPFB) and nonsubsampled directional filter banks. Therefore, it has many advantages, such as avoiding frequency aliasing phenomenon. The two filter banks endow the ability of multi-scale and multi-direction decomposition in NSCT. One image is decomposed by NSCT to obtain a lowpass image and a series of bandpass images, and the subimages are same size as the source image. Figure 12 gives an example of NSCT decomposition.

Fig. 12
figure 12

NSCT decomposition example. a Source image. b Lowpass image. c,d Images of the detailed coefficients at level 1. e,f,g,h Images of the detailed coefficients at level 2

3.2 Pulse-coupled neural network (PCNN)

The basic neuron of PCNN can be divided into three parts: the receptive field, the modulation field and the pulse generator, which are shown in Fig. 13.

Fig. 13
figure 13

The typical structure of PCNN

The receptive filed, modulation field and pulse generator can be described in detail by (20)-(25)

$$\begin{aligned} F_{ij} (n)= & {} V_F \sum _{k,l} {W_{kj} } Y_{kl} (n-1)+F_{ij} (n)e^{-\alpha _{\mathrm{F}} }+S_{ij} ,\nonumber \\ \end{aligned}$$
(20)
$$\begin{aligned} L_{ij} (n)= & {} e^{-\alpha ^{\mathrm {L}}}L_{ij} \left( n \right) \hbox {+}V_{^{L}} \sum _{kl} {M_{kj} } Y_{ijkl} (n-1), \end{aligned}$$
(21)
$$\begin{aligned} U_{ij} (n)= & {} F_{ij} (n)[1+\beta L_{ij} (n)], \end{aligned}$$
(22)
$$\begin{aligned} \theta _{ij} (n)= & {} e^{-\alpha _{^{\theta }} }\theta _{ij} (n-1)+V_T Y_{ij} (n-1), \end{aligned}$$
(23)
$$\begin{aligned} Y_{ij} (n)= & {} \left\{ {\begin{array}{l} 1\;,U_{ij} (n)>\theta _{ij} (n) \\ 0\;,\mathrm{otherwise} \\ \end{array}} \right. , \end{aligned}$$
(24)
$$\begin{aligned} T_{ij} (n)= & {} \left\{ {\begin{array}{l} n, if Y_{ij} (n)=1, \mathrm{for}\, \mathrm{the}\, \mathrm{first}\, \mathrm{time} \\ T_{ij} (n-1), \mathrm{otherwise} \\ \end{array}} \right. . \end{aligned}$$
(25)

In (20) and (21), \(S_{ij}\) is the input stimulus at pixel (ij) in the source image. \(F_{ij}\) is the feeding input of the pixel. Matrices M and W are the constant synaptic weighted. \(\beta \) is the linking strength of the neuron. \(\alpha _F\) and \(\alpha _L\) are the time constants. And \(Y_{ij}\) is the output of the neuron at (ij). The linking coefficient \(\beta \) is a key parameter in the PCNN model, which can vary the weighting of the linking channel. And in our proposed fusion method, the spatial frequency (SF) of the bandpass subimages will be used as the linking coefficient \(\beta \).

3.3 Non-focus region fusion rules

For the non-focus regions of the source multi-focus images, one lowpass image and a series of bandpass images are obtained by NSCT decomposition firstly. Then we fuse them with different rules.

3.3.1 Lowpass subband fusion rules

The common processing methods in low-frequency domain are weighted average-based methods; these lowpass subimage fusion rules usually cannot retain the low-frequency information to the final results perfectly, which will loss the partial details of the source image. To solve this problem, we use Gaussian blurred method (Jamal and Karim 2012) to fuse the lowpass subimage in our proposed method, which can be described as follows:

$$\begin{aligned} C_F^L (i,j)= & {} w_A (i,j)\times C_A^L (i,j)+w_B (i,j)\times C_B^L (i,j),\nonumber \\ \end{aligned}$$
(26)
$$\begin{aligned} w_A (i,j)= & {} \exp \left[ {-\frac{(C_B^L (i,j)-\mu )^{2}}{2(\tau \sigma )^{2}}} \right] , \end{aligned}$$
(27)
$$\begin{aligned} w_B (i,j)= & {} 1-w_A (i,j). \end{aligned}$$
(28)

In (26), \(C_F^L (i,j)\) is the low-frequency coefficient of the final result, \(C_A^L (i,j)\) is the low-frequency coefficient of the non-focus regions in the source image A, \(C_B^L (i,j)\) is the low-frequency coefficient of the non-focus regions in the source image B. Equations (27) and (28) are the fusing weighted coefficients. In (27), \(\mu \) and \(\sigma \) are the mean and variance of the non-focus regions in the source image B, and \(\tau \) is the adjustment factor of Gaussian blurred function. And the Gaussian blurred function curve is shown in Fig. 14. In the proposed method, we set \(\tau =1\).

Fig. 14
figure 14

Gaussian blurred curve with different \(\tau \)

3.3.2 Bandpass subband fusion rules

We can see from Fig. 12 that the high-frequency details of information, texture and edge are included in the bandpass images. In the proposed method, PCNN is used to process these bandpass images. In the PCNN model, due to the spatial frequency (SF) can reflect the overall definition level of the source image, we use the SF of the input image to determine the linking strength \(\beta \). SF is described as follows:

$$\begin{aligned} \mathrm{SF}= & {} \sqrt{\mathrm{RF}^{2}+CF^{2}}, \end{aligned}$$
(29)
$$\begin{aligned} \mathrm{RF}= & {} \sqrt{{\begin{array}{ll} {\frac{1}{M\times N}}&{} {\sum _{i=1}^M {\sum _{j=2}^N {[F(i,j)-F(i,j-1)]^{2}} } } \\ \end{array} }}, \end{aligned}$$
(30)
$$\begin{aligned} \mathrm{CF}= & {} \sqrt{{\begin{array}{ll} {\frac{1}{M\times N}}&{} {\sum _{i=1}^N {\sum _{j=2}^M {[F(i,j)-F(i,j-1)]^{2}} } } \\ \end{array} }}. \end{aligned}$$
(31)

In (29), RF is the spatial low frequency, CF is the spatial column frequency, F is the image, the size of F is \(M\times N\), and the fused bandpass coefficients \(C_{F,ij} \) can be determined as follows:

$$\begin{aligned} C_{F,ij} =\left\{ {{\begin{array}{l} {C_{\mathrm{A},ij} ,T_{\mathrm{A},ij} (n)\ge T_{\mathrm{B},ij} (n)} \\ {C_{\mathrm{B},ij} ,T_{\mathrm{A},ij} (n)<T_{\mathrm{B},ij} (n)} \\ \end{array} }} \right. , \end{aligned}$$
(32)

\(C_{\mathrm{A},ij} \) and \(C_{\mathrm{B},ij} \) are the bandpass coefficients of the non-focus regions in source image A and B. \(T_{\mathrm{A},ij} (n)\) and \(T_{\mathrm{B},ij} (n)\) denote time matrix of each neuron, which is obtained by (25).

3.4 Fusion steps

The fusion framework presented in this paper is shown in Fig. 10, which can be described concretely as follows:

Input: Source multi-focus images A and B.

Step 1 :

Perform the focus-region-level partition by trained BP neural network and get the focused regions and the non-focus regions of the source images.

Step 2 :

Perform NSCT in the non-focus regions and then obtain one lowpass subband image and a series of bandpass subband images for each source image.

Step 3 :

For the lowpass subband image, Gaussian blurred fusion algorithm is used to produce the fused lowpass coefficient, which is described by (2628).

Step 4 :

For the bandpass subband images, SF-PCNN is used to produce the fused bandpass coefficients, which is described by (2931).

Step 5 :

Fused non-focus regions are produced by NSCT reconstruction.

Step 6 :

Fuse the focused regions and the fuse non-focus regions to produce the final fusion image.

Fig. 15
figure 15

First experimental results using different methods. a, b Multi-focus images. c PCA. d DWT. e PCNN. f NSCT. g LP-PCNN. h NSCT-PCNN. i Proposed method

Fig. 16
figure 16

Details of enlarged scale. a PCA. b DWT. c PCNN. d NSCT. e LP-PCNN. f NSCT-PCNN. g Proposed method

Fig. 17
figure 17

Bar chart comparison of MI, VIF and \(Q^{AB/F}\) values for the first example

Fig. 18
figure 18

Second experimental results using different methods. a, b Multi-focus images. c PCA. d DWT. e PCNN. f NSCT. g LP-PCNN. h NSCT-PCNN. i Proposed method

Fig. 19
figure 19

Details of enlarged scale. a Source image A. b Source image B. c PCA. d DWT. e PCNN. f NSCT. g LP-PCNN. h NSCT-PCNN. i Proposed method

Fig. 20
figure 20

Third experimental results using different methods. a, b Multi-focus images. c PCA. d DWT. e PCNN. f NSCT. g LP-PCNN. h NSCT-PCNN. i Proposed method

Fig. 21
figure 21

Details of enlarged scale. a Source image A. b Source image B. c PCA. d DWT. e PCNN. f NSCT. g LP-PCNN. h NSCT-PCNN. i Proposed method

Table 4 Objective evaluation indexes for various fusion results

4 Experimental results and analysis

In this section, we use three groups of experiments to illustrate the feasibility and practicability of the proposed fusion algorithm. And all simulations are conducted in MATLAB 2014a. We first introduce the experimental parameters setting and then discuss the fusion results compared with other methods.

5 Experiments introduction

In all experiments, the parameters of PCNN are set as \(\mathop \alpha \nolimits _\theta =0.2. \), \(\mathop \alpha \nolimits _L =0.05\), \(\mathop V\nolimits _L =0.02\), \(\mathop V\nolimits _\theta =40\),\(N=200\), \(M=W =[0.707 1 0.707; 1 0 1; 0.707 1 0.707 ]\), and NSCT-based method, we set “pkva” and “9-7” as the pyramid and the direction filter. Moreover, we choose the other six current fusion methods as the comparison algorithms: PCA-based method, discrete wavelet transform (DWT)-based method, PCNN-based method, NSCT-based method, Laplacian pyramid transform (LP)-PCNN-based method and NSCT-PCNN-based method. And for multi-scale decomposition methods, the decomposition level is set to 3, and we use “averaging” to fuse the lowpass subimages, and the bandpass subimages are fused by “absolute maximum choosing.”

In order to objectively illustrate the fusion results, we adopt three objective indicators as the evaluation indexes: mutual information (MI), pixel of visual information (VIF) and edge gradient operator (\(Q^{\mathrm{AB}/F})\). MI is used to measure the amount of the information of the source images retained in the fused image. VIF is an evaluation index for human visual system. \(Q^{\mathrm{AB}/F}\) is used to measure the fused image how much edges information is obtained from the source images (Sheikh and Bovik 2006). Generally, the larger the values of these three indexes are, the better the quality of the fused image is.

5.1 Fusion results and discussion

The first set of results are given in Fig. 15, where Fig. 15a, b shows the multi-focus images, and (c–i) shows the fusion results using PCA, DWT, PCNN, NSCT, LP-PCNN, NSCT-PCNN and proposed method. We can see that from Fig. 15a, b, only the focused region is clear, and the other regions are fuzzy, which causes the lack of the information of the multi-focus image and the poor quality of single source multi-focus image. The fused images have more information compared with the source images and contain two source images details, and the fused image’s quality and reliability were improved compared with the single source image, as shown in Fig. 15c–i. However, the quality of the fused image obtained by different methods is not exactly the same. In order to better reflect their differences, Fig. 16 shows the detail with enlarged scale of Fig. 15.

Figure 17 shows the comparison of MI, VIF, \(Q^{AB/F}\) values between different methods for first example. The x-axis denotes the different fusion methods, and the y-axis shows the values of MI, VIF, \(Q^{AB/F}\). From Figs. 16 and 17, we can see that the proposed fusion method can outperform other traditional fusion methods, and the fused image of our proposed method has better visual effect; it contains more details and more information of the focus region in the source image are preserved.

Second and third experiments are tested by different multi source images, and the results are given in Figs. 18, 19, 20, 21. Figures 19 and 21 are the entails of enlarged scale images of Figs. 18 and 20, respectively.

From Figs. 19i and 21i, we can see that the fused image obtained by the proposed method contains more edges information and details, which is easily observed by the human visual system. The objective evaluation indexes are given in Table 4. Through these experiments, from Table 4, we can summarize that the proposed fusion algorithm is feasible, and it outperforms than other traditional fusion algorithms.

The best result for each evaluation is highlighted in bold

6 Conclusion

This paper presents a novel fusion scheme for the multi-focus image based on focus-region-level partition. The first step is to put forward three kinds of evaluation functions to measure the definition of the image, then the focused region of the source multi-focus image is divided by BP neural network, and the focused regions and the non-focus regions are obtained, respectively. Next the non-focus regions will be fused in NSCT domain. The Gaussian blurred is used to produce lowpass coefficients, SF-PCNN is used to produce bandpass coefficients, and then the fused non-focus regions are produced by NSCT reconstruction. Finally, the fused image is produced by fusing the focus regions and the fused non-focus regions. Experimental results show that the proposed fusion scheme not only retains more clear pixels of two source images, but also preserves more details and edges information of the non-focus regions. By comparison between other traditional fusion methods, the proposed method can achieve superior results in visual inspection and objective evaluations.