1 Introduction

Nowadays, groups of researchers around the globe are working to develop systems for monitoring the health of aquatic systems. World ecosystems, including aquatic (marine and freshwater) environments, integrate both abiotic and biotic components. Aquatic ecosystems are mainly divided into two categories: freshwater ecosystems and marine ecosystems. Three-fourths of the Earth’s surface is covered by the marine ecosystems and only 0.78% of the Earth’s surface is covered by the freshwaters. To monitor the health of any aquatic ecosystem, diatoms are important organisms provided their biological features (see Chap. 2). Diatoms are single-celled organisms containing light-absorbing molecules, which are responsible for photosynthesis. It has been stated that 20% of the oxygen is produced by marine microalgae (mainly diatoms) that remove a huge amount of carbon dioxide (CO2) from the planet. The structural layers outside the cell membrane made of silica are called frustules. The nano-scale patterns in the frustules yield unique features that are used for the identification and classification of diatoms.

Almost every diatom is microscopic and the cell size of the diatom is between 2 μm to 2 mm. To visualize the features in cell structure, microscopy imaging techniques have been developed. In light microscopy (LM), visible light and magnifying lenses are used to visualize the cell structure. When diatoms are seen with a LM, the nano-scale patterns in the frustules appear transparent. Since most diatom walls have a 3D structure, therefore the majority cannot be adequately represented in a single focal plane. Many recent computational photography techniques play a significant role to overcome the limitation of LM to handle 3D structure of diatom walls ornamented by intricate and striking patterns of silica. In many of these techniques, it is often desirable to fuse details from images captured at different focal planes so that both contour and striation are well defined for feature extraction. These cell wall features are vital for distinguishing diatom species.

In recent years, several fusion algorithms have been developed to combine substantial information from multiple input images into a single composite image. For instance, most of the image fusion techniques are based on multi-resolution decomposition [1, 2]. The principal motivation for image fusion is to extend the depth of field (DOF) [3], to improve spatial and temporal resolution [4, 5], and finally to extend the dynamic range of the fused image in the case of the multiexposure techniques [6]. Image fusion has a fundamental difficulty in preventing artifacts and preserving local contrast when fusing the characteristics recorded from the source data, such as exposure value, focusing, modality, and environmental conditions. In particular, the choice of parameter settings during input stacks acquisition will have a large impact on the outcome of the image fusion framework. Two examples of multifocus and multiexposure stacks, and corresponding fusion results are shown in Fig. 10.1. The automated procedure of extracting all the meaningful details from the input images to a final fused image is the main motivation of image fusion.

Fig. 10.1
figure 1

Top row: (a) example of a multifocus stack and (b) example of a multiexposure stack. Bottom row: (c) fused image obtained from image stack shown in (a) and (d) fused image obtained from image stack shown in (b)

In this chapter, we present a two-scale decomposition-based weighted average multifocus image fusion (TSD-MF) method. The aim of two-scale decomposition (TSD) is to produce a detail-enhanced image from a set of images that have been taken at different focal lengths of a camera sensor. Since the goal of mutifocus image fusion is to create a new image that is focused throughout, it is always a challenging task to decide a local measure of information content of source images. In the case that optical defocus is the major source of quality degradation, it is natural to assume that the image region that has sharper edges is more active and thus more informative [7]. In existing multifocus image fusion techniques, a common implicit assumption is that finding the sharper edges regions is associated with finding the focused region.

This chapter is organized as follows. In Sect. 10.2, we explore in detail one multifocus image fusion algorithm to extend the DOF by combining in-focus details from several images of diatom species captured by LM. In order to obtain a sharp valve contour and striation pattern, images captured at different exposure settings can be fused. This example of multiexposure image fusion is described in Sect. 10.3. The fusion quality metrics are presented in Sect. 10.4. To perform a rapid mathematical calculation using graphics processing unit (GPU), the efficient implementation of the proposed fusion method is discussed in Sect. 10.5. Finally, Sect. 10.6 contains concluding remarks.

2 Multifocus Fusion Methods

2.1 Two-Scale Decomposition (TSD)

The first TSD-based image fusion was introduced in 2013 by Li et al. [8]. They aimed to propose fast two-scale image fusion techniques that do not rely heavily on a specific multi-resolution method. A simple average filter was used to decompose source images into base layers (BLs) and detail Layers (DLs). To perform weighted average fusion, the guided filtering (GF) [9] based spatial consistency principal was introduced. In this study, we selected an edge-preserving filter (EPF) based two-scale fusion approach [10]. The aim of EPF-based decomposition is to better approximate the DL for detail enhancement. Applying this fusion technique requires additional computation but performs better for both multifocus and multiexposure microscopy data sets.

Let In be the nth source image which needs to be operated by an EPF. In order to compute BL and DL, we first decompose source images into two-scale representations by using anisotropic diffusion [11]. The BL Bn of each source image is obtained as follows:

$$\displaystyle \begin{aligned} B_{s,n}^{t+1} &= I_{s,n}^{t} + \frac{\gamma}{|\eta_{s}|}\left[g_{N} \cdot \nabla_{N_{s}}I_{n} + g_{S} \cdot \nabla_{S}I_{n} + g_{E} \cdot \nabla_{E}I_{n} \right. \\ & \quad\left. + g_{W} \cdot \nabla_{W}I_{n}\right]_{s}^{t} \end{aligned} $$
(10.1)

where ∇N, ∇S, ∇E, and ∇W indicate the difference of North, South, East, and West neighbor for pixel position s, respectively. The corresponding diffusion coefficients [11] gN, gS, gE, and gW are computed from local window of size (3 × 3).The pictorial view of gradient computation from 1-D grid structure and 2-D grid structure is illustrated in Fig. 10.2. The diffusion functions g(⋅) used in our TSD approach can be defined as follows:

$$\displaystyle \begin{aligned} g(\nabla I)=\mathrm{e}^{\left (-{\left (\frac {\| \nabla I \|}{K}\right )}^2 \right)} \end{aligned} $$
(10.2)
Fig. 10.2
figure 2

Gradient computation: (a) from 1-D grid structure by considering left (L) and right (R) neighbors and (b) from 2-D grid structure by considering North (N), South (S), East (E), and West (W) neighbors

and in Eq. 10.1, the variable t determines iterations, the constant γ is a scalar that determines the rate of diffusion, ηs represents the spatial neighborhoods of current sample position s, and |ηs| is the number of neighbors. In this chapter, these parameters are empirically determined. In practice, a set of t = 5, γ = 1∕7 yield plausible results.

Once the BL is computed for each nth input image, the DL Dn can be directly calculated by subtracting the Bn from the corresponding source image In as follows:

$$\displaystyle \begin{aligned} D_{n}=I_{n}-B_{n} \end{aligned} $$
(10.3)

To see the behavior of Perona et al. [11] filter at edges, we first analyze 1-D signal into BL and DL. As can be seen in Fig. 10.3, in BL (i.e., the coarser level after diffusion), high-frequency textures disappear. The weak texture details filtered out from the BL are exactly reconstructed in the DL.

Fig. 10.3
figure 3

The TSD of 1-D signal based on EPF after 5 iterations with K = 30, γ = 1∕3, and |ηs| = 2 (left and right neighbors). The 1-D input signal (I) is decomposed into two main components: a low-frequency BL and a high-frequency DL. Notice that the edges are preserved in the diffused image (i.e., BL) and the DL yields fine details only

The BL and DL decomposition of Actinocyclus ralfsii image data set is illustrated in Fig. 10.4. From Fig. 10.4, it can be visually seen that the BL provides coarse details and the textures are almost eliminated. The texture details compressed in the BL are exactly reconstructed in DL.

Fig. 10.4
figure 4

(a–c) Multifocus images of Actinocyclus ralfsii diatom species. (d–f) Example of BL decomposition. (g–i) Example of DL decomposition. In order to better visualize the DLs, a constant value is added to each pixel, e.g., 100

2.2 Detection of a Focused Region and Weight Map Computation

Sharp edge details are important determinants in weight map computation. A traditional view is that an in-focus region yields sharper edge details than out-of-focus region. A wide variety of criteria functions have been proposed in the literature for selecting the in-focus region from source images to construct an all-in-focus fused image. Examples of these criteria functions include variance (VAR), Tenengrad (TNG), spatial frequency (SF), energy of image gradients (EOG), energy of Laplacian (EOL), sum modified Laplacian (SML), Laplacian of Gaussian (LOG) [12], and frequency selective weighted median filter (FSWM) [13]. The authors in [13] have provided comparative studies of different criteria functions. In literature [14, 15], EOL-based criteria function provided better performance for multifocus image fusion than SF and EOG.

In this chapter, to identify the block having high frequencies, the EOL is computed in each M × N block of input images. The higher value of EOL indicates that an image block is in-focus and the lower value of EOL indicates that an image block is faint/out-of-focus. The EOL of the nth input image is computed as follows:

$$\displaystyle \begin{aligned} {\mathrm{EOL}}_{n}=\sum_{i}\sum_{j} (\nabla^2 I_{n}(i,j))^2 \end{aligned} $$
(10.4)

The 3 × 3 Laplacian kernel: [0, −1, 0;−1, 4, −1;0, −1, 0] is used for gradient computation. The result of computing an EOL with this Laplacian kernel is a filtered image that contains strong edges in areas where rich details are present. In order to compute initial saliency maps SMn, we apply the Gaussian low pass filter G having 5 × 5 symmetric kernel size r with standard deviation σ = 5, which is formulated as follows:

$$\displaystyle \begin{aligned} SM_{n}=|{\mathrm{EOL}}_{n}| \otimes G_{r,\sigma} \end{aligned} $$
(10.5)

where ⊗ denotes the convolution operator. Next, the saliency maps are compared to determine the weight maps as follows:

$$\displaystyle \begin{aligned} WM_{n}^k= \begin{cases} 1 & \text{if}~SM_{n}^k={\mathrm{max}}\left(SM_{1}^k,SM_{2}^k, \ldots, SM_{N}^k\right), \\ 0 & \text{otherwise} \end{cases} \end{aligned} $$
(10.6)

where \(WM_{n}^k\) and \(SM_{n}^k\) are, respectively, the weight map and saliency value of the pixel k in the nth image.

Figure 10.5a–c shows the saliency maps computed by using Eq. 10.5, and the initial weight maps (computed by using Eq. 10.6) are shown in Fig. 10.5d–f. From Fig. 10.5d–f, we can easily notice that the initial weight maps computing by using Eq. 10.6 yield noisy outputs that are not suitable for the fusion process. To refine these noisy weight maps, weighted least squares (WLS) [16] based weight map refinement process is introduced in our fusion algorithm, which is discussed in the forthcoming section.

Fig. 10.5
figure 5

(a–c) Saliency maps of Actinocyclus ralfsii diatom species, computed using Eq. 10.5 and (d–f) noisy weight maps computed using Eq. 10.6

2.3 Weight Map Refinement

The weight maps computed in Eq. 10.6 are hard, noisy, and not aligned with the object boundaries. In our implementation, the WLS framework [16] is used for weight map refinement. WLS-based edge-preserving operator may be viewed as a compromise between two possible contradictory goals. Given an input image v, we seek a new image w, which, on the one hand, is as close as possible to v, and at the same time, is as smooth as possible everywhere, except across significant gradients in v. To achieve these objectives, we seek to minimize the following quadratic functional:

$$\displaystyle \begin{aligned} \sum_{p}\!\left(\!\left(w_p-v_p\right)^2 {\,+\,} \gamma \!\left(\! q_{x,p}(v)\! \left(\! \frac{\partial w}{\partial x} \!\right)_p^2\! + q_{y,p}(v) \!\left( \! \frac{\partial w}{\partial y} \right)_{p}^{2}\right)\!\right) \end{aligned} $$
(10.7)

where the subscript p denotes the spatial location of a pixel. The goal of the expression term (wpvp)2 is to minimize the distance between w and v, while the second (regularization) term strives to achieve smoothness by minimizing the partial derivatives of w. The smoothness requirement is enforced in a spatially varying manner via the smoothness weights qx and qy, which depend on v:

$$\displaystyle \begin{aligned} q_{x,p}(v)&=\left(\left|{{\frac{\partial l}{\partial x}(p)}}\right|{}^\alpha +\epsilon\right)^{-1},\\ q_{y,p}(v)&=\left(\left|{{\frac{\partial l}{\partial y}(p)}}\right|{}^\alpha +\epsilon\right)^{-1} {} \end{aligned} $$
(10.8)

where l is the log-luminance channel of the input image v, the exponent α (typically between 1.2 and 2.0) determines the sensitivity to the gradients of v, while 𝜖 is a small constant (typically 0.0001) that prevents division by zero in areas where v is constant.

Let w = WLSγ,α,𝜖(v) represent the WLS filtering operation. In our case, \(WM_{n}^k\) computed in Eq. 10.6 serves as the input image to WLS filter (i.e., \(v= WM_{n}^k\)), and \(W_{n}^B\) or \(W_{n}^D\) is the output of WLS filter (\(w=W_{n}^B\) for BL and \(w=W_{n}^D\) for DL). More specifically, the coarser version of weight map \(WM_{n}^k\) will serve as refined weight map for nth base layer and detail layer:

$$\displaystyle \begin{aligned} W_{n}^B=WLS_{\gamma_{1},\alpha_{1},\epsilon}(v)\end{aligned} $$
(10.9)
$$\displaystyle \begin{aligned} W_{n}^D=WLS_{\gamma_{2},\alpha_{2},\epsilon}(v) \end{aligned} $$
(10.10)

where \(W_{n}^B\) and \(W_{n}^D\) are refined weight maps for corresponding Bn and Dn, respectively. We have found that the weight refinement for most of the data sets with the parameters γ1 = 1.2 and α1 = 0.9 is suitable for the fusion of BLs. In Eq. 10.10, γ2 = 0.2 and α2 = 0.1 will work for preserving details in the fused image.

The refined weight maps computed from Eq. 10.9 are shown in Fig. 10.6a–c. To demonstrate in-focus region selection efficiently, different selected regions are visualized in false color, as shown in Fig. 10.6d. The false color image visualizes that the sharp regions across source images (shown in Fig. 10.4a–c) are detected perfectly. The red shows where the first input image contributes, green the second image, and blue shows the contribution of the third image.

Fig. 10.6
figure 6

(a–c) Refined weight maps computed from Eq. 10.9 and (d) false color visualization of in-focus regions detected across source images

2.4 Weighted Average Fusion of BL and DL

The final step of weight map refinement consists of weight map normalization so that they sum to one at each pixel k. These normalized weight maps are used to compute fused base layer BF and fused detail layer DF as follows:

$$\displaystyle \begin{aligned} B_{F}=\displaystyle\sum_{n=1}^{N} W_{n}^B B_{n}\end{aligned} $$
(10.11)
$$\displaystyle \begin{aligned} D_{F}=\displaystyle\sum_{n=1}^{N} W_{n}^D D_{n}\end{aligned} $$
(10.12)

and the resulting fused image IF can be directly calculated as follows:

$$\displaystyle \begin{aligned} I_{F}=B_{F}+D_{F}\end{aligned} $$
(10.13)

To enhance details, we found that a simple interactive tool is very effective for DL manipulation. The enhanced detail fused image can be computed by:

$$\displaystyle \begin{aligned} I_{F}^e=B_{F}+\displaystyle\sum_{n=1}^{N} W_{n}^D S(a,D_{n})\end{aligned} $$
(10.14)

where S is a sigmoid function S = 1∕(1 + exp(−ax)), applied on nth DL for detail manipulation. The parameter a is the user-defined boosting factor, which can be selected empirically. We have found that the free parameter a = 4 is very effective for boosting fine details in the fused image with fewer visible artifacts near strong edges. The effective manipulation range is very wide and will vary in accordance with the texture details present in the source images.

Figure 10.7a shows a fused base layer BF that is produced from a sequence of three base layers of Actinocyclus ralfsii diatom species, shown in Fig. 10.4d–f. An example result of detail layer fusion using Eq. 10.12 is shown in Fig. 10.7b, along with the final fused image shown in Fig. 10.7c. From the image shown in Fig. 10.7c, We can see that single all-in-focus image is produced from three partially focused images. Most notable, the fine details appear without introducing visible artifacts.

Fig. 10.7
figure 7

(a) Fused base layer BF was computed using Eq. 10.11, (b) fused detail layer DF was computed using Eq. 10.12, and (c) fused image IF was computed using Eq. 10.13

3 Exposure Fusion (EF) vs High Dynamic Range (HDR)

Nowadays, much research is going on for developing exposure fusion techniques for real-world scenes. Yet, multiexposure image fusion which enhances the contrast of fused image has not been applied to diatoms. Actually, study and characterization of intricate micrometer-sized silica pattern of diatom becomes more desirable for classification of species in different aquatic systems. Recently, Ferrara et al. have found that the period of exposure of the sensor plays an important role to exploit the optical properties of micro- and nano-structures of diatoms [17]. Beyond visualization and localization of intricate micrometer-sized silica pattern, on the other hand, the full dynamic range of the sensor can be utilized by variable exposure time to produce more details. In this section, we hypothesize that multiexposure image fusion techniques are expected to provide better performance in diatom identification and classification.

In recent years, several techniques have been developed that are capable of providing precise representation of complete information of shadows and highlights present in the real-world natural scenes [6]. The direct 8-bit gray and 24-bit RGB representation of visual data, with the standard digital cameras at single exposure settings, often causes loss of information. This is because the dynamic range of most scenes is beyond what can be captured by the standard digital cameras. Such representation is referred to as low dynamic range (LDR) image. To handle such typical cases, digital cameras have the aperture setting, exposure time, and ISO value that regulate the amount of light to be captured by the sensors. Therefore, it is important to somehow determine exposure setting for controlling the response of charge coupled device (CCD). For single exposure setting, either detail in the poorly illuminated area (i.e., shadows) is visible with long exposure or brightly illuminated area (i.e., highlights) with short exposure. Thus, the image captured by the standard digital camera for single exposure setting is partially over- or underexposed. As a result, there will always be a need to capture the detail of the entire scene with a sufficient number and value of exposures.

The idealized response of digital camera sensor from highlights, mid-tones, and shadows present in the scene is shown in Fig. 10.8. In Fig. 10.8, the graph represents collected charges versus exposure times for three illuminations. We assume that both the aperture and ISO setting are kept fixed over integration times (tint1, tint2, and tint3). There is one setting (i.e., exposure time), which determines how long the sensor is exposed to light. The charge (Q) produced at the end of integration is a functional f[.] of the current I(t) over the integration time 0 ≤ t ≤ tint. When the sensor is operating in integration mode, functional f[.] is given by:

$$\displaystyle \begin{aligned} f[x]= \int_{0}^{t_{int}} I(t)\,dt \end{aligned} $$
(10.15)
Fig. 10.8
figure 8

Idealized repose of CCD sensor from highlights, mid-tones, and shadows present the scene. Image courtesy of [18]

As depicted in the graph shown in Fig. 10.8, in difficult lighting situations, where highlights, shadows, and mid-tones appear simultaneously, the camera sensors under the influence of highlights are saturated at integration time (tint1). As the exposure time increases further, the camera sensors under the influence of mid-tones are saturated at integration time (tint2). Similarly, as we have seenforintegrationtime(tint3), the camera sensors under the influence of shadows are producing non-saturating signal, while the sensors under the influence of highlights and shadows have been saturated. Therefore, after a short integration time, highlights are captured before the sensor saturates, and adequate integration time is required for capturing the shadows present in the scene.

3.1 HDR and Tone-Mapping

In principle, there are two major approaches to handle the limitations of the existing image capturing devices. The first approach is to develop HDR reconstruction from multiple exposures. Debevec and Malik [19] has estimated camera response function (CRF) from images acquired at different exposure settings. The CRF recovered from differently exposed images is used to create HDR image whose pixel values are equivalent to the true radiance value of a scene. Let multiple exposures of a scene are captured with different exposure times △tj, where j is an index over exposure times j = 1, 2.....N that determines the number of images to be captured from the scene. The pixel values Zij at each jth spatial location can be computed as:

(10.16)

where Ei is irradiance values of ith pixel, and f(.) is the CRF. If CRF is assumed to be monotonic and invertible, we then can rewrite Eq. 10.16 as:

(10.17)

This equation can be solved by taking the natural logarithm of both sides:

(10.18)

where g is the monotonic and invertible function [19]. In this equation, Ei and g are unknowns. A quadratic objective function based on linear least squares optimization was proposed by Debevec and Malik to derive response function, which is defined as:

(10.19)

Therefore, the second term g′′(z) = g(z − 1) − 2g(z) + g(z + 1) is a smoothness term on the sum of squared values, and γ1 is a scalar that helps to control noise level in Zij. The weighting function w(z) chosen in [19] is a simple hat function:

$$\displaystyle \begin{aligned} w(z)= \begin{cases} z-Z_{\mathrm{min}} & \text{for }~z \leq \frac{1}{2}(Z_{\mathrm{min}}+Z_{\mathrm{max}}), \\ Z_{\mathrm{max}}-z & \text{for }~z > \frac{1}{2}(Z_{\mathrm{min}}+Z_{\mathrm{max}}) \end{cases}\end{aligned} $$
(10.20)

Once the response function g is recovered, the desired HDR radiance values are computed using weighting function as:

(10.21)

It reconstructs the full dynamic range up to 8 orders of magnitude. HDR imaging is called scene-referred representation which represents the original captured scene values as close as possible [19]. Such representation is sometimes also referred to as extrasensory data representation. After acquiring HDR data, an efficient encoding technique is needed to avoid taking an excess of disk space. Various possible formats to store radiance maps have been developed that are described by Reinhard et al. [6].

HDR reconstruction recovers a much wider range of brightness from input exposures, but it is impractical to display such images on standard display devices and printing media, as shown in Fig. 10.9b. Although HDR display devices will be developed in the near future, conventional printers may lead to inconsistencies which will be responsible for the loss of details in the output. Recently, Sunnybrook technologies, BrightSide, and Dolby prototypes of HDR display devices have been proposed [20] that can display HDR data directly. As a result, to avoid these inconsistencies, we must use tone-mapping operators [21] to prepare HDR imagery for depiction on LDR devices.

Fig. 10.9
figure 9

(a) Images acquired at different exposure settings, (b) depiction of unprocessed HDR image constructed by Photomatix Pro [23] on a standard monitor, (c) the tone-mapped image, and (d) exposure fusion results

The problem of recovering HDR image by combining multiple frames captured at variable exposure settings is well described in the literature, and up to now, different software to build HDR photograph have been proposed, e.g., HDRshop [22] and Photomatix [23]. Photomatix is developed by HDRsoft to fuse a series of differently exposed images. Photomatix Pro and Photomatix Essentials are two standalone versions of Photomatix that can be run on Windows and Mac OS X. Photomatix Essentials is an excellent simple tool for constructing HDR images and is user-friendly. Photomatix Pro offers more options and includes advanced features such as batch processing and selective deghosting. An example of constructing HDR photograph and tone-mapping is shown in Fig. 10.9.

Here, to visualize all important details of the diatom, we have captured five different multiexposure images (shown in Fig. 10.9a). We can observe that appearance of details in multiexposure images depends upon the exposure settings. These five frames are merged to construct 32-bit HDR image. The HDR image has significantly larger dynamic range and does not fit into the display limits of a standard monitor. For demonstration purpose, we depict the HDR image on a standard monitor, which is shown in Fig. 10.9b. Therefore, dynamic-range reduction based on tone-mapping operator would need to be applied on HDR image. The resultant tone-mapped image is shown on the left in Fig. 10.9c. Alternatively, we may directly generate 8-bit LDR image that looks like a tone-mapped image (as shown in Fig. 10.9d), which is described in the following section.

3.2 Exposure Fusion

The second alternative approach for the purpose is combining multiexposure images directly into 8-bit single LDR images that do not contain underexposed and overexposed regions [10]. It provides a convenient and consistent way for preserving details in both brightly and poorly illuminated areas by skipping the construction of HDR image, and the use of tone-mapping operators. The incorporation of the notion of combining multiple exposures without typical HDR and tone-mapping steps is known as exposure fusion (EF), as shown in Fig. 10.10. The underlying idea of various exposure fusion approaches [24] is based on the utilization of different local measures for generating a weight map to preserve details present in the several exposures. In [24], the pyramid-based multi-resolution decomposition (MRD) [25] was utilized as an analysis and synthesis tool for image fusion. The Laplacian pyramid of source images and the Gaussian pyramid of weight maps were computed to produce a seamless fused image.

Fig. 10.10
figure 10

Comparison of high resolution imaging pipeline (i.e., HDR imaging followed by tone-mapping process, and exposure fusion process). The yellow color depicts HDR and tone-mapping pipeline, and blue color depicts exposure fusion pipeline

In our exposure image fusion approach, we additionally incorporate the exposure measure into our weighting function. To compute exposure value, we normalize the intensity values of input image so that they lie between 0 and 1. A correctly exposed picture is one that has intensities, not near zero (underexposed), or one (overexposed). Therefore, a pixel is said to be well-exposed if the intensity value is close to 0.5. The weight intensity value (I) of each pixel k based on how close it is to 0.5 using a Gauss curve can be computed as:

$$\displaystyle \begin{aligned} EX_{n}=\mathrm{e}^{\left (-{\frac {\left (I-0.5\right)^2}{2\sigma_{2}}} \right)} \end{aligned} $$
(10.22)

Therefore, the saliency map computation in Eq. 10.5 can be redefined as follows:

$$\displaystyle \begin{aligned} \hat {SM}_{n}=|EX_{n}| \otimes G_{r,\sigma} \end{aligned} $$
(10.23)

The effect of exposure settings on multifocus images of Triceratium favus diatom species acquired by dark-field microscopy is shown in Fig. 10.11a–e. The image series shown in top row (a–e) were acquired with a shutter speed of 1/160 s. To acquire sufficient details, a second series was acquired with a shutter speed of 1/100 s (shown in middle row (f–j)). For demonstration purposes, Fig. 10.11k,l shows the resulting multifocus fusion results for image series captured at low exposure settings and high exposure settings, respectively. These figures show the images fused by considering the saliency maps computed from Eq. 10.5. Thus, a single fused image with improved DOF is obtained from two or more images of the same scene acquired at different focal lengths of a camera sensor.

Fig. 10.11
figure 11

(a–e) Multifocus image series of Triceratium favus diatom species, acquired at low exposure settings (i.e., shutter speed of 1/160 s and ISO 400), (f–j) multifocus image series acquired at high exposure settings (i.e., shutter speed of 1/100 s and ISO 400), (k) multifocus fusion result obtained from total five images shown in top row, (l) multifocus fusion result obtained from total five images shown in middle row, and (m) exposure fusion result obtained from total two images shown in (k) and (l)

The fused result shown in Fig. 10.11m is generated by considering the multifocus fusion results shown in Fig. 10.11k,l. This figure shows the image constructed by considering the saliency maps computed from Eq. 10.23. For detail enhancement, multifocus results are generated using Eq. 10.14. It should be noted that the details produced in Fig. 10.11m are therefore more pronounced than for multifocus fusion results shown in Fig. 10.11k,l. The fusion results of multifocus images of Triceratium pentacrinus diatom species shown in Fig. 10.12 are also generated in a similar manner. In our opinion, the precise selection of exposure settings of image acquisition device leads to produce plausible results; it seems to handle very bright and too dark areas in all exposures. Finally, the fusion results show that a sharp and enhanced image can be generated by combining multifocus image fusion with a multiexposure image fusion. This would, of course, be desirable for capturing features of micro- and nano-structures found in diatom species that is helpful for distinguishing the living organism.

Fig. 10.12
figure 12

(a–e) Multifocus image series of Triceratium pentacrinus diatom species, acquired at low exposure settings (i.e., shutter speed of 1/160 s and ISO 400), (f–j) multifocus image series acquired at high exposure settings (i.e., shutter speed of 1/100 s and ISO 400), (k) multifocus fusion result obtained from total five images shown in top row, (l) multifocus fusion result obtained from total five images shown in middle row, and (m) exposure fusion result obtained from total two images shown in (k) and (l)

4 Depth Map and 3-D Surface Visualization of Fusion Results

In this section, the depth map and 3-D visualization have been obtained from the fusion results of Triceratium favus diatom species. To indicate the regions where sharp structures are detected in the fusion results, depth map was generated using PICOLAY [26]. PICOLAY, developed by Heribert Cypionka, is a scientific software that allows to perform focus stacking and generate three-dimensional views from image series acquired at sequential focus levels. The depth map of fusion results is shown in Fig. 10.13a–c. The depth map shows colors from yellow (top) over green (middle) to blue (bottom). Gray indicates regions without detected structures and no depth localization.

Fig. 10.13
figure 13

2-D and 3-D visualization of depth map of Triceratium favus diatom: (a) depth map of fusion result shown in 10.11k, (b) depth map of fusion result shown in 10.11l, (c) depth map of fusion result shown in 10.11m, and (d–f) their 3-D surface visualization

Another extremely useful way to analyze the diatom shape and its structural variation is to consider that the fused image can be modeled as a 3-D surface. The depth map can also be used to generate a 3-D view of depth levels. The use of a depth map for 3D surface reconstruction was used by the authors in [27], The constructed depth map in Fig. 10.13a–c and the corresponding 3-D surface visualizations are shown in Fig. 10.13d–f. The additional dimension helps to identify each pixel from different angles that have better physical relevance. The 3-D surface visualizations shown in Fig. 10.13d–f are constructed by using Fiji [28]. Fiji was developed at the Laboratory for Optical and Computational Instrumentation (LOCI) at the University of Wisconsin-Madison and is maintained by Curtis Rueden. It allows to explore scientific data and includes features in the form of plugins and scripts.

A 3-D projection of fusion results can give a better visualization of structures that belong to different layers. Eventually, different structures can be visualized by changing the viewing angle. For demonstration purpose, Fig. 10.14 shows 3-D reconstruction of the fusion results of Triceratium favus diatom. In this way, 3-D surface visualization of a fused image having sharp contour and striation pattern can be used by the diatomists to attempt diatom identification and classification.

Fig. 10.14
figure 14

3-D surface visualization of fusion results of Triceratium favus diatom: (a) 3-D surface visualization of fusion result shown in 10.11k, (b) 3-D surface visualization of fusion result shown in 10.11l, and (c) 3-D surface visualization of fusion result shown in 10.11m

5 Fusion Quality Metrics

The quality assessment of fused images is necessary before using them for different applications such as machine vision, surveillance applications, scientific, and medical imagery, in which data is analyzed and visualized to record more details. The subjective evaluation of fusion results is time-consuming process, and expert viewers are need to be involved for assessing the performance. In addition, as per ITU recommendation [29], equal viewing conditions are also needed for all viewers for accuracy and fair comparison. On the other hand, the fusion quality metrics quantify the performance of a fusion process without involving the expert viewers. The goal is to measure the amount of complementary information transferred from the source images to the fused image. Therefore, it is essential to use objective image quality metric, which has relevance with the subjective quality measures for validation purpose [30].

In order to quantify the performance of image fusion framework, two assessment strategies are preferred, either referenced or non-referenced assessment. In full referenced assessment, the fused image is compared with a reference image (or ground truth). However, ground truth is not always obtainable in practical applications. The input source images are used as a reference for quality assessment. The second strategy, the quality assessment, is obtained from the fused image without considering any reference or ground truth and is called blind assessment. Numerous non-referenced conventional objective performance measures including spatial frequency (QSF), average gradient (QAG), and entropy (QH) have been proposed [31, 32]. The newly developed referenced assessment-based fusion performance measure has been demonstrated by Xydeas and Petrovic [33] in which the amount of edge information transferred from input images to the fused image is evaluated. The detailed description of fusion performance measures is introduced in the following sections.

5.1 Gradient-Based Fusion Performance (QABF)

Xydeas and Petrovic [33] proposed a feature-based fusion quality metric that evaluates the amount of edge information transferred from input images to the fused image. A Sobel operator is applied to yield the edge strength and orientation information for each pixel. For two input images A and B, and a resulting fused image F, the Sobel edge operator is applied to compute edge strength e(m, n) and orientation β(m, n) information of input image A for each pixel, which can be defined as follows:

$$\displaystyle \begin{aligned} e_{A}(m,n)=\sqrt{s_{A}^ x(m,n)^2 + s_{A}^y (m,n)^2} \end{aligned} $$
(10.24)
$$\displaystyle \begin{aligned} \beta_{A}(m,n)=tan^{-1} \left[ \frac{s_{A}^ x(m,n)}{s_{A}^y (m,n)} \right] \end{aligned} $$
(10.25)

where \(s_{A}^ x(m,n)^2\) and \(s_{A}^y (m,n)\) are horizontal and vertical Sobel template cantered on pixel (m, n) and convolved with the corresponding pixels of image A. The relative strength and orientation values of an input image A with respect to F are formed as:

$$\displaystyle \begin{aligned} &\left( G^{AF}(m,n), A^{AF} (m,n) \right)\\ & \quad=\left( \left( \frac{e^F (m,n)}{e^A (m,n)}\right) ^{\psi}, {1-\frac{\left|\beta_{A}(m,n)-\beta_{F}(m,n)\right|}{\pi/2}} \right) \end{aligned} $$
(10.26)

where ψ is

$$\displaystyle \begin{aligned} \psi= \begin{cases} 1 & \text{if }~e_{A}(m,n) > e_{F}(m,n), \\ -1 & \text{otherwise} \end{cases} \end{aligned} $$
(10.27)

The edge strength and orientation preservation values can be derived:

$$\displaystyle \begin{aligned} Q_{e}^{AF}(m,n)=\frac{{\varGamma}_{e}}{1+e^{\kappa_{e}\left(G^{AF}(m,n)- {\sigma}_e \right)}} \end{aligned} $$
(10.28)
$$\displaystyle \begin{aligned} Q_{\beta}^{AF}(m,n)=\frac{{\varGamma}_{\beta}}{1+e^{\kappa_{\beta}\left(A^{AF}(m,n)- {\sigma}_\beta \right)}}\end{aligned} $$
(10.29)

where Γe,κe,σe and Γβ,κβ,σβ determine the shape of sigmoid functions used to form the edge strength and orientation preservation. Edge information preservation value is then defined as follows:

$$\displaystyle \begin{aligned} Q^{AF}(m,n)=Q_{e}^{AF}(m,n)Q_{\beta}^{AF}(m,n)\end{aligned} $$
(10.30)

Finally, the metric value QABF is defined as:

$$\displaystyle \begin{aligned} & Q^{AB/F}\\ & =\frac{\sum_{n=1}^N \sum_{m=1}^M \left (Q^{AF}(m,n)\omega^A (m,n)+Q^{BF}(m,n) \omega^B (m,n) \right)}{\sum_{n=1}^N \sum_{m=1}^M \left( \omega^A (m,n)+\omega^B(m,n) \right)}\end{aligned} $$
(10.31)

which evaluates the sum of edge information preservation values for both inputs QAF and QBF weighted by local importance perceptual factors ωA and ωA. We defined ωA(m, n) = [eA(m, n)]L and ωB(m, n) = [eB(m, n)]L, where L is a constant.

For the “ideal fusion,” the sum of QABF, total loss of information LABF, and noise added in the fused image due to fusion process NABF should be equal to unity [32], as shown in 10.32.

$$\displaystyle \begin{aligned} Q^{AB/F}+L^{AB/F}+N^{AB/F}=1\end{aligned} $$
(10.32)

In most of the cases, fusion artifact measure introduced in [33] could not lead to unity. In order to overcome this problem, the revised fusion artifact measure was proposed by B.K.S. Kumar [32], which is defined as follows:

$$\displaystyle \begin{aligned} N^{AB/F}=\frac{\sum_{n=1}^N \sum_{m=1}^M AM(m,n) \left[ \left(1-Q^{AF}(m,n)\right)\omega^A (m,n)+\left (1-Q^{BF}(m,n)\right) \omega^B (m,n) \right]}{\sum_{n=1}^N \sum_{m=1}^M \left( \omega^A (m,n)+\omega^B(m,n) \right)}\end{aligned} $$
(10.33)

where AM(m, n) indicates the location of fusion artifacts in the fused image and is defined as follows:

$$\displaystyle \begin{aligned}AM(m,n)= \begin{cases} 1 & \text{if }~e_{F}(m,n) > e_{A}(m,n)~\text{and}~e_{F}(m,n) > e_{B}(m,n), \\ 0 & \text{otherwise} \end{cases}\end{aligned} $$
(10.34)

The outcomes of objective performance measures are tabulated in Table 10.1.

Table 10.1 Quantitative assessments of multifocus image fusion results

5.2 Image Fusion Metric Based on Spatial Frequency (QSF)

The spatial frequency, which is originated from the human visual system (HVS), indicates the overall activity level in an image and has led to an effective objective quality index for image fusion [34]. The total spatial frequency of the fused image is computed from row (RF) and column (CF) frequencies of the image block and QSF is defined as:

$$\displaystyle \begin{aligned} Q^{SF}=\sqrt{RF^{2}+CF^{2}} \end{aligned} $$
(10.35)
$$\displaystyle \begin{aligned} RF=\sqrt{{\frac{1}{MN}}\displaystyle\sum_{m=1}^{M}\displaystyle\sum_{n=1}^{N} \left( I_{F}(m,n)-I_{F}(m,n-1)\right)^2} \end{aligned} $$
(10.36)
$$\displaystyle \begin{aligned} CF=\sqrt{{\frac{1}{MN}}\displaystyle\sum_{m=1}^{M}\displaystyle\sum_{n=1}^{N} \left( I_{F}(m,n)-I_{F}(m-1,n)\right)^2}\end{aligned} $$
(10.37)

where IF(m, n) is the gray value of pixel at position (m, n) of image IF.

5.3 Average Gradient-Based Fusion Metric (QAG)

It estimates a degree of clarity and sharpness in the fused image and is computed as

$$\displaystyle \begin{aligned} Q^{AG} =\frac{\sum_{i} \sum_{j} \left(((I_{F}(i,j)-I_{F}(i+1,j))^{2} + I_{F}(i,j)-I_{F}(i,j+1))^{2}\right)^{1/2}}{mn} \end{aligned} $$
(10.38)

In general, we desire that a good image fusion method should yield a higher score in terms of QAG.

5.4 Entropy-Based Fusion Metric (QH)

This metric is based on information theory. It quantifies the amount of information present in the fused image, and is defined as follows:

$$\displaystyle \begin{aligned} {Q}_{H}=-\sum_{k=0}^{255} p_{k} {\mathrm{log}}_{2}p_{k} \end{aligned} $$
(10.39)

where pk is the probability of intensity value k in an 8-bit fused image.

To better analyze the performance of fusion approaches with the help of assessment metrics, the outcomes of two-scale decomposition-based weighted average multifocus image fusion (TSD-MF) and the results proposed by multi-resolution decomposition (MRD) [24] are given in Table 10.1, better values are depicted in bold. The higher the value of QABF is, better is the quality of the composite image. On the other hand, the lower the values of LABF and NABF are, the better the quality of composite image. In relation to non-referenced quality metrics QSF, QAG, and QH, higher values are expected from the ideal fusion process. The analysis presented in Table 10.1 shows that on six microscopy data sets the TSD-MF outperformed MSD. It can be observed that TSD-MF has scored higher value in terms of QABF and lower value in terms of LABF, for all data sets except Triceratium Pentacrinus (acquired at exposure time of 1/160 s). It can be noticed that MSD has better performance in terms of NABF that gives the lower metric outcome. On the other hand, MSD did not perform well in terms of QABF and LABF. In terms of QSF, QAG, and QH, TSD-MF has scored a higher value for all six data sets. Thus, the performance of TSD-MF is excellent for improving DOF, while avoiding visual artifacts.

6 Efficient Implementations

The average execution time of the TSD-MF algorithm on microscopy data is presented in Table 10.2. The algorithm is implemented in MATLAB 2014a and executed on the machine with 3.70 GHz Intel Core i3 processor and 8 GB RAM. The total execution time includes the n number of source images read time, the BL computation time, the DL computation time, the computation time of n number of weight maps for BL, the computation time of n number of weight maps for DL, and the computation time of the resultant fused image generation, while the write time of the resultant fused image is not included.

Table 10.2 Comparison of execution time in seconds (s). The number of input images is shown in brackets

As shown in Table 10.2, the weight map refinement based on WLS operator in Eq. 10.9 and Eq. 10.10 is the most time-consuming. In our fusion approach, the WLS operator uses the preconditioned conjugate gradients (PCG) [35]. It was reported in [16] that the average time of PCG-based sparse matrix solver is 3.5 s per megapixel on a 2.2 GHz Intel Core 2 Duo. We believe that the overall execution time of weight map optimization can be reduced through an efficient GPU implementation of the solver proposed by Weber et al. [36].

7 Discussion and Conclusions

In this chapter, we have presented TSD-MF scheme that can extend DOF of the fused image by selecting the in-focus region from source images of diatom species. The EOL is utilized as a criteria function that identifies the in-focus region to compute the initial saliency maps. The weight maps are determined by comparing the edge strength that explicitly defined which region should be selected from source images to obtain a single composite image. The weight maps are hard, noisy, and not aligned with the object boundaries, which are not suitable for pixel-level weighted overage fusion. In this chapter, we have introduced a weight map refinement approach based on WLS edge-preserving operator. The results show that significant improvements have been obtained by fusing BLs and DLs separately, with different weight maps for BLs and DLs fusion rather than same weight map.

We have observed that the appearance of details in the multiexposure images of diatom species depends upon exposure settings. By using exposure measure as a criteria function, multiexposure images can be used to enhance intricate micrometer-sized silica patterns of diatom. This study has opened up a potent way to provide a clue to their role in diatom identification and classification. The fusion results from two very different data sets acquired at a shutter speed of 1/160 s and a shutter speed of 1/100 s revealed that further developments would undoubtedly perform better for identification and classification of diatom species.

The quantitative analysis of fusion results using four quality metrics clearly demonstrates that TSD-MF method preserves more details to improve the clarity and sharpness in the fused image. It should be noted that the fixed parameter setting was used to obtain fusion results for all data sets. In future work, we would like to experiment with more precise parameter selection for improving the DOF of fused image. Another direction for future work is to explore which GPU implementation of sparse matrix solver might reduce the execution time of our fusion method. In particular, we would like to develop a more computationally efficient scheme for generating the fused image with improved DOF and robustness.