1 Fusion of Multi-Sensor Images

Fusion is a process that uses mathematical techniques to produce a single image from a set of input images obtained through different sensors or modalities [8, 59, 83]. The fused image provides the advantage of reliability based on redundant information, and is more informative as a result of the complementary information [6, 9]. These advantages are reflected in computer processing and human visual perception [35, 47].

Fig. 1
figure 1

Types of sensing

The goal of a fusion system is to provide a composite image which can be used as essential preprocessed data for various applications such as target detection, tracking, identification, and security defense systems. It also has applicability in medical fields such a diagnostics [17, 36, 49]. Single sensor data often do not provide sufficient information about a scene or object; the combination of data from different sensors solves this issue through the fusion process [61, 104].

Multi-sensor image fusion has been an ongoing problem for the past three decades for remote sensing data available from earth observation satellites [39]. Fusion is an application-dependent process; the desired information will be fused according to application need [9, 26]. Let us consider the following scenarios, depicted in Fig. 1, to understand fusion.

In Fig. 1, three scenarios for sensing the same object are provided. The object and background are the same in each case.

  1. 1.

    Scenario 1: Two sensors having the same capability are kept at different distances.

  2. 2.

    Scenario 2: Two sensors having different capabilities are kept at the same distance.

  3. 3.

    Scenario 3: Two sensors having different capabilities are kept at different distances.

Table 1 Scenario analysis

As Fig. 1 and Table 1 suggest, fusion combines complementary information with supplementary information to provide comprehensive information about a particular object or scene [23, 95]. For fusion to succeed, the researcher should be aware of the supplementary and complementary information through the input images [12, 75]. The third scenario is usually well suited for fusion of remote sensing images fusion.

Fusion categorization is depicted in Fig. 2.

Fig. 2
figure 2

Types of fusion

Fusion can be categorized into three types:

  1. 1.

    Pixel-level fusion

  2. 2.

    Feature-level fusion

  3. 3.

    Decision-level fusion

Pixel-level fusion is a basic type of fusion in which the pixel values are manipulated using minimum or maximum conditions [60]. Structural information is ignored, which leads to undesirable or poor results [5, 90]. Thus, while pixel-level fusion has advantages such as low-level complexity and applicability to all types of images [1, 4], it ignores structural components and features, resulting in poor information gain [74]. Feature-based systems, by contrast, provide optimal quality of fused products based on the types of features selected [52, 52]. Feature selection is trivial in this method, because the sensors provide different features [66]. In a feature-based system, the supplementary and complementary information of the fusion product should be analyzed and computed in order to ensure adequate results [62, 91]. Finally, decision-level fusion is carried out after classification is performed [76]. Although this type carries high computational cost, the results are much more promotable than those of the other two methods [3, 31]. Each type of fusion has pros and cons; methods are chosen depending on the application desired.

For remote sensing data, some of the prerequisites for fusion are provided below [1].

  1. 1.

    Geometric corrections

  2. 2.

    Radiometric corrections

  3. 3.

    Feature extraction

  4. 4.

    Labeling

  5. 5.

    Ground observations

To perform the fusion of remotely sensed images, the default prerequisites should be carried out first [103].

What should an image fusion processing algorithm provide?

  1. 1.

    Pattern conservation: the ability of the fusion process to retain the structural information present in the input images when they are transformed into the fused product

  2. 2.

    Artifact-free: avoidance of any inconsistencies or artifacts in the final product that will affect the subsequent process

  3. 3.

    Shift and rotational invariance: the algorithm should not depend on the orientation of the inputs [89]

  4. 4.

    Stability: the final fused product should contain maximum details of the supplementary information from the input images [103]

  5. 5.

    Consistency: the algorithm should be consistent in terms of the resultant product

  6. 6.

    Time complexity: as fusion is a real-time practical application, time complexity plays an important role in addition to results

Fusion algorithms must be written based on the above-mentioned constraints.

2 Related Reviews on Fusion

The development of algorithms for the fusion of multi-sensor images presents an interesting challenge for researchers, and many techniques have been proposed. Algorithms continue to increase in complexity because of increasing data availability. Several reviews have been published on different aspects of multi-sensor fusion. In this section, we provide a detailed summary of the existing survey literature.

In [7], the authors presented a comprehensive discussion on the operators used for data fusion and the impact of the operators on algorithm behavior. In [54], the authors discussed the fundamentals of fusion, as well as the tools utilized for image fusion and their various characteristics.

In [69], the authors provided a review of multi-sensor fusion in remote sensing, and presented a comprehensive analysis of pixel-based approaches. In [88], the author introduced a protocol to determine the efficiency of fusion using reference metrics. This approach is now widely used to improve fusion algorithms based on the reference metrics. In [79], the authors presented fusion techniques for remote sensing applications. They described three different applications that utilize the fused image as a primary source for post-processing. In [72], the authors introduced a new concept called ARSIS, which involves three operations for fusion. Operation 1 performs information extraction from Image 1, operation 2 performs inference of the missing information from Image 2 using information extracted from Image 1, and operation 3 performs construction of the synthesized output. The authors tested the ARSIS concept in different schemes with good success rates.

In [28], the authors provided guidelines for data fusion. They discussed the benefits of fusion and the parameters considered (generalized for all types of fusion). In [80], the authors presented a comprehensive review of image fusion algorithms. They noted that image fusion algorithms dedicated for use in military applications are especially prominent in the published literature.

In [93], the authors undertook a detailed comparative analysis of image fusion algorithms, which included a comprehensive analysis of generalized fusion algorithms within the context of their proposed framework. In [73], the authors discussed pan-sharpening algorithms, providing a comprehensive analysis and detailing how spectral and structural information can be retained when performing pan sharpening.

In [2], the authors presented a detailed survey of wavelet-based methods utilized for fusion, including a comparative analysis of different wavelet-based methods and a discussion of the information gain after fusion using the experimental results. In [105], the authors discussed quality metrics that can be used to assess the performance of fusion algorithms in terms of spectral, spatial, and structural information gain. The methods used for quality assessment and their impact on algorithm behavior were detailed in the paper.

In [22], the authors presented a detailed survey on existing techniques used in the fusion of remote sensing data. They included an exhaustive discussion of conventional techniques, wavelet-based methods, and fuzzy-based approaches, and the spectral and spatial fidelity of the existing fusion methods were evaluated. In [98], the authors provided a survey of pixel-based methods in image fusion, with an emphasis on the fusion rules used and the implications of the rules and their impact on fusion.

In [102], the authors discussed recent developments, future prospects, and challenges in the fusion of remote sensing data. They discussed fusion of multi-source data from remote sensing and the pros/cons with respect to algorithm complexity and performance, as well as future trends and associated challenges. In [40], the authors presented a general discussion of multispectral image fusion techniques. A detailed survey of existing methods was provided, including an analysis of performance with respect to spectral and structural information preservation.

In [107], the authors compared several commercial packages for fusion of remote sensing data, and examined the challenges and performance issues through experimental testing. In [41], the authors presented a review of multi-sensor data fusion. They discussed the concepts of the various fusion methodologies currently available and associated challenges. Future directions and prospects were also briefly addressed.

As the above discussion reveals, the state of the art in the fusion domain has addressed problems regarding the fundamentals of fusion, tools used and their limitations, pixel- and wavelet-based approaches, fusion used for defense applications, recent developments, and current trends. Some of the reviews are based on an evaluation of fusion methods by reference metrics, parameter considerations, and quality assessments. The existing state of the art in remote sensing is limited by the general factors which affect the fusion algorithms. Image fusion constraints have not been discussed so far in the literature in the context of the tools utilized for fusion. Since multi-resolution analysis (MRA) and multi-geometric analysis (MGA) are the most widely used tools in image fusion, the paper is concentrated on the review of MRA- and MGA-based techniques and addressing the performance issues of MRA- and MGA-based fusion algorithms. In the complete review, we have ignored hyperspectral pan-sharpening methods, since extensive studies have recently reported [50, 67, 87] on the state of the art.

2.1 Image Representation and Image Fusion

Image representation can be classified as spatial- or frequency-based. Spatial representation ignores the frequency components during processing, and frequency representation ignores the spatial content during processing. MRA was introduced to address these basic issues [55]. Multi-resolution processing was efficiently achieved using a tool called wavelets [19]. Wavelets have advantages such as space-frequency localization [16, 34]. However, while wavelets are best suited for image processing tasks, they also have certain disadvantages, which will be discussed later. One of the properties of wavelets is limited directionality in decomposition, which leads to a lack of intrinsic geometrical information when decomposing the image at different levels. To overcome this issue, MGA-based tools have been introduced in the literature.

MGA solved the issue regarding loss of intrinsic geometric information during decomposition. MGA was achieved using several tools including contourlets[21], non-subsampled contourlet transform (NSCT) [18], curvelets [24], and shearlets [48]. Fusion based on these tools has increased over the past decade due to the advantages of MGA.

3 Literature Survey: Overview of Wavelet-Based Fusion Methods

MRA for wavelets was first carried out by Mallat in 1989. This detailed analysis provided the basis for utilizing wavelets in the image-processing domain. The use of MRA-based fusion techniques has increased because fusion can take place at different levels (i.e. different resolutions)[10]. Wavelets have the ability to decompose the image at different levels at different resolutions. The figure depicted below shows three-level decomposition. The low-pass filtered component is referred to as approximated, and the high-pass filtered components are referred to as detailed components.

Fig. 3
figure 3

Three-level decomposition

In Fig. 3, A1 represents the approximated components at level 1, and H1, V1, and D1 represent the specific components at level 1. Similarly, A2 represents the approximated components at level 2, and H2, V2, and D2 represent the individual components at level 2. If the image is \(2^n \times 2^n\) in size, then decomposition can be performed up to \(n-1\) levels using wavelets. This is a basic property of wavelets.

Wavelets possess an oscillatory characteristic of finite intervals. They have outstanding localization properties, which makes them useful as a sophisticated tool for signal processing. These properties are in the trimmed form of MRA, allowing experts to utilize wavelets in image-processing applications. A wavelet is basically a windows of finite duration, which operates over a continuous signal at different shifted instances so as to extract the information from the signal. Figure 4 presents the basic concept underlying MRA-based fusion techniques.

Fig. 4
figure 4

MRA-based fusion general framework

Based on the fundamental framework shown in Fig. 4, many MRA-based fusion techniques have been introduced in the literature.

3.1 Literature Review: Wavelet-Based Methods

In [46], the authors introduced a method utilizing wavelets for the fusion of multi-sensor remotely sensed images. They applied an area-based maximum rule for fusing the decomposed coefficients. This approach is a generalized method for utilizing wavelets for fusion, and maximum preservation of information from images was achieved. However, the presence of artifacts in the final output image was an issue due to the wavelet reconstruction schema. In [14], the authors developed a prototype for combining various data using wavelets, and applied coefficient comparisons for fusing the decomposed coefficients. The structural information (edges) was retained in good proportions in the fused output. However, blocky and wavy effects in the final fused output were an issue with this method.

In [64], the authors developed a method using wavelets to generate a super-resolution image from multispectral and panchromatic (PAN) images. Fusion was carried out by injection of PAN components into the multispectral components. Artifacts and and localization failure due to substitution were issues observed. In [63], the authors presented a chapter on wavelets in image processing, which discussed wavelets and their implications for fusion. They provided information related to three types of wavelets used for fusion.

In [29], the authors carried out an urban analysis by fusion of multi-sensor images using wavelets. They integrated the decomposed coefficients instead of substituting for fusing. A directional rule based on edge detection was applied. This method performed well when heterogeneity in the scene was dominant, but visual degradation due to the skeletonization was a problem. In [66], the authors conducted a detailed study on Daubechies and Haar wavelets for the fusion of remotely sensed images. They used a decision-based rule in the context of the definition of the urban object. Their study showed that the fusion of multi-sensor images was best achieved with the Daubechies wavelet basis technique.

In [43], the authors developed a pan-sharpening algorithm utilizing wavelets, and used the substitution method for fusing the decomposed coefficients. The method showed good capacity for retention of spectral information. As with other methods, however, the presence of artifacts and blocky effects were an issue with this algorithm. In [68], the authors presented a formalized method for fusion using wavelets. Region-based decision rules based on segmentation were used in the formalized framework. The authors addressed existing issues such as the presence of artifacts and blurry effects. They found that the time complexity increased with their proposed framework compared with existing methods.

In [65], the authors presented a tutorial on wavelet-based methods for fusion, and used weighted averaging for fusing the decomposed coefficients. Their comparative study provided future perspectives on fusion. In [106], the authors developed a method for pan sharpening using an intensity-hue-saturation (IHS) color model and wavelets. The injection of the intensity component was carried out at the fusion level. Color distortions were decreased to a good extent with the proposed method, but artifacts and blocky effects in the reconstructed output image were an issue.

In [38], the authors presented a method for the fusion of multi-sensor images using dual-tree complex wavelet transforms. They compared their method with both the decimated and undecimated discrete wavelet transform-based methods. The proposed method was able to generate both visually and quantitatively better results than the other methods. This method also addressed existing issues such as artifacts and blocky effects in the reconstructed output. Increased time complexity was noted for this method compared with the other methods due to the complex filter banks utilized in the architecture of the dual-tree complex wavelet transform. In [30], the authors introduced a variational wavelet-based method for fusion. Variational-based approaches have performed well for data with a high level of heterogeneity, but increasing the textural content of the structural components was unsuccessful.

In [32], the authors developed a fusion algorithm based on interband structure modeling. The method performed well when the images contained more structural information. Increasing the textural content of the structural components was not successful with this method.

In [33], the authors undertook a detailed study of Mallat and á trous wavelet transforms utilized for pan sharpening. Using different types of fusion rules, they provided a comprehensive analysis of decimated and undecimated wavelet-based approaches for image fusion. In [42], the authors proposed an improved additive wavelet transform for fusion, in which they were able to retain the radiometric information along with geometric information. They used the component injection method at the fusion phase to address existing issues such as artifacts and blocky effects in the reconstructed output.

In this section, we have provided a comprehensive outline of wavelet-based approaches for the fusion of remote sensing images. Wavelet-based techniques rely on both decimated and undecimated approaches. Decimated approaches are prone to aliasing, which leads to artifacts. Undecimated wavelet approaches favor fusion due to the short decomposition and restoration schema. Based on this discussion, it can be observed that fusion rules are application-dependent. For example, injection-based rules are used for urban applications, and substitution-based fusion rules are used for change detection. The overview presented above discusses the various methods in terms of the methodology used, advantages, and drawbacks. Observations from publications based on wavelet-based fusion can be summarized as follows.

  1. 1.

    From 1995 to 2001, regular improvements were made to wavelet-based fusion methods.

  2. 2.

    From 2001 to 2005, formalized frameworks for the fusion of multi-sensor images were developed and studied.

  3. 3.

    Since 2005, the use of wavelet-based approaches has decreased drastically in the fusion domain; this is because of the emergence of multi-geometric-based approaches, with their respective advantages.

3.2 Discussion of Wavelet-Based Methods

Here we draw some conclusions regarding wavelet-based methods from the above discussion. Wavelet-based methods perform well compared with conventional fusion methods such as principal component analysis (PCA), IHS, and Brovey transform. The authors mentioned above found that their methods performed well relative to the methods in their comparisons. However, it is difficult to reach a general conclusion regarding the best method among these, because of the differences and subjectivity in the comparisons. Some published remarks, such as method performance in terms of time complexity and computational complexity of the processes, were discussed, and some plans have clear advantages over other methods based on the processing time and complexity. In this review, we focused on the utilization of wavelet tools and implications of the fusion schema. In the above discussion, we provided the details on the methods and fusion schema along with their results with regard to structural, spatial, and radiometric information.

Various fusion schemes have been introduced in the literature, including simple (min, max, and averaging), substitution, injection, PCA, IHS, clustering, segmentation, region, object (decision level), decision map, and pulse-coded neural network (PCNN)-based schemes. Different types of wavelets have been introduced and utilized for fusion, including decimated, undecimated, and non-separable wavelets. The use of a particular fusion schema along with corresponding wavelets and their performance can be judged based on loss of radiometric information, spatial distortions, time complexity, memory utilization, and subjective analysis (visual). Different combinations provide different levels of performance, and consequently we cannot determine the absolute efficiency of any one method due to limited or subjective comparisons and the application-dependent nature of the particular method.

The simplest or earliest wavelet-based fusion methods were capable of producing better results than conventional methods such as IHS, PCA, and Brovey transform, and additional improvements in wavelet-based methods have been seen over the past three decades. Hybrid combinations such as wavelets along with PCA and IHS produce better results than the simplest methods, but they also have limitations. For example, IHS can be applied only on three bands. In substitution-based methods, the substitution is usually carried out on the basis of statistical parameters which require a priori knowledge regarding the data distribution on the particular image. In advanced approaches such as object detection, decision maps and neural network-based methods in combination with wavelets generate visually good results, with good quantitative measures, but these methods are limited by the need for a strong training data set. Obtaining training data in the remote sensing domain is an expensive and tedious task. These advanced approaches are typically application- and data-dependent.

The selection of wavelets also affects the performance of the fusion algorithm. Decimated wavelet-based methods commonly introduce artifacts due to the disturbance of the continuity of linear structures. Undecimated wavelet-based methods exact a greater premium in the computation process, but address the issue of artifacts in the final product. The fusion schema or rules applied to the method significantly affect the performance of the algorithms. When an averaging rule is applied to the decomposed coefficients, it degrades the structural information by over-smoothing; the application of min-max rules on the decomposed coefficients leads to no loss of either radiometric or structural information. It is important to apply the appropriate rule for the specific coefficients. Whereas the application of averaging on detailed coefficients leads to severe degradation of the structural components, averaging applied on approximated coefficients leads to over-smoothing. In region-based methods, the selection of regions is crucial, because the types of regions present in the test image affect the results. Some schemes will perform better on vegetation but fail in the urban region. In clustering-based methods, the result is dependent on the number of clusters considered for the particular data; the same number of clusters may produce better or poorer results for different data.

It is clear from this discussion that every wavelet-based fusion scheme has its own set of advantages and limitations. More comprehensive testing is needed in order to fully assess the specific conditions under which each one is most appropriate.

4 Literature Survey: Overview of MGA-Based Fusion

A major drawback in wavelet-based fusion systems is the loss of intrinsic geometrical information due to the limited directionality of the decomposition schema. To overcome this issue, multi-scale geometric analysis (MGA) was introduced, which is based on directional representation. This has been achieved through tools such as contourlets [21], curvelets [24], wedgelets, ridgelets, non-subsampled contourlets [18], and shearlets [48]. Figure 5 shows the basic concept underlying MGA-based fusion techniques.

Fig. 5
figure 5

MGA-based fusion general framework

Several MGA-based fusion techniques considering the basic idea shown in Fig. 3 have been introduced in the literature. The following section describes the research work carried out on MGA-based approaches for the fusion of multi-sensor images and their corresponding objectives, along with their advantages and drawbacks.

5 Contourlets

Contourlets were introduced by Do and Vetterli [21] as a means of obtaining sparse expansion for images with smooth contours. In the contourlet transform, multi-scale transform is initially applied on the image to obtain the edge and point detection, and local directional transform is then applied on the high-pass coefficients to obtain the smooth contours. The contourlet transform is depicted in Fig. 6.

Fig. 6
figure 6

Contourlets

Contourlets offer a flexible multi-scale representation and directional decomposition of images. The intrinsic geometrical representation of the image is retained with the obtained smooth contours and linear structures. This property of contourlet transform has motivated the use of MGA in image processing. The utilization of contourlets in the image fusion domain has increased over the past decade due to its advantages over wavelet techniques and other conventional fusion methods. In the following, we provide details regarding the fusion methods employed using contourlets and their objectives, along with the pros and cons.

5.1 Literature Survey: Contourlet-Based Fusion

In [108], the authors developed a method for fusing multiband synthetic aperture radar (SAR) images using contourlets. They used averaging for the approximated coefficients and edge information measurements for the decomposed directional coefficients in the fusion phase. This method enabled the retention of the structural information, but in the homogeneous region, over-smoothing of small details was observed. In [71], the authors developed a method for fusing multi-sensor images using contourlets, without loss of energy constraints or structural information. They utilized averaging for the low-pass coefficients and regional energy measurement for the directional coefficients in the fusion phase. Their method was tested for consistency with different sets of data and obtained encouraging results. However, time complexity was an issue due to the evaluation of regional energy components.

In [57], the authors presented a method to fuse multi-sensor images in remote sensing using contourlets. They performed averaging for the low-pass components and applied the maximum selection (MS) rule for the directional coefficients in the fusion phase. Good retention of the structural component was achieved in the reconstructed output image, but a loss of spectral information occurred during the fusion process. In [81], the authors developed a pan-sharpening method using contourlets. In their proposed work, they considered the low-pass components of MS images as fused low-pass components based on threshold T, and then selected the directional components. This method maintained a good trade-off between spectral and spatial resolution during the fusion process. However, a ringing effect was found in the fused image due to downsampling at the reconstruction end.

In [94], the authors presented a method for fusion of multiband SAR images using contourlets with an expectation–maximization (EM) algorithm. They used edge information measurements for low-pass components and EM algorithm-based parameter estimation for directional components. The method was shown to be capable of retaining the textural information due to the adaptability of the fusion parameter selection. A high premium was paid, however, because of the utilization of an iterative EM algorithm for the directional coefficient selection. In [85], the authors developed a method for fusion of multi-sensor images in remote sensing using a wavelet-based contourlet schema. They performed averaging for low-pass components and applied region-based rules for high-pass directional components. Improvements were observed in the sharpness of the fused output and retention of structural information, but poor performance was an issue when radiometric information was dominant in the input images.

In [77], the authors introduced a method for pan sharpening using PCA and contourlets. They injected the principal components of the multispectral image into the detailed components of the panchromatic (PAN) image. The spectral (radiometric) information was successfully injected using PCA, and spatial enhancement was achieved using contourlets. Filter selection then became a crucial factor and was dependent on the input pair of images in the proposed method. In [82], the authors developed a method for pan sharpening using contourlets and IHS transformations. They used a weighted model at the low-pass coefficients and maximum rule at high-pass coefficients during the fusion phase. The radiometric information was retained well with this method, but degradation of spatial information was observed in their quantitative analysis.

In [37], the authors developed a method for fusion of visible and infrared (IR) data using the contourlet transform and K-means clustering. They utilized K-means clustering to obtain regions, and applied a max-min rule to the regions to obtain fusion results. This method was found to be efficient in detecting hidden objects, and also in dealing with misregistration issues. Determining the number of clusters according to the nature (heterogeneity) of the image, however, was difficult. In [15], the authors proposed a pan-sharpening algorithm based on contourlet and spectral response. A weighted fusion rule was applied on the low-pass coefficients, and directional coefficients of the PAN image were injected to obtain the fused result. Spectral and spatial improvements were observed in the fused product. However, because a static weight parameter was set through experiments based on the selected images, application to different pairs of inputs was tedious.

In [11], the authors presented a method for the fusion of IR and visible remote sensing images (optical), and the method was employed using multi-contourlet transform. A weighted fusion rule was applied for the low-pass coefficients and local energy-based rules for the directional coefficients. The authors determined the weight for the fusion rule by using the golden-section algorithm. The performance of the algorithm was encouraging in the fusion of IR and visible images both quantitatively and qualitatively, but a high premium was paid due to the multi-contourlet decomposition schema and iterative golden-section search algorithm. In [100], the authors developed a method for the fusion of multi-sensor images in remote sensing using contourlet packets along with a PCNN. They applied the PCNN rules on the low-pass coefficients, and a region-based rule was applied to the directional coefficients. Restoration in the fused coefficients was more efficient than with contourlet-based methods due to utilization of a non-subsampled directional filter bank, but severe color degradation occurred when the radiometric information was dominant in the inputs.

In [51], the authors developed a method for fusion of multi-sensor images using local energy and sharp frequency-localized contourlet transform. They employed a local energy-based rule for low-pass coefficients and a sum-modified Laplacian-based rule for directional coefficients in the fusion phase. This method addressed the issue of the pseudo-Gibbs phenomenon that was seen in existing methods, but suffered from degradation of radiometric information during fusion. In [86], the authors developed a fusion algorithm in the contourlet domain by retaining the structural information using discontinuity preservation based on edge learning. They used the maximum a posteriori (MAP) approach to determine the low-pass coefficients and Markov random field (MRF) prior method to determine the fusion coefficients in the directional components. Improvements in sharpness and structural information were observed due to the MAP-MRF combined approach. However, time complexity was an issue due to the utilization of iterative minimization algorithms.

In [99], the authors presented a method for fusing multi-sensor images using hidden Markov tree and PCNN. Their method relies on the consideration of statistical dependence to obtain a mature output. The authors utilized a maximum selection rule for low-pass coefficients and saliency-based rules for directional coefficients using PCNN. This method addressed issues of minor distortions of the structural components. However, it was found that with multispectral images, if the radiometric information was dominant, the method would fail when transferring radiometric information by generating a saliency map using the EM algorithm. In [92], the authors investigated the contourlet representation and introduced a tunable contourlet transform to attain an efficient transformation for fusion. They employed averaging for the low-pass coefficients and absolute maximum selection for the directional coefficients. Minor distortions of the structural components and blurring of the radiometric components were rectified using the proposed method, but the method was limited due to the selection of parameter Q. The parameter Q selected to tune the decomposition schema of the contourlet is dependent on the entropy change. In the case of the homogeneous region, this factor would have severe effects, leading to failure.

6 Non-Subsampled Contourlet Transform (NSCT)

The NSCT divides the two-dimensional signal into multiple components which are shift-invariant. The 2D signal is decomposed into different levels of decomposition, referred to as a non-subsampled pyramid structure (NSPS). The non-subsampled directional filter bank (NSDFB) is applied to the high-frequency component to obtain directional components, as shown in Fig. 7a.The filter bank, which splits the frequency plane in 2D, is pictured in Fig. 7b. The NSPS provides multi-scale properties, and the NSDFB provides the directionality.

Fig. 7
figure 7

NSCT overall structure

6.1 NSCT Representation [56]

The representation of NSCT comprises the following properties.

  1. 1.

    Multi-resolution

  2. 2.

    Multidirectional

  3. 3.

    Shift invariance

  4. 4.

    Regularity

  5. 5.

    Redundancy (\(J+1\)) J is the number of levels

6.2 Multi-Resolution Analysis

$$\begin{aligned} {[V_j]}_{j \in Z} \end{aligned}$$
(1)

where Z indicates a real number. This provides a sequence of multi-resolution nested subspaces, as given below

$$\begin{aligned} ...V_{-2} \subset V_{-1} \subset V_{0} \subset V_1 \subset V_2 ... \end{aligned}$$
(2)

where \( V_j \) is associated with the uniform grid of \( 2^j * 2^j \)

Difference images in the subspace are \(W_j\), and the orthogonal difference is the subspace that orthogonally complements \(V_j\) in \(V_{j-1}\)

$$\begin{aligned} V_{j-1} = {V_j}\oplus {W_j} \end{aligned}$$
(3)

6.3 Multidirectional Analysis

Equation 4 shows the result when the filter bank is applied to approximation subspace \(V_j\).

$$\begin{aligned} V_j = \bigoplus _{k=0}^{{2^l}-1}V_{j,k}^l \end{aligned}$$
(4)

where \(k = 0,1,2,\ldots , 2^l-1\) indicates the total number of wedges. Wedges represent the directional elements.

The multi-scaling of the NSCT is achieved as follows

$$\begin{aligned} L_2(R^2) = \bigoplus _{j=Z}W_j \end{aligned}$$
(5)

where \(W_j\) is not shift-variant.

6.4 Shift Invariance

To make the representation shift-invariant, a lifting theorem is applied to the filters as follows

$$\begin{aligned} \begin{pmatrix} H_0^{2D}{f(z)}\\ H_1^{2D}{f(z)} \end{pmatrix} = \varPi _{i=0}^{N} \begin{pmatrix} 1 &{}\quad 0 \\ P_i^{2D} &{}\quad 1 \end{pmatrix} * \begin{pmatrix} 1 &{}\quad Q_i^{2D} \\ 0 &{}\quad 1 \end{pmatrix} \begin{pmatrix} 1\\ 0 \end{pmatrix} \end{aligned}$$
(6)

where f(z) is a 2D function, and P and Q have the same complexity.

6.5 Regularity

\(\varPhi (\omega )=\varPi _{j=0}^{\infty } H_0(2^{-j}){\omega }\) is obtained by scaling the detail of the low-pass approximation, and the regularity is controlled by a low-pass filter.

$$\begin{aligned} H_0(\omega )=\left( \frac{1+e^{j\omega _1}}{2}\right) N1\left( \frac{1+e^{j\omega _2}}{2}\right) N2 \end{aligned}$$
(7)

6.6 Literature Survey: NSCT-Based Fusion

In [84], the authors investigated the utilization of NSCT in image fusion and developed a method for fusing multi-sensor images using NSCT. They employed averaging for low-pass coefficients and a regional energy-based rule for the directional coefficients. The method showed successful utilization of NSCT in image fusion, with good results in terms of structural preservation. However, a high premium was paid due to the dynamic change in the windowing-based fusion rule. In [97], the authors developed an algorithm for the fusion of remote sensing images based on NSCT and a luminance hue saturation (LHS) color model. They employed the substitution of decomposed coefficients to achieve fusion. Their method preserved the structural information to a good extent, but failed to retain the spectral information.

In [101], the authors developed a method for fusion of multiparametric SAR images by utilizing stationary wavelet-based NSCT (SW-NSCT) and PCNN. They employed PCNN to apply fusion rules on the decomposed coefficients for final fusion. The performance of the method was good when a large number of iterations were used, which led to increased time complexity. In [44], the authors presented a novel method for fusion of IR and visible images using NSCT and region segmentation. They employed substitution of coefficients at the low-pass level and local region-based fusion rules for the directional coefficients. The limitation in this method was the increased brightness in the final reconstructed image, leading to misclassification of small linear structures.

In [45], the authors developed a method for the fusion of multi-sensor images in remote sensing using NSCT and PCA. They employed substitution for the low-pass coefficients performed by PCA, and maximum value selection for the directional coefficients. This method performed well in terms of spectral preservation, but a blurring of structural components was observed in the output. In [53], the authors introduced a pan-sharpening method by considering spectral and spatial qualities of PAN and MS images in the NSCT domain. They used a fourth-order correlation coefficient (FOCC) as a decision-level parameter to determine which component was injected. Improved spectral enhancement was observed in the fused product and noise, and the blurry effect decreased considerably. However, the FOCC-based results were not convincing in terms of structural component preservation compared with correlation coefficient (CC)-based methods.

In [96], the authors introduced a novel region-based fusion approach by utilizing NSCT and particle swarm optimization. A maximum value-based rule was applied to the approximated level, the bandpass coefficients were segregated as smooth regions, and the edge region using particle swarm optimization and the maximum value-based rule was applied on the segregated components. This method performed well in terms of spatial enhancement, and minor blurry effects were also addressed. A decrease in the color values was observed, which may have been due to the use of the IHS color model. In [27], the authors introduced a method for pan sharpening in the NSCT domain by considering the issue of interpolation during upsampling. Upsampling was performed after the decomposition of the PAN image to avoid an interpolation issue. A decision map was used for the application of fusion rules. Structural components were retained well in the fusion process, but patches were observed due to spectral distortion.

7 Shearlets

Shearlets have received much attention in the image processing domain over the past few years, as they are well suited for continuous and discrete signal representations. Intrinsic features such as curves will be represented by straight lines in 2D data, which leads to roughness of the curves in the results. Directional representations have solved this issue by providing directional components for representing features in images. Shearlets exhibit time and frequency localization with long and narrow elements that hold intrinsic geometrical information. The discrete shearlet transform provides a good approximation with tight frames, producing optimal directional representation. The directional representations provided through shearlets are scale-, shearing-, and translation-invariant.

Discrete signal representations are considered; hence, the work is based on discrete 2D signals (images). Shearlets are constructed on the basis of affine systems with composite dilations [theory of wavelets with composite dilations].

Affine systems with composite dilations are of the form \({\varPhi _{AB}{(\psi )}}\) where \(\psi \in L^2(R^2)\); A and B are invertible matrices with \(|det \quad B|= 1\), so

$$\begin{aligned} B =\begin{vmatrix} 1&\quad 1\\ 0&\quad 1 \end{vmatrix} A = \begin{vmatrix} 4&\quad 0\\ 0&\quad 2 \end{vmatrix} \end{aligned}$$

where the matrix A is referred to as the anisotropic dilation matrix, and matrix B is referred to as the shear matrix. The affine system with composite dilations is defined by considering the prior conditions provided above, as follows.

$$\begin{aligned} {\varPhi _{AB}{(\psi )}}=\psi _{j,k,l}(X)=|det \quad A|^{j/2} \psi (B^lA^j(x-k)), \end{aligned}$$
(8)

where \(j \in Z, l \in Z\), and \(k \in Z^2\). The shearlets are a good example of affine systems. The shearlet elements are supported by Parseval frames. The elements of shearlets are defined [21] as follows

$$\begin{aligned} \psi _{j,k,l}^0=2^{3{j/2}} \psi (B_0^lA_0^j(x-k)) \end{aligned}$$
(9)

for \(j \ge 0, -2^j \le l \le 2^j \), \(k \in Z^2\) similarly,

$$\begin{aligned} \psi _{j,k,l}^1=2^{3{j/2}} \psi (B_1^lA_1^j(x-k)) \end{aligned}$$
(10)

In general,

$$\begin{aligned} \psi _{j,k,l}^d=2^{3{j/2}} \psi (B_d^lA_d^j(x-k)), \end{aligned}$$
(11)

where “d” represents the total number of decomposed elements present after decomposition.The decomposition schema is shown in Fig. 8.

Fig. 8
figure 8

Decomposition schema of shearlets

Each element “d” is of trapezoid form.

The size of the trapezoid = \(2^{2j} * 2^j\), where j is the level of decomposition.

To reduce the computational complexity in Eq. 4, the parameters lj and k are replaced with index value i. The count of the index value starts when the slope is 1 (i.e \(\theta = +\,45^{\circ }or -\,45^{\circ }\)) and proceeds in the clockwise direction up to \(180^{\circ }\). The incremental count will start from the next scale. The indexing schema is shown in Fig. 9. Hence, the total number of indices at \(J_0-1\) scale is given as \( \eta = 1+\sum \nolimits _{j=0}^{j_0}2^{j+2} \)

Fig. 9
figure 9

Indexing in shearlets

\(\implies \)

$$\begin{aligned} \eta = 2^{j_0+2}-3 \end{aligned}$$
(12)

If the level of decomposition is 2, then 13 decomposed elements will be obtained.

If the level of decomposition is 3, then 29 decomposed elements will be obtained.

7.1 Advantages of Shearlet Transform

  • Shearlets are characterized by the isometry of the pseudo-polar transform.

  • The closeness to isometry helps to avoid information dissipation that occurs in decomposition and restoration.

  • The invertibility characteristic of shearlets makes the system translation-invariant.

  • Space-frequency localization is provided to a good extent to avoid decays in spatial and frequency domains.

  • The smoothness provided by the shearlet transform in the spatial and frequency domains helps to retain the intrinsic geometrical information present in the data.

7.2 Literature Survey: Shearlet-Based Fusion

In [20], the authors investigated the utilization of shearlet infusion of multi-sensor remote sensing images. They introduced a pan-sharpening method using shearlets and an LHS color model. They employed substitution of pan coefficients for multispectral coefficients at the approximated level and maximum selection on the directional coefficients. This method provided good results in terms of spatial enhancement and spectral distortion, but it produced better results when there was less scene heterogeneity. In [58], the authors developed a method for image fusion based on shearlet transform along with region-based fusion rules. In their proposed method, averaging was performed at the approximated level, and a regional consistency-based rule was applied on the directional coefficients. This method showed greater improvement in spatial enhancement but failed to preserve the spectral information.

In [13], the authors proposed a method for image fusion based on a pulse-coded neural network (PCNN) in the shearlet domain. They employed the PCNN to apply rules based on the decision map generated from the decomposed coefficients. This method produced a better result than the previously discussed shearlet-based methods in terms of retaining structural and radiometric information, but a high premium was paid due to the utilization of PCNN for all of the decomposed coefficients. In [25], the authors investigated a dual-tree compactly supported shearlet transform (DTCST) for pan sharpening. They utilized PCA and DTCST, and employed substitution of PAN coefficients for multispectral coefficients at the approximated level and applied the maximum selection rule for the directional coefficients. Spatial and spectral enhancement were observed in the fused product, but the method suffered from time complexity, which was greater due to the use of a Gaussian Markov random field (GMRF).

In [78], the authors presented a method for pan sharpening in the shearlet domain using a regional division strategy. They employed fusion rules based on regional correlation for all of the decomposed coefficients. Remarkable spatial enhancements were observed due to the application of region-based rules. However, because IHS-based color transformation was used, a brightness issue was seen in the final pan-sharpened image. In [70], the authors developed a method for pan sharpening in the shearlet domain by considering the regional relevance metrics. They employed a local region-based fusion rule for the approximated coefficients, and a gradient-based rule for the directional coefficients. The results showed good retention of structural and radiometric information, with a lower premium paid by this method than by the other methods in the shearlet domain discussed in this section.

8 Discussion of MGA Tool-Based Methods

Wavelets are best suited for the fusion of remote sensing images, but their limitations play an important role in the fusion results. Wavelets are good at isolating the discontinuities at edge points, but fail to retain smoothness along the contours. The limited directional decomposition of wavelets degrades the continuity of contours and linear structures, which leads to the loss of intrinsic geometrical information and introduces artifacts in the fused image when restoring the decomposed coefficients. MGA-based tools aim to solve the two-dimensional discontinuities by offering a flexible multi-resolution and multidirectional decomposition of images.

MGA-based tools provide sparse expansions for images with smooth contours and linear structures. In practice, the multi-scale transform is applied to the image to obtain the different frequency components, and a directional filter is applied to the decomposed components to obtain smooth contours and linear structures of the same scale. Since MGA-based tools have the capacity to retain the intrinsic geometrical information in the image by their representation, their use has been encouraged in image fusion algorithms. In previous sections, we have discussed fusion methods that rely on MGA tools such as contourlets, NSCT, and shearlets.

Many fusion schemes have been introduced in the literature for MGA-based fusion methods, including averaging, maximum selection, minimum selection, weighted averaging, PCA, IHS, edge information, regional energy measurement, threshold, expectation–maximization algorithm, region-based rules, K-means clustering, local energy, sum-modified Laplacian, maximum a posteriori, Markov random field, Gaussian Markov random field, particle swarm optimization, regional divisional strategy, and regional relevance metric-based schemes. It is difficult to provide an absolute conclusion regarding the value of these fusion schemes given the differences in the comparisons and experiments. We have provided some general remarks based on the tools utilized rather than the schemes used in the methodologies, as the schemes developed are application- and data-dependent. However, although evaluating the fusion scheme in absolute terms is very difficult, in previous sections we have provided the advantages and drawbacks of the schemes corresponding to the various methods.

Contourlets provide optimality for the analysis of linear or curved structures in 2D data (images), whereas wavelets provide optimality in the analysis of zero-dimensional or point singularities. The basic idea behind contourlets is to generate multi-resolution and multidirectional expansion using non-separable filter banks. By performing parabolic scaling, the contourlets achieve continuously differentiable curves with directional vanishing moments. With the infusion of multi-sensor imagery in remote sensing, the above-mentioned property helps to retain the structural information during the decomposition of the image into approximated and directional components. The contourlet-based method provides better results in terms of structural preservation, and the radiometric information retained is dependent on the fusion schema introduced in the literature. On the other hand, the drawback to utilizing contourlet transform is that it is shift-variant, and this property introduces a pseudo-Gibbs phenomenon in the fused image. Downsampling and upsampling operations at the decomposition and reconstruction levels lead to the introduction of artifacts in the fused results. However, the better retention of structural information than that achieved with wavelets has encouraged the use of contourlets in the fusion of remotely sensed multi-sensor images.

The non-subsampled contourlet transform (NSCT) has a similar process of decomposition and reconstruction as the contourlet transform. The difference in NSCT is that it uses a non-subsampled pyramid in place of a Laplacian pyramid and a non-subsampled directional filter bank in place of a directional filter bank to avoid decimation when performing decomposition and reconstruction. By avoiding decimation, the shift invariance property is achieved by NSCT. The NSCT provides better selectivity of frequency components, and regularity is well maintained for two-dimensional signals. Studies have proven that NSCT coefficients are strongly dependent on their neighborhood and cousin coefficients. The above-mentioned properties have provided a platform for the use of NSCT in the field of fusion. The correlation maintained by the coefficients with the neighborhood helps to retain the radiometric information when fusion is performed, and the strong selectivity helps the fusion algorithms retain the structural components. The shift-invariance property helps to address the issue of artifacts caused by contourlets and wavelets, as discussed in the literature. However, even though NSCT provides maximum optimality of signal decomposition, the NSP and NSDBF construction is highly complex, which leads to an increase in the time and space complexity in the fusion algorithms.

Shearlets have the same advantages as NSCT, and also provide a simplified mathematical structure and added flexibility due to their discretization. They provide multi-resolution analysis similar to that associated with classical wavelets, which is encouraging for developing faster algorithmic implementations. The advantages of lower complexity in time and space, along with the other advantages as discussed for NSCT, encourage the use of shearlets in image fusion algorithms. From the discussion, it is clear that every tool possesses its own advantages and drawbacks; the selection of the tool plays an important role in the fusion result.

9 Conclusion

Fusion of multi-sensor images in remote sensing offers the potential for application in a variety of areas including defense, agriculture, disaster management, and urban development. The fusion arena has been developed extensively over the past three decades, and yet it continues to evolve in new directions and wider scope. Challenges also continue to emerge with the increasing availability of data. Research communities are developing advanced techniques, methods, and architectures as a result of the increased interest and challenges in fusion.

In this paper, we have provided a detailed study of the widely used MRA- and MGA-based tools and their implications for the fusion of remote sensing images. We have discussed the efficiency of the tools and of the corresponding fusion schema in retaining the desired information. From these discussions, we have drawn the following conclusions.

  • Tool selection for fusion is a critical task because of its own potential advantages and drawbacks.

  • The fusion schema plays a vital role in the fusion process depending on the application and the type of data utilized.

  • It is difficult to determine the best methodology among existing methods due to differences and the subjective nature of comparisons and experiments.

Because of improvements in sensors, image fusion in remote sensing has increased the possibilities for research. The development of tools used for image representation enhances the fusion domain and provides greater scope. We hope that the advances in image representation tools and their implications for remote sensing image fusion will expand the work in this field.

Since fusion is an interesting problem in the remote sensing domain, each year a fusion contest is conducted by IEEE Geoscience and Remote Sensing (IGARSS). They distribute data along with the ground truth values for their algorithm evaluation. The contest held by the committee encourages researchers to pursue work on multi-sensor remote sensing image fusion.