1 Introduction

Image processing is one of the technologies that has recently experienced significant development. Digital image processing, which is utilised in digital computers, comprises of the modifications that are included in it [1]. Though it primarily focuses on images, the subfield of image processing is known as signals and systems. A common procedure that includes both digital signal processing and image-specific techniques is called digital image processing [2]. The most typical illustration is Adobe Photoshop which is considered as Currently, the most popular tools for processing digital photos.

Digital image processing is the act of processing digital images using a computer. Digital image processing offers fundamental and sophisticated principles for an image processing tutorial [3]. Digital photographs are primarily processed in Adobe Photoshop.

The image fusion technique collects all the essential information from numerous photos and combines it into fewer, usually single, images [4]. This single image is more accurate and instructive than any other image from a single source since it contains all the necessary information. In addition to lowering the amount of data, image fusion strives to provide visuals that are more pertinent and understandable for both human and machine perception [57,6,]. The process of combining pertinent information from two or more images into one image is known as multisensor image fusion in computer vision. The input photographs will all be less instructive than the finished image [8, 9]. The expanding array of space-based sensors accessible for uses in remote sensing. Picture fusion can be done in a variety of ways.

2 Literature Survey

2.1 High Dynamic Range Imaging

A remarkably easy way for considerably boosting the dynamic range of practically any imaging system was put out by the researchers [11, 12]. The fundamental idea is to sample the image irradiance's spatial and exposure dimensions simultaneously. Placing an ideal mask next to a traditional image detector array is one of several ways to accomplish this. Adjacent pixels on the detector receive various exposures to the scene due to the mask's pattern of spatially variable transmittance. An effective image reconstruction technique maps the collected image to a high dynamic range image 13. The ultimate result is an imaging system that can produce a significantly greater number of brightness levels and measure a very wide range of scene irradiances, with a minor drop in the number of brightness levels.

2.2 Multi-Exposure Image Fusion

A. Goshtasby developed a technique for combining multiple exposure photos of a still scene shot with a stationary camera to create an image with the most amount of information possible. The technique divides the image domain into uniform blocks and chooses the image inside each block that has the greatest information. Next, monotonically decreasing blending functions with a sum of 1 throughout the entire image domain and centres at the blocks are used to combine the selected images. A gradient-ascent approach is used to calculate the ideal block size and width for the blending functions in order to optimise the amount of information included in the combined image. Finding the image at a specific location that has the most information is the key issue at hand.

2.3 Exposure Fusion

A method for combining a bracketed exposure sequence into a high-quality image without requiring HDR was put out by T. Mertens, J. Kautz, and F. Van Reeth. The acquisition workflow is made easier by skipping the physically-based HDR assembly. This saves time on computation by avoiding camera response curve calibration. Additionally, it permits the sequence to include flash pictures. Utilising straightforward quality indicators like saturation and contrast, this approach combines many exposures. To account for the brightness change in the sequence, this is done in a multi-resolution manner. Using exposure fusion, the desired image sequence is computed. A scalar-valued weight map, which is composed of a collection of quality measurements, directs this procedure. It helps to visualise the input sequence as a collection of pictures.

2.4 Random Walks for Multi-Exposure Image Fusion

The multi-exposure picture fusion problem was approached from a new angle by Rui Shen, Irene Cheng, Jianbo Shi, and Anup Basu. It is thought to be best to strike a balance between the two quality metrics of local contrast and colour consistency using a probabilistic approach. Based on these two metrics, it is computed pixel-by-pixel probabilities that a pixel in the fused image originates from various input images, which are then utilised as fusion weights in the composition step. Local contrast is indicated by the pixel, which is how the fused image is displayed. The colour consistency measure mandates both uniformity with the surrounding natural landscape and consistency within a broad community. This measurement is based on the supposition that neighbouring pixels with comparable colours in the majority of the input images will suggest similar colours.

3 Methods

3.1 Existing Method

The CDL problem can be expressed as follows: minimise D s.t. Dk, D, k = 1 K, where D. Given T sets of N dependent input signals and associated concurrent SRs (and). It is possible to solve problem utilising batch or online CDL methods because it is a common CDL problem. When using online CDL, training samples must be observed sequentially over time, whereas batch CDL requires all training data to be accessible at once. When there are more filters in the dictionary (here K) than there are training samples (here T N), online CDL is also more computationally efficient. The CDL issue can be extended to include learning multimodal convolutional dictionaries if the input signals are multimodal, and the order of modalities is fixed throughout all T sets of training samples. This can be written as n = 1 = 1 = 1 = 1 s.t. D, which has N distinct CDL issues that can be solved. Using the corresponding filters in the multimodal dictionaries, problem can be understood as learning correlated (coupled) characteristics in multimodal data (Fig. 1).

Fig. 1
A flow diagram. Includes data base with visible and N I R images, image acquisition, color mapping, low pass filtering, convolutional dictionary learning, alternating direction method of multipliers, and multi model fused image.

Block diagram

4 Proposed Method

The NIR images are distinguished by high contrast resolutions, which are useful for imaging in low-visibility atmospheric circumstances like fog or haze and for capturing scenes with vegetation. These qualities are utilised to enhance outdoor VL photos using NIR photographs. We provide an NIR-VL image fusion approach based on CSSA and CDL in this section. Both ‘1’ and ‘2,1’ regularizations as well as multimodal dictionaries are used to perform the CSSA. The following list explains each stage of the suggested procedure for merging two identical-sized NIR and VL images (sn and sv, respectively). Due to the NIR images’ greyscale presentation, they can be combined with the intensity components of VL images, which are often provided in using the proposed CSSA technique and two pre-learned multimodal NIR-VL dictionaries, the convolutional SRs Xn and Xv are generated for shn and shv,g, respectively (designated as Dn and Dv). The convolutional SRs are fused using the maxabsolute-value fusion criterion. Fnk and Fvk are fused convolutional SRs that contain only the most significant representation coefficients at each entry. Fvk if |Xvk(I, j)| |Xnk(I, j), otherwise Fvk if |Xnk I j) | > |Xvk I j), otherwise furthermore, the points I j) represent the locations of each pixel in the images shn and shv, whereas | | represents the absolute value of an integer and k = 1 in this example (number of filters in the dictionaries). Then, using the fused greyscale high-resolution component (Figs. 2, 3 and 4).

Fig. 2
A photograph of a tree by a winding road on a lawn. A house is at a distance. The objects are not sharp.

Visible image

Fig. 3
A photograph of a tree by a winding road on a lawn with a house at a distance. The objects lack the original colors, but are in 2 types of shades.

NIR image

Fig. 4
A photograph of a tree by a winding road on a lawn. A house is at a distance. The objects are well defined.

Fused image

5 Results

First, a pair of NIR-VL pictures are sparsely approximated using the suggested CSSA algorithms with various sparsity structures. In order to do multifocus and multimodal image fusion tasks, we then apply the suggested methods. The 32 filters of size 8 × 8 in the convolutional dictionaries utilised in the experiments are learned using the online CDL approach. A multifocus picture dataset and an NIR-VL image dataset, each with 10 pairs of images, make up the training data. The RGB-NIR scene dataset and the Lytro dataset, respectively, are used to capture the NIR-VL and multifocus photos. Both visually and through objective evaluation indicators, the fusion results are assessed. The average peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM), average entropy (EN), and the average are the five metrics utilised for objective evaluations.

From the Figs. 567, there are two images: a visible image and an NIR image. The visible image is more amenable to human perception and contains a plethora of textural data. By acquiring considerable thermal radiation data, infrared pictures can be used to emphasise significant targets like cars, people, and other objects even in low light or other extremely hostile situations. An NIR light source is required to create an image since, like visible light, NIR is a reflected energy. During the day, the sun supplies plenty of IR light, but at night, an IR light source is needed to illuminate an area. A final image with more information is obtained when we merge the two input photos, a visible image and an NIR infrared image (Fig. 8, 9, and 10).

Fig. 5
A photograph of a circular flight of stairs. The objects are not sharp.

Visible image

Fig. 6
A photograph of a circular flight of stairs. The objects lack the original colors, but are in 2 types of shades.

NIR image

Fig. 7
A photograph of the fused image of the visible and N I R image of the stairs. The objects are well defined.

Fused image

Fig. 8
A photograph of 2 sofa sets arranged perpendicularly in a room. The objects are not sharp.

Visual image

Fig. 9
A photograph of 2 sofa sets arranged perpendicularly in a room. The objects lack the original colors, but are in 2 types of shades.

NIR image

Fig. 10
A photograph of 2 sofa sets arranged perpendicularly in a room. The objects are well defined.

Fused image

To create the fused convolutional SRs, we fuse the convolutional SRs (with identical supports) using the elementwise maximum absolute value rule. The other steps of the two algorithms are identical. The acquired fusion findings show that using CSSA results in significant gains in terms of greater contrast resolutions and better fusing of multifocus edges (boundaries where one side is in focus, and the other side is out of focus). The figure depicts an example of fusion results obtained using the two procedures. The objective evaluation results in Table II also show that CSSA increases the overall performance of the CSA-based multifocus image fusion approach.

6 Conclusion

Based on the alternating direction approach of multipliers, algorithms for convolutional simultaneous sparse approximation with diverse sparsity structures have been presented. We tested the efficacy of the suggested approaches by applying them to two distinct types of picture fusion challenges and comparing the results to those of current image fusion methods. A novel near-infrared and visible light picture fusion approach based on convolutional simultaneous sparse approximation was suggested in particular.