Keywords

Stitching two or more images together to create a photo-mosaic that enables the interpretation of the benthos by a scientist (biologist, geologist, archeologist, etc.) requires the use of a blending technique to obtain a seamless mosaic (see Fig. 3.1).

Building a photomosaic requires performing a geometrical registration to align the images involved as well as a photometrical registration to equalize color and luminance appearances [1]. Both kinds of registrations may lead to image inconsistencies in the mosaic. The visibility of such inconsistencies should be minimized in order to provide the mosaic with a homogeneous appearance, which is important from not only the aesthetical but also the cognitive point of view. Geometrical misalignments result in distinguishable object discontinuities and incongruence, while photometrical misalignments make the visibility of seams more evident, reducing the consistency of the global appearance of the mosaic.

Fig. 3.1
figure 1

Photo-mosaic built from six images of two megapixels. The mosaic shows noticeable seams in (Left), where the images have only been geometrically transformed and sequentially rendered on the final mosaic canvas, the last image on top of the previous one. After applying a blending algorithm, the artifacts (image edges) disappear from the resulting mosaic (Right). Images courtesy of Dan Fornari (Woods-Hole Oceanographic Institution)

Due to the above stated reasons, there are three main concepts guiding image blending algorithms. Firstly, the effects of different illumination or exposure times between images should be minimized. Secondly, an adequate seam should be found in order to reduce the visibility of micro-registration misalignments and moving objects. Lastly, a smooth transition along the selected seam must be applied to reduce the prominence of transitions between images.

The basic principles of image blending where established four decades ago [2] and include two main concepts which lead to two groups of algorithms [3]: transition smoothing and optimal seam finding. On the one hand, transition smoothing methods (also known as feathering [4] or alpha blending methods [5]) attempt to minimize the visibility of seams by smoothing the common overlapping regions of the combined images. On the other hand, optimal seam finding methods place the seam between images where photometric differences in their joining boundaries are minimal [6, 7]. Image blending methods often combine the benefits of both groups of algorithms (e.g. [2, 8]) in order to produce more plausible results and to reduce to an even higher degree the noticeability of the joining regions. A smooth transition between the fused images is applied, but along an optimally selected seam, a combination which helps to avoid double contours and blurring effects when image registration is not accurate enough. This group of methods will be called from now on hybrid methods.

This chapter provides a review of the most relevant blending techniques in the literature since 1975. The methods listed are divided into three different groups, corresponding to its main principle: transition smoothing methods, optimal seam finding methods and hybrid methods. A classification of the approaches according to several features and properties is also proposed in order to highlight their benefits and drawbacks in different scenarios.

3.1 Transition Smoothing Methods

The main concern of transition smoothing methods is to produce a non-perceptible transition between two images over a given overlapping region (see Fig. 3.2). The information of this common area is fused in such a way that the boundaries of the images involved become invisible. Even though a totally indistinguishable transition may be achieved, the content and coherency of the overlapping region is not guaranteed, as the information is fused without taking into account the content of the scene.

Fig. 3.2
figure 2

Example of the application of a transition smoothing method on the overlapping area of two images. The images show different exposures and significantly different sizes once registered. As a result of the blending algorithm, the transition between both images is smooth though noticeable

In the early 70s, D. Milgram [2] addressed the problem of the seamless combination of two satellite images. The approach was intended to deal with only one pair of images horizontally registered, which is a limiting factor for the application of the method to different and more complex scenarios. This constraint lead to a method which searches for the smoothest transition in a row-wise manner. An arbitrary surrounding range is defined at each row around a given selected seam pixel, allowing to smooth the transition in that direction using a weighted average of the luminance values. Consequently, the method achieves a smooth transition in the horizontal direction, but this smoothness cannot be guaranteed in the vertical direction. The weighted average of luminance values (of grayscale images) became the first approach to the transition smoothing problem and a basic principle used by several methods that arose in the following decades.

Still in the context of low-scale (order of mega-pixels) aerial photo-mosaicing, the limitation of using only two overlapping images was addressed in the first instance by Peleg [9], who introduced the concept of Seam-Eliminating Function (SEF). The SEF is based on a luminance smoothing function (i.e. a weighting map), obtained using a computationally expensive iterative relaxation algorithm, which is used to smooth the transition from an arbitrary number of overlapping images (although the overlapping information is not used and the seams used are not optimal), setting the intensity differences along the seams at zero. The main advantage of the method is that the gradual, smooth change does not affect the detail nor the picture near the seams. Nevertheless, in lack of an optimal seam finding strategy, images suffering from vignetting may lead to mosaics with noticeable illumination artifacts.

In 1983, Burt and Adelson [10] introduced the concept of image spline to obtain a smooth transition among several images. The approach was multipurpose, extending its fields of application to any imaging scenario, as opposed to Milgram’s [2] who focused on satellite imaging. It was also the first approach to image compositing, i.e., the first method able to seamlessly fuse several images from different and unrelated scenes. The images to be fused are decomposed into a set of band-pass component images, and a separate spline with an appropriate transition width is applied to each band. The goal is to fuse the features from the same scale at each band-pass level. Finally, the splined band-pass components are recombined into the desired mosaic image using a simple addition. The method suppresses the visibility of the seams and reduces the noticeability of the misalignments when registration is imperfect. However, it leads to double contouring and ghosting effects when the misalignment is significant (see Fig. 3.3). In 1996, Hsu and Wu [11] extended the idea of Burt and Adelson [10] by applying the method to wavelet subspaces with the aim of avoiding the undesired oversampling nature of the Laplacian pyramid. Although the improvement on the results obtained is negligible, similar results are obtained despite the higher computational cost.

Fig. 3.3
figure 3

Sample photo-mosaic region with (a) and without (b) ghosting and double contouring in the transition region due to registration inaccuracies. Seabed structures 1 and 2 are noticeably blurry in (a) while having a sharp appearance in (b). (c) shows two overlapping images of a given photo-mosaic (\(I_1\) and \(I_2\)) represented in the red (\(I_1\)) and green (\(I_2\)) channels. Consequently, perfectly registered regions should appear in yellow, while the regions affected by misalignments present a reddish or greenish appearance. The image without ghosting and double contouring has been obtained using the blending approach proposed in this book. Images courtesy of Dan Fornari (Woods-Hole Oceanographic Institution)

In 2003, Pérez et al. [12] proposed a generic interpolation machinery based on solving Poisson equations for seamless editing and cloning of selection regions. Despite the main focus of that framework being image composition, it may also have applications in the underwater photo-mosaicing context when combined with an appropriate optimal seam finding strategy. The approach allows us to suppress the visibility of the seams along the joining regions. Beyond luminance and wavelet domains, this is the first important approach to image mosaicing in the gradient domain. The method is based on the idea that, through suitably mixing the gradient of a given image with that of another, it becomes possible to convincingly fuse image regions (namely objects) with a transparent appearance. The framework is based on the partial differential equation with Dirichlet boundary conditions which specifies the Laplacian of an unknown function over the domain of interest, along with the unknown function values over the boundary of the domain. As an extension of the technique presented by Bertalmio in [13], Pérez et al. proposed to modify the problem of image interpolation through Poisson equation by introducing further constraints in the form of a guidance field. In the same context, Levin et al. [3] proposed a method based in several cost functions for the evaluation of the quality of the stitching defined in the gradient domain. Levin et al. named GIST (Gradient-domain Image STitching) the framework developed based on this method. GIST provides two main approaches to image stitching. In the first one, images are combined in the gradient domain, reducing global inconsistences between the stitched parts due to illumination changes and variations in the camera photometric response. The stitched image is computed by minimizing a cost function evaluating the dissimilarity measure between the derivatives of the stitched image and the derivatives of the input images. In the second one, the mosaic image is inferred by optimization over image gradient, reducing seam artifacts and edge duplications. In this case, the stitching is performed using feathering, pyramid blending [14] or optimal seam [15]. The drawbacks of the methods working exclusively on the gradient domain are the important computational resources required to deal with large datasets.

Following the idea of gradient domain image blending, Agarwala et al. proposed a technique in 2004 that combined methods belonging to the two main classes of blending algorithms [8]. Firstly, graph-cut optimization [16, 17] was used to find the optimal place for the seam within the overlapping region. Secondly, gradient-domain fusion [12] was applied to reduce or remove any remaining visible artifacts along the image seams. The method has multiple applications in the image photomontage field and achieves convincingly seamless results. The framework developed was mainly intended to require user guidance to select the interest image regions, thus being unsuited for the automatic generation of photo-mosaics. In 2007, Agarwala [18] presented a hierarchical approach to improve the efficiency of gradient-domain compositing. The efficiency increase was achieved by observing that the difference between a simple color composite and its associated gradient-domain composite is largely smooth, and the pattern of this smoothness can be predicted a priori. This difference is solved by adaptively subdividing the domain using a quadtree hierarchical structure [19]. Unfortunately, the increases in efficiency with this method only occur if the problem can be transformed into a space where the solution is mostly smooth, and the pattern of this smoothness can be predicted a priori. Consequently, when the number of overlapping images increases and the overlapping regions become smaller, the performance of the methods also decreases. In 2011, Szeliski et al. [20] presented a technique for fast Poisson blending and gradient domain compositing which associates, to each input image, a separate low-resolution offset map, that can be represented using a low-dimensional spline. The resulting linear system is much smaller than either the original Poisson system or the quadtree spline approximation of a single offset map. Since each of the offset fields is represented using a low-dimensional spline, the resultant representation is called multi-spline.

Still in the context of gradient domain blending, Su et al. [21] proposed a method based on the minimization of a blending energy function, considering not only gradient values but also luminance. Within this blending energy function, indented to combine low-level image properties, two variation terms are measured and minimized: image value variation and first derivative variation. Image value variation measures the difference between corresponding pixel values of the images to be combined and the photo-mosaic itself. On the other hand, first derivative variations measure the difference between the blended values of each respective first derivative and the first derivative of the mosaic. The resultant image can be effectively obtained by minimizing the blending energy function. Unfortunately, the computational cost of the method (according to the authors, between six and eight times slower than [10]) makes it unsuitable for large image datasets.

The problem of stitching images in real time for online photo-mosaicing was addressed by Zhao [22] in 2006. The author proposed an efficient image blending method for creating good-quality and real-time dynamic image mosaics from an arbitrary number of input images. There are three main advantages with the flexible blending technique: (a) good results and possible implementation in embedded systems for real-time performance, (b) comprehensive treatment of geometry, time and user control and (c) capability of handling exposure imbalance among frames. Flexible blending has its basis in the sequential implementation of image blending features. Unfortunately, there are some drawbacks preventing its application in large scale underwater mosaicing. Firstly, the blending step is based on an improved multi-resolution weighted average [10] which prioritizes pixels close to the image centers, but does not offer good enough results when registration problems appear. Secondly, the exposure correction mechanism takes as a reference the exposure of the photo-mosaic built until a new image is added. This fact may lead to a global exposure degeneration when some of the implied images are over or underexposed. Lastly, the method is intended to deal with small input images, but its behavior when confronted with large input images sequences is unknown.

Few approaches in the literature have specifically dealt with the problem of underwater imagery mosaicing. Gu and Rzhanov [23], proposed as a blending step the application, around an optimally found boundary, of a pure gradient domain fusion of the boundary pixels only. The method claims to overcome the short comings of gradient domain fusion, which produce blurring in the case of misalignment inasmuch as is uses information from all the implied images to build the fused gradient field. The authors do not define a criteria for selecting the contributing image in the case where multiple images overlap the same region. Thus, [23] is limited to panoramic mosaics where only two images overlap over the same area. The color treatment is not performed, being assumed that the method is gray-scale intended.

3.2 Optimal Seam Finding Methods

The objective of optimal seam finding methods is to find an optimal placement for a seam line through a given overlapping region between two images (see Fig. 3.4). This seam should minimize the photometric differences on both sides of the line and determine the contribution of the involved images to the final mosaic. Unlike transition smoothing techniques, optimal seam finding approaches consider the content of the scene in the overlapping region, allowing us to deal with problems such as moving objects or parallax. In contrast, no information is fused, and the step between the images can be easily noticeable when illumination conditions or exposure times change from frame to frame.

Fig. 3.4
figure 4

Example of the application of an optimal seam finding method on the overlapping region between two images. The images show different exposures and significant different sizes once registered. As a result of the blending algorithm, the transition between both images is still noticeable due to the different exposures and different sizes, which leads to a visible contrast concerning detail richness

Milgram [2] proposed a non-optimal seam definition strategy that searches the seam pixel offering the smoothest transition in a row-wise manner, inasmuch as it is intended to deal only with pairs of images horizontally registered. This random positioning of the edge was referred to as “feathering”, and was claimed to help reduce visual cues, but with the disadvantage of introducing discontinuities in the vertical direction. In order to deal with this drawback, a restriction of the candidate seam points, depending on the magnitude of the minimum edge difference, was imposed. This restriction allows us to obtain a more continuous and consistent seam line. The same author later proposed an improved approach, adding a pixel selection criterion in the illumination compensation step in order to deal with shadows and moving objects and considering only the most informative gray level values [24]. Furthermore, a cost function was included in the seam definition strategy, permitting to control of the origin and the final pixel coordinates in the optimal seam path.

The problem of non-static objects in the overlapping regions was addressed by Davis [6] in 1998, who found an optimal seam using Dijkstra’s algorithm [25] through the photometric differences computed between two registered images. The path obtained tends to cut around the moving object, leaving it either totally in or out of the final mosaic image. As a drawback, at least one image must contain a complete view of the moving object so as not to bisect it. Furthermore, some photometric issues that can disturb the seam localization, such as automatic exposure or vignetting, are not taken into account by the method.

Focusing mainly on the panoramic imaging context using a rotating camera, Uyttendaele et al. [4] proposed, in 2001, a method to suppress the ghosting effect in mosaic images due to moving objects, along with a procedure to adjust the exposure over multiple images to eliminate visible shifts in brightness and hue. The aim of the method is to deal with the complicated problem of multiple overlapping regions with moving objects. When confronted with ghosting artifacts, the authors proposed a search for Regions of Difference (RODs) in the overlapping areas in order to use information from only one image per ROD. Hence RODs are defined in different images to be corresponding, i.e. to belong to the same scene object, if they have any overlap at all. Regions of Difference (RODs) are then used to build a graph in which the minimum weight vertex cover [26] must be computed. However, this method is not entirely robust and situations can appear where a wrong elimination of ROD causes holes in the mosaic image. Nevertheless, according to the authors, conflictive situations are rare in practice. Concerning the exposure artifacts, a block-based exposure adjustment technique was applied. The exposure compensation solution obtains smooth but still noticeable transitions between images in some cases.

In the context of image compositing, Agarwala et al. [8] proposed, in 2004, a technique which combined methods belonging to the two main classes of blending algorithms. Concerning the seam finding strategy allowing the selection of the image regions which will contribute to the composite, a graph-cut optimization [16] was used. This graph-cut was guided, depending on user preferences, by several features, such as color, luminance or likelihood, among others. The method has multiple applications in the image photomontage field and achieves convincingly seamless results. The framework developed was mainly intended to require user guidance to select the interest image regions, thus being unsuitable for the automatic generation of photo-mosaics.

Regarding the computational and memory cost reduction of Dijkstra’s based optimal seam finding, Gracias et al. [27] proposed a method using watersheds and graph cuts intended to achieve execution speed when building large photo-mosaics. The use of watershed segmentation to find possible cuts over areas with low photometric differences allowed their algorithm to reduce the search to a smaller set of watershed segments, at the cost of sacrificing a certain degree of precision of the computed path, which is conditioned by the initial watershed segmentation. Furthermore, the use of graph cuts over image pairs guarantees a globally optimal solution for each intersection region. While the authors applied the algorithm developed to underwater images, the method can be extended to other contexts.

Eden et al. [28] presented, in 2006, a blending approach that included a two-step graph cut procedure to deal with both highly different exposures and misregistration problems, and work on a global radiance space for all the images involved. This is one of the first methods applied to the global radiance space domain. Firstly, the positions of the moving objects in the scene are defined (manually or automatically). Secondly, the entire available dynamic range is used to render the photo-mosaic. Therefore, a High Dynamic Range (HDR) image can be obtained from the photo-mosaicing process. Furthermore, two kinds of costs are introduced. Firstly, a data cost is computed to insure consistency and a high signal-to-noise ratio. Secondly, a seam cost is applied to favor smooth transitions. Nonetheless, such extreme exposure differences are not common in underwater photo-mosaicing. The gradient blending step is performed as in [8].

More recently, Mills and Dudek [29] presented a combination of techniques to create good quality image mosaics despite the presence of moving objects in the scene. The technique uses heuristic measures to determine the optimal seam, in both intensity and gradient domains, combined with a multiresolution splining [10] algorithm to refine the results around the selected seam. Concerning underwater imagery, the strong differences in appearance between images and the sequential nature of the approach may prevent its application. The exposure compensation of new added images is performed based on the already generated photo-mosaic, which may lead to mosaic degeneration as the amount of stitched images grows. Furthermore, the blending method used by the approach may lead to double contouring, specially in the presence of complex seabed structures.

In the underwater context, Gu and Rzhanov [23], similar to [3], proposed a graph-cut technique in order to select the optimal seam between two images, and the application of a pure gradient domain fusion around this boundary. The graph-cut is performed in the gradient domain with the aim of correctly dealing with images showing inhomogeneous illumination, but as opposed to [3], is performed on the overall image values, being more flexible in defining the cut area according to the authors.

3.3 Hybrid Methods

The third group of methods, which we refer to as hybrid methods, is in fact not composed of any novel blending method, but of a set of appropriate transition smoothing and optimal seam finding techniques combinations. This group of approaches typically applies a transition smoothing method around an optimally calculated (or selected by some criterion) seam in order to improve the quality of the image regions joined reducing its noticeability to an even higher degree. As a result of the combination, problems such as blurring or double contouring presented by transition smoothing methods, and others such as different exposures presented by optimal seam finding methods, can be reduced or even totally avoided. One of the (evident) drawbacks of hybrid methods is their computational cost, inasmuch as at least two different strategies should be sequentially applied.

Fig. 3.5
figure 5

Example of the application of a hybrid method. A multiresolution spline [10] is applied around a seam determined by the distance from the pixels to the corresponding image centers in order to give more weight to the pixels close to the optical axis. The images show different exposures and significantly different sizes once registered. As a result of the blending algorithm, the transition between both images is smooth although not perfect, and the difference in detail richness between them is still noticeable

In fact, and as mentioned above, one of the pioneers in the image blending field, Milgram [2, 24], had already proposed, in 1975 and later updated in 1977, a hybrid approach based on the selection, in a row-wise manner, of an optimal seam (in terms of photometric differences) and the application around this seam of a weighted average, allowing a noticeable reduction of the image transition. Furthermore, a “zero-order” adjustment to compensate illumination differences between images was also used. This strategy was intended for satellite imaging and limited to grayscale images registered horizontally (regardless of rotation or scale changes). Nevertheless, it dealt with the most relevant concerns of image blending, i.e. the equalization of image appearance over a sequence (a pair of images in that case), the selection of a seam that minimizes photometric differences at the boundary and the application of a smoothing method around the seam to make the transition even less noticeable.

Agarwala et al. [8] proposed, as an optimal seam finding strategy, a graph-cut optimization [16] guided by several parameters, such as color, luminance or likelihood, among others. The transition smoothing in this case is performed in the gradient domain [12, 30]. Using the same labeling obtained after the graph-cut, the color gradients are used to form a composite vector field. The best-fit image in a least-squares sense is thereafter calculated by solving a discretization of the Poisson equations. Each color channel is processed independently, and in order to keep color channel coherency, the color of a given pixel is added to the Poisson equations to constrain the linear system. No overlap information around the boundaries is used, and according to the authors, in case of high-gradient edges, complications such as objectionably blurring artifacts may appear. In order to solve this problem, the linear constraints corresponding to these problematic pixels are removed. The gradient blending method acts in practice as an exposure compensation mechanism when all the images of the composite belong to the same scene. The approach of Agarwala is intended for image compositing, requiring human intervention when selecting the image regions to be fused, and consequently, is not suitable for automatic image mosaicing. Furthermore, performing the blending in the gradient domain regardless of any pixel overlap information, even if the equations corresponding to problematic pixels are dropped from the linear system, cannot guarantee a smooth transition in all scenarios.

Similar to Agarwala et al. [8], Eden et al. [28] combined the benefits of both an optimal seam finding strategy using a two-step graph-cut, and an optional transition smoothing method on the gradient domain. The main novelties of this approach are the use of a global radiance space for all the images involved, and the possibility of obtaining an HDR image as a result. In the first step of the graph-cut, the optimal boundaries are found in the same way as in Agarwala et al. [8] but in the radiance domain over a subset of geometrically and photometrically registered images covering the full field of view. After this step, the position of moving objects is defined, and can be manually changed or automatically selected. In the second step, an image selection strategy is applied, which determines the best radiance values in all the images of a given patch after the graph-cut in order to provide more detail, if possible, to the final composite. A secondary labeling is performed based on two cost functions; one determining the data cost of adding a given image pixel to the composite, and another determining the seam cost over each neighbor of this pixel. The goal of this second step of the image selection is to find the labeling of the final composite that minimizes both data and seam costs. Finally, the final composite can be obtained by either directly copying the corresponding radiance values into the final HDR mosaic after the graph-cut labeling, or applying a gradient blending of the original images using the Poisson equations [3, 8, 12]. Additionally, in order to visualize the final HDR image, a tone mapping algorithm is used [3032].

Gu and Rzhanov [23] proposed, as an optimal approach for underwater image blending, a graph-cut strategy in the gradient domain in order to find the optimal seam placement, and a gradient domain blending as a transition smoothing method. The authors argued that performing a graph-cut on the gradient domain allows dealing with different exposures and inhomogeneous illumination more robustly than in the luminance domain, inasmuch as gradients are not affected by these factors. The gradient domain transition smoothing is performed in a similar way as [8, 12, 30], but applying a weight to a few pixels around the seam in order to reduce the artifacts caused by simple gradient blending, specially in presence of misalignments. In practice, the weighting leads to the usage of the average value of the gradients of pixels around the chosen seam. Nevertheless, this weighting is not able to get fully rid of ghosting artifacts around the image boundaries.

In 2009, Mills and Dudek [29] presented a full mosaicing approach to create pleasant and physically consistent image mosaics despite the presence of moving objects. The authors proposed performing a graph-cut along the differences between the luminance of two registered images in order to find an optimal seam. This graph-cut is computed, similarly to Davis [6], using Disjkstra’s [25] algorithm. As a transition smoothing strategy, the multiresolutions splining of Burt and Adelson [10] is applied, which, in contrast to some gradient domain methods, uses the common overlapping pixels to smooth the transition. Inasmuch as the graph-cut is performed in the luminance differences domain, it cannot appropriately deal with different exposures or changes in the illumination conditions in the scene. On the other hand, the multiresolution splining strategy may lead to ghosting and double contouring in the case of misregistration, and cannot deal with different image exposures or illuminations.

3.4 Classification

The list of papers that form the state of the art in image blending is large, and the main requirements for conventional image panorama generation have been satisfyingly addressed by several of them. Unfortunately, blending in underwater photo-mosaicing is a specific application that has not been deeply treated in the literature. Consequently, not all the methods are appropriate for this context. In order to highlight the properties, benefits and drawbacks of the current methods, and to evaluate their suitability for underwater mosaicing, a classification is proposed.

Table 3.1 Blending techniques comparison table

There are several criteria that determine the behavior and performance of a given blending algorithm, including its capability of dealing with high resolution underwater photo-mosaics. Table 3.1 provides a comprehensive comparison of the most relevant blending techniques proposed in the literature. The specially important categories for underwater applications (mostly working with monochrome images) are exposure correction and elimination of ghosting and double contouring, concerning image quality, and scalability, concerning large scale photo-mosaicing.

3.4.1 Basic Principle

Two main groups of algorithms can be found in the literature in the context of image blending  [3]: transition smoothing methods (also known as feathering [4] or alpha blending methods [5]) and optimal seam finding methods [6, 7]. The benefits of both groups of algorithms are combined into a third group, the hybrid methods [2, 8], in order to produce more plausible results and to reduce to an even higher degree the noticeability of the joining regions. Additionally, those methods avoid double contours and blurring effects when image registration is not accurate enough.

Each method uses a basic approach (Principle): Transition Smoothing (TS); Optimal Seam Finding (OS); or an appropriate Hybrid combination (OS/TS). The first set of methods (TS) often suffers from Ghosting, which concerns image blurriness of the finest details (i.e. low frequency image components), and Double Contouring, consisting in practice of a partial duplication of certain scene structures (i.e. high frequency image components), if registration is not accurate enough or the scenario considerably violates the planar scene assumption for 2D mosaicing. The second set (OS) is not able to deal with images with different Exposures, as is often the case in underwater imagery due to 3D relief, oblique terrain, variations in vehicle altitude, etc. Finally, Hybrid methods are able to compensate for these drawbacks to a certain degree.

Concerning the main principle of the techniques, the combination of a transition smoothing around an estimated boundary seems to be the most adequate approach and has been the most popular methodology in the literature since 2004, independent of the application context. The tolerance to moving objects is tied to this main principle. Optimal seam finding based methods naturally deal with this problem. In most cases, this tolerance is not actively treated, but is a result of the optimal seam search, which tends to make the cut in areas where photometric differences are small; overlapping areas with moving objects will thus be avoided.

3.4.2 Domain

The Domain in which the process is carried out (Luminance/Radiance, Wavelet or Gradient), has a double effect on the blending process. On the one hand, the image domain strongly influences the properties of the blending that will be performed. As an example, Gradient blending methods are able to unify different Exposures seamlessly and can lead implicitly to a high dynamic range from a set of low dynamic range images. However, Gradient methods require solving large sparse equation systems to recover the Luminance from the gradient vectors, and thereby their computational cost is significant. In contrast, Luminance based methods typically have lower computational requirements.

Luminance and gradient domains are widely used, and the second has become the preferred method in the latest publications [29, 51, 54]. This is due to the nature of the domain, which allows easy reduction of the exposure differences between neighboring images. Nevertheless, methods actively applying an exposure correction algorithm obtain more visually pleasant results. The ability to remove ghosting effects and the fact of presenting double contouring are complementary, and are avoided jointly.

3.4.3 Scalability

A particularly important property of blending methods is the Scalability, which we define as the ability to deal with more than two overlapping images. This property might be constrained by two main factors. The first one is the nature of the method itself, as in [2, 11, 24], which cannot work with more than two overlapping images. The second one is related to computational requirements: non-optimized Gradient algorithms suffer from poor computational scalability when the input dataset is extremely large, as in the case of Giga-Mosaics.

Leaving aside the first blending methods in the literature [2, 10, 11, 24], throughout the last decade most of the approaches have been scalable up to a certain point. Approaches such as [51] are intended to reduce computer requirements allowing the efficient processing of high resolution photo-mosaics. Unfortunately, these benefits only appear in the case of mosaics with images showing low overlap. In that case is possible to avoid storage and computations for image regions that remain unchanged after blending. This situation mainly happens in image panoramas, but not in underwater mosaics, where image registrations are unpredictable and geometrically non-uniform.

3.4.4 Color and Dynamic Range

Color is another critical factor when building visually plausible images. Colors change significantly as a function of the distance between the camera and the seafloor (known as robot altitude) due to the wavelength-dependent spectral absorption of the media. Mosaic blending techniques generally use a Channel Wise approach, where three color channels are processed independently and later reunified into a single color image. These methods have no control over perceptual color attributes. Several approaches in the literature address the color balancing problem in the image photo-mosaicing pipeline, based on exposure compensation in single [4, 37] or multiple channels [38, 59], and based on color transfer techniques [60, 61]. Unfortunately, dealing with extremely large datasets to generate photo-mosaics of large dimensions and keeping the consistency of the global image appearance is a difficult task when using methods available in the literature.

The treatment of color channels is common to all the methods in the literature, with the blending always being performed separately over each channel, independently of the number of channels of the source images. Consequently, a different smooth transition and location of the optimal seam are calculated for each channel. In this sense, Agarwala et al. [8] requires user intervention to specify some preferred color values, and [51] adds some constraints to the color variations in order to avoid significant color shifting. These corrections are performed channel-wise and do not treat the deep nature of the real colors. As a consequence, their performance when dealing with images evidencing different appearances due to light attenuation and illumination inhomogeneities is unpredictable.

The Dynamic Range of the image and the quantization of the data provided by the camera sensor strongly influence the accuracy of the final scene representation. Despite some of the methods reviewed being be able to work with high dynamic range images (with more than the common 8 bits per pixel and channel), they are not reported to do so. In fact, any High Dynamic Range blending method will require a Tone Mapping algorithm in order to display the High Dynamic Range mosaic image into a Low Dynamic Range device, such as conventional screens or printers [28].

Few blending methods claim to work with high dynamic range images. Nevertheless, gradient based blending methods are able to intrinsically deal with this kind of imagery, requiring the application of tone mapping algorithms to the mosaic image generated in order to visualize the results. A high dynamic range should be reduced so as to be displayed in low dynamic range devices.

3.4.5 Multiresolution

The use of a Multiresolution approach was first published in 1983 by Burt and Adelson [10]. Its main advantage is the significant reduction, but not suppression, of the noticeability of Double Contours due to registration inaccuracies. Under this approach, the images are decomposed into a set of band-pass components. For each different band, an appropriately selected width for the transition region T is applied, ensuring a smooth fusion at this spatial frequency band. An important shortcoming is that the method requires keeping several representations of the same image in memory, increasing memory requirements. The price of the seamless appearance is the loss of high frequency details. The multiresolution approach, based on the idea of Burt and Adelson [10], is applied by Su et al. [21] to the wavelet domain, but is the only variation of this idea in the literature.

3.4.6 Local/Global and Real-Time Operation

With respect to the Locality of the methods, Global methods require knowing all the final mosaic information a priori in order to perform the blending procedure, while Local methods can work on small parts of the final photo-mosaic, joining them together upon completion. Obviously, Global methods often require higher computational resources than Local ones, while Local methods may not be able to solve some problematic situations, such as loop closing, i.e. visiting twice or more a given scene region, or exposure compensation during a pair-wise sequential processing.

Methods that are able to deal with most of the mosaicing and blending issues in Real Time [22], though uncommon, are optimized towards high performance for large sequences. The results obtained are not as accurate as those from off-line approaches, but acceptable when on-line feedback is required. Real-time techniques are typically based on the Sequential Processing of the input data. Some methods, like Milgram [2] or Hsu and Wu [11], can process the images pair-wise and add the result to a final mosaic canvas. The pair-wise processing is a limiting factor for the scalability of these methods, which are not appropriate for sequences where a given place is visited more than once as the drift accumulated due to the sequential registration, without a global alignment correction, results in inconsistent overlapping regions. Methods that do not perform a sequential processing are better positioned to deal with problems like exposure compensation and ensure global appearance consistency.

3.4.7 Relevant Visual Performance Criteria

Different exposures between images are especially common in underwater imaging. Frequently, the AUV or ROV cannot keep a perfectly constant altitude (distance to the seafloor) during the survey, requiring the automatic adjustment of the exposure time between frames. The exposure correction might be performed actively, by preprocessing the image sequence to be blended, but may also be corrected by means of gradient domain techniques, inasmuch as this domain is not sensitive to time exposure.

Fig. 3.6
figure 6

Registration of two images acquired at significantly different altitudes. The image acquired at higher altitude shows strong light attenuation and scattering. These effects cause a noticeable different appearance between the two images

As already pointed out above, ghosting and double contouring are mainly due to geometrical registration inaccuracies. When two overlapping images are not properly aligned, non-coincident features are smoothed, and thereby ghosted, when fused, while strong contours appear twice in the blended photo-mosaic. Underwater, the forward scattering phenomenon is responsible for loosing contrast [62] and, therefore, ghosting appears when merging images with significantly different depths (see Fig. 3.6). Double contouring underwater is sometimes unavoidable due to the limited camera distance to the seabed leading to parallax.

Moving objects often appear in underwater imaging, e.g. fish, algae, crustacea and other life forms or floating objects. Most of the Optimal Seam Finding algorithms are able to deal with moving objects, actively or passively, and cut them out of the overlapping regions, keeping a single representation of each object in the final map.

Finally, the parallax robustness determines the ability of a given blending algorithm to deal with a sequence where the 2D assumptions were considerably violated. Underwater scenarios are characterized by frequent seabed depth-changes, as well as the direction of shadows produced by the artificial lighting systems of the AUV or ROV. Optimal Seam Finding techniques are typically the most indicated methods to deal with this problem.

The parallax robustness is strongly related to its tolerance toward moving objects, and methods able to deal with moving objects are often able to handle parallax. In fact, parallax robustness can be considered in practice as the ability of a method to avoid repeated objects or shapes.

3.5 Conclusions

The generation of terrestrial and aerial photo-mosaics from a set of images is a problem widely treated in the literature. The number of approaches confronting this problem is large and the main imaging issues, such as exposure variations, vignetting effects and the presence moving objects, have been mainly solved.

Nevertheless, the underwater medium presents additional problems which tend to make the common approaches fail when applied in this context. The problems of extreme non-uniform illumination, backward and forward scattering and parallax, in addition to significant exposure variations and frequent moving objects, are specific to the medium, and few approaches have been presented in that direction.

Consequently, a different processing pipeline is required to deal with all the problems affecting underwater imagery. This pipeline should also be computationally efficient to allow processing large data sets, whose images might be affected to various degrees by the underwater phenomena presented. Obtaining consistent high-resolution large-scale geo-referenced photomosaics is the goal of the developed pipeline, comparable in terms of visual agreeability to terrestrial and common aerial photo-mosaics.