1 Introduction

In the field of visual arts, color is a crucial element that not only brings life and vibrancy to a work, but also a medium to convey emotions, ideas, and messages. Color transfer refers to the process where the color palette of one image (termed the “original image”) is adapted to that of another, known as the “reference image”. Such techniques for color images are widely used in different fields, enhancing everything from images and videos to the aesthetics of artifacts and artworks, image correction for further processing [1]. For example, in film and TV special effects, color transfer can change the color scheme of a scene, enhancing its emotional expression or setting a special atmosphere. In the domain of advertising, these techniques can help make brand features, leading to more captivating visuals. For image restoration, color transfer can repair damaged or outdated images, rendering them more natural and appealing. Given its multifaceted utility, color transfer is a versatile asset in image processing that can be applied in several fields for solving various image-related problems. Therefore, various color transfer methods have been proposed to enhance the intricacy and accuracy of such effects. Researchers have carried out studies on color transfer methods [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],with some based on traditional image processing [2,3,4,5,6,7,8,9,10], while others leverage the power of deep learning [11,12,13,14,15,16,17,18,19,20,21,22].

In 2001, Reinhard et al. [2] proposed a global color transfer method based on the statistical information of color spaces. This method adjusts the pixel values of the original image, presenting a simple and effective algorithm. However, when applied to colorful reference images, it can lead to blending of color regions, resulting in unnatural outcomes. Despite these shortcomings, the simplicity of the approach ensured that it was integrated into subsequent studies. Pitie et al. [3] proposed a new idea: matching an N-dimensional distribution with another by estimating a continuous transformation. Essentially, they constructed a color mapping model to fit the color relationship between the original and reference images. While this method adeptly preserves the structure of the original image, post-processing is necessary for color matching to remove noise and visual artifacts. Revisiting the domain in 2011, Reinhard et al. [4] proposed an incremental color transfer method, catering to images with different dynamic ranges. To address artifacts in the iterative distribution transfer (IDT), Ueda et al. [5] employed the results of Reinhard’s method. By adaptively mixing the two methods they obtained color results more similar to the reference image, although issues with pseudo-contours persisted. In 2014, Hwang et al. [6] proposed a color transfer method based on moving least squares, notable for its advantage of being applicable to a wide range of situations. Whether faced with differences due to camera parameters, shooting times, or illumination conditions, this method proved resilient. However, it requires a large number of feature points as control points, implying that the target and source images must share the same scene, which may be limited in practical applications. Researchers such as Grogan [7, 8], leaned into a shape alignment-based approach. While it focuses on a localized approach, its implementation is time-intensive. In recent years, Wu et al. [9] proposed a new color transfer method using saliency feature mapping. This addressed the problem of color region mixing evident in Reinhard’s method and achieves certain results. In 2022, Xu et al. [10] proposed a new IDT-based color transfer approach. Relying on color component projection, they adjusted weights to suppress pseudo-colors and improve the color similarity.

As the field of computer vision has advanced, methods based on statistics and machine learning have come into their own. Methods based on probabilistic models [11,12,13] have been adept at modeling color distributions, yielding more accurate transfer results. However, advent of deep learning has ushered in new possibilities for color transfer. Deep learning methodologies [14] aim to transform the transfer problem into a nonlinear regression problem, seeking to obtain the appearance mapping relationship between original and reference images [15]. Utilizing advanced architectures such as convolutional neural networks [16,17,18,19] (CNNs) and generative adversarial networks [20,21,22] (GANs), researchers have managed to learn more complex color mapping and transfer laws, achieving higher quality results.

While these strategies have achieved some success in dealing with different image types and application scenarios, there are still many directions worth exploring and enhancement. Color transfer faces many technical challenges. A pressing concern is aligning color distributions across different images to ensure naturalness and consistency of the transfer. Further, evading the inadvertent introduction of artifacts and distortions during color transition remains paramount.

In this paper, the image region exhibiting minimal pixel value fluctuations is designated as the “salient region”. Conversely, the region undergoing maximal pixel shifts is termed the “most significant region”. Predominantly, the color of the salient region of the reference image is transferred to the salient region of the original image, while the non-salient region of the reference image is transferred to the non-salient region of the original image. Most of the time, the resultant image post-color transfer has good picture quality. Therefore, we divide the original and reference images into salient and non-salient regions, completing color transfer correspondingly between these salient regions. Our methodology involved calculating the local variances of both images, establishing a temporary saliency feature map. This was followed by refining the saliency feature map through minimization filtering, binarization, expansion, and iterations. Subsequent steps encompassed transferring colors between the salient and non-salient regions of both the images. To suppress the generation of contours, we employed base projection. Finally, the final output image was derived by fusing the base-projected image with the resulting image derived using Reinhard et al.’s method.

2 Color transfer method of Reinhard [2]

This section delves into a seminal technique in the field of color transfer: the method proposed by Reinhard et al. [2].

Introduced in 2001, Reinhard et al.’s method seeks to equalize the color distributions of the original and reference images. By mapping them into a common color space, the color distributions of both the images can be better aligned to achieve color transfer. The steps are outlined as follows: (1) Convert the original and reference images to the YUV color space. (2) Compute the mean and the standard deviation for the original and reference images. (3) For each pixel of the original image across every channel, subtract the mean value of the original image, multiply by the ratio of the standard deviations of the reference to the original image, and then add the mean value of the reference image. (4) Transform the processed original image back to the color space to get the final result. This can be mathematically represented by Eq.(1):

$$\begin{aligned} I_{i;C}^{(R)} = (I_{i;C}^{(src)} - \mu _C^{(src)})\frac{{\sigma _C^{(ref)}}}{{\sigma _C^{(src)}}} + \mu _C^{(ref)}, \end{aligned}$$
(1)

where \(C \in \{ Y, U,V\}\), \(\mu _C^{(src)}\) and \(\mu _C^{(ref)}\) denote the mean of each channel of the original and reference images, and \(\sigma _C^{(src)}\) and \(\sigma _C^{(ref)}\) denote the standard deviation of each channel of both the images. Reinhard et al.’s approach stands out for its simple and intuitive concept, which is easy to understand and implement. Although Reinhard et al.’s method holds a place in the field of color transfer as a classic algorithm, it has certain shortcomings when dealing with complex scenes and preserving details. Therefore, this paper delves deeper into Reinhard et al.’s method, aiming to uncover more efficient and accurate color transfer methods.

3 Proposed method

In this paper, we propose a color transfer method for color images based on saliency features, and the flowchart of the method in this paper is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of method proposed in this paper: a original image, b reference image, c temporary saliency feature map of original image, d temporary saliency feature map of reference image, e saliency feature map of original image, f saliency feature map of reference image, g image after sub-region transfer, h resulting image of Reinhard et al.’s method, i Resulting image of base projection, j final output image

The method proposed in this paper aims at more accurate color migration while preserving the distinct style of the image. First, we obtain a temporary saliency feature map of the input image by calculating pixel variance in the \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) color space. Second, an enhanced saliency feature map is obtained based on the temporary map through appropriate methods such as binarization. Third, according to the obtained saliency feature map, both the original and reference images are divided into salient and non-salient regions. Color transfer is then performed in these corresponding regions, and their results are summed. Fourth, we produce a projection image by projecting the color components of the original image and adjusting weights to process the summed results. Finally, the outputs from Reinhard’s method and the projection are fused. In our experimental section, we apply our method alongside previous methods applied to several images. The efficacy of this method is then verified through objective evaluation.

Figure 1 provides an in-depth view of the framework of the proposed algorithm, elucidating aspects such as the derivation of temporary saliency feature map, the method of the saliency feature map and the algorithm for the region division and color transfer processes, projection result image, and final fusion result image.

3.1 Temporary saliency feature map

Humans often effortlessly determine the significance of image regions, naturally focusing their attention toward significant parts [23]. Therefore, calculating the salient image sections is crucial. The \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) color space contains all the colors that the human eye can perceive, and this color space calculates the variance of the image to better handle the differences between different colors in the local space. Therefore, in this paper, we first convert the original and reference images in Fig. 1 to the \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) color space. Pixel variance is then computed, as described by Eq. (2), to obtain the temporary saliency feature maps for both images. These variance values are subsequently normalized between 0 and 1 utilizing the tanh function.

$$\begin{aligned} {v_i} = \frac{1}{{\left| {{S_{i;\rho }}} \right| }}\sum \limits _{j \in {S_{i;\rho }}} {[{{(L_j^* - \left\langle {L_i^*} \right\rangle )}^2} + {{(a_j^* - \left\langle {a_i^*} \right\rangle )}^2} + {{(b_j^* - \left\langle {b_i^*} \right\rangle )}^2}]}, \end{aligned}$$
(2)
$$\begin{aligned} M_i^{(v)} = 255*\tanh (\frac{{{v_i}}}{\alpha }), \end{aligned}$$
(3)

where \({S_{i;\rho }}\) denotes the set of pixels in a window centered at ith pixel with chessboard distance \(\rho\) of 2, and \(\left| {{S_{i;\rho }}} \right|\) is the number of pixels in the window. The region with a distance of \(\rho =2\) from the central pixel refers to the 5x5 area centered at that pixel. \(\left\langle {L_i^*} \right\rangle\), \(\left\langle {a_i^*} \right\rangle\), and \(\left\langle {b_i^*} \right\rangle\) represent the average values of each channel within the 5x5 window centered on the ith pixel, and \(L_j^*\), \(a_j^*\), and \(b_j^*\) denote the values of each channel in the jth pixel in the window. \(\alpha\) is set to 15, and \(M_i^{(v)}\) is taken as the value of the i-th pixel in the temporary saliency feature map. Figure 3 presents the temporary saliency feature maps, \({M^{(src)}}\) and \({M^{(ref)}}\), obtained by converting the original and reference images from Fig. 2. \({M^{(src)}}\) and \({M^{(ref)}}\) are obtained from Eq. (3).

Fig. 2
figure 2

Input image: a original image, b reference image

Fig. 3
figure 3

Temporary saliency feature map: a original image, b reference image

3.2 Saliency feature map

Fig. 4
figure 4

Flowchart for calculating saliency feature map

The resulting temporary saliency feature maps for the original and reference images are computationally processed as depicted in the flowchart in Fig. 4 to derive the saliency feature maps. Within our proposed method, the temporary saliency feature maps \({M^{(src)}}\) and \({M^{(ref)}}\) were initially subjected to a minimum filter. This step ensures that the thresholds do not surge excessively in the following binarization process. The minimum filter is represented as follows:

$$\begin{aligned} M_i' = \textrm{MIN}\{ M _i \mid j \in S _{i;\rho } \}, \end{aligned}$$
(4)

where \(\mathrm{{MIN}}(\cdot )\) returns the minimum value of the set of pixels \({S_{i;\rho }}\) within a window by centering pixel i with chessboard distance \(\rho\) of 2. Substituting \({M^{(src)}}\) and \({M^{(ref)}}\) yields \({M^{'(src)}}\) and \({M^{'(ref)}}\). The results of this minimum filtering process, \({M^{'(src)}}\) and \({M^{'(ref)}}\), are then binarized, dividing the results into significant and non-significant regions.

$$\begin{aligned} \begin{array}{l} M_i^{''} = \textrm{BF}(M_i^{'}),\\ \mathrm{{BF}}(z) = \left\{ {\begin{array}{*{20}{c}} {255,} &{} {z \ge \left\langle {M^{'}} \right\rangle ,}\\ {0,}&{}{\mathrm{{otherwise}},} \end{array}} \right. \end{array} \end{aligned}$$
(5)

where \(\mathrm{{BF}}(\cdot )\) is the binarization function, and \(\left\langle {M^{'}} \right\rangle\) is the average of the temporary saliency feature maps after minimum filtering. As a threshold for binarization, \({M^{'}}^{(src)}\) and \({M^{'(ref)}}\) are used. The purpose of expansion is to remove non-significant pixels (or black pixels in a white region) from the more salient regions in the binarization result due to improper delineation. Conversely, erosion serves to remove white pixels from the black regions. In this paper, the window size of both expansion and erosion was set at \(7 \times 7\). The number of iterations was five for expansion and seven for erosion. The erosion intensity was twice that of expansion to balance the expansion of the black area into the white area during the minimum filtering. This ensures a more accurate color conversion within the relevant regions. The saliency feature maps obtained in this step are \(S{M^{(src)}}\) and \(S{M^{(ref)}}\), as shown in Fig. 5.

Fig. 5
figure 5

Saliency feature map: a original image, b reference image

3.3 Region division and color transfer

This section aims to transfer colors between regions that receive the same level of attention to the human eye and avoid the problem of mixing color regions. Since the color transfer step uses Reinhard’s method, this part operates in the YUV color space. Figure 6 shows the comparison of the output of Reinhard’s method using \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) and YUV color space. The first and second rows are the original and reference images, and the third and fourth rows are the output images using Reinhard’s method in \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) and YUV color space respectively.

Fig. 6
figure 6

Comparison of effect images for Reinhard et al. method in YUV and \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) color spaces

From the comparison result, it can be seen that the output image in YUV color space is more natural and the colors more closely match the color distribution of the reference image. The criterion for division is the average, \(\left\langle {SM}\right\rangle\), of all the pixel values of the saliency feature map.

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{I_{y;i}} = {I_i},}&{}{S{M_i} \ge \left\langle {SM} \right\rangle ,}\\ {{I_{n;i}} = {I_i},}&{}{S{M_i} < \left\langle {SM} \right\rangle .} \end{array}} \right. \end{aligned}$$
(6)

According to Eq. (6), the original image \({I^{(src)}}\) and the reference image \({I^{(ref)}}\) are divided into significant and non-significant regions, where \({I^{(src)}}\) and \({I^{(ref)}}\) are substituted into \({I_i}\) in Eq. (6), the original image is then divided into significant, \(I_y^{(src)}\), and non-significant, \(I_n^{(src)}\), regions and the reference image is also divided into significant, \(I_y^{(ref)}\), and non-significant, \(I_n^{(ref)}\), regions. Next, color transfer is performed in the corresponding region using the Reinhard method, which avoids the problem of mixing the mean values of each region and gives a more natural appearance to the color of the resulting image. The color transfer of the corresponding region was performed according to Eq. (7).

$$\begin{aligned} I_{y;i}^{(SM;C)} = (I_{y;i}^{(src;C)} - m_y^{(src;C)})\frac{{\sigma _y^{(ref;C)}}}{{\sigma _y^{(src;C)}}} + m_y^{(ref;C)}, \end{aligned}$$
(7)

where \(C \in \{ Y,U,V\}\), \(\sigma _y^{(src;C)}\), and \(\sigma _y^{(ref;C)}\) denote the standard deviation of each channel in regions \(I_y^{(src)}\) and \(I_y^{(ref)}\). \(m_y^{(src;C)}\) and \(m_y^{(ref;C)}\) denote the means of \(I_y^{(src)}\) and \(I_y^{(ref)}\), respectively. The formula is calculated as shown in Eq. (8).

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {m_y^{(src;C)} = \frac{1}{{\left| {S_y^{(src)}} \right| }}\sum \limits _{i \in S_y^{(src)}} {I_{y;i}^{(src;C)}},}\\ {m_y^{(ref;C)} = \frac{1}{{\left| {S_y^{(ref)}} \right| }}\sum \limits _{i \in S_y^{(ref)}} {I_{y;i}^{(ref;C)}},} \end{array}} \right. \end{aligned}$$
(8)

where \(S_y^{(src)}\) is the set of pixels with \(S{M^{(src)}} = 255\) in the original image, and \(\left| {S_y^{(src)}} \right|\) represents the total number of this set. \(S_y^{(ref)}\) is the set of pixels with \(S{M^{(ref)}} = 255\) in the reference image, and \(\left| {S_y^{(ref)}} \right|\) represents the total number of this set.

Similarly, the color transfer in the non-salient region operates as follows:

$$\begin{aligned} I_{n;i}^{(SM;C)} = (I_{n;i}^{(src;C)} - m_n^{(src;C)})\frac{{\sigma _n^{(ref;C)}}}{{\sigma _n^{(src;C)}}} + m_n^{(ref;C)}, \end{aligned}$$
(9)

where \(\sigma _n^{(src;C)}\) and \(\sigma _n^{(ref;C)}\) denote the standard deviation of each channel in regions \(I_n^{(src)}\) and \(I_n^{(ref)}\), respectively. \(m_n^{(src;C)}\) and \(m_n^{(ref;C)}\) denote the mean of each channel in regions \(I_n^{(src)}\) and \(I_n^{(ref)}\), respectively. The calculation is further detailed in Eq. (10).

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {m_n^{(src;C)} = \frac{1}{{\left| {S_n^{(src)}} \right| }}\sum \limits _{i \in S_n^{(src)}} {I_{n;i}^{(src;C)}},}\\ {m_n^{(ref;C)} = \frac{1}{{\left| {S_n^{(ref)}} \right| }}\sum \limits _{i \in S_n^{(ref)}} {I_{n;i}^{(ref;C)}},} \end{array}} \right. \end{aligned}$$
(10)

where \(S_n^{(src)}\) is the set of pixels with \(S{M^{(src)}} = 0\) in the original image, and \(\left| {S_n^{(src)}} \right|\) represents the total number of this set. \(S_n^{(ref)}\) is the set of pixels with \(S{M^{(ref)}} = 0\) in the reference image, and \(\left| {S_n^{(ref)}} \right|\) represents the total number of this set. The results of the saliency and non-saliency regions are summed according to Eq. (11). Following this, the color space is reverted to RGB to derive the result of \({I^{(SM)}}\) after color transfer, as shown in Fig. 7. It is evident that the summed image shows false contours due to color differences between the regions. This problem will be dealt with in Sect. 3.4.

$$\begin{aligned} {I^{(SM;C)}} = I_y^{(SM;C)} + I_n^{(SM;C)}. \end{aligned}$$
(11)
Fig. 7
figure 7

Image after sub-region transfer

3.4 Base projection

False contours are generated between the salient and non-salient regions of the sub-region transferred image. False contours are created when the similar or same color appears as a different color. The base projection generates the same coefficients through the same basis, so the projected colors are also the same. Therefore, in this study, the problem of false contours in images after subregion transfer can be fixed using the base projection method. Concurrently, the base projection ensures consistent colors in the original image are translated uniformly. Figure 8 shows the flowchart for the base projection.

Fig. 8
figure 8

Flowchart of base projection

Within the RGB color space, 12 bases \(\mathrm{{Base}} = \left\{ {R,G,B,{R^2},{G^2},{B^2},{R^{0.5}},{G^{0.5}},{B^{0.5}},R*G,R*B,G*B} \right\}\) were obtained based on the RGB components of the original image, and all elements of Base are multiplied by 100. The derived projection image is represented as \({I^{'(SM)}}\). The results of the color component projection were obtained according to the linear equation in Eq. (12).

$$\begin{aligned} I_i^{'(SM)} = \sum \limits _{l \in \mathrm{{Base}}} {k(l) \cdot {x_i}(l)}, \end{aligned}$$
(12)

where \({x_i}(l)\) denotes the l-th base of the i-th pixel. k is the projection coefficient that is solved by constructing the objective function from Eq. (13).

$$\begin{aligned} {\tilde{k}} = \arg \mathop {\min }\limits _{k \in } \left[ {\sum \limits _{i = 1}^n {{{(I_i^{'(SM)} - I_i^{(SM)})}^2}} } \right] . \end{aligned}$$
(13)

Here, \(k \in \Re\) means that k contains 12 real numbers and n is the number of pixels in the image. Figure 9 shows the results obtained for the base projection section.

Fig. 9
figure 9

Base projection results

3.5 Image fusion

Occasionally, the resultant image after base projection exhibits artifactual colors in specific regions. To suppress these pseudo-colors, this study employed the saturation of the original image as a weight in the fusion between the base projection result image and the resultant image from Reinhard et al. Lower color saturation in the original image indicates that the altered color is close to neutral; thus, there is no need for color transfer, focusing on regions with peak saturation.

For fusion, the color information from the resultant image \({I^{(R)}}\) of the Reinhard et al. technique and the resultant image \({I^{'(SM)}}\) obtained in Sect. 3.4 are combined, utilizing the saturation information of the original image \({I^{(src)}}\), to obtain the final image \({I^{(out)}}\). The original image is converted to \(\mathrm{{CIE}}{L^*}{a^*}{b^*}\) color space, and the saturation information of the original image was calculated as a weighted value according to Eq. (14) [24] for the compounded weighting of the \({I^{(R)}}\) and \({I^{'(SM)}}\) images.

$$\begin{aligned} {w_i} = \tanh (\sqrt{{a^2} + {b^2}} /\beta ), \end{aligned}$$
(14)
$$\begin{aligned} I_i^{(out)} = [(1 - {w_i})I_i^{(R)} + {w_i}I_i^{'(SM)}]. \end{aligned}$$
(15)

Equation (15) aims to prioritize the use of Reinhard’s results in low-saturation regions and the projected image \({I^{'(SM)}}\) results in high-saturation regions.

4 Experimental results

In this section, we delve into the experimental design and provide an in-depth analysis of the evaluation metrics for our proposed method. For the experiments, we selected some photos commonly used in the field of color transfer. This assortment encompassed a diverse range of images, from natural landscapes and architectural buildings to artistic paintings. In the tuning experiments of the parameters, we carried out tuning experiments for the parameter \(\alpha\) in Sect. 3.1, the number of iterations for erosion and expansion in Sect. 3.2, and the parameter \(\beta\) involved in Sect. 3.5, respectively.

Fig. 10
figure 10

Temporary saliency feature maps of the output at different values of parameter \(\alpha\): a input image, b \(\alpha\)=5, c \(\alpha\)=10, d \(\alpha\)=15, e \(\alpha\)=20, f \(\alpha\)=25

Firstly, Fig. 10 illustrates the effect of the parameter \(\alpha\) on the temporary saliency feature map at different values. Taking the input image in Fig. 10 as an example, the temporary saliency feature map should have the parts of mountains and grass as salient regions and the parts of sky as non-salient regions. Observing Fig. 10, it can be noticed that when the parameter \(\alpha\) is 5, the part of the sky in subfigure (b) will have too many white pixels. When the parameter \(\alpha\) is 25, the part of grass in subfigure (f) will have too many black pixels. The same trend is observed for most of the images; therefore, by way of compromise, we choose to set the parameter \(\alpha\) to 15, which can be applied to all the experimental images.

Fig. 11
figure 11

Saliency feature maps for different number of iterations: a input image, b 2, c 3, d 4, e 5

Next, Fig. 11 shows the effect of the number of iterations of erosion versus expansion on the saliency feature maps, with subfigures (b) to (e) showing the saliency feature maps after 2, 3, 4 and 5 iterations of erosion and expansion, in that order. The experimental results show that for most of the images, the features of the input image can be extracted and highlighted more effectively when the number of iterations is set to 5. Therefore, considering the generality and robustness of the method, we choose an iteration number of 5.

Fig. 12
figure 12

Detailed images of output results for different values of parameter \(\beta\)

Finally, we conducted parameter \(\beta\) tuning experiments during the fusion phase to investigate the effect of different parameter values on the results. The graphical result generated at different parameter values are plotted for a side-by-side comparison in Fig. 12. From our experiments, it became evident that the setting for parameter \(\beta\) will have some effect on the results for a given image set. In Fig. 12, the first column is the output image and the second to sixth columns represent a portion of the output image, labelled with dashed boxes, showing the different output effects for parameter \(\beta\) values of 10, 20, 30, 40, and 50, respectively, in that order. Observations indicate that clarity improves incrementally with values of \(\beta\) ranging from 10 to 30. Pixelation is not seen in this range and is most noticeable in the first and second rows. However, as \(\beta\) gradually increases from 30 to 50, the entire image progressively darkens, which is most obvious in the third row. Given these observations, we choose \(\beta = 30\) as the optimal value, producing the most favorable result.

To maintain a standard of comparison, we applied our method to an identical set of photographs as those utilized by Reinhard et al., Pitie et al., Ueda et al., Wu et al., and Xu et al. The comparative results of these methods, for a total of eight distinct image sets, are exhibited in Fig. 13.

Fig. 13
figure 13

Plot of the results of method proposed in this paper alongside Reinhard et al., Pitie et al., Ueda et al., Wu et al., and Xu et al. The first row is the original image, the second row is the fusion weight image and the third row is the reference image

In the illustration provided in Fig. 13, the original image (displayed in the first row) and the reference image (in the third row) exhibit different color distributions. The second row is the fusion weight maps with the weights of the fused parts of Sect. 3.5 as the image output, these weight maps adeptly elucidate the specific image regions we focused on during the color fusion process. Brighter sections of the image focus on the results of the projected image \({I^{'(SM)}}\), whereas the darker areas incorporate the outcomes from the method of Reinhardt et al. However, our proposed method manages to produce a color distribution that not only appears more natural but also mirrors the colors of the reference image. At the same time, the intricacies and details within the images are preserved maintained.

Fig. 14
figure 14

Diagrams representing individual details of red-boxed portion of Fig. 13: a Reinhard et al. method, b Pitie et al. method, c Ueda et al. method, d Wu et al. method, e Xu et al. method, f our method

Fig. 15
figure 15

Detailed images of the results of this paper’s method compared to Reinhard’s results

We have focused on preserving the color of the original image, which has historically been susceptible to distortion in previous methods. Taking image group 4 as an example, Fig. 14 shows a zoomed-in perspective. This close-up reveals that the output images from methods by Pitie et al., Ueda et al., and Xu et al. undergo noticeable color shifts, leading to an unnatural appearance of the wall segment. While Wu et al.’s method exhibits improved color, a pronounced black area is still evident. This recurring appearance of black pixels in the output is attributed to the unsuitability of parameter criteria chosen by Wu et al. for demarcating regions across diverse image types. The results from Wu et al. bear similarities to those by Reinhard et al. Figure 15 shows a detail view of the results of the method herein compared to the results of Reinhard. The first column is the output image, and the second and third columns are the portions of the output image in which the method of this paper and the method of Reinhard et al. mark dashed boxes, respectively. The results of the method in this article are similar to those of the method of Reinhard et al. However, a detailed comparison indicates a difference in brightness: Reinhard et al.’s outcomes are visibly dimmer. Our method, on the other hand, produces vibrant, authentic colors and faithfully conserves the colors in the reference image.

Fig. 16
figure 16

Diagrams representing various details of yellow boxed portion of Fig. 13: a Reinhard et al. method, b Pitie et al. method, c Ueda et al. method, d Wu et al. method, e Xu et al. method, f Our method

Another important aspect is the ability to retain details. The second image group focuses on the texture and minute details of the original image. Figure 16 displays detailed views from various methods. Compared to others, both Pitie et al. and Ueda et al. exhibited severe distortions in the sun part of their output images. Furthermore, while the method by Wu et al. has discernible black areas, Xu et al.’s approach performs somewhat better, though it still yields a slightly blurred sky region. In stark contrast, our method is able to better preserve the detailed features from the original image, such as texture and sharp edges, making the output image visually clearer. To evaluate the performance of the different methods, we used objective evaluation metrics: Kullback–Leibler Divergence (KLD) [25], False Color Index (FCI) [10], and Structural Similarity Index Measure (SSIM) [26]. The KLD quantifies the difference in color distribution between reference and output images.

$$\begin{aligned} \mathrm{{KLD}}(P||Q) = \sum \limits _i {{P_i}} \ln \frac{{P{}_i}}{{{Q_i}}}, \end{aligned}$$
(16)

where P and Q are the color histograms of the reference image and the output image, respectively. FCI evaluates the prevalence of pseudo-colors in the transferred output image.

$$\begin{aligned} \begin{array}{l} \mathrm{{FCI}} = \sum \limits _{i = 1}^n {\sum \limits _{j = i + 1}^n {{\Phi _{ij}}} },\\ {\Phi _{ij}} = \left\{ {\begin{array}{*{20}{c}} {1,}&{}{\begin{array}{*{20}{c}} {\Delta E_{ij}^{(src)} < \xi }&{}{and}&{}{\Delta E_{ij}^{(out)} > \eta } \end{array}}\\ {0,}&{}{otherwise} \end{array}} \right. , \end{array} \end{aligned}$$
(17)

where \(\Delta {E_{ij}} = \sqrt{{{({L_i} - {L_j})}^2} + {{({a_i} - {a_j})}^2} + {{({b_i} - {b_j})}^2}},\) and the values of \(\xi\) and \(\eta\) are 5 and 15. SSIM measures the structural similarity between the transferred output and the original images.

$$\begin{aligned} \mathrm{{SSIM}}(src,out) = \frac{{(2{\mu ^{(src)}}{\mu ^{(out)}} + {C_1})(2{\sigma ^{(src,out)}} + {C_2})}}{{({{({\mu ^{(src)}})}^2} + {{({\mu ^{(out)}})}^2} + {C_1})({{({\sigma ^{(src)}})}^2} + {{({\sigma ^{(out)}})}^2} + {C_2})}}, \end{aligned}$$
(18)

where \({\mu ^{(src)}}\) and \({\mu ^{(out)}}\) denote the means the original and output images, \({({\sigma ^{(src)}})^2}\) and \({({\sigma ^{(out)}})^2}\) denote the variances of the original and output images, \({\sigma ^{(src,out)}}\) is the covariance between the original and output images, and \({C_1}\) and \({C_2}\) are constants to avoid the case where the denominator is zero.

Table 1 KLD of output images for each method

Table 1 shows the KLD values for output images produced by different methods. The data suggest that the KLD values of the proposed method are intermediate in comparison to other methods. Specifically, Pitie et al., Ueda et al., and Xu et al.’s methods possess KLD values smaller than ours, while the KLD values from Reinhard et al. and Wu et al. are larger than that of the KLD method of the method proposed in this paper. However, it is worth noting that the KLD metric does not fully reflect the color distribution similarities between output and reference images, because the KLD metrics do not take into account factors such as color saturation and luminance, and therefore a combination of other metrics is needed to assess the quality of the output image in a more comprehensive way.

Table 2 FCI of output images for each method

Table 2 shows the pseudo-colors calculated from the output images of different methods. We have highlighted the lowest value for each row. Our method consistently registered lower FCI values compared to other methods and have the smallest average value. This indicates that our proposed method produces fewer pseudo-colors in its output images, thus enhancing the resultant image’s quality.

To assess image generation quality, we employed the SSIM, which measures structural similarities between the output images of different methods and the original images. Table 3 shows the SSIM values associated with each method for different image sets, with the maximum value in each row duly marked.

Table 3 SSIM of output images for each method

From the table data, it is evident that the proposed method exhibits higher SSIM scores in seven image sets, with improvements of 2%, 15%, 8%, 6%, and 9% over the mean values from Reinhard et al., Pitie et al., Ueda et al., Xu et al., and Wu et al., respectively. This implies the superiority of our method in preserving image structure and details. Notably, our approach excels at retaining edge information and offers higher fidelity in areas with complex textures. This adherence to the original image’s structural similarity contributes to the elevated SSIM scores.

In conclusion, the experimental results clearly demonstrate our method’s supremacy, especially when assessed using the SSIM metric. Additionally, our method is more computationally efficient and easy to employ than deep-learning-based methods. These findings reaffirm our method’s prowess in upholding image structure and intricacies, marking a significant advancement in the realm of image generation.

5 Conclusion

This paper proposes a method based on the idea of dividing the region for color transfer using saliency feature maps. Our experimental results show that this approach adeptly maintains the detailed features of the original image. Simultaneously, it imparts more vibrant and authentic colors and displays superior fidelity in complex texture regions.

Looking ahead, we aim to refine the current process, seeking to optimize steps without losing the existing accuracy. We are also invested in enhancing the precision of reference image color transfers through improved color space matching.