1 Introduction

Non-photorealistic rendering (NPR) is a computer graphics technique that mimics human artistic expression. NPR has been studied since the 1990s [7], and video-based NPR was studied mainly in the 2000s. Most research studies focused on the expression of a specific style: painterly [6, 9, 15], pen and ink [13, 20], and watercolor [2]. These are stroke-based rendering (SBR) methods, which use brush strokes as the basic primitive [10]. Various styles can easily be expressed by modeling each brush stroke according to the style. However, each style requires a distinct painting method, so that representing various styles in a single framework is not easy. To overcome this limitation, a method for transferring texture is suggested.

Texture transfer is a method that copies the texture of a reference image to a target image [4]. This technique has an advantage in that various styles, such as painterly, pen and ink, watercolor, etc., can be expressed according to the reference image, in a single framework. However, this method expresses the textureness by compositing the pixels of the reference image. It is therefore not easy to control the effect of each style. In addition, when this technique is extended to processing video images, maintaining temporal coherence is very difficult. Because of these problems, texture transfer for a single target image is studied widely, but for video images only rarely.

In this paper, we propose an algorithm that transfers the texture of a reference image to a target video while retaining the directionality of the target video. The algorithm maintains the temporal coherency of the transferred texture, and controls the style of the texture transfer. In order to move the transferred texture along the motion of the target video, we synthesize the texture by using the previous frame considering pixels moved along the motion. In addition, we control the textureness and the directionality by using the saliency of each video frame. When the saliency value is high, the detail of the target frame is retained to a greater extent. However, strong textureness is expressed when the saliency value is low. To control the textureness effectively, we employ pyramid dilation based on the saliency.

The main contribution of this paper is to provide a directional texture transfer algorithm for a video target while maintaining the temporal coherency of the transferred texture. Additionally, we propose a technique that estimates some weights using the saliency to control the effect of the texture transfer.

2 Related works

NPR is divided into painterly rendering, pen and ink, watercolor, etc., according to the style. SBR is a popular NPR technique for expressing style, which expresses artistic style by using the brush stroke as the basic primitive. It can easily control stylization by defining the attributes of the brush stroke and painting method. In the 2000s, SBR was extended to animation, and research studies were therefore conducted whose objective was to maintain the temporal coherency of stroke-based animation [8, 12, 19, 23, 24]. The methods used in these studies achieved temporally coherent animations by moving, adding, and modifying the stroke according to the motion between frames. However, the attributes of the stroke and painting method vary according to the style, and therefore different styles cannot be expressed in a single framework. Therefore, techniques that allow different styles to be expressed in single a framework were studied. Texture transfer is one of them.

In the original research on texture transfer, texture synthesis techniques were studied. The purpose of texture synthesis is to create a random very high-quality texture from a tiny example texture. Methods of texture synthesis have been studied [3, 5, 17, 18]; however, these methods do not allow an artistic effect to be expressed, but rather the texture is imitated using self-similarity. We concentrated primarily on an artistic texture transfer algorithm.

The objective of image analogies [11] is to broaden the range of non-photorealistic rendering techniques and produce results by employing coherent local and optimal global texture synthesis. However, this approach is still rather time consuming, and it requires users to give additional unfiltered information about the target image/frame in order to achieve a precise correlation with the source texture. Then, a fast texture transfer algorithm was proposed. This technique straightforwardly extends the coherent local texture synthesis technique [25]. This approach can improve the results of image analogies, in terms of imitating artistic painting effects. It is much faster than the image analogies technique of [11]. The most important factor is that it needs only one example image. In the study reported in [22], a patch of the reference image was used as the basic primitive. The authors extracted some patches from the reference image, and generated textures by synthesizing them. In [16], a texture transfer algorithm that has a directional effect was proposed. The algorithm transfers the texture of the reference image into the target image, but retains the direction of the target image. Unlike previous SBR methods, these methods have the advantage that various styles can be expressed according to the style of the reference image. However, it is not easy to control the effect of the texture transfer.

In [25], texture transfer for single images was extended for video images. The texture is synthesized by selecting the best pixels from among the candidates in the reference image. The authors additionally considered the motion between target video frames; their method therefore allows the texture to be transferred along the motion of the target video. However, it cannot express the directional effect of the target video. Like previous texture transfer methods, this method has the limitation that it cannot control various effects of the texture transfer. In this paper, we extend this method to express directional effects and control various effects of the texture transfer.

3 Directional texture transfer for video

3.1 Overview of texture transfer for video

In this section, we represent an overview of a pixel-wise texture transfer algorithm for target video images. The algorithm is divided into two main steps. In the first step, texture transfer is performed by using a reference image and the first frame of the target video. In the second step, a temporary result of the next frame is generated by using the results of the motion and texture transfer of the previous frame, and a final result frame is generated by using our texture transfer algorithm. Table 1 shows the overview of each step.

Table 1 The pseudo code of directional texture transfer for video 1

The quality of the result is dominated by the selection of the best pixel among the candidates. Therefore, the design of the distance function is most important. We design our distance function by considering the texture pattern similarity, intensity similarity, directionality, and temporal coherency. We present the details of the distance function in Section 3.2. In addition, we control the effect of texture transfer by using an adaptive kernel and directional weight control according to the gradient value. We explain this in detail in Section 3.2.

3.2 Distance function

In this paper, the distance function for choosing the best pixel among the candidates is as follows.

$$ D(r,q) = D_{N}(r,q) + D_{L}(r,q) + D_{F}(r,q) + w_{I}D_{I}(r,q) $$
(1)

The equation consists of four terms. The first term is related to the similarity in intensity. This enables the function to select a candidate pixel that is similar to the original pixel in the target image. The second term is related to the texture similarity. This enables the function to select a candidate pixel whose texture pattern is similar to the current texture pattern. The third term aims to maintain temporal coherency. This allows the function to select a candidate pixel that is similar to the corresponding pixel of the previous frames result. The last term is related to directionality. This makes the transferred texture follow the direction of the target image. In the next subsections, we describe each term in detail.

3.2.1 Intensity similarity

Similarity of the pixel intensity is one of the criteria for choosing the pixel that will be placed on a given location in the target image from among the candidates that consist of pixels selected from the reference image. The pixels from the reference image must accurately express the target image. Therefore, a pixel must be selected whose intensity is similar to that of the pixel of the target image. The equation for measuring the intensity similarity is as follows.

$$ D_{N}(r,q) = ||N_{r} - N{q}||^{2} $$
(2)
$$ N_{x} = avg(N(x)) $$
(3)

Here, N(x) denotes the intensities of the pixels in the predefined kernel. The average intensity of the pixels in a specific kernel is employed for measuring the intensity similarity because it is intended to be less influenced by the noise of the image. In previous research efforts [1, 16, 25], a rectangular kernel was used. However, averaging the intensities of the pixels in a rectangular kernel produces a blurring effect around the edges in the target image. When the kernel size is increased, this blurring effect becomes more pronounced.

In order to solve this problem, we employ an anisotropic Kuwahara kernel [14] instead of a rectangular kernel. Anisotropic Kuwahara filtering is used for edge-preserving smoothing, which prevents this blurring effect around the edges. We obtain the flow directions of the target image using the edge tangent flow [13]. Then, using this flow, we calculate the Kuwahara kernel [14]. Figure 1 presents the flow of a target image and its corresponding anisotropic Kuwahara kernel.

Fig. 1
figure 1

Flow direction of image and anisotropic kernel for preventing blurring

3.2.2 Texture similarity

In texture transfer methods, the pixel whose texture is similar to the texture of the pixel in the target image is selected from among the candidates. This process is represented by the following equations:

$$ D_{L}(r,q) = (1/L(r))||H_{r} - H{q}||^{2} $$
(4)
$$ H_{r} = [R(x_{i}) - \overline{R(x_{i})}], x_{i} \in L(r) $$
(5)
$$ H_{q} = [S(x_{i}) - \overline{S(x_{i})}], x_{i} \in L(q) $$
(6)

Here, ||·|| denotes L 2 norm, L(x) denotes the L-shaped neighbors, R(·) and S(·) denote the intensities of pixels of result and reference image, and |·| denotes the size of the set. Since our algorithm is performed in scan-line order, the L-shaped neighbors indicate the neighbor pixels that were synthesized before the current pixel. By comparing the deviation of the intensities of the L-shaped neighbors, we calculate the texture similarity. When two textures are similar to each other, this term has a low value so that the pixel of that texture can be selected for synthesizing the texture.

3.2.3 Temporal coherency

If the texture transfer, which is based on single target image, is applied to each frame of the target video, flickering occurs in the resulting frames. This is why the motions between frames are not considered. In order to solve this problem, we choose the pixel whose intensity is similar to that of the pixel inherited from the previous result frame. We estimate the motions between frames using an optical flow algorithm. However, if we select the best pixel strictly based on motion, the shower door effect is produces near the occlusion/disocclusion boundaries [25]. To solve this, we reflect the motion according to the degree of confidence in the optical flow quality [21].

$$ D_{F}(r,q) = C(r)\cdot ||S(q) - S(g(r+F(r)))||^{2} $$
(7)

Here, F represents the backward optical flow, which is computed using the previous and current frames. C(r) is the confidence at r, which is defined as follows:

$$ C(r) = G\left(\nabla\cdot F(r);\sigma^{2}_{f}\right)\cdot G\left(T(r)-T_{-1}(r+F(r));\sigma^{2}_{I}\right), $$
(8)

where ∇ · F represents the divergence of the flow field, T is the current frame of the target video, T −1 is the previous frame of the target video, and G(·;σ 2) represents a zero-mean Gaussian function with variance σ 2. Please refer to [21] for further details.

3.2.4 Directionality

The directionality of texture exaggerates the shape object in an image and creates a more artistic effect. In order to exaggerate the directionality, [16] considered the flow of textures. They defined I-shape as neighborhoods laid down in the flow direction belonging to the L-shape. By using I-shaped kernel, they find candidate that matches the intensity of pixels on the flow of texture. Similar to this, we choose a pixel whose intensity is similar to that of the pixel in the flow direction using an I-shaped kernel so that the texture has a directional pattern along the gradient direction; this is represented by

$$ D_{I}(r,q) = ||\overline{R(r_{i})}-S(q)||^{2}, r_{i} \in I(r) $$
(9)

Here, I(x) denotes the pixels in an I-shaped kernel.

3.3 Adjustment of texture transfer style

3.3.1 Degree of texture

In artistic painting, brushes of various sizes are used. In the background region, strong textures generated by large brush strokes can be observed. On the contrary, for details of a significant object or complicated region, small brushstrokes are used. Similarly, we can control the degree of texture according to the saliency of the target image, for our algorithm to create a more artistic effect. We transfer a weak texture in the salient region and a strong texture in a relatively less salient region.

We estimate the saliency by using the gradient magnitude of the target image. If we calculate the gradient using a Sobel or Raplcian operator, the magnitude of the gradient varies within a narrow range near the edges. In order to expand the range such that the gradient values vary to a greater degree, we employ a dilation operation. Although this dilation widens the range, the magnitude of the gradient does not fall off along the direction toward the end of the range. In order to solve this, we employ pyramid dilation, which accumulates the dilation results using various sized kernels. In order to avoid the staircase effect, we smooth the pyramid dilation result using a Gaussian filter (Fig. 2). Figure 3 shows the results of pyramid dilation proposed in this paper. We use this to obtain the saliency map of the target image.

Fig. 2
figure 2

Change in gradient magnitude: (a) original gradient magnitude, (b) dilation, (c) pyramid dilation, (d) pyramid dilation + smoothing

Fig. 3
figure 3

Pyramid dilation by accumulating several dilation results obtained with different sized kernels: (a) dilation with kernel sizes of 1, 4, 8, and 12 and (b) pyramid dilation result (saliency map)

In our texture transfer algorithm, the degree of texture depends on the size of the kernel. When a larger kernel is used, a stronger texture is observed, and vice versa. Figure 4 shows a comparison of different degrees of texture according to kernel size.

Fig. 4
figure 4

Comparison of degrees of texture according to kernel size

We adjust the kernel size of the texture transfer algorithm using the saliency map so that we can control the degree of texture. The following equation shows the calculation of the kernel size according to the saliency.

$$ size = K_{min}+G(r)\cdot(K_{max} - K_{min}) $$
(10)

Here, G(x) denotes the value of the saliency map, and K min and K max denote the minimum and maximum sizes of the kernel, respectively.

3.3.2 Weight of directionality

In order to control the degree of directionality, a weight of directionality term is used in (1). If the weight is increased, the texture pattern is aligned along the flow direction, but it tends to be different from the pattern in the reference image. On the contrary, if the weight is decreased, the texture pattern reflects the reference image well, but it does not follow the flow of the target image. In [25], the weight is determined by the user.

In this paper, we propose a method of assigning the weight automatically according to the saliency obtained above. In the salient region, we can express the directionality by using a high weight. On the contrary, we can preserve the texture pattern of the reference image by using a low weight, in a less-salient region.

$$ w_{I} = w_{max}\cdot G(r) $$
(11)

Here, w max is the maximum weight defined by the user; in this paper, we use a value of 0.6.

4 Experimental results

We conducted experiments using different example source images to examine the corresponding different results. We compared our results with those of previous studies, and found that our method can consider and express the object shape well, whereas previous studies did not. In addition, we can control the various stroke attributes using our pixel-based texture transfer algorithm. Implementation takes about 9 s, depending on the image size and parameters, for a VGA image size with a neighborhood size of 513 pixels. We used a core 2 duo 2.33 GHz CPU PC with 4 GB memory to obtain the results in this paper.

Figure 5 shows the different results according to the different reference image. We use painterly, pen and ink, and water color style images as the reference images. By transferring the texture of the reference image, we converted the style of the target image to that of the reference image.

Fig. 5
figure 5

Results according to the reference image. (a) target image, (bd) reference images, (eg) result images

Figure 6 shows a comparison of the results when a fixed-size or adaptive kernel was applied. Figure 8a, b shows the results when a fixed-size kernel of size 5 (Fig. 6a) and 13 (Fig. 6a) was applied. When the larger kernel was used, the textureness is exaggerated, but the detail of the target image is lost. In contrast, when the smaller kernel was used, the detail of the target image is maintained well, but the textureness is degraded. In Fig. 6c, which is the result when an adaptive kernel was applied, the textureness is exaggerated in the region where the gradient value is low, and the detail is maintained in the region where the gradient value is high. Therefore, the detail and textureness are both expressed well.

Fig. 6
figure 6

The results with fixed sized kernel (a and b) and adaptive kernel (c)

Figure 7 shows the effect of directional weight control. In Fig. 7b, which is the result when a constant directional weight was applied, the directionality is expressed well. However, the patterns of the texture are confused in some regions (red circles in the figure). In Fig. 7c, which is the result when directional weight control was used, directionality is expressed well in high gradient regions, and textureness is maintained well in low gradient regions.

Fig. 7
figure 7

The effect of directional weight control: (a) Reference image; (b) the result without directional weight control; (c) the result with directional weight control

Figure 8 shows a comparison of the results of a previous study [25] and our study. We generated our results using the same reference image and target video as in [25]. As seen in the figure, in contrast to the results of [25], the directionality is well expressed in our results. Moreover, as in [25], the temporal coherency is well maintained.

Fig. 8
figure 8

Comparison with previous work

Figure 9 shows the videos that result when various reference images are used for the composition. Please see the experimental results in the accompanying videos. Our algorithm can generate various results according to the reference image, in a single framework.

Fig. 9
figure 9

The result videos by composing with various reference images

5 Conclusions and future work

In this paper, we proposed a directional texture transfer method for video targets. First, we improved the distance function by employing a directional and a temporal coherency term. By using this function, we generated a stylized video having a directional effect based on the image gradient of the target video frame. In addition, the temporal coherency of the transferred texture is well maintained. Next, we controlled the effect of texture transfer. By using adaptive kernel, we controlled the textureness. In addition, we controlled the directionality by using directional weight control. In our study, in a single framework, we were able to generate videos in various styles according to the reference image.

Our algorithm has some limitations. In this study, we employed optical flow [21] for estimating the motion between video frames. However, the motion estimation is not accurate, and therefore estimation error can accumulate. This will generate an incorrect texture pattern. Therefore, it is necessary to enhance the motion estimation to obtain better results. In addition, our algorithm is pixel-based, so that it is easy to convert into parallel processing. Therefore, we are planning future studies on parallel processing using GPU.