Multi-directional Hole Filling Method for Virtual View Synthesis

Mun, Ji-Hun; Ho, Yo-Sung

doi:10.1007/s11265-015-1069-2

Multi-directional Hole Filling Method for Virtual View Synthesis

Published: 29 October 2015

Volume 85, pages 211–219, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Signal Processing Systems Aims and scope Submit manuscript

Multi-directional Hole Filling Method for Virtual View Synthesis

Download PDF

Ji-Hun Mun¹ &
Yo-Sung Ho¹

319 Accesses
Explore all metrics

Abstract

In 3D video applications, view synthesis is widely used for generating a virtual viewpoint images. However, the quality of synthesized image is dissimilar to captured color image. Furthermore, noise occurs at object boundary areas in synthesized images. In this paper, we propose a new hole filling method to improve the synthesized image quality. In order to synthesize an image without the noise, we remove the boundary noise by expanding it as a hole region. Based on the expanded hole region, we use proposed hole scanning and directional hole filling method in simultaneous time. Additionally, we compute a cost value among the candidate pixels to find out proper hole filling pixel. We conducted experiments on various test sequences for evaluation of the proposed method. The proposed method produces better objective results than other conventional hole filling methods.

Multi-view Synthesis Based on Single View Reference Layer

Fast and high-quality virtual view synthesis from multi-view plus depth videos

Article 09 February 2019

Virtual View Synthesis Based on DIBR and Image Inpainting

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In general, the 3D video system supports immersion and reality to viewers. Multi-view camera systems are broadly used for producing 3D video content. There have been many research activities on camera geometrics and parameter alignment techniques.

Due to the technical development, the demand for 3D video systems has incredibly increased. 3D video technology is widely used in various fields, e.g., films, games, medical imaging, and TV broadcasting systems. The technology allows viewers to enjoy more realistic stereoscopic images in a number of viewpoints. 3D video technology will affect next generation video content business. This business prospect is already proved by a number of research data. Since 2002, ATTEST (advanced three dimensional television system technologies) of Europe has conducted extensive researches on 3D processing techniques [1]. Similarly, 3D image processing and holography techniques have been broadly studied [2]. Furthermore, Japan also has developed free viewpoint TV systems using ray-space techniques [3].

In addition, hybrid multi-view video systems are widely used to generate 3D video content. In general, hybrid multi-view camera systems consist of multiple color cameras and depth cameras, e.g., Kinect and ToF (Time-of-Flight) cameras [4]. A clock generator is used for synchronization of multiple color and depth cameras. Thus, multiple color camera and depth camera can capture the scene in simultaneous time.

Figure 1 describes the overall procedure to generate 3D video content using a hybrid multi-view camera system. Basically, color and depth cameras are calibrated to extract the camera parameters. Then, we perform color correction to diminish color inconsistency among multiple color cameras. Subsequently, multi-view image rectification is conducted for compensating geometric disparities of such cameras [5].

Captured depth images need up-sampling and 3D warping procedure, due to inconsistent resolution to color cameras and camera position differences, respectively [6]. Sequentially, the warped depth information reduce the disparity search range for stereo matching. While such a constrained disparity range enhances the accuracy stereo matching, synthesized images still contain a noise and blurring effects. Such effects compromise the viewing quality [7].

Generally, stereo and multi-view camera systems are used to generate 3D video content. Multi-view camera video content supports a reality and immersion without any assistance devices. The multi-view video content generates a many viewpoint images, therefore, we can select preferred viewpoint images.

Recently, research about the multi-viewpoint and free-viewpoint image technique have been executed by MPEG. They also develop 3D video coding techniques for broadcasting system. Since the multi-view video content have a lots of data sets, it cause a bandwidth overloading problem. In order to solve limited bandwidth and heavy data problems, effective coding technologies are developed [8]. Virtual view synthesis method is also used for solving such issues.

MPEG develops the VSRS (View Synthesis Reference Software) to create a generalized virtual viewpoint images [9]. Virtual viewpoint images are generated using a depth information. The depth information is not entirely correct with the real world coordination, thus synthesized image includes a many hole region as represented in Fig. 2. In this paper, we propose an effective hole filling and noise removal method to enhance the synthesized image quality.

2 3D Video System

Generally, the 2D video system do not requires a depth information when generating video content. However, the 3D video system uses the depth information to support a reality and immersion in 3D video content. Depth images can be acquired by depth cameras such as ToF and Kinect cameras. Moreover, they can be estimated via stereo matching using captured color images from multi-view or stereoscopic camera systems.

Two input data which color and depth images are usually used to generate 3D video content. Camera geometrical error and parameter values effect on the generated image, thus image rectification and color correction have to be performed before generating video content.

Generated video content images are differently displayed depending on the types of screen devices [10]. If display devices support a 3D video image, then we can feel the reality and immersion. However, increased number of cameras for multi-view systems cause a spatial and data overloading problem. Generally, virtual viewpoint synthesis method is used to solve that kind of problems. Virtual viewpoint images are generated by three steps: 3D warping, boundary noise removal and hole filling.

3 Virtual View Synthesis

In VSRS, stereo color image and depth image are used for an input data to generate a virtual viewpoint images. Figure 3 represent the flow chart of view synthesis procedure. The view synthesis method can be separated into two stage.

In the warping stage, we perform depth image 3D warping to change the image coordination to 3D space. The depth camera coordination and virtual world coordination are not the same, hence on the warped depth image includes some hole region. VSRS uses the median filter at the warped depth image to remove the hole region.

When performing the color image 3D warping (texture mapping), the filtered depth image is used for guide information. Due to the hole region in depth image, the warped color image also contains a hole region. The hole area has a negative effect during the view synthesis, since it has to be clearly removed. For a remaining hole region, VSRS uses the inpainting method. The viewpoint mixing method selects a relevant view point image to create an accurate synthesized image.

3.1 Boudnary Noise Handling

Conventional hole filling methods did not consider a boundary noise effect, which occurred near the object neighbor region. The boundary noise affects the quality of synthesized images. As indicated in Fig. 4, the noise artifacts are emerged in synthesized images.

Many kinds of boundary noise are generated during the color image 3D warping. Since the boundary noise influence the hole filling result, it has to be eliminated from the warped color image before performing the hole filling.

In this paper, we propose boundary nose removing method as represented in Fig. 5. Since the boundary noise occurred near the hole area, we compare the boundary noise region with noise neighbor area. By replacing the boundary noise area with a hole regions, we can efficiently remove the noise region. Figure 6 represents the proposed boundary nose removing method.

In order to find out the boundary noise region in warped color image, we compute the mean value of boundary noise neighbor pixels. Since the boundary noise pixels have a different value compared to neighbor pixel values, the proposed boundary handling method correctly extract the boundary noise area.

$$ \begin{array}{l}Av{g}_p={\displaystyle \sum_{i=0}^3\frac{C_K+{T}_i}{4},\kern0.5em \left(T=-5,-4,..4,5\right)}\hfill \\ {}\kern8em \left(p=1,2\kern0.5em \dots \kern0.5em 8\right)\hfill \end{array} $$

(1)

$$ Ab{d}_i\left|Avg\left({P}_i\right)-Avg\left({P}_{i+1}\right)\right|,\kern0.5em \left(i=1,2,3\kern0.5em \dots \kern0.5em 7\right) $$

(2)

In Eq. (1), the denominator represents the number of comparison pixels and C_k indicates the target boundary noise pixels. The computation of the average value of all the candidates is an essential procedure to change the boundary noise area to a hole region. After all candidates average values are calculated, we compute the absolute difference of each neighbor average values using Eq. (2). If the absolute difference value is bigger than the pre-determined threshold value, then C_k changes into a hole pixel.

As a result of the proposed boundary noise handling method, the original hole area is expanded as represented in Fig. 7. Boundary noise contour image and original hole information are used as an input data to obtain the expanded hole image. Using the proposed boundary handling method, we can get a non-boundary noise image before performing the hole filling.

3.2 Conventional Hole Filling Methods

After the depth image 3D warping, VSRS uses median filter to remove the remaining hole area as represented in Fig. 3. Based on the warped depth information, we also perform color image 3D warping. Since the real world coordination and warped depth information is not matched, the warped color images contain a hole region as represented in Fig. 8.

If we apply the median filter to the warped color image, then synthesized image has a noise and blurring effects due to the hole region. Consequently, the hole regions have to be clearly removed before view synthesis.

Many hole filling methods have been developed to remove the hole region in warped color image [11]. Majority of effective hole filling methods use a depth information and hole neighbor pixels to remove the hole region. In a conventional hole filling methods, the hole area H is filled with a constant color value C. The constant color value is determined by averaging the hole neighbor pixel values. That relation is represented in Eq. (3). Where ∂H indicates the hole neighbor pixels.

$$ C=\frac{\sum_{d\forall m\in \partial {H}^I\left[m\right]}}{\sum_{\forall m\in \partial {H}^1}} $$

(3)

The conventional method [11] do not considers the hole filling directions and other hole neighbor pixel conditions, thus the quality of hole filling result is not extremely improved than VSRS result. Figure 9 demonstrates the conventional hole filling procedure.

In conventional hole filling method, the hole neighbor pixel values are used for hole filling pixel value. As shown in Fig. 9, the hole area is gradually removed using neighbor pixel values from the outside to inside. Since the conventional method do not efficiently fill the hole region, result image still includes a blurring and noise effects.

An advanced hole filling method was proposed using a texture background pixel value to eliminate the noise artifacts [12]. Hole regions usually occur in the background region, hence the hole regions are efficiently removed using the background pixel values. In order to find the hole region, the hole contour line is extracted from warped color image as represented in Fig. 10.

Based on the extracted hole region, the background region is determined using a depth information. As shown in Fig. 11, by comparing the depth values, the background and foreground region are distinguished.

If the left depth value is ‘101’ and the right depth value is ‘155’, then we can determine that the left region is background area. This method can be easily applied to standard hole filling procedure such as VSRS. Consequently, this algorithm improved the quality of synthesized image compared to the other conventional hole filling methods.

Even though the conventional method considers the background region pixel for hole filling, the result image quality is not significantly improved compared to original hole filling method. The warped depth image information is not perfectly correspond to real world coordination, thus the background considering method has a problem in terms of effective hole filling.

3.3 Proposed Hole Filling Method

Previously used hole filling methods create an insufficient hole filling results. In order to improve the quality of hole filling result image, we propose a new hole filling method considering a directionality of hole neighbor pixels. Usually the hole regions appear near the object. If the hole region is filled with a background pixel value, then result becomes more reasonable. Since the hole areas do not occur within the object, background labeling is crucial in hole filling procedure.

The proposed hole filling method considers the directionality of hole neighbor pixels. Since the hole region emerges near the object area, we fill the hole region with a specific background pixel value. Left and right image warping results contain a different hole region depending on the viewpoint as shown in Fig. 8. The hole regions occur at the opposite side of input viewpoint in warped image. Furthermore, the occlusion region generates a hole region in warped color images.

Based on the depth information, we can easily determine the background area which correlates to the hole region. Additionally, we divide the hole appearance types as represented in Fig. 12. Depending on the location of background region, each side of hole candidate is adaptively applied to hole filling. Usually, the 7-candidate hole filling method can select more proper hole filling pixel value than the 5-candidate method.

Hole filling ordering is determined by location of background area. As demonstrated in Fig. 13, hole region have to be filled from the background region to object region.

Even though the proposed method considers the directionality of background pixels, the empty pixel values (hole) can be used as a hole filling candidate pixels. We propose an adaptive hole scanning method to solve this problem.

Hole scanning directions are differently applied depending on the left and right image warping results as represented in Fig. 14. Using this adaptive hole scanning method, the empty pixel values are not used for candidate pixels.

In Fig. 14a, it represents the situation which a background is located in hole left side. Similarly, hole scanning departures from right to left side in case of Fig. 14b. Since the right side information is insufficient when performing the left image warping, the right side of warped image contains a hole region. Likewise, the right image warping result also has a hole region on the left side.

The adaptive hole scanning method fills the hole region using different directions, thus we can generate an improved hole filling result when performing the hole filling with multi-directional hole filling candidate method. Throughout the proposed hole scanning method, we can use the non-empty pixels as a hole filling candidate pixel values.

In order to determine useful pixel value among the hole filling candidate pixels, we compare the hole pixel with candidate pixels. However, the hole pixels do not contain any value, so we have to use other pixel value instead of empty pixel value. Since the hole neighbor pixel values contain a similar pixel value with a hole pixel, we use a hole upper side pixel value as a substitutive pixel. Using a substitutive pixel value, we can accurately select a hole filling pixel value. Consequently, we compare the substitutive pixel value with candidate pixel values as represented in Fig. 15.

When performing the hole filling for warped left image as shown in Fig. 15a, we use right to left hole scanning direction. Similarly, in case of Fig. 15b, left to right hole scanning direction is used. Among the hole filling candidate pixels, we select a proper pixel values by comparing the substitutive pixel with candidate pixels. Equation (4) demonstrates the cost computation method for hole pixel value.

$$ Cost=\frac{\sum_{x,y\in Neighbor}\left|NP\left(x,y\right)-CP\left(x,y\right)\right|}{NeighborNumber} $$

(4)

$$ Direction= \arg Min\left\{ Cost\right\} $$

(5)

In Eq. (4), NeighborNumber indicates the number of hole filling candidate pixels, thus the denominator value can be changed depending on the 5 and 7 candidate methods. Where NP represents the hole filling candidate pixels and CP indicates substitutive hole pixel value. Until the final hole is removed, we iteratively compute the cost value.

Subsequently, the average of differences between candidate and substitutive hole pixel is selected as a final cost value. The smallest cost value is used for a hole filling pixel value as indicted in Eq. (5). Consequently, the final hole filling pixel is determined among the hole filling candidate pixels which has a smallest cost value.

Using the proposed hole filling method, we can improve the synthesized image quality. Especially adaptive hole scanning method and multi-directional hole filling method are used to find a proper hole filling pixel value.

4 Experimental Results

To verify the multi-directional hole filling performance, we conduct the tests with 6 MPEG sequences; Kendo, Bee, Pantomime, Champagne, Balloons and Soccer (http://www.jujii.nuee.nagoya-u.ac.jp/multiview-data). Conventional hole filling methods for view synthesis [11, 12] and proposed methods are used to compare the synthesized image quality.

Multi-directional hole filling method shows a better objective results than other the conventional hole filling methods. Experiment results are represented in Fig. 16. In Fig. 17, the proposed methods and conventional methods result are indicated, especially object boundary areas.

Since the performance of view synthesis method is usually measured using a peak signal-to-noise ratio (PSNR), we present the PSNR comparison results in Table 1. The VSRS 3.5 version, conventional methods and proposed methods are used for performance comparison.

Table 1 Objective evaluation of hole filling methods in terms of PSNR.

Full size table

The proposed 7-candidate method performs better than other tools in most cases. However in Bee sequence test, the 5-candidate method is more effective than the 7-candidate method. Since the Bee sequence includes similar texture areas near the hole region compared to other test sequences, the similar pixel values are used when computing the cost value.

In Table 2, the average PSNR results of entire hole filling methods are indicated. As shown in Table 2, the proposed 7-candidate method has a better PSNR result than other hole filling methods. The conventional methods [11, 12] use the background pixel value and depth information for hole filling, thus they can efficiently fill the hole region compared to the VSRS method.

Table 2 Average PSNR results.

Full size table

5 Conclusion

In this paper, we proposed an efficient hole filling method for virtual view synthesis. Boundary noise pixels near hole region are expanded as a hole region. Based on the expanded hole region, we use the multi-directional hole filling method to correctly determine the hole filling pixel value. In addition, the hole scanning method is used to efficiently compare the hole filling pixel value within the non-empty pixels. Subsequently, we compute the cost value for each candidate pixels to find a minimum cost pixel value. Consequently, the hole filling pixel is properly selected from the background region. From view synthesis experiments, we confirmed that the proposed method showed a 1.03 dB average PSNR improvement over other conventional methods.

References

Redert, A., Op de Beeck, M., Fehn, C., IJsselsteijn, W., Pollefeys, M., Gool, L., Ofek, E., Sexton, I., & Surman, P. (2002). ATTEST: advanced three-dimensional television system technologies. 3D data processing visualization and transmission. doi: 10.1109/TDPVT.2002.1024077.
Fehn, C., Kauff, P., Op de Beeck, M., Ernst, F., IJsselsteijn, W., Pollefeys, M., Van Gool, L., Ofek, E., & Sexton, I. (2002). An evolutionary and optimised approach on 3D-TV. International Broadcast Conference, pp. 357–365.
Tanimoto, M. (2006). FTV (Free Viewpoint Television) for 3D scene reproduction and creation. Proc. of IEEE Computer Vision and Pattern Recognition Workshop (CVPRW), p. 172.
Wilburn, B., Joshi, N., Vaish, V., Talvala, V., Antunez, E., Barth, A., Adams, A., Horowitz, M., & Levoy, M. (2005). High performace imaging using large camera arrays. ACM Transactions on Graphics, 24(3), 765–776.
Article Google Scholar
Cheng, H., An, P., Li, H., & Zhang, Z. (2011). Stereo image rectification algorithm for multi-view 3D display. International conference on 3D Imaging (IC3D), pp. 1–5.
Plath, N., Knorr, S., Goldmann, L., & Sikora, T. (2013). Adaptive image warping for hole prevention in 3D view synthesis. IEEE Transactions on Image Processing, 22(9), 3420–3432.
Article MathSciNet Google Scholar
Macchiavello, B., Dorea, C., Hung, E.M., & Cheung, G. (2013). Saliency-cognizant robust view synthesis in free viewpoint video streaming. IEEE International Conference on Image Processing (ICIP), pp. 1603–1607.
Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.
Article Google Scholar
MPEG. (2009). Introduction to 3D video. ISO/IEC JTC1/SC29/WG11 document N 10357.
Yuan, H., Calic, J., & Kondoz, A. (2014). Facilitating interaction with stereoscopic 3D display devices. 3DTV-Conference: The True Vision –Capture, Transmission and display of 3D Video (3DTV-CON), pp. 1–4.
Vazquez, C., Tam. W. J., & Spernaza, F. (2006). Stereoscopic imaging: filling disoccluded areas in depth image-based rendering. Proc. SPIE, vol. 6392. doi: 10.1117/12.685047.
Lee. C., & Ho, Y.S. (2009). View synthesis using depth map for 3D video. Asia-Pacific Signal and International Processing Association (APSIPA), pp. 1–8.

Download references

Acknowledgments

This research was supported by the‘Cross-Ministry Giga KOREA Project’ of the Ministry of Science, ICT and Future Planning, Republic of Korea (ROK). [GK15C0100, Development of Interactive and Realistic Massive Giga-Content Technology]

Author information

Authors and Affiliations

Gwangju Institute of Science and Technology (GIST), 123 Cheomdan-gwagi-ro, Buk-gu, Gwangju, 61005, Republic of Korea
Ji-Hun Mun & Yo-Sung Ho

Authors

Ji-Hun Mun
View author publications
You can also search for this author in PubMed Google Scholar
Yo-Sung Ho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji-Hun Mun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mun, JH., Ho, YS. Multi-directional Hole Filling Method for Virtual View Synthesis. J Sign Process Syst 85, 211–219 (2016). https://doi.org/10.1007/s11265-015-1069-2

Download citation

Received: 30 March 2015
Revised: 14 October 2015
Accepted: 19 October 2015
Published: 29 October 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11265-015-1069-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-directional Hole Filling Method for Virtual View Synthesis

Abstract

Similar content being viewed by others

Multi-view Synthesis Based on Single View Reference Layer

Fast and high-quality virtual view synthesis from multi-view plus depth videos

Virtual View Synthesis Based on DIBR and Image Inpainting

1 Introduction

2 3D Video System