Abstract
Large holes are unavoidably generated in depth image based rendering (DIBR) using a single color image and its associated depth map. Such holes are mainly caused by disocclusion, which occurs around the sharp depth discontinuities in the depth map. We propose a divide-and-conquer hole-filling method which refines the background depth pixels around the sharp depth discontinuities to address the disocclusion problem. Firstly, the disocclusion region is detected according to the degree of depth discontinuity, and the target area is marked as a binary mask. Then, the depth pixels located in the target area are modified by a linear interpolation process, whose pixel values decrease from the foreground depth value to the background depth value. Finally, in order to remove the isolated depth pixels, median filtering is adopted to refine the depth map. In these ways, disocclusion regions in the synthesized view are divided into several small holes after DIBR, and are easily filled by image inpainting. Experimental results demonstrate that the proposed method can effectively improve the quality of the synthesized view subjectively and objectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Depth image based rendering (DIBR) is one of the most important technologies to synthesize different virtual views, using color images and their associated depth maps [3, 5, 15, 16, 31, 32]. However, some regions occluded by the foreground (FG) objects in the original views may become visible in the synthesized views, which are called disocclusion [2, 11, 12, 18, 28]. Figure 1 shows the occurrence of disocclusion in single-view rendering. Original camera represents the captured reference viewpoint, and virtual camera represents the virtual view synthesized by DIBR. The visible areas in the reference view are marked with vertical lines in red, while those visible areas in the virtual view are marked with horizontal lines in green. The disocclusion region, which is not visible in the reference viewpoint but visible in virtual image, is located at the boundary of the foreground and background as pointed out in Fig. 1. Disocclusion problem is particularly serious in single-view-plus-depth format. Therefore, the task of disocclusion-filling becomes more crucial in single-view rendering.
Several methods for solving this problem have been proposed, which can be categorized into two classes: (1) depth map preprocessing before rendering [10, 13, 14, 29, 30] and (2) image inpainting jointing depth and texture information after rendering [6, 17, 19, 21, 24]. In this paper, we focus on the former class, which mainly consists of depth filtering and depth boundary refinement. The main idea of depth filtering is to reduce the difference of depth values on the object boundary. Lee et al. [14] adopted adaptive smoothing filters which including an asymmetric smoothing filter and a horizontal smoothing filter for depth map preprocessing. The asymmetric smoothing filter was adopted to reduce the geometric distortions and the horizontal smoothing filter was adopted to reduce the time of computation and occurrence of holes. In this way, disocclusion region in the synthesized view becomes smaller. However, depth filtering usually introduces rubber-sheet artifacts and geometric distortions. Another method applied is the dilation of the FG objects of the depth map [13, 29, 30]. In [29], a dilation based method for view synthesis by DIBR was proposed. The depth pixels which located in the large depth discontinuities are refined with the neighboring foreground depth values, and thus improve the quality of the rendered virtual view. In [30], Xu et al. extended the method in [29] with a misalignment correction of the depth map for a better view synthesis result. Koppel et al. [13] proposed a depth preprocessing method which including the depth map dilation, adaptive cross-trilateral median filtering, and adaptive temporally consistent asymmetric smoothing filtering to reduce holes and distortions in the virtual view. Under the dilation based method, the FG objects in the depth map are slightly bigger than the corresponding objects in the textured image, but the disocclusion region is not reduced. When the big holes are filled through image inpainting, the diffusion region may be blurred. In addition, the global dilating may causes distortions, as the boundary disparity information is changed in the common area. The disocclusion is mainly located between the FG and the background (BG). Furthermore, the size of the disocclusion region is quite large. Such challenging conditions prevent the use of traditional image inpainting methods and make such holes difficult to be filled.
In this paper, in order to resolve the above problems, we propose a divide-and-conquer hole-filling method to effectively solve the disocclusion problem. Due to the fact that reducing the sharp depth discontinuities can narrow the disocclusion region, we propose to modify the depth pixels around the depth discontinuities by dilating associated with a linear interpolation process. In particular, to avoid the distortion on other FG objects, the interpolation is limited to the neighboring BG. Finally, median filtering is adopted to remove isolated depth pixels. Experimental results demonstrate that our approach is effective in recovering the large disocclusion region.
2 Proposed method
Figure 2 shows the block scheme of the proposed method, which includes disocclusion region detection, local object dilating, disocclusion handling, and median filtering. With the preparation of disocclusion region detection and local object dilating, the divide-and-conquer strategy is mainly reflected in the disocclusion handling section.
For the right synthesized view, disocclusion occurs on the depth map with high to low sharp depth transition areas, and vice versa for the left synthesized view as described in [14]. In this paper, we take the rendering of the right virtual view for example. The same process can be applied on the rendering of the left virtual view. The depth difference between two horizontal adjacent pixels can be expressed as:
where \(d\left ({x,y} \right )\) represents the depth pixel value at position \(\left ({x,y} \right )\), and \({d_{f}}\left ({x,y} \right )\) represents the horizontal depth difference between the depth pixels at positions \(\left ({x,y} \right )\) and \(\left ({x\text {{ -} }1,y} \right )\). If \({d_{f}}\left ({x,y} \right )\) is larger than the specified threshold T 0, there is a large depth discontinuity at position \(\left ({x,y} \right )\).
Based on the above concept, the disocclusion regions are detected according to the degree of depth discontinuity and are labeled as b(x,y)=1 as follows.
Then the target region for refinement is restricted to the neighboring BG and is marked as a binary mask as:
where x 0 represents the x-coordinate of a pixel where \(b\left ({{x_{0}},y} \right ) = 1.\) The condition d(x 0 + k,y)−d(x 0,y)<T 0 is used to avoid that other FG objects be marked by the binary mask.
Figure 3 shows the relationship between the virtual view and the original view. Assume that \(d\left ({x - 1,y} \right )\) is 30 and \(d\left ({x ,y} \right )\) is 28, the depth difference between \(d\left ({x - 1,y} \right )\) and \(d\left ({x ,y} \right )\) is equal to 2 (as shown in Fig. 3, the adjacent pixels are marked in red and blue). After rendering, the hole region is appeared in the virtual view (the black pixels between the red pixel and blue pixel). There are two hole pixels in the virtual view between the adjacent pixels in original view which are marked in red and blue, respectively. It is worthwhile to mention that the pixel value of depth map in Middlebury is equals to the value of disparity, and the parallel rendering is adopted in our paper.
A more generalize description is shown as follows. As the similar with [11], assume that the x-coordinates of two adjacent pixels in the original view are x-1 and x, the x-coordinates of the two pixels become \(x - 1 - d\left ({x - 1,y} \right )\) and \(x - d\left ({x,y} \right )\) after rendering. Thus, the width of the disocclusion region can be formulated as follows.
For \({\left ({{x_{0}},y} \right )}\), the width of the disocclusion region (which is the information loss area in the synthesized view) is equal to \({{d_{f}}\left ({{x_{0}},y} \right )}\). In our proposed method, the neighboring background information with width of \({{d_{f}}\left ({{x_{0}},y} \right )}\), is used to fill the disocclusion region. Thus, the maximum value of k is set as \({2 * {d_{f}}\left ({{x_{0}},y} \right )-1}\). In other words, the horizontal size of the refined region in the depth map is \({2 * {d_{f}}\left ({{x_{0}},y} \right )}\).
An example of the disocclusion region detection is shown in Fig. 4. Figure 4a and b show the original left color image and its associated depth map. Figures 4c and d demonstrate that half of the binary mask (Fig. 4c) can accurately mark out the disocclusion region (the black areas in Fig. 4d). It is worthwhile to point out that the region marked by the red rectangle in Fig. 4c does not match the hole in Fig. 4d. This is because other FG object is present in this region. In this way, the FG depth pixels are preserved, and the distortion on the FG object after rendering is avoided. Figures 4e and f show the image inpainting result accompanied by annoying artifact distortion.
The principle of the proposed divide-and-conquer disocclusion handling approach is shown in Fig. 5. Figure 5a shows the positional relationship in the horizontal direction between color and depth discontinuity. After rendering, the generated disocclusion around the depth discontinuity is shown in Fig. 5b. The size of the disocclusion region is determined by the value of \({{d_{f}}\left ({x,y} \right )}\), and the quantitative relationship between them can be found in [29]. In Fig. 5c, the disocclusion region is filled with the neighbor pixels. As the surrounding pixels of the disocclusion region contain both FG and BG pixels, annoying artifact will appear. To avoid this annoying artifact, the FG objects of the depth map, which neighbors with the sharp depth discontinuity, are firstly dilated as shown in Fig. 5d (marked by purple arrow). Thus, the depth boundary will be slightly wider than the corresponding color boundary. In such a way, the problem of annoying artifact is solved. A structure element n*n large is used for dilation in this paper.
The marked target region except for the dilation area in the depth map, is refined to reduce the sharp depth discontinuities through a linear interpolation process, whose pixel values decrease from the FG depth value to the BG depth value as shown in Fig. 5d (marked by red arrow). The linear interpolation process can be described as:
where d * represents the refined pixels in the depth map. Specifically, as we limited the refined areas, the minimum value of the linear interpolation will not reach the BG depth value when the horizontal distance between the boundaries of two FG objects is less than \({2 * {d_{f}}\left ({{x_{0}},y} \right )}\).
Finally, in order to get a better result of DIBR, a two-dimensional median filtering algorithm [6] is used to remove isolated depth pixels. In this way, the error pixels on the synthesized viewpoints can be reduced.
3 Experimental results
In order to evaluate the effectiveness of the proposed method, the color images and corresponding depth maps of Venus, Middl1, Rocks, Lampshade1, and Flowerpots, provided by Middlebury database [8, 22, 23] are used in our experiments. All the depth images provided by Middlebury database have the same resolution with the corresponding color images. The original images are shown in Fig. 6. T 0 is set as 5, and n is set as 3. The depth map of Venus after being refined by the above algorithms is shown in Fig. 7a. Since the disparity of the target region decreases linearly, after rendering by the preprocessed depth map, the large disocclusion on the virtual image is divided into several small uniform holes as shown in Fig. 7b. The small holes are easily filled by various inpainting methods [1, 20, 26]. In this paper, we show the result with Telea image inpainting [26]. To further improve the quality of the synthesized images of the proposed method, the pixels which are located in the regions of small holes are processed by a horizontal median filter [9] after image inpainting. The 1*5 smoothing window for the horizontal median filter was adopted in our experiment. To further show the advantages of the divide and conquer method with linear interpolation, the intermediate results of the proposed method are shown in Fig. 8. It can be seen from the figure, after rendering with the preprocessed depth map, the large disocclusion on the virtual image is divided into small holes, which are easily filled by image inpainting.
For comparison subjectively, results of different methods are also shown in Figs. 9, 10 and 11. The virtual view images are synthesized with the original depth maps (Figs. 9–11a), the preprocessed depth maps by Lee’s method [14] (Figs. 9–11b), Xu’s method [30] (Figs. 9–11c), and the proposed divide-and-conquer hole-filling method (Figs. 9–11d), and the groundtruth(Figs. 9–11e), respectively. The top row shows the results of dealing with the disocclusion by the above methods, and the bottom row shows some parts of the virtual images which are labeled with red rectangles. It can be seen that the proposed method can fill the disocclusion regions more effectively than others. Meanwhile, the visual effect of the virtual image is improved remarkably. Compared to Xu’s method [30], the annoying artifact and diffusion effect are reduced and even removed by our proposed method.
To compare with the ground-truth provided by Middlebury database objectively, PSNR values are computed between the virtual images and the corresponding ground-truth. The results of PSNR for Middlebury database are shown in Table 1. From the table, the virtual images rendered with the proposed method achieve higher PSNR values compared to the images rendered without preprocessing, Lee’s method [14], and Xu’s method [30]. For example, with the proposed method, we can achieve 2.55 dB gain for Flowerpots as well as 0.17 dB gain for Venus, compared with Xu’s method [30]. The reasons why the proposed method is effective than other methods are as follows: 1) The important foreground information has less geometrical distortion compared with the smoothing based methods. 2) It is reasonable to fill holes using adjacent background texture, since the missing information of the disocclusion region in the synthesized view, which is occluded by the foreground object, is background information. 3) The disocclusion is divided into several small holes by the proposed divided and conquer strategy. Compared with big holes, small holes are easily to be filled, since there are background pixels which are adjacent with the small holes in the horizontal direction and the missing pixels on the small holes are similar with their horizontal neighbors.
It should be noted that the PSNR gain of smoothing based methods sometimes are lower than the results without preprocessing, as smoothing based methods cause geometrical distortion in texture. These methods are generally used for improving the visual effect of the synthesized views or boosting the computational speed. The dilation based method with a sophisticated in-painting method or texture synthesis method can reduce more errors and obtain a higher PSNR gain. It is mainly because the important foreground information has less geometrical distortion compared with the smoothing based methods, and the missing textures located in the disocclusion region are similar with the adjacent background textures. In addition to inheriting the advantages of the dilation based method, the proposed method divides the disocclusion into several small holes and thus makes holes to be filled easily and accurately.
To further evaluate the performance of the proposed method for real word cases, four test sequences, Balloons, Door Flowers, Lovebird1, and Exit [4, 7, 25, 27], are used in our experiments. The original images are shown in Fig. 12. The resolution for Balloons, Door Flowers, and Lovebird1 is 1024*768, and the resolution for Exit is 640*480. The 1st and 3rd viewpoints of Balloons, the 8th and 9th viewpoints of Door Flowers, the 6th and 7th viewpoints of Lovebird1, and the 1st and 2nd viewpoints of Exit are adopted. The PSNR values of the four test sequences are shown in Table 2, and the subjective comparison of different methods are shown in Figs. 13 and 14. From these results, it can be observed that the proposed method can improve the quality of the synthesized view objectively and subjectively.
In order to investigate the additional computation complexity of the proposed method, we performed experiments under the simulation environments of Intel(R) Core(TM) i5-4590 CPU @3.30GHz with 8.0GB memory and 64 bit Windows 7 operating system. The comparisons of running time are shown in the Table 3. It can be observed that the running time of the proposed algorithm is less than Lee’s method, and a little more than Xu’s method, which is acceptable.
4 Conclusion
In this paper, we have presented a novel and effective divide-and-conquer method for handling disocclusion of the synthesized image in single-view rendering. Firstly, a binary mask is used to mark the disocclusion region. Then, the depth pixels located in the disocclusion region are modified by a linear interpolation process. Finally, a median filtering is adopted to remove the isolated depth pixels. With the proposed method, disocclusion regions in the synthesized virtual view are divided into several small holes after DIBR, and are easily filled by image inpainting. Experimental results demonstrate that the proposed method can effectively improve the visual effect of the synthesized view. Furthermore, the proposed method gives higher PSNR values of rendered virtual images compared to previous methods.
References
Bertalmio M, Sapiro G, Caselles V, Ballester C: Image inpainting. Proceedings ACM International Conference on Computer Graphics and Interactive Techniques. New Orleans, America; 2000:417–424.
Chaurasia G, Duchene S, Sorkine-hornung O, Drettakis G: Depth synthesis and local warps for plausible image-based navigation. ACM Trans Graph 2013, 30 (3):1–12.
Chen K-H, Chen C-H, Chang C-H, Liu J-Y, Su C-L: A shape-adaptive low-complexity technique for 3D free-viewpoint visual applications. Circ Syst Signal Process 2015, 34 (2):579–604.
Common test conditions of 3DV core experiments, Document E1100, JCT3V, Vienna, AT, 2013.
Fehn C: Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. SPIE 2004, 5291:93–104.
Feng Y-M., Li D-X, Luo K, Zhang M: Asymmetric bidirectional view synthesis for free viewpoint and three-dimensional video. IEEE Trans Consum Electron 2009, 55 (4):2349–2355.
Feldmann I, Muller M, Zilly F, Tanger R, Mueller K, Smolic A, Kauff P, Wiegand T: HHI Test Material for 3D Video, ISO, Archamps, France, ISO/IEC JTC1/SC29/WG11, Document M15413, 2008.
Hirschmller H, Scharstein D: Evaluation of cost functions for stereo matching. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Minneapolis, America; 2007:1–8.
Huang T, Yang G, Tang G: A fast two-dimensional median filtering algorithm. IEEE Trans Acoust Speech Signal Process 1979, 27 (1):13C18.
Jung S-W: modified model of the just noticeable depth difference and its application to depth sensation enhancement. IEEE Trans Image Process 2013, 22 (10):3892–3903.
Kellnhofer P, Ritschel T, Myszkowski K, Seidel H-P: Optimizing disparity for motion in depth. Comput Graph Forum 2013, 32 (4):143–152.
Kim HG, Ro YM: Multi-view stereoscopic video hole filling considering spatio-temporal consistency and binocular symmetry for synthesized 3D video. IEEE Transactions on Circuits and Systems for Video Technology; 2016. 10.1109/TCSVT.2016.2515360.
Koppel M, Makhlouf MB, Mller M, Ndjiki-Nya P: Temporally consistent adaptive depth map preprocessing for view synthesis. Proceedings of the IEEE International Conference on Visual Communications and Image Processing. Kuching, Malaysia; 2013:1–6.
Lee P-J: Effendi, Nongeometric distortion smoothing approach for depth map preprocessing. IEEE Trans Multimed 2011, 13 (2):246–254.
Lei J, zhang C, Fang Y, Gu Z, Ling N, Hou C: Depth Sensation Enhancement for Multiple Virtual View Rendering. IEEE Trans Multimed 2015, 17 (4):457–469.
Loghman M, Kim J: Segmentation-based view synthesis for multi-view video plus depth. Multimed Tools Appl 2015, 74 (5):1611–1625.
Lu S, Hanca J, Munteanu A, Schelkens P: Depth-based view synthesis using pixel-level image inpainting. Proceedings of the IEEE International Conference on Digital Signal Processing. Fira, Greece; 2013:1–6.
McMillan L, Bishop G: Plenoptic modeling: an image-based rendering system. Proceedings ACM Conference On Special Interest Group on Computer Graphics and Interactive Techniques. New York, USA; 1995:39–46.
Oh K-J, Yea S, Vetro A, Ho Y-S: Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video. Int J Imaging Syst Technol 2010, 20 (4):378– 390.
Po L-M, Zhang S, Xu X, Zhu Y: A new multidirectional extrapolation hole-filling method for depth-image-based rendering. Proceedings of the IEEE International Conference on Image Processing. Brussels, Belgium; 2011:2637–2640.
Reel S, Cheung G, Wong P, Dooley LS: Joint texture-depth pixel inpainting of disocclusion holes in virtual view synthesis. Proceedings of the IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Kaohsiung, Taiwan; 2013:1–7.
Scharstein D, Pal C: Learning conditional random fields for stereo. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Minneapolis, America; 2007:1–8.
Scharstein D, Szeliski R: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 2002, 47 (1-3):7–42.
Solh M, AlRegib G: Hierarchical hole-filling for depth-based view synthesis in FTV and 3D video. IEEE J Sel Topics Signal Process 2012, 6 (5):495–504.
Tanimoto M, Fujii T, Fukushima N: 1D Parallel Test Sequences for MPEG-FTV,ISO, Archamps, France, ISO/IEC JTC1/SC29/WG11, Document M15378, 2008.
Telea A: An image inpainting technique based on the fast marching method. J Graph Tools 2004, 9 (1):25–36.
Vetro A, McGuire M, Matusik W, Behrens A, Lee J, Pfister H: Mitsubishi Electric Research Labs(USA), Multiview video test sequences from MERL, ISO/IEC JTC1/SC29/WG11, m12077, Busan, Korea, 2005.
Wang L, Hou C, Lei J, Yan W: View generation with DIBR for 3D display system. Multimed Tools Appl 2015, 74 (21):9529–9545.
Xu X, Po L-M, Cheung K-W, Ng K-H, Wong K-M, Ting C-W: A foreground biased depth map refinement method for DIBR view synthesis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan; 2012:805–808.
Xu X, Po L-M, Cheung K-W, Ng K-H, Wong K-M, Ting C-W: Depth map misalignment correction and dilation for DIBR view synthesis. Signal Process Image Commun 2013, 28 (9):1023– C1045.
Yao C, Tillo T, Zhao Y, Xiao J, Bai H, Lin C: Depth map driven hole filling algorithm exploiting temporal correlation information. IEEE Trans Broadcast 2014, 60 (2):394–404.
Zhao Y, Zhu C, Chen Z, Tian D, Yu L: Boundary artifact reduction in view synthesis of 3D video: from perspective of texture-depth alignment. IEEE Trans Broadcast 2011, 57 (2):510–522.
Acknowledgments
This research was partially supported by the Natural Science Foundation of China (Nos. 61271324, 61520106002, 61471262, 91320201, and 61471260).
Author information
Authors and Affiliations
Corresponding author
Additional information
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Lei, J., Zhang, C., Wu, M. et al. A divide-and-conquer hole-filling method for handling disocclusion in single-view rendering. Multimed Tools Appl 76, 7661–7676 (2017). https://doi.org/10.1007/s11042-016-3413-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3413-3