Abstract
We present a system for the removal of objects from videos. As input, the system only needs a user to draw a few strokes on the first frame, roughly delimiting the objects to be removed. To the best of our knowledge, this is the first system allowing the semi-automatic removal of objects from videos with complex backgrounds. The key steps of our system are the following: after initialization, segmentation masks are first refined and then automatically propagated through the video. Missing regions are then synthesized using video inpainting techniques. Our system can deal with multiple, possibly crossing objects, with complex motions, and with dynamic textures. This results in a computational tool that can alleviate tedious manual operations for editing high-quality videos.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bai, X.; Wang, J.; Simons, D.; Sapiro, G. Video SnapCut: Robust video object cutout using localized classifiers. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 70, 2009.
Le, T. T.; Almansa, A.; Gousseau, Y.; Masnou, S. Removing objects from videos with a few strokes. In: Proceedings of the SIGGRAPH Asia Technical Briefs, Article No. 22, 2018.
Wang, S.; Lu, H.; Yang, F.; Yang, M.-H. Superpixel tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1323–1330, 2011.
Levinkov, E.; Tompkin, J.; Bonneel, N.; Kirchhoff S.; Andres, B.; Pfister, H. Interactive multicut video segmentation. In: Proceedings of the 24th Pacific Conference on Computer Graphics and Applications: Short Papers, 33–38, 2016.
Marki, N.; Perazzi, F.; Wang, O.; Sorkine-Hornung, A. Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 743–751, 2016.
Nagaraja, N. S.; Schmidt, F. R.; Brox, T. Video segmentation with just a few strokes. In: Proceedings of the IEEE International Conference on Computer Vision, 3235–3243, 2015.
Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.
Caelles, S.; Maninis, K. K.; Pont-Tuset, J.; Leal-Taixé, L.; Cremers, D.; van Gool, L. One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5320–5329, 2017.
Perazzi, F.; Khoreva, A.; Benenson, R.; Schiele, B.; Sorkine-Hornung, A. Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3491–3500, 2017.
Caelles, S.; Chen, Y.; Pont-Tuset, J.; van Gool, L. Semantically-guided video object segmentation. arXiv preprint arXiv:1704.01926, 2017.
Dai, J. F.; He, K. M.; Sun, J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150–3158, 2016.
Lee, Y. J.; Kim, J.; Grauman, K. Key-segments for video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1995–2002, 2011.
Papazoglou, A.; Ferrari, V. Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, 1777–1784, 2013.
Yang, Y. C.; Sundaramoorthi, G.; Soatto, S. Self-occlusions and disocclusions in causal video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 4408–4416 2015.
Colombari, A.; Fusiello, A.; Murino, V. Segmentation and tracking of multiple video objects. Pattern Recognition Vol. 40, No. 4, 1307–1317, 2007.
Li, F. X.; Kim, T.; Humayun, A.; Tsai, D.; Rehg, J. M. Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, 2192–2199, 2013.
Seguin, G.; Bojanowski, P.; Lajugie, R.; Laptev, I. Instance-level video segmentation from object tracks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3678–3687, 2016.
Drayer B.; Brox, T. Object detection, tracking, and motion segmentation for object-level video segmentation.arXiv preprint arXiv:1608.03066, 2016.
Pont-Tuset, J.; Perazzi, F.; Caelles, S.; Arbeláez, P.; Sorkine-Hornung, A.; van Gool, L. The 2017 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.
Voigtlaender, P.; Leibe, B. Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
Khoreva, A.; Benenson, R.; Ilg, E.; Brox, T.; Schiele, B. Lucid data dreaming for object tracking. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
Tokmakov, P.; Alahari, K.; Schmid, C. Learning motion patterns in videos. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, 3386–3394, 2017.
Li, X.; Qi, Y.; Wang, Z.; Chen, K.; Liu, Z.; Shi, J.; Luo, P.; Tang, X.; Loy, C. C. Video object segmentation with re-identification. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
Hu, Y.-T.; Huang, J.-B.; Schwing, A. MaskRNN: Instance level video object segmentation. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 324–333, 2017.
Xu, N.; Yang, L.; Fan, Y.; Yue, D.; Liang, Y.; Yang, J.; Huang, T. YouTube-VOS: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
Luiten, J.; Voigtlaender, P.; Leibe, B. PReMVOS: Proposal-generation, refinement and merging for video object segmentation. In: Computer Vision — ACCV 2018. Lecture Notes in Computer Science, Vol. 11364. Jawahar, C.; Li, H.; Mori, G.; Schindler, K. Eds. Springer Cham, 565–580, 2019.
Caelles, S.; Montes, A.; Maninis, K.-K.; Chen, Y.; van Gool, L.; Perazzi, F.; Pont-Tuset, J. The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557, 2018.
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961–2969, 2017.
Xu, N.; Price, B.; Cohen, S.; Yang, J.; Huang, T. Deep grabcut for object selection. arXiv preprint arXiv:1707.00243, 2017.
Chen, L. C.; Zhu, Y. K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 833–851, 2018.
Yang, L.; Wang, Y.; Xiong, X.; Yang, J.; Katsaggelos, A. K. Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6499–6507, 2018.
Chen, Y. H.; Pont-Tuset, J.; Montes, A.; van Gool, L. Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1189–1198, 2018.
Cheng, J.; Tsai, Y.-H.; Hung, W.-C.; Wang, S.; Yang, M.-H. Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7415–7424, 2018.
Oh, S. W.; Lee, J. Y.; Sunkavalli, K.; Kim, S. J. Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7376–7385, 2018.
Leake, M.; Davis, A.; Truong, A.; Agrawala, M. Computational video editing for dialogue-driven scenes. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 130, 2017.
Cui, Z. P.; Wang, O.; Tan, P.; Wang, J. Time slice video synthesis by robust video alignment. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 131, 2017.
Bonneel, N.; Sunkavalli, K.; Tompkin, J.; Sun, D. Q.; Paris, S.; Pfister, H. Interactive intrinsic video editing. ACM Transactions on Graphics Vol. 33, No. 6, 1–10, 2014.
Zhang, F. L.; Wu, X.; Zhang, H. T.; Wang, J.; Hu, S. M. Robust background identification for dynamic video editing. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 197, 2016.
Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228–242, 2008.
Chuang, Y. Y.; Agarwala, A.; Curless, B.; Salesin, D. H.; Szeliski, R. Video matting of complex scenes. ACM Transactions on Graphics Vol. 21, No. 3, 243–248, 2002.
Aksoy, Y.; Oh, T. H.; Paris, S.; Pollefeys, M.; Matusik, W. Semantic soft segmentation. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 72, 2018.
Masnou, S.; Morel, J.-M. Level lines based disocclusion. In: Proceedings of the International Conference on Image Processing, Vol. 3, 259–263, 1998.
Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 417–424, 2000.
Drori, I.; Cohen-Or, D.; Yeshurun, H. Fragment-based image completion. ACM Transactions on Graphics Vol. 22, No. 3, 303–312, 2003.
Criminisi, A.; Perez, P.; Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing Vol. 13, No. 9, 1200–1212, 2004.
Efros, A. A.; Leung, T. K. Texture synthesis by nonparametric sampling. In: Proceedings of the 7th IEEE International Conference on Computer Vision, 1033–1038, 1999.
Patwardhan, K. A.; Sapiro, G.; Bertalmio, M. Video inpainting of occluding and occluded objects. In: Proceedings of the IEEE International Conference on Image Processing, II–69, 2005.
Patwardhan, K. A.; Sapiro, G.; Bertalmio, M. Video inpainting under constrained camera motion. IEEE Transactions on Image Processing Vol. 16, No. 2, 545–553, 2007.
Granados, M.; Kim, K. I.; Tompkin, J.; Kautz, J.; Theobalt, C. Background inpainting for videos with dynamic objects and a free-moving camera. In: Computer Vision — ECCV 2012. Lecture Notes in Computer Science, Vol. 7572. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 682–695, 2012.
Herling, J.; Broll, W. PixMix: A real-time approach to high-quality diminished reality. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 141–150, 2012.
Jia, J. Y.; Tai, Y. W.; Wu, T. P.; Tang, C. K. Video repairing under variable illumination using cyclic motions. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 5, 832–839, 2006.
Grossauer, H. Inpainting of movies using optical flow. In:Mathematical Models for Registration and Applications to Medical Imaging. Mathematics in Industry, Vol. 10. Scherzer, O. Ed. Springer Berlin Heidelberg, 151–162, 2006.
Matsushita, Y.; Ofek, E.; Ge, W. N.; Tang, X. O.; Shum, H. Y. Full-frame video stabilization with motion inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 7, 1150–1163, 2006.
Shiratori, T.; Matsushita, Y.; Tang, X.; Kang, S. B. Video completion by motion field transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 411–418, 2006.
Tang, N. C.; Hsu, C. T.; Su, C. W.; Shih, T. K.; Liao, H. Y. M. Video inpainting on digitized vintage films via maintaining spatiotemporal continuity. IEEE Transactions on Multimedia Vol. 13, No. 4, 602–614, 2011.
You, S.; Tan, R. T.; Kawakami, R.; Ikeuchi, K. Robust and fast motion estimation for video completion. In: Proceedings of the MVA, 181–184, 2013.
Bokov, A.; Vatolin, D. 100+ times faster video completion by optical-flow-guided variational refinement. In: Proceedings of the 25th IEEE International Conference on Image Processing 2122–2126, 2018.
Wexler, Y.; Shechtman, E.; Irani, M. Space-time completion of video. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 29, No. 3, 463–476, 2007.
Newson, A.; Almansa, A.; Fradet, M.; Gousseau, Y.; Pérez, P. Video inpainting of complex scenes. SIAM Journal on Imaging Sciences Vol. 7, No. 4, 1993–2019, 2014.
Granados, M.; Tompkin, J.; Kim, K.; Grau, O.; Kautz, J.; Theobalt, C. How not to be seen—Object removal from videos of crowded scenes. Computer Graphics Forum Vol. 31, No. 2pt1, 219–228, 2012.
Huang, J. B.; Kang, S. B.; Ahuja, N.; Kopf, J. Temporally coherent completion of dynamic video. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 196, 2016.
Le, T. T.; Almansa, A.; Gousseau, Y.; Masnou, S. Demonstration abstract: Motion-consistent video inpainting. In: Proceedings of the IEEE International Conference on Image Processing, 4587, 2017.
Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A. A. Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544, 2016.
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.
Vo, H. V.; Duong, N. Q. K.; Pérez, P. Structural inpainting. In: Proceedings of the 26th ACM International Conference on Multimedia, 1948–1956, 2018.
Jain, S. D.; Grauman, K. Click carving: Segmenting objects in video with point clicks. In: Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing, 89–98, 2016.
Xie, S. N.; Tu, Z. W. Holistically-nested edge detection. International Journal of Computer Vision Vol. 125, Nos. 1–3, 3–18, 2017.
Meyer, F. Topographic distance and watershed lines. Signal Processing Vol. 38, No. 1, 113–125, 1994.
Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.
Yang, H. X.; Shao, L.; Zheng, F.; Wang, L.; Song, Z. Recent advances and trends in visual tracking: A review. Neurocomputing Vol. 74, No. 18, 3823–3831, 2011.
Ramakanth, S. A.; Babu, R. V. SeamSeg: Video object segmentation using patch seams. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 376–383, 2014.
Tsai, Y. H.; Yang, M. H.; Black, M. J. Video segmentation via object flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3899–3908, 2016.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
Everingham, M.; Eslami, S. M. A.; van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision Vol. 111, No. 1, 98–136, 2015.
Newson, A.; Almansa, A.; Gousseau, Y.; Pérez, P. Nonlocal patch-based image inpainting. Image Processing on Line Vol. 7, 373–385, 2017.
Pérez, P.; Gangnet, M.; Blake, A. Poisson image editing. ACM Transactions on Graphics Vol. 22, No. 3, 313–318, 2003.
Korman, S.; Avidan, S. Coherency sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 6, 1099–1112, 2016.
Ramakanth, S. A.; Babu, R. V. FeatureMatch: A general ANNF estimation technique and its applications. IEEE Transactions on Image Processing Vol. 23, No. 5, 2193–2205, 2014.
Dehghan, A.; Assari, S. M.; Shah, M. GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4091–4099, 2015.
Roshan Zamir, A.; Dehghan, A.; Shah, M. GMCPtracker: Global multi-object tracking using generalized minimum clique graphs. In: Computer Vision — ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 343–356, 2012.
Bay, H.; Ess, A.; Tuytelaars, T.; van Gool, L. Speeded-up robust features (SURF). Computer Vision and Image Understanding Vol. 110, No. 3, 346–359, 2008.
Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.
Xu, B. B.; Pathak, S.; Fujii, H.; Yamashita, A.; Asama, H. Spatio-temporal video completion in spherical image sequences. IEEE Robotics and Automation Letters Vol. 2, No. 4, 2032–2039, 2017.
Odobez, J. M.; Bouthemy, P. Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation Vol. 6, No. 4, 348–365, 1995.
Sánchez, J. Comparison of motion smoothing strategies for video stabilization using parametric models. Image Processing on Line Vol. 7, 309–346, 2017.
Choi, S.; Kim, T.; Yu, W. Robust video stabilization to outlier motion using adaptive RANSAC. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 1897–1902, 2009.
Chiu, W. C.; Fritz, M. Multi-class video co-segmentation with a generative multi-video model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 321–328, 2013.
Yang, M. Y.; Reso, M.; Tang, J.; Liao, W. T.; Rosenhahn, B. Temporally object-based video cosegmentation. In: Advances in Visual Computing. Lecture Notes in Computer Science, Vol. 9474. Bebis, G. et al. Eds. Springer Cham, 198–209, 2015.
Acknowledgements
We gratefully acknowledge the support of NVIDIA who donated a Titan Xp GPU used for this research. This work was funded by the French Research Agency (ANR) under Grant No. ANR-14-CE27-001 (MIRIAM).
Author information
Authors and Affiliations
Corresponding author
Additional information
Thuc Trinh Le is a Ph.D. candidate in computer science and applied mathematics at the LTCI Lab of Telecom ParisTech, Paris-Saclay University, France. His research is devoted to the development of machine learning techniques to address advanced problems in video editing, video segmentation, and video reconstruction.
Andrés Almansa has been a CNRS Research Director at Université Paris Descartes (France) since 2016. He received his M.Sc. and Ph.D. degrees from ENS Cachan (1999, 2002), and M.Sc. and Engineering degrees from Universidad de la República (1995, 1998). He has previously worked at Telecom ParisTech, ENS Cachan (France), Universitat Pompeu Fabra (Spain), and Universidad de la República (Uruguay). His current research interests include image restoration and analysis, subpixel stereovision and applications to earth observation, high quality digital photography, and film editing and restoration.
Yann Gousseau received his engineering degree from the cole Centrale de Paris, France, in 1995, and Ph.D. degree in applied mathematics from the University of Paris-Dauphine in 2000. He is currently a professor at Telecom ParisTech. His research interests include mathematical modeling of natural images and textures, stochastic geometry, computational photography, computer vision, and image and video processing.
Simon Masnou is a professor in mathematics at Claude-Bernard Lyon 1 University (France) and head of the Camille Jordan Institute. His research interests include image processing, shape optimization, calculus of variations, and geometric measure theory.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Le, T.T., Almansa, A., Gousseau, Y. et al. Object removal from complex videos using a few annotations. Comp. Visual Media 5, 267–291 (2019). https://doi.org/10.1007/s41095-019-0145-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-019-0145-0