Abstract
Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Zhang, P. F.; Lan, C. L.; Zeng, W. J.; Xing, J. L.; Xue, J. R.; Zheng, N. N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1109–1118, 2020.
Chen, Z.; Li, S. C.; Yang, B.; Li, Q. H.; Liu, H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 2, 1113–1122, 2021.
Gui, L. Y.; Wang, Y. X.; Liang, X. D.; Moura, J. M. F. Adversarial geometry-aware human motion prediction. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 823–842, 2018.
Wang, T. C.; Liu, M. Y.; Tao, A.; Liu, G. L.; Kautz, J.; Catanzaro, B. Few-shot video-to-video synthesis. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 451, 5013–5024, 2019.
Taylor, G. W.; Hinton, G. E. Factored conditional restricted Boltzmann Machines for modeling motion style. In: Proceedings of the 26th Annual International Conference on Machine Learning, 1025–1032, 2009.
Fragkiadaki, K.; Levine, S.; Felsen, P.; Malik, J. Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, 4346–4354, 2015.
Martinez, J.; Black, M. J.; Romero, J. On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4674–4683, 2017.
Yan, X. C.; Rastogi, A.; Villegas, R.; Sunkavalli, K.; Shechtman, E.; Hadap, S.; Yumer, E.; Lee, H. MTVAE: Learning motion transformations to generate multimodal human dynamics. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11209. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 276–293, 2018.
Ling, H. Y.; Zinno, F.; Cheng, G.; Van De Panne, M. Character controllers using motion VAEs. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 40, 2020.
Ghosh, A.; Cheema, N.; Oguz, C.; Theobalt, C.; Slusallek, P. Synthesis of compositional animations from textual descriptions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1376–1386, 2021.
Wang, Z. Y.; Chai, J. X.; Xia, S. H. Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE Transactions on Visualization and Computer Graphics Vol. 27, No. 1, 14–28, 2021.
Barsoum, E.; Kender, J.; Liu, Z. C. HP-GAN: Probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1499–149909, 2018.
Kundu, J. N.; Gor, M.; Babu, R. V. BiHMP-GAN: Bidirectional 3D human motion prediction GAN. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 8553–8560, 2019.
Wang, Z. Y.; Yu, P.; Zhao, Y.; Zhang, R. Y.; Zhou, Y. F.; Yuan, J. S.; Chen, C. Y. Learning diverse stochastic human-action generators by learning smooth latent transitions. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12281–12288, 2020.
Liu, Z. G.; Lyu, K. D.; Wu, S.; Chen, H. P.; Hao, Y. B.; Ji, S. L. Aggregated multi-GANs for controlled 3D human motion prediction. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 2225–2232, 2021.
Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 42, 2017.
Starke, S.; Zhao, Y. W.; Komura, T.; Zaman, K. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 54, 2020.
Starke, S.; Zhao, Y. W.; Zinno, F.; Komura, T. Neural animation layering for synthesizing martial arts movements. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 92, 2021.
Van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A generative model for raw audio. In: Proceedings of the 9th ISCA Speech Synthesis Workshop, 125, 2016.
Wang, X.; Chen, Q. D.; Wang, W. L. 3D human motion editing and synthesis: A survey. Computational and Mathematical Methods in Medicine Vol. 2014, 104535, 2014.
Xia, S. H.; Gao, L.; Lai, Y. K.; Yuan, M. Z.; Chai, J. X. A survey on human performance capture and animation. Journal of Computer Science and Technology Vol. 32, No. 3, 536–554, 2017.
Brand, M.; Hertzmann, A. Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 183–192, 2000.
Xu, J. W.; Xu, H. Z.; Ni, B. B.; Yang, X. K.; Wang, X. L.; Darrell, T. Hierarchical style-based networks for motion synthesis. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12356. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 178–194, 2020.
Xia, S. H.; Wang, C. Y.; Chai, J. X.; Hodgins, J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 119, 2015.
Wen, Y. H.; Yang, Z. P.; Fu, H. B.; Gao, L.; Sun, Y. N.; Liu, Y. J. Autoregressive stylized motion synthesis with generative flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13607–13607, 2021.
Aberman, K.; Li, P. Z.; Lischinski, D.; Sorkine-Hornung, O.; Cohen-Or, D.; Chen, B. Q. Skeletonaware networks for deep motion retargeting. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 62, 2020.
Min, J. Y.; Liu, H. J.; Chai, J. X. Synthesis and editing of personalized stylistic human motion. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 39–46, 2010.
Nie, Q.; Liu, Z. W.; Liu, Y. H. Unsupervised 3D human pose representation with viewpoint and pose disentanglement. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12364. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 102–118, 2020.
Corona, E.; Pumarola, A.; Alenyà, G.; Moreno-Noguer, F. Context-aware human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6990–6999, 2020.
Ghorbani, S.; Wloka, C.; Etemad, A.; Brubaker, M. A.; Troje, N. F. Probabilistic character motion synthesis using a hierarchical deep latent variable model. Computer Graphics Forum Vol. 39, No. 8, 225–239, 2020.
Holden, D.; Saito, J.; Komura, T. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 138, 2016.
Mao, W.; Liu, M. M.; Salzmann, M. History repeats itself: Human motion prediction via motion attention. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12359. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 474–489, 2020.
Li, M. S.; Chen, S. H.; Zhao, Y. H.; Zhang, Y.; Wang, Y. F.; Tian, Q. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 211–220, 2020.
Cui, Q. J.; Sun, H. J.; Yang, F. Learning dynamic relationships for 3D human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6518–6526, 2020.
Yuan, Y.; Kitani, K. M. Diverse trajectory forecasting with determinantal point processes. In: Proceedings of the 8th International Conference on Learning Representations, 2020.
Petrovich, M.; Black, M. J.; Varol, G. Action-conditioned 3D human motion synthesis with transformer VAE. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10965–10975, 2021.
Henter, G. E.; Alexanderson, S.; Beskow, J. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 236, 2020.
Valle-Perez, G.; Henter, G. E.; Beskow, J.; Holzapfel, A.; Oudeyer, P. Y.; Alexanderson, S. Transflower: Probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics Vol. 40, No. 6, Article No. 195, 2021.
Grassia, F. S. Practical parameterization of rotations using the exponential map. Journal of Graphics Tools Vol. 3, No. 3, 29–48, 1998.
Lee, K.; Lee, S.; Lee, J. Interactive character animation by learning multi-objective control. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 180, 2018.
Ghosh, P.; Song, J.; Aksan, E.; Hilliges, O. Learning human motion models for long-term predictions. In: Proceedings of the International Conference on 3D Vision, 458–466, 2017.
Graves, A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
Buss, S. R. Introduction to inverse kinematics with Jacobian transpose, pseudoinverse and damped least squares methods. IEEE Journal of Robotics and Automation Vol. 17, 16, 2004.
Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 1325–1339, 2014.
Pavllo, D.; Feichtenhofer, C.; Auli, M.; Grangier, D. Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision Vol. 128, No. 4, 855–872, 2020.
Acknowledgements
We thank the anonymous reviewers for their constructive comments. Weiwei Xu is partially supported by the National Natural Science Foundation of China (No. 61732016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article. The author Hujun Bao is the Associate Editor of this journal.
Additional information
Shuaiying Hou is currently a Ph.D. student at State Key Lab of CAD&CG at Zhejiang University. He received his B.S. degree in software engineering from Northwestern Polytechnical University, in 2017. His research interests include computer animation and computer graphics.
Congyi Wang received his Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Sciences in 2017. Since 2018, he has been a research scientist at Xmov, a startup company aiming at AI-powered virtual production. His research interests include computer animation, computer graphics, computer vision, and speech signal processing.
Wenlin Zhuang received his M.S. degree from the School of Automation, Southeast University, in 2021. His research interests include human pose estimation and 3D human animation.
Yu Chen received his master degree in computer science from Zhejiang University, in 2016. His research interests include computer animation, computer graphics, computer vision, and speech signal processing.
Yangang Wang is currently an asso-ciate professor in the School of Automation at Southeast University. He received his Ph.D. degree from Department of Automation at Tsinghua University, in 2014. His research interests include motion capture and animation, 3D reconstruction, and image processing.
Hujun Bao is a Cheung Kong professor in the School of Computer Science and Technology in Zhejiang University, and the director of State Key Laboratory of CAD&CG. He received his B.S. and Ph.D. degrees in applied mathematics from Zhejiang University in 1987 and 1993, respectively. His research interests include geometry computing, vision computing, real-time rendering and virtual reality.
Jinxiang Chai received his Ph.D. degree in computer science from Carnegie Mellon University. He is currently an associate professor in the Department of Computer Science and Engineering at Texas A&M University. His primary research is in the area of computer graphics and vision with broad applications in other disciplines such as virtual and augmented reality, robotics, human computer interaction, and biomechanics. He received an NSF CAREER award for his work on theory and practice of Bayesian motion synthesis.
Weiwei Xu is currently a professor at the State Key Lab of CAD&CG in Zhejiang University. He was a Qianjiang Professor at Hangzhou Normal University and a researcher in the Internet Graphics Group at Microsoft Research Asia from 2005 to 2012. He was a post-doc researcher at Ritsmeikan University in Japan for over one year. He received his Ph.D. degree in computer graphics from Zhejiang University, and B.S. and master degrees in computer science from Hohai University in 1996 and 1999, respectively.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Hou, S., Wang, C., Zhuang, W. et al. A causal convolutional neural network for multi-subject motion modeling and generation. Comp. Visual Media 10, 45–59 (2024). https://doi.org/10.1007/s41095-022-0307-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-022-0307-3