A causal convolutional neural network for multi-subject motion modeling and generation

Hou, Shuaiying; Wang, Congyi; Zhuang, Wenlin; Chen, Yu; Wang, Yangang; Bao, Hujun; Chai, Jinxiang; Xu, Weiwei

doi:10.1007/s41095-022-0307-3

A causal convolutional neural network for multi-subject motion modeling and generation

Research Article
Open access
Published: 30 November 2023

Volume 10, pages 45–59, (2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

A causal convolutional neural network for multi-subject motion modeling and generation

Download PDF

Shuaiying Hou¹,
Congyi Wang²,
Wenlin Zhuang³,
Yu Chen¹,
Yangang Wang³,
Hujun Bao¹,
Jinxiang Chai² &
…
Weiwei Xu¹

787 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.

Article PDF

Hierarchical learning recurrent neural networks for 3D motion synthesis

Article 29 March 2021

Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses

Toward Continuous-Time Representations of Human Motion

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Zhang, P. F.; Lan, C. L.; Zeng, W. J.; Xing, J. L.; Xue, J. R.; Zheng, N. N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1109–1118, 2020.
Chen, Z.; Li, S. C.; Yang, B.; Li, Q. H.; Liu, H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 2, 1113–1122, 2021.
Article Google Scholar
Gui, L. Y.; Wang, Y. X.; Liang, X. D.; Moura, J. M. F. Adversarial geometry-aware human motion prediction. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 823–842, 2018.
Google Scholar
Wang, T. C.; Liu, M. Y.; Tao, A.; Liu, G. L.; Kautz, J.; Catanzaro, B. Few-shot video-to-video synthesis. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 451, 5013–5024, 2019.
Taylor, G. W.; Hinton, G. E. Factored conditional restricted Boltzmann Machines for modeling motion style. In: Proceedings of the 26th Annual International Conference on Machine Learning, 1025–1032, 2009.
Fragkiadaki, K.; Levine, S.; Felsen, P.; Malik, J. Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, 4346–4354, 2015.
Martinez, J.; Black, M. J.; Romero, J. On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4674–4683, 2017.
Yan, X. C.; Rastogi, A.; Villegas, R.; Sunkavalli, K.; Shechtman, E.; Hadap, S.; Yumer, E.; Lee, H. MTVAE: Learning motion transformations to generate multimodal human dynamics. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11209. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 276–293, 2018.
Google Scholar
Ling, H. Y.; Zinno, F.; Cheng, G.; Van De Panne, M. Character controllers using motion VAEs. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 40, 2020.
Ghosh, A.; Cheema, N.; Oguz, C.; Theobalt, C.; Slusallek, P. Synthesis of compositional animations from textual descriptions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1376–1386, 2021.
Wang, Z. Y.; Chai, J. X.; Xia, S. H. Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE Transactions on Visualization and Computer Graphics Vol. 27, No. 1, 14–28, 2021.
Article Google Scholar
Barsoum, E.; Kender, J.; Liu, Z. C. HP-GAN: Probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1499–149909, 2018.
Kundu, J. N.; Gor, M.; Babu, R. V. BiHMP-GAN: Bidirectional 3D human motion prediction GAN. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 8553–8560, 2019.
Article Google Scholar
Wang, Z. Y.; Yu, P.; Zhao, Y.; Zhang, R. Y.; Zhou, Y. F.; Yuan, J. S.; Chen, C. Y. Learning diverse stochastic human-action generators by learning smooth latent transitions. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12281–12288, 2020.
Article Google Scholar
Liu, Z. G.; Lyu, K. D.; Wu, S.; Chen, H. P.; Hao, Y. B.; Ji, S. L. Aggregated multi-GANs for controlled 3D human motion prediction. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 2225–2232, 2021.
Article Google Scholar
Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 42, 2017.
Starke, S.; Zhao, Y. W.; Komura, T.; Zaman, K. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 54, 2020.
Starke, S.; Zhao, Y. W.; Zinno, F.; Komura, T. Neural animation layering for synthesizing martial arts movements. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 92, 2021.
Van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A generative model for raw audio. In: Proceedings of the 9th ISCA Speech Synthesis Workshop, 125, 2016.
Wang, X.; Chen, Q. D.; Wang, W. L. 3D human motion editing and synthesis: A survey. Computational and Mathematical Methods in Medicine Vol. 2014, 104535, 2014.
Article Google Scholar
Xia, S. H.; Gao, L.; Lai, Y. K.; Yuan, M. Z.; Chai, J. X. A survey on human performance capture and animation. Journal of Computer Science and Technology Vol. 32, No. 3, 536–554, 2017.
Article Google Scholar
Brand, M.; Hertzmann, A. Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 183–192, 2000.
Xu, J. W.; Xu, H. Z.; Ni, B. B.; Yang, X. K.; Wang, X. L.; Darrell, T. Hierarchical style-based networks for motion synthesis. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12356. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 178–194, 2020.
Google Scholar
Xia, S. H.; Wang, C. Y.; Chai, J. X.; Hodgins, J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 119, 2015.
Wen, Y. H.; Yang, Z. P.; Fu, H. B.; Gao, L.; Sun, Y. N.; Liu, Y. J. Autoregressive stylized motion synthesis with generative flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13607–13607, 2021.
Aberman, K.; Li, P. Z.; Lischinski, D.; Sorkine-Hornung, O.; Cohen-Or, D.; Chen, B. Q. Skeletonaware networks for deep motion retargeting. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 62, 2020.
Min, J. Y.; Liu, H. J.; Chai, J. X. Synthesis and editing of personalized stylistic human motion. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 39–46, 2010.
Nie, Q.; Liu, Z. W.; Liu, Y. H. Unsupervised 3D human pose representation with viewpoint and pose disentanglement. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12364. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 102–118, 2020.
Google Scholar
Corona, E.; Pumarola, A.; Alenyà, G.; Moreno-Noguer, F. Context-aware human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6990–6999, 2020.
Ghorbani, S.; Wloka, C.; Etemad, A.; Brubaker, M. A.; Troje, N. F. Probabilistic character motion synthesis using a hierarchical deep latent variable model. Computer Graphics Forum Vol. 39, No. 8, 225–239, 2020.
Article Google Scholar
Holden, D.; Saito, J.; Komura, T. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 138, 2016.
Mao, W.; Liu, M. M.; Salzmann, M. History repeats itself: Human motion prediction via motion attention. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12359. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 474–489, 2020.
Google Scholar
Li, M. S.; Chen, S. H.; Zhao, Y. H.; Zhang, Y.; Wang, Y. F.; Tian, Q. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 211–220, 2020.
Cui, Q. J.; Sun, H. J.; Yang, F. Learning dynamic relationships for 3D human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6518–6526, 2020.
Yuan, Y.; Kitani, K. M. Diverse trajectory forecasting with determinantal point processes. In: Proceedings of the 8th International Conference on Learning Representations, 2020.
Petrovich, M.; Black, M. J.; Varol, G. Action-conditioned 3D human motion synthesis with transformer VAE. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10965–10975, 2021.
Henter, G. E.; Alexanderson, S.; Beskow, J. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 236, 2020.
Valle-Perez, G.; Henter, G. E.; Beskow, J.; Holzapfel, A.; Oudeyer, P. Y.; Alexanderson, S. Transflower: Probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics Vol. 40, No. 6, Article No. 195, 2021.
Grassia, F. S. Practical parameterization of rotations using the exponential map. Journal of Graphics Tools Vol. 3, No. 3, 29–48, 1998.
Article Google Scholar
Lee, K.; Lee, S.; Lee, J. Interactive character animation by learning multi-objective control. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 180, 2018.
Ghosh, P.; Song, J.; Aksan, E.; Hilliges, O. Learning human motion models for long-term predictions. In: Proceedings of the International Conference on 3D Vision, 458–466, 2017.
Graves, A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
Buss, S. R. Introduction to inverse kinematics with Jacobian transpose, pseudoinverse and damped least squares methods. IEEE Journal of Robotics and Automation Vol. 17, 16, 2004.
Google Scholar
Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 1325–1339, 2014.
Article Google Scholar
Pavllo, D.; Feichtenhofer, C.; Auli, M.; Grangier, D. Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision Vol. 128, No. 4, 855–872, 2020.
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments. Weiwei Xu is partially supported by the National Natural Science Foundation of China (No. 61732016).

Author information

Authors and Affiliations

State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058, China
Shuaiying Hou, Yu Chen, Hujun Bao & Weiwei Xu
Xmov, Shanghai, 200030, China
Congyi Wang & Jinxiang Chai
School of Automation, Southeast University, Nanjing, 210096, China
Wenlin Zhuang & Yangang Wang

Authors

Shuaiying Hou
View author publications
You can also search for this author in PubMed Google Scholar
Congyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenlin Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yangang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hujun Bao
View author publications
You can also search for this author in PubMed Google Scholar
Jinxiang Chai
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiwei Xu.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article. The author Hujun Bao is the Associate Editor of this journal.

Additional information

Shuaiying Hou is currently a Ph.D. student at State Key Lab of CAD&CG at Zhejiang University. He received his B.S. degree in software engineering from Northwestern Polytechnical University, in 2017. His research interests include computer animation and computer graphics.

Congyi Wang received his Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Sciences in 2017. Since 2018, he has been a research scientist at Xmov, a startup company aiming at AI-powered virtual production. His research interests include computer animation, computer graphics, computer vision, and speech signal processing.

Wenlin Zhuang received his M.S. degree from the School of Automation, Southeast University, in 2021. His research interests include human pose estimation and 3D human animation.

Yu Chen received his master degree in computer science from Zhejiang University, in 2016. His research interests include computer animation, computer graphics, computer vision, and speech signal processing.

Yangang Wang is currently an asso-ciate professor in the School of Automation at Southeast University. He received his Ph.D. degree from Department of Automation at Tsinghua University, in 2014. His research interests include motion capture and animation, 3D reconstruction, and image processing.

Hujun Bao is a Cheung Kong professor in the School of Computer Science and Technology in Zhejiang University, and the director of State Key Laboratory of CAD&CG. He received his B.S. and Ph.D. degrees in applied mathematics from Zhejiang University in 1987 and 1993, respectively. His research interests include geometry computing, vision computing, real-time rendering and virtual reality.

Jinxiang Chai received his Ph.D. degree in computer science from Carnegie Mellon University. He is currently an associate professor in the Department of Computer Science and Engineering at Texas A&M University. His primary research is in the area of computer graphics and vision with broad applications in other disciplines such as virtual and augmented reality, robotics, human computer interaction, and biomechanics. He received an NSF CAREER award for his work on theory and practice of Bayesian motion synthesis.

Weiwei Xu is currently a professor at the State Key Lab of CAD&CG in Zhejiang University. He was a Qianjiang Professor at Hangzhou Normal University and a researcher in the Internet Graphics Group at Microsoft Research Asia from 2005 to 2012. He was a post-doc researcher at Ritsmeikan University in Japan for over one year. He received his Ph.D. degree in computer graphics from Zhejiang University, and B.S. and master degrees in computer science from Hohai University in 1996 and 1999, respectively.

Electronic Supplementary Material

A causal convolutional neural network for multi-subject motion modeling and generation

Supplementary material, approximately 19.8 MB.

Supplementary material, approximately 4 MB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Hou, S., Wang, C., Zhuang, W. et al. A causal convolutional neural network for multi-subject motion modeling and generation. Comp. Visual Media 10, 45–59 (2024). https://doi.org/10.1007/s41095-022-0307-3

Download citation

Received: 25 May 2022
Accepted: 02 August 2022
Published: 30 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s41095-022-0307-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A causal convolutional neural network for multi-subject motion modeling and generation

Abstract

Article PDF

Similar content being viewed by others

Hierarchical learning recurrent neural networks for 3D motion synthesis

Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses

Toward Continuous-Time Representations of Human Motion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Electronic Supplementary Material

A causal convolutional neural network for multi-subject motion modeling and generation

Supplementary material, approximately 19.8 MB.

Supplementary material, approximately 4 MB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A causal convolutional neural network for multi-subject motion modeling and generation

Abstract

Article PDF

Similar content being viewed by others

Hierarchical learning recurrent neural networks for 3D motion synthesis

Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses

Toward Continuous-Time Representations of Human Motion

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Electronic Supplementary Material

A causal convolutional neural network for multi-subject motion modeling and generation

Supplementary material, approximately 19.8 MB.

Supplementary material, approximately 4 MB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation