Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

d’Alessandro, Nicolas; Tilmanne, Joëlle; Astrinaki, Maria; Hueber, Thomas; Dall, Rasmus; Ravet, Thierry; Moinet, Alexis; Cakmak, Huseyin; Babacan, Onur; Barbulescu, Adela; Parfait, Valentin; Huguenin, Victor; Kalaycı, Emine Sümeyye; Hu, Qiong

doi:10.1007/978-3-642-55143-7_2

Nicolas d’Alessandro²,
Joëlle Tilmanne²,
Maria Astrinaki²,
Thomas Hueber³,
Rasmus Dall⁴,
Thierry Ravet²,
Alexis Moinet²,
Huseyin Cakmak²,
Onur Babacan²,
Adela Barbulescu³,
Valentin Parfait²,
Victor Huguenin²,
Emine Sümeyye Kalaycı² &
…
Qiong Hu⁴

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 425))

Included in the following conference series:

International Summer Workshop on Multimodal Interfaces

705 Accesses
1 Citations
7 Altmetric

Abstract

This paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool.

Download to read the full chapter text

Chapter PDF

Motion Events in the Speech + Gesture Interface

Hand Gesture Synthesis for Conversational Characters

Keywords

References

Mori, M.: The Uncanny Valley. Energy 7(4), 33–35 (1970)
Google Scholar
Mori, M.: The Uncanny Valley (K. F. MacDorman & N. Kageki, Trans.). IEEE Robotics & Automation Magazine 19(2), 98–100 (2012)
Article Google Scholar
Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers Inc. (1997)
Google Scholar
Raux, A., Black, A.W.: A Unit Selection Approach to F0 Modelling and its Applications to Emphasis. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 700–705 (December 2003)
Google Scholar
Lindemann, E.: Music Synthesis with Reconstructive Phrase Modelling. IEEE Signal Processing Magazine 24(2), 80–91 (2007)
Article Google Scholar
Fechteler, P., Eisert, P., Rurainsky, J.: Fast and High Resolution 3D Face Scanning. In: IEEE International Conference on Image Processing, vol. 3, pp. 81–84 (2007)
Google Scholar
Menache, A.: Understanding Motion Capture for Computer Animation and Video Games. Morgan Kauffman Publishers Inc. (2000)
Google Scholar
d’Alessandro, N.: Realtime and Accurate Musical Control of Expression in Voice Synthesis. PhD defence at the University of Mons (November 2009)
Google Scholar
Maestre, E., Blaauw, M., Bonada, J., Guaus, E., Perez, A.: Statistical Modelling of Bowing Control Applied to Violin Sound Synthesis. IEEE Transactions on Audio, Speech, and Language Processing 18(4), 855–871 (2010)
Article Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), vol. 3, pp. 1315–1318 (2000)
Google Scholar
Dutreve, L., Meyer, A., Bouakaz, S.: Feature Points Based Facial Animation Retargeting. In: Proceedings of the 2008 ACM Symposium on Virtual Reality Software and Technology, pp. 197–200 (2008)
Google Scholar
Hunt, A., Wanderley, M., Paradis, M.: The Importance of Parameter Mapping in Electronic Instrument Design. Journal of New Music Research 32(4), 429–440 (2003)
Article Google Scholar
Tokuda, K., Oura, K., Hashimoto, K., Shiota, S., Takaki, S., Zen, H., Yamagishi, J., Toda, T., Nose, T., Sako, S., Black, A.W.: HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp
Tilmanne, J., Moinet, A., Dutoit, T.: Stylistic Gait Synthesis Based on Hidden Markov Models. Eurasip Journal on Advances in Signal Processing 2012(1,72) (2012)
Google Scholar
Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-Based Laughter Synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 7835–7839 (2013)
Google Scholar
Astrinaki, M., d’Alessandro, N., Picart, B., Drugman, T., Dutoit, T.: Reactive and Continuous Control of HMM-based Speech Synthesis. In: IEEE Workshop on Spoken Language Technology (December 2012)
Google Scholar
Astrinaki, M., Moinet, A., Yamagishi, J., Richmond, K., Ling, Z.-H., King, S., Dutoit, T.: Mage - Reactive Articulatory Feature Control of HMM-Based Parametric Speech Synthesis. In: Proceedings of the 8th ISCA Speech Synthesis Workshop, SSW 8 (September 2013)
Google Scholar
Hueber, T., Bailly, G., Denby, B.: Continuous Articulatory-to-Acoustic Mapping using Phone-Based Trajectory HMM for a Silent Speech Interface. In: Proceedings of Interspeech, ISCA (2012)
Google Scholar
Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory, vol. 2. Prentice Hall PTR (1998)
Google Scholar
Stylianou, Y., Cappé, O., Moulines, E.: Continuous Probabilistic Transform for Voice Conversion. IEEE Transactions on Speech and Audio Processing 6(12), 131–142 (1998)
Article Google Scholar
Kain, A.B.: High Resolution Voice Transformation. PhD Thesis, Rockford College (2001)
Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15(8), 2222–2235 (2007)
Article Google Scholar
Astrinaki, M., Moinet, A., Wilfart, G., d’Alessandro, N., Dutoit, T.: Mage Platform for Performative Speech Synthesis., http://mage.numediart.org
Kominek, J., Black, A.W.: CMU Arctic Databases for Speech Synthesis. Tech. Rep., Language Technologies Institute, School of Computer Science, Carnegie Mellon University (2003)
Google Scholar
Imai, S., Sumita, K., Furuichi, C.: Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis. Electronics and Communications in Japan, Part I 66(2), 10–18 (1983)
Article Google Scholar
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech Synthesis Based on Hidden Markov Models. In: Proceedings of IEEE, vol. 101(5) (2013)
Google Scholar
Sundberg, J.: The Science of Singing Voice. PhD Thesis, Illinois University Press (1987)
Google Scholar
Titze, I.R.: Nonlinear Source-Filter Coupling in Phonation: Theory. J. Acoust. Soc. Am. 123, 2733–2749 (2008)
Article Google Scholar
Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)
Google Scholar
Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Quantitative Comparison of Glottal Closure Instant Estimation Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)
Google Scholar
Tilmanne, J., Ravet, T.: The Mockey Database, http://tcts.fpms.ac.be/~tilmanne
IGS-190, Animazoo website, http://www.animazoo.com
Baumann, T., Schlangen, D.: Recent Advances in Incremental Spoken Language Processing. In: Interspeech 2013 Tutorial 1 (2013)
Google Scholar
Oura, K.: An Example of Context-Dependent Label Format for HMM-Based Speech Synthesis in English. In: HTS-demo_CMU-ARCTIC-SLT (2011), http://hts.sp.nitech.ac.jp
Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-based Laughter Synthesis. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2013)
Google Scholar
Urbain, J., Cakmak, H., Dutoit, T.: Automatic Phonetic Transcription of Laughter and its Application to Laughter Synthesis. In: Proceedings of the 5th Biannual Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)
Google Scholar
Kawahara, H.: Straight, Exploitation of the Other Aspect of Vocoder: Perceptually Isomorphic Decomposition of Speech Sounds. Acoustical Science and Technology 27(6) (2006)
Google Scholar
Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic Plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proceedings of Interspeech (2009)
Google Scholar
Tilmanne, J., Dutoit, T.: Continuous Control of Style and Style Transitions through Linear Interpolation in Hidden Markov Model Based Walk Synthesis. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science XVI. LNCS, vol. 7380, pp. 34–54. Springer, Heidelberg (2012)
Chapter Google Scholar
Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., Van Gool, L.: Acquisition of a 3D Audio-Visual Corpus of Affective Speech. IEEE Transactions on Multimedia 12(6), 591–598 (2010)
Article Google Scholar
Bailly, G., Govokhina, O., Elisei, F., Breton, G.: Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models. EURASIP Journal on Audio, Speech, and Music Processing 2009(5) (2009)
Google Scholar
Barbulescu, A., Hueber, T., Bailly, G., Ronfard, R.: Audio-Visual Speaker Conversion Using Prosody Features. In: International Conference on Auditory-Visual Speech Processing (2013)
Google Scholar
Faceshift, http://faceshift.com
Max Audio Software, http://cycling74.com/products/max
University of Cambridge, The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk
Puckette, M.: Pure Data, http://puredata.info .
Lieberman, Z., Watson, T., Castro, A., et al.: openFrameworks, http://www.openframeworks.cc
Astrinaki, M., Moinet, A., d´Alessandro, N., Dutoit, T.: Pure Data External for Reactive HMM-based Speech and Singing Synthesis. In: Proceedings of the 16th International Conference on Digital Audio Effects, DAFx 2013 (September 2013)
Google Scholar
Astrinaki, M., d’Alessandro, N., Reboursiere, L., Moinet, A., Dutoit, T.: Mage 2.0: New Features and its Application in the Development of A Talking Guitar. In: Proceedings of the 13th International Conference on New Interfaces for Musical Expression, NIME 2013 (May 2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Numediart Institute for New Media Art Technology, University of Mons, Belgium
Nicolas d’Alessandro, Joëlle Tilmanne, Maria Astrinaki, Thierry Ravet, Alexis Moinet, Huseyin Cakmak, Onur Babacan, Valentin Parfait, Victor Huguenin & Emine Sümeyye Kalaycı
GIPSA-lab, UMR 5216/CNRS/INP/UJF/Stendhal University, Grenoble, France
Thomas Hueber & Adela Barbulescu
Centre for Speech Technology Research, University of Edinburgh, Scotland, UK
Rasmus Dall & Qiong Hu

Authors

Nicolas d’Alessandro
View author publications
You can also search for this author in PubMed Google Scholar
Joëlle Tilmanne
View author publications
You can also search for this author in PubMed Google Scholar
Maria Astrinaki
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hueber
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus Dall
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Ravet
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Moinet
View author publications
You can also search for this author in PubMed Google Scholar
Huseyin Cakmak
View author publications
You can also search for this author in PubMed Google Scholar
Onur Babacan
View author publications
You can also search for this author in PubMed Google Scholar
Adela Barbulescu
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Parfait
View author publications
You can also search for this author in PubMed Google Scholar
Victor Huguenin
View author publications
You can also search for this author in PubMed Google Scholar
Emine Sümeyye Kalaycı
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Engenharia Electrotécnica, Universidade Nova de Lisboa, Quinta da Torre, 2829-516, Monte de Caparica, Portugal
Yves Rybarczyk , Tiago Cardoso , João Rosas & Luis M. Camarinha-Matos , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

d’Alessandro, N. et al. (2014). Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data. In: Rybarczyk, Y., Cardoso, T., Rosas, J., Camarinha-Matos, L.M. (eds) Innovative and Creative Developments in Multimodal Interaction Systems. eNTERFACE 2013. IFIP Advances in Information and Communication Technology, vol 425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55143-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-55143-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55142-0
Online ISBN: 978-3-642-55143-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Abstract

Chapter PDF

Similar content being viewed by others

Motion Events in the Speech + Gesture Interface

Hand Gesture Synthesis for Conversational Characters

Hand Gesture Synthesis for Conversational Characters

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Abstract

Chapter PDF

Similar content being viewed by others

Motion Events in the Speech + Gesture Interface

Hand Gesture Synthesis for Conversational Characters

Hand Gesture Synthesis for Conversational Characters

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation