Abstract
This paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool.
Chapter PDF
Similar content being viewed by others
Keywords
References
Mori, M.: The Uncanny Valley. Energy 7(4), 33–35 (1970)
Mori, M.: The Uncanny Valley (K. F. MacDorman & N. Kageki, Trans.). IEEE Robotics & Automation Magazine 19(2), 98–100 (2012)
Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers Inc. (1997)
Raux, A., Black, A.W.: A Unit Selection Approach to F0 Modelling and its Applications to Emphasis. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 700–705 (December 2003)
Lindemann, E.: Music Synthesis with Reconstructive Phrase Modelling. IEEE Signal Processing Magazine 24(2), 80–91 (2007)
Fechteler, P., Eisert, P., Rurainsky, J.: Fast and High Resolution 3D Face Scanning. In: IEEE International Conference on Image Processing, vol. 3, pp. 81–84 (2007)
Menache, A.: Understanding Motion Capture for Computer Animation and Video Games. Morgan Kauffman Publishers Inc. (2000)
d’Alessandro, N.: Realtime and Accurate Musical Control of Expression in Voice Synthesis. PhD defence at the University of Mons (November 2009)
Maestre, E., Blaauw, M., Bonada, J., Guaus, E., Perez, A.: Statistical Modelling of Bowing Control Applied to Violin Sound Synthesis. IEEE Transactions on Audio, Speech, and Language Processing 18(4), 855–871 (2010)
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), vol. 3, pp. 1315–1318 (2000)
Dutreve, L., Meyer, A., Bouakaz, S.: Feature Points Based Facial Animation Retargeting. In: Proceedings of the 2008 ACM Symposium on Virtual Reality Software and Technology, pp. 197–200 (2008)
Hunt, A., Wanderley, M., Paradis, M.: The Importance of Parameter Mapping in Electronic Instrument Design. Journal of New Music Research 32(4), 429–440 (2003)
Tokuda, K., Oura, K., Hashimoto, K., Shiota, S., Takaki, S., Zen, H., Yamagishi, J., Toda, T., Nose, T., Sako, S., Black, A.W.: HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp
Tilmanne, J., Moinet, A., Dutoit, T.: Stylistic Gait Synthesis Based on Hidden Markov Models. Eurasip Journal on Advances in Signal Processing 2012(1,72) (2012)
Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-Based Laughter Synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 7835–7839 (2013)
Astrinaki, M., d’Alessandro, N., Picart, B., Drugman, T., Dutoit, T.: Reactive and Continuous Control of HMM-based Speech Synthesis. In: IEEE Workshop on Spoken Language Technology (December 2012)
Astrinaki, M., Moinet, A., Yamagishi, J., Richmond, K., Ling, Z.-H., King, S., Dutoit, T.: Mage - Reactive Articulatory Feature Control of HMM-Based Parametric Speech Synthesis. In: Proceedings of the 8th ISCA Speech Synthesis Workshop, SSW 8 (September 2013)
Hueber, T., Bailly, G., Denby, B.: Continuous Articulatory-to-Acoustic Mapping using Phone-Based Trajectory HMM for a Silent Speech Interface. In: Proceedings of Interspeech, ISCA (2012)
Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory, vol. 2. Prentice Hall PTR (1998)
Stylianou, Y., Cappé, O., Moulines, E.: Continuous Probabilistic Transform for Voice Conversion. IEEE Transactions on Speech and Audio Processing 6(12), 131–142 (1998)
Kain, A.B.: High Resolution Voice Transformation. PhD Thesis, Rockford College (2001)
Toda, T., Black, A.W., Tokuda, K.: Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15(8), 2222–2235 (2007)
Astrinaki, M., Moinet, A., Wilfart, G., d’Alessandro, N., Dutoit, T.: Mage Platform for Performative Speech Synthesis., http://mage.numediart.org
Kominek, J., Black, A.W.: CMU Arctic Databases for Speech Synthesis. Tech. Rep., Language Technologies Institute, School of Computer Science, Carnegie Mellon University (2003)
Imai, S., Sumita, K., Furuichi, C.: Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis. Electronics and Communications in Japan, Part I 66(2), 10–18 (1983)
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech Synthesis Based on Hidden Markov Models. In: Proceedings of IEEE, vol. 101(5) (2013)
Sundberg, J.: The Science of Singing Voice. PhD Thesis, Illinois University Press (1987)
Titze, I.R.: Nonlinear Source-Filter Coupling in Phonation: Theory. J. Acoust. Soc. Am. 123, 2733–2749 (2008)
Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)
Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Quantitative Comparison of Glottal Closure Instant Estimation Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)
Tilmanne, J., Ravet, T.: The Mockey Database, http://tcts.fpms.ac.be/~tilmanne
IGS-190, Animazoo website, http://www.animazoo.com
Baumann, T., Schlangen, D.: Recent Advances in Incremental Spoken Language Processing. In: Interspeech 2013 Tutorial 1 (2013)
Oura, K.: An Example of Context-Dependent Label Format for HMM-Based Speech Synthesis in English. In: HTS-demo_CMU-ARCTIC-SLT (2011), http://hts.sp.nitech.ac.jp
Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-based Laughter Synthesis. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2013)
Urbain, J., Cakmak, H., Dutoit, T.: Automatic Phonetic Transcription of Laughter and its Application to Laughter Synthesis. In: Proceedings of the 5th Biannual Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)
Kawahara, H.: Straight, Exploitation of the Other Aspect of Vocoder: Perceptually Isomorphic Decomposition of Speech Sounds. Acoustical Science and Technology 27(6) (2006)
Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic Plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proceedings of Interspeech (2009)
Tilmanne, J., Dutoit, T.: Continuous Control of Style and Style Transitions through Linear Interpolation in Hidden Markov Model Based Walk Synthesis. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science XVI. LNCS, vol. 7380, pp. 34–54. Springer, Heidelberg (2012)
Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., Van Gool, L.: Acquisition of a 3D Audio-Visual Corpus of Affective Speech. IEEE Transactions on Multimedia 12(6), 591–598 (2010)
Bailly, G., Govokhina, O., Elisei, F., Breton, G.: Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models. EURASIP Journal on Audio, Speech, and Music Processing 2009(5) (2009)
Barbulescu, A., Hueber, T., Bailly, G., Ronfard, R.: Audio-Visual Speaker Conversion Using Prosody Features. In: International Conference on Auditory-Visual Speech Processing (2013)
Faceshift, http://faceshift.com
Max Audio Software, http://cycling74.com/products/max
University of Cambridge, The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk
Puckette, M.: Pure Data, http://puredata.info .
Lieberman, Z., Watson, T., Castro, A., et al.: openFrameworks, http://www.openframeworks.cc
Astrinaki, M., Moinet, A., d´Alessandro, N., Dutoit, T.: Pure Data External for Reactive HMM-based Speech and Singing Synthesis. In: Proceedings of the 16th International Conference on Digital Audio Effects, DAFx 2013 (September 2013)
Astrinaki, M., d’Alessandro, N., Reboursiere, L., Moinet, A., Dutoit, T.: Mage 2.0: New Features and its Application in the Development of A Talking Guitar. In: Proceedings of the 13th International Conference on New Interfaces for Musical Expression, NIME 2013 (May 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
d’Alessandro, N. et al. (2014). Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data. In: Rybarczyk, Y., Cardoso, T., Rosas, J., Camarinha-Matos, L.M. (eds) Innovative and Creative Developments in Multimodal Interaction Systems. eNTERFACE 2013. IFIP Advances in Information and Communication Technology, vol 425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55143-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-55143-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55142-0
Online ISBN: 978-3-642-55143-7
eBook Packages: Computer ScienceComputer Science (R0)