Abstract
Speech enhancement was not and should not be examined solely with the tool of time-frequency analysis. Approaching this problem from different perspectives or incorporating other knowledges helps to expand the number of options open to us when developing a speech enhancement system. Using multiple microphones at different locations makes it possible to develop more sophisticated source separation and dereverberation technologies for speech enhancement, which enable man-made systems to extract a speech signal of interest in a noisy environment with competing speech and/or noise sources. This phenomenon is referred to as the cocktail party effect demonstrated by human beings and many other creatures with few efforts. However, separating and dereverberating speech signals is a very difficult problem in reverberant environments and the state-of-the-art algorithms are still unsatisfactory. The challenge lies in the coexistence of spatial interference from competing sources and temporal echoes due to room reverberation in the observed microphone signals. Focusing only on optimizing the signal-to-interference ratio is inadequate for most speech processing systems where source separation and speech dereverberation are two fully-integrated problems. In this chapter, we study these two problems in a unified framework. We deduce that spatial interference and temporal reverberation can be separated and a SIMO system with the speech signal of interest as input is extracted from the MIMO system. Furthermore, this interference-free SIMO system is dereverberated using the MINT theorem. Such a two-stage procedure leads to a novel sequential source separation and speech dereverberation algorithm based on blind multichannel identification. Simulations with measurements obtained in the varechoic chamber at Bell Labs verified the proposed algorithm.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
E. C. Cherry, “Some experiments on the recognition of speech, with one and with two ears,” J. Acoust. Soc. Am., vol. 25, pp. 975–979, Sept. 1953.
B. D. Van Veen and K. M. Buckley, “Beamforming: a versatile approach to spatial filtering,” IEEE ASSP Magazine, vol. 5, pp. 4–24, Apr. 1988.
Y. Huang, J. Benesty, and G. W. Elko, “Source localization,” in Audio Signal Processing for Next-Generation Multimedia Communication Systems, Y. Huang and J. Benesty, Eds., Boston, MA: Kluwer Academic, 2004.
B. Widrow, P. E. Mantley, L. J. Griffiths, and B. B. Goode, “Adaptive antenna systems,” Proc. of the IEEE, vol. 55, pp. 2143–2159, Dec. 1967.
J. Herault, C. Jutten, and B. Ans, “Detection de grandeurs primitives dans un message composite par une architecture de calul neuromimetique un apprentissage non supervise,” in Proc. GRETSI, 1985.
P. Comon, “Independent component analysis: a new concept?,” Signal Processing, vol. 36, pp. 287–314, Apr. 1994.
L. Molgedey and H. G. Schuster, “Separation of a mixture of independent signals using time delayed correlations,” Phys. Rev. Lett., vol. 72, no. 23, pp. 3634–3637, June 1994.
J.-F. Cardoso, “Eigenstructure of the 4th-order cumulant tensor with application to the blind source separation problem,” in Proc. IEEE ICASSP, 1989, pp. 2109–2112.
S. Amari, A. Cichocki, and H. H. Yang, “Blind signal separation and extraction: neural and information-theoretic approaches,” in Unsupervised Adaptive Filtering, Volume 1: Blind Source Separation, S. Haykin, Ed., New York: John Wiley & Sons, 2000.
H. Wee and J. Principe, “A criterion for BSS based on simultaneous diagonalization of time correlation matrices,” in Proc. IEEE Workshop NNSP, 1997, pp. 496–508.
M. Z. Ikram and D. R. Morgan, “Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment,” in Proc. IEEE ICASSP, 2000, vol. 2, pp. 1041–1044.
L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 320–327, May 2000.
D. Bees, M. Blostein, and P. Kabal, “Reverberant speech enhancement using cepstral processing,” in Proc. IEEE ICASSP, 1991, vol. 2, pp. 977–980.
S. Subramaniam, A. P. Petropulu, and C. Wendt, “Cepstrum-based deconvolution for speech dereverberation,” IEEE Trans. Speech Audio Processing, vol. 4, pp. 392–396, Sept. 1996.
T. Nakatani and M. Miyoshi, “Blind dereverberation of single channel speech based on harmonic structure,” in Proc. IEEE ICASSP, 2003, vol. I, pp. 92–95.
A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989.
M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 145–152, Feb. 1988.
G. Xu, H. Liu, L. Tong, and T. Kailath, “A least-squares approach to blind channel identification,” IEEE Trans. Signal Processing, vol. 43, pp. 2982–2993, Dec. 1995.
L. Tong, G. Xu, and T. Kailath, “A new approach to blind identification and equalization of multipath channels,” in Proc. 25th Asilomar Conf. on Signals, Systems, and Computers, 1991, vol. 2, pp. 856–860.
C. Avendano, J. Benesty, and D. R. Morgan, “A least squares component normalization approach to blind channel identification,” in Proc. IEEE ICASSP, 1999, vol. 4, pp. 1797–1800.
Y. Huang and J. Benesty, “Adaptive multi-channel least mean square and Newton algorithms for blind channel identification,” Signal Processing, vol. 82, pp. 1127–1138, Aug. 2002.
Y. Huang and J. Benesty, “A class of frequency-domain adaptive approaches to blind multi-channel identification,” IEEE Trans. Signal Processing, vol. 51, pp. 11–24, Jan. 2003.
P. P. Vaidyanathan, Multirate Systems and Filter Bank. Englewood Cliffs, NJ: Prentice-Hall, 1993.
D. R. Morgan, J. Benesty, and M. M. Sondhi, “On the evaluation of estimated impulse responses,” IEEE Signal Processing Lett., vol. 5, pp. 174–176, July 1998.
M. Z. Ikram and D. R. Morgan, “Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment,” in Proc. IEEE ICASSP, 2000, vol. 2, pp. 1041–1044.
L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
S. R. Quackenbush, T. P. Barnwell, M. A. Clements, Objective Measures of Speech Quality. Englewood Cliffs, NJ: Prentice-Hall, 1988.
G. Chen, S. N. Koh, I. Y. Soon, “Enhanced Itakura measure incorporating masking properties of human auditory system,” Elsevier Science Signal Processing, vol. 83, pp. 1445–1456, July 2003.
A. Härmä, “Acoustic measurement data from the varechoic chamber,” Technical Memorandum, Agere Systems, Nov. 2001.
W. C. Ward, G. W. Elko, R. A. Kubli, and W. C. McDougald, “The new Varechoic chamber at AT&T Bell Labs,” in Proc. Wallance Clement Sabine Centennial Symposium, 1994, pp. 343–346.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Huang, Y.(., Benesty, J., Chen, J. (2005). Separation and Dereverberation of Speech Signals with Multiple Microphones. In: Speech Enhancement. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27489-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-27489-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24039-6
Online ISBN: 978-3-540-27489-6
eBook Packages: EngineeringEngineering (R0)