Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

Anguera, Xavier; Wooters, Chuck; Peskin, Barbara; Aguiló, Mateu

doi:10.1007/11677482_34

Xavier Anguera^18,19,
Chuck Wooters¹⁸,
Barbara Peskin¹⁸ &
…
Mateu Aguiló^18,19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3869))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

2082 Accesses
26 Citations

Abstract

In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Unsupervised adaptation of PLDA models for broadcast diarization

Article Open access 27 December 2019

Speaker Diarization: An Emerging Research

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Article Open access 26 September 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ajmera, J., Bourlard, H., Lapidot, I.: Improved unknown-multiple speaker clustering using HMM. IDIAP, Tech. Rep. (2002)
Google Scholar
Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using HMM. In: ICSLP 2002, Denver, Colorado, USA (September 2002)
Google Scholar
Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)
Google Scholar
Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Rich Transcription Workshop, New Jersey, USA (2004)
Google Scholar
Shaobing Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)
Google Scholar
Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustic Society of America 78, 1508–1518 (November 1994)
Article Google Scholar
Brandstein, M.S., Silverman, H.F.: A robust method for speech signal timedelay estimation in reverberant rooms. In: ICASSP 1997, Munich, Germany (1997)
Google Scholar
Hirsch, H.-G.: HMM adaptation for applications in telecommunication. Speech Communication 34, 127–139 (2001)
Article MATH Google Scholar
Li, Q., Tsai, A.: A matched filter approach to endpoint detection for robust speaker verification. In: IEEE Workshop on Automatic Identification Advanced Technologies, New Jersey, USA (October 1999)
Google Scholar
NIST speech tools and APIs, Available at, http://www.nist.gov/speech/tools/index.htm

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, 94704, USA
Xavier Anguera, Chuck Wooters, Barbara Peskin & Mateu Aguiló
Technical University of Catalonia, Barcelona, Spain
Xavier Anguera & Mateu Aguiló

Authors

Xavier Anguera
View author publications
You can also search for this author in PubMed Google Scholar
Chuck Wooters
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Peskin
View author publications
You can also search for this author in PubMed Google Scholar
Mateu Aguiló
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anguera, X., Wooters, C., Peskin, B., Aguiló, M. (2006). Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_34

Download citation

DOI: https://doi.org/10.1007/11677482_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised adaptation of PLDA models for broadcast diarization

Speaker Diarization: An Emerging Research

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised adaptation of PLDA models for broadcast diarization

Speaker Diarization: An Emerging Research

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation