Using spatial audio cues from speech excitation for meeting speech segmentation

被引：0

作者：

Cheng, Eva ^{[1
]}

Burnett, Ian ^{[1
]}

Ritz, Christian ^{[1
]}

机构：

[1] Univ Wollongong, Whisper Labs, Sch Elect Comp & Telecommun Engn, Wollongong, NSW 2522, Australia

来源：

2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4 | 2006年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multiparty meetings generally involve stationary participants. Participant location information can thus be used to segment the recorded meeting speech into each speaker's 'turn' for meeting 'browsing'. To represent speaker location information from speech, previous research showed that the most reliable time delay estimates are extracted from the Hilbert envelope of the Linear Prediction residual signal. The authors' past work has proposed the use of spatial audio cues to represent speaker location information. This paper proposes extracting spatial audio cues from the Hilbert envelope of the speech residual for indicating changing speaker location for meeting speech segmentation. Experiments conducted on recordings of a real acoustic environment show that spatial cues from the Hilbert envelope are more consistent across frequency subbands and can clearly distinguish between spatially distributed speakers, compared to spatial cues estimated from the recorded speech or residual signal.

引用

页码：3067 / +

页数：2

共 50 条

[1] Using spatial cues for meeting speech segmentation
Cheng, E
Lukasiak, J
Burnett, IS
Stirling, D
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 350 - 353
[2] Varying microphone patterns for meeting speech segmentation using spatial audio cues
Cheng, Eva
Burnett, Ian
Ritz, Christian
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2006, PROCEEDINGS, 2006, 4261 : 221 - +
[3] Automatic speech recognition using audio visual cues
Yashwanth, H
Mahendrakar, H
David, S
[J]. PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
[4] Audio-visual speech perception without speech cues
Saldana, HM
Pisoni, DB
Fellowes, JM
Remez, RE
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
[5] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
Rybach, David
Gollan, Christian
Schlueter, Ralf
Ney, Hermann
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
[6] Visual speech segmentation: using facial cues to locate word boundaries in continuous speech
Mitchel, Aaron D.
Weiss, Daniel J.
[J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2014, 29 (07) : 771 - 780
[7] Speech/Laughter Classification in Meeting Audio
Khine, Swe Zin Kalayar
Nwe, Tin Lay
Li, Haizhou
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 793 - 796
[8] Speech segmentation is facilitated by visual cues
Cunillera, Toni
Camara, Estela
Laine, Matti
Rodriguez-Fornells, Antoni
[J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2010, 63 (02): : 260 - 274
[9] Disambiguating durational cues for speech segmentation
Monaghan, Padraic
White, Laurence
Merkx, Marjolein M.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (01): : EL45 - EL51
[10] The Use of Facial Cues for Speech Segmentation
Mitchel, Aaron D.
Weiss, Daniel J.
[J]. PROCEEDINGS OF THE 36TH ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 2012, : 361 - +

← 1 2 3 4 5 →