Using spatial audio cues from speech excitation for meeting speech segmentation

被引:0
|
作者
Cheng, Eva [1 ]
Burnett, Ian [1 ]
Ritz, Christian [1 ]
机构
[1] Univ Wollongong, Whisper Labs, Sch Elect Comp & Telecommun Engn, Wollongong, NSW 2522, Australia
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multiparty meetings generally involve stationary participants. Participant location information can thus be used to segment the recorded meeting speech into each speaker's 'turn' for meeting 'browsing'. To represent speaker location information from speech, previous research showed that the most reliable time delay estimates are extracted from the Hilbert envelope of the Linear Prediction residual signal. The authors' past work has proposed the use of spatial audio cues to represent speaker location information. This paper proposes extracting spatial audio cues from the Hilbert envelope of the speech residual for indicating changing speaker location for meeting speech segmentation. Experiments conducted on recordings of a real acoustic environment show that spatial cues from the Hilbert envelope are more consistent across frequency subbands and can clearly distinguish between spatially distributed speakers, compared to spatial cues estimated from the recorded speech or residual signal.
引用
收藏
页码:3067 / +
页数:2
相关论文
共 50 条
  • [1] Using spatial cues for meeting speech segmentation
    Cheng, E
    Lukasiak, J
    Burnett, IS
    Stirling, D
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 350 - 353
  • [2] Varying microphone patterns for meeting speech segmentation using spatial audio cues
    Cheng, Eva
    Burnett, Ian
    Ritz, Christian
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2006, PROCEEDINGS, 2006, 4261 : 221 - +
  • [3] Automatic speech recognition using audio visual cues
    Yashwanth, H
    Mahendrakar, H
    David, S
    [J]. PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
  • [4] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
  • [5] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
    Rybach, David
    Gollan, Christian
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
  • [6] Visual speech segmentation: using facial cues to locate word boundaries in continuous speech
    Mitchel, Aaron D.
    Weiss, Daniel J.
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2014, 29 (07) : 771 - 780
  • [7] Speech/Laughter Classification in Meeting Audio
    Khine, Swe Zin Kalayar
    Nwe, Tin Lay
    Li, Haizhou
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 793 - 796
  • [8] Speech segmentation is facilitated by visual cues
    Cunillera, Toni
    Camara, Estela
    Laine, Matti
    Rodriguez-Fornells, Antoni
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2010, 63 (02): : 260 - 274
  • [9] Disambiguating durational cues for speech segmentation
    Monaghan, Padraic
    White, Laurence
    Merkx, Marjolein M.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (01): : EL45 - EL51
  • [10] The Use of Facial Cues for Speech Segmentation
    Mitchel, Aaron D.
    Weiss, Daniel J.
    [J]. PROCEEDINGS OF THE 36TH ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 2012, : 361 - +