Microphone array driven speech recognition:: Influence of localization on the word error rate

被引:0
|
作者
Wölfel, M [1 ]
Nickel, K [1 ]
McDonough, J [1 ]
机构
[1] Univ Karlsruhe, Inst Theoret Informat, D-76131 Karlsruhe, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interest within the automatic speech recognition (ASR) research community has recently focused on the recognition of speech captured with one or more microphones located in the far field, rather than being mounted on a headset and positioned next to the speaker's mouth. Far field ASR is a natural application for beam forming techniques using an array of microphones. A prerequisite for applying such techniques, however, is a reliable means of speaker localization. In this work, we compare the accuracy of source localization systems based on only audio features, only video features, as well as a combination of audio and video features using speech data collected during seminars held by actual speakers. We also investigate the influence of source localization accuracy on the word error rate (WER) of a far field ASR system, comparing the WERs obtained with position estimates from several automatic source localizers with those obtained from true speaker positions. Our results reveal that accurate speaker localization is crucial for minimizing the error rate of a far field ASR system.
引用
收藏
页码:320 / 331
页数:12
相关论文
共 50 条
  • [1] Robust speech recognition with speaker localization by a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1317 - 1320
  • [2] Microphone array system for speech recognition
    Kiyohara, K
    Kaneda, Y
    Takahashi, S
    Nomura, H
    Kojima, J
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 215 - 218
  • [3] Microphone Array Processing for Distant Speech Recognition
    Kumatani, Kenichi
    McDonough, John
    Raj, Bhiksha
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 127 - 140
  • [4] A DIGITAL MICROPHONE ARRAY FOR DISTANT SPEECH RECOGNITION
    Zwyssig, Erich
    Lincoln, Mike
    Renals, Steve
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5106 - 5109
  • [5] Microphone array speech recognition: Experiments on overlapping speech in meetings
    Moore, DC
    McCowan, IA
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 497 - 500
  • [6] Dealing with uncertainty in microphone placement in a microphone array speech recognition system
    Himawan, Ivan
    Sridharan, Sridha
    McCowan, Kin
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1565 - +
  • [7] Optimizing expected word error rate via sampling for speech recognition
    Shannon, Matt
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3537 - 3541
  • [8] Word Error Rate Estimation for Speech Recognition: e-WER
    Ali, Ahmed
    Renals, Steve
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 20 - 24
  • [9] A Posterior Approach for Microphone Array Based Speech Recognition
    Wang, Dong
    Himawan, Ivan
    Frankel, Joe
    King, Simon
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 996 - 999
  • [10] Microphone array sub-band speech recognition
    McCowan, IA
    Sridharan, S
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 185 - 188