An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition

被引:0
|
作者
Yoshida, Takami [1 ]
Nakadai, Kazuhiro [1 ,2 ]
Okuno, Hiroshi G. [3 ]
机构
[1] Tokyo Inst Technol, Grad Sch Informat Sci & Engn, Tokyo 152, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama, Japan
[3] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
Audio-Visual integration; Voice Activity Detection; Speech Recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Noise-robust Automatic Speech Recognition (ASR) is essential for robots Much are expected to communicate with humans in a daily environment In such an environment; Voice Activity Detection (VAD) Strongly affects the performance of ASR, because dime are many acoustically and visually noises In this paper; we improved Audio-Visual VAD for out two-layered audio visual integration framework for ASR, by using hangover processing based on erosion and dilation We implemented proposed method to our audio-visual speech if:cognition system for robot, Empirical results show v. the effectiveness of our pi posed method in terms of VAD
引用
收藏
页码:51 / +
页数:3
相关论文
共 50 条
  • [1] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [2] Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots
    Yoshida, Takami
    Nakadai, Kazuhiro
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2710 - 2713
  • [3] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +
  • [4] An audio-visual corpus for multimodal automatic speech recognition
    Andrzej Czyzewski
    Bozena Kostek
    Piotr Bratoszewski
    Jozef Kotus
    Marcin Szykulski
    [J]. Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
  • [5] An audio-visual corpus for multimodal automatic speech recognition
    Czyzewski, Andrzej
    Kostek, Bozena
    Bratoszewski, Piotr
    Kotus, Jozef
    Szykulski, Marcin
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
  • [6] Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition
    Maulana, Muhammad Rizki Aulia Rahman
    Fanany, Mohamad Ivan
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 381 - 385
  • [7] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05): : 2421 - 2424
  • [8] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [9] Lips Detection for Audio-Visual Speech Recognition System
    Chin, Siew Wen
    Ang, Li-Minn
    Seng, Kah Phooi
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 311 - 314
  • [10] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211