Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments

被引:0
|
作者
Wu, Gin-Der [1 ]
Zhu, Zhen-Wei [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
关键词
voice activity detection; speech boundary; fuzzy neural network; skin color segmentation; audio; visual; SYSTEM; RECOGNITION; FEATURES; ENTROPY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Voice activity detection is a fundamental problem in speech processing, which has been discussed for decades. However, it is a big challenge to determine the speech boundary in noisy environments because the corrupted speech is uncertain. In handing problems with noisy data, this study adopts a fuzzy neural network (FNN) to process the uncertainty. Furthermore, human speech perception is bimodal. We lip-read in noisy environments to improve intelligibility. This idea inspires us to adopt the visual information into the voice activity detection system. Based on the skin color segmentation, faces and mouths can be found in images. By analyzing the geometric shapes, the lip contour feature of speaker can be extracted. Then, the proposed fuzzy neural network considers not only audio but also visual information. Compared with the other voice activity detection, the proposed method for voice activity detection is more robust in the condition of low signal-to-noise ratio (SNR).
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [1] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [2] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [3] VOICE ACTIVITY DETECTION USING AUDIO-VISUAL INFORMATION
    Petsatodis, Theodoros
    Pnevmatikakis, Aristodemos
    Boukis, Christos
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 216 - +
  • [4] Active Audio-Visual Integration for Voice Activity Detection based on a Causal Bayesian Network
    Yoshida, Takami
    Nakadai, Kazuhiro
    2012 12TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2012, : 370 - 375
  • [5] Audio-Visual Voice Activity Detection Using Diffusion Maps
    Dov, David
    Talmon, Ronen
    Cohen, Israel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 732 - 745
  • [6] Audio-Visual Voice Activity Detection Using Diffusion Maps
    Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa
    32000, Israel
    IEEE Trans. Audio Speech Lang. Process., 4 (732-745):
  • [7] Adaptive Weighting Parameter in Audio-Visual Voice Activity Detection
    Buchbinder, Matar
    Buchris, Yaakov
    Cohen, Israel
    2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [8] Voice activity detection for driver using audio-visual integration
    Ninomiya, Yoshiki
    Ban, Yoshihide
    Maeno, Toshiki
    Negi, Daisuke
    Miyajima, Chiyomi
    Mori, Kensaku
    Kitasaka, Takayuki
    Suenaga, Yasuhito
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2008, 62 (03): : 435 - 441
  • [9] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [10] A deep architecture for audio-visual voice activity detection in the presence of transients
    Ariav, Ido
    Dov, David
    Cohen, Israel
    SIGNAL PROCESSING, 2018, 142 : 69 - 74