Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments

被引：0

作者：

Wu, Gin-Der ^{[1
]}

Zhu, Zhen-Wei ^{[1
]}

机构：

[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan

来源：

2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS) | 2018年

关键词：

voice activity detection; speech boundary; fuzzy neural network; skin color segmentation; audio; visual; SYSTEM; RECOGNITION; FEATURES; ENTROPY;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Voice activity detection is a fundamental problem in speech processing, which has been discussed for decades. However, it is a big challenge to determine the speech boundary in noisy environments because the corrupted speech is uncertain. In handing problems with noisy data, this study adopts a fuzzy neural network (FNN) to process the uncertainty. Furthermore, human speech perception is bimodal. We lip-read in noisy environments to improve intelligibility. This idea inspires us to adopt the visual information into the voice activity detection system. Based on the skin color segmentation, faces and mouths can be found in images. By analyzing the geometric shapes, the lip contour feature of speaker can be extracted. Then, the proposed fuzzy neural network considers not only audio but also visual information. Compared with the other voice activity detection, the proposed method for voice activity detection is more robust in the condition of low signal-to-noise ratio (SNR).

引用

页码：141 / 145

页数：5

共 50 条

[1] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[2] Audio-Visual Speech Recognition in Noisy Audio Environments
Palecek, Karel
Chaloupka, Josef
2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
[3] VOICE ACTIVITY DETECTION USING AUDIO-VISUAL INFORMATION
Petsatodis, Theodoros
Pnevmatikakis, Aristodemos
Boukis, Christos
2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 216 - +
[4] Active Audio-Visual Integration for Voice Activity Detection based on a Causal Bayesian Network
Yoshida, Takami
Nakadai, Kazuhiro
2012 12TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2012, : 370 - 375
[5] Audio-Visual Voice Activity Detection Using Diffusion Maps
Dov, David
Talmon, Ronen
Cohen, Israel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 732 - 745
[6] Audio-Visual Voice Activity Detection Using Diffusion Maps
Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa
32000, Israel
IEEE Trans. Audio Speech Lang. Process., 4 (732-745):
[7] Adaptive Weighting Parameter in Audio-Visual Voice Activity Detection
Buchbinder, Matar
Buchris, Yaakov
Cohen, Israel
2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
[8] Voice activity detection for driver using audio-visual integration
Ninomiya, Yoshiki
Ban, Yoshihide
Maeno, Toshiki
Negi, Daisuke
Miyajima, Chiyomi
Mori, Kensaku
Kitasaka, Takayuki
Suenaga, Yasuhito
Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2008, 62 (03): : 435 - 441
[9] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
Yoshida, Takami
Nakadai, Kazuhiro
Okuno, Hiroshi G.
TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
[10] A deep architecture for audio-visual voice activity detection in the presence of transients
Ariav, Ido
Dov, David
Cohen, Israel
SIGNAL PROCESSING, 2018, 142 : 69 - 74

← 1 2 3 4 5 →