Statistical Model-Based Voice Activity Detection Using Spatial Cues and Log Energy for Dual-Channel Noisy Speech Recognition

被引：0

作者：

Park, Ji Hun ^{[1
]}

Shin, Min Hwa ^{[2
]}

Kim, Hong Kook ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Informat & Commun, Kwangju 500712, South Korea

[2] Multimedia IP Res Ctr, Korea Elect Technol Inst, Seongnam 463816, South Korea

来源：

COMMUNICATION AND NETWORKING, PT II | 2010年 / 120卷

基金：

新加坡国家研究基金会;

关键词：

Voice activity detection (VAD); end-point detection; dual-channel speech; speech recognition; spatial cues;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a voice activity detection (VAD) method for dual-channel noisy speech recognition is proposed on the basis of statistical models constructed by spatial cues and log energy. In particular, spatial cues are composed of the interaural time differences and interaural level differences of dual-channel speech signals, and the statistical models for speech presence and absence are based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed using only speech signals segmented by the proposed VAD method. The performance of the proposed VAD method is then compared with those of conventional methods such as a signal-to-noise ratio variance based method and a phase vector based method. It is shown from the experiments that the proposed VAD method outperforms conventional methods, providing the relative word error rate reductions of 19.5% and 12.2%, respectively.

引用

页码：172 / +

页数：2

共 42 条

[31] Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy for Speech Enhancement
Park, Yun-Sik
Lee, Sangmin
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10) : 2568 - 2571
[32] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
PARKYun-sik
LEE Sang-min
Journal of Central South University, 2013, 20 (02) : 424 - 432
[33] KEYWORD DETECTION IN CONVERSATIONAL SPEECH UTTERANCES USING HIDDEN MARKOV MODEL-BASED CONTINUOUS SPEECH RECOGNITION
ROSE, RC
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (04): : 309 - 333
[34] Device-Free Human Activity Recognition Based on Dual-Channel Transformer Using WiFi Signals
Gu, Zhihao
He, Taiwei
Wang, Ziqi
Xu, Yuedong
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[35] Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision
Chang, Joon-Hyuk
ETRI JOURNAL, 2012, 34 (02) : 184 - 189
[36] Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold
Kim, Sang-Kyun
Chang, Joon-Hyuk
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2010, 29 (01): : 76 - 81
[37] Noise Cancellation Based on Voice Activity Detection Using Spectra Variation for Speech recognition in Smart Home Devices
Park, Jeong-Sik
Kim, Seok-Hoon
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (01): : 149 - 159
[38] Enhanced process monitoring for industrial coking furnace using a dual-channel pooling and homologous bilinear model-based convolutional neural network
Hua, Chunle
Cui, Yuancun
Wu, Feng
Zhang, Ridong
CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2024, 102 (08): : 2857 - 2875
[39] Behavior recognition of non-motorized transport at intersections using dual-channel grid model based on disordered trajectory point data
Xu, Huanting
He, Zhaocheng
Chen, Yiyang
Wu, Zhigang
Zhu, Yiting
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2024, 650
[40] Evaluation of the Modulation Transfer Function from a Model-Based and a Statistical-Based Hybrid Iterative Reconstruction Algorithm Using Single-Energy and Dual-Energy CT
Olguin, E.
Leon, S.
Olguin, C.
Arreola, M.
MEDICAL PHYSICS, 2020, 47 (06) : E377 - E378

← 1 2 3 4 5 →