USE OF PITCH CONTINUITY FOR ROBUST SPEECH ACTIVITY DETECTION

被引:0
|
作者
Shao, Yiwen [1 ,2 ]
Lin, Qiguang [1 ]
机构
[1] Baihu Technol Co Ltd, Guangzhou, Guangdong, Peoples R China
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
autocorrelation function; speech activity detection; pitch continuity; pitch detection;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech activity detection (SAD) is an important component for various speech processing applications and has been researched extensively recently. The pitch continuity, a significant characteristic of speech, however, has not successfully played a role in existing SAD methods. In this work, we propose a novel way to integrate the pitch continuity with pitch-related features. Practice is carried out through the Combo-SAD approach: We examine three consecutive frames and assume that they all have the same pitch as the center frame due to pitch continuity. Corresponding feature values are recomputed at the adjusted pitch location and then used in the final expression. The new combo feature is evaluated with various types of additive noise at different signal-to-noise ratios (SNR). The results show that the new feature leads to better SAD performance (with an up to 39.3% relative improvement on miss rate compared to Combo-SAD). We also introduce a novel variant of the underlying autocorrelation function and illustrate how it can improve the accuracy of pitch detection.
引用
收藏
页码:5534 / 5538
页数:5
相关论文
共 50 条
  • [1] PITCH CONTINUITY AND SPEECH SOURCE ATTRIBUTION
    DARWIN, CJ
    BETHELLFOX, CE
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1977, 3 (04) : 665 - 672
  • [2] Robust pitch detection of speech signals using steerable filters
    Cai, JH
    Liu, ZQ
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1427 - 1430
  • [3] Noise Robust Speech Activity Detection
    Abdulla, Waleed H.
    Guan, Zhou
    Sou, Hou Chi
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), 2009, : 473 - 477
  • [4] A New Pitch Based Approach for Speech Activity Detection
    Punnoose, A. K.
    [J]. PROCEEDINGS OF 2019 5TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K19), 2019, : 319 - 322
  • [5] An Evaluation of Keyword Detection Using ACF of Pitch for Robust Speech Recognition
    Tang, Jiayue
    Tian, Yu
    Jiang, Xiaonan
    Tsutsui, Hiroshi
    Miyanaga, Yoshikazu
    [J]. 2018 18TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2018, : 96 - 100
  • [6] A robust pitch detection algorithm for speech signals in a practical noisy environment
    Shahnaz, C.
    Zhu, W. -P.
    Ahmad, M. O.
    [J]. 2007 50TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 2007, : 330 - 333
  • [7] A noise robust speech activity detection algorithm
    Harsha, BV
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 322 - 325
  • [8] Pitch restoration for robust speech recognition
    Lima, C
    Tavares, A
    Silva, C
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 18 - 22
  • [9] Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
    Deshmukh, O
    Espy-Wilson, CY
    Salomon, A
    Singh, J
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 776 - 786
  • [10] A novel normalization method for autocorrelation function for pitch detection and for speech activity detection
    Lin, Qiguang
    Shao, Yiwen
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2097 - 2101