Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

被引:0
|
作者
Sarkar, Eklavya [1 ,2 ]
Prasad, RaviShankar [1 ]
Doss, Mathew Magimai [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
INTERSPEECH 2022 | 2022年
基金
瑞士国家科学基金会;
关键词
Voice activity detection; zero-frequency filtering; speech analysis; signal processing; NOISE;
D O I
10.21437/Interspeech.2022-10535
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.
引用
收藏
页码:4626 / 4630
页数:5
相关论文
共 50 条
  • [21] The Detection of Parkinson's Disease From Speech Using Voice Source Information
    Narendra, N. P.
    Schuller, Bjorn
    Alku, Paavo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1925 - 1936
  • [22] Single versus Multi-Source Discrimination in Birdcalls using Zero-Frequency Filtering
    Sinha, Ragini
    Vadluri, Vivek
    Arya, Ashish
    Rajan, Padmanabhan
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [23] Unsupervised Singing Voice Detection Using Dictionary Learning
    Pikrakis, Aggelos
    Kopsinis, Yannis
    Kroher, Nadine
    Diaz-Banez, Jose-Miguel
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1212 - 1216
  • [24] Unsupervised fault detection using frequency-wise angular filtering in contaminated vibration signals
    Byun, Yunseon
    Maeng, Daeju
    Baek, Jun-Geol
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2024,
  • [25] Voice Activity Detection using AdaBoost with Multi-Frame Information
    Usukura, Tohru
    Mitsuhashi, Wataru
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 262 - +
  • [26] EFFECTS OF FREQUENCY FILTERING UPON INFORMATION CONTENT AND STRUCTURE OF VOICE RATINGS
    VOIERS, WD
    MILLER, JF
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1965, 37 (06): : 1212 - &
  • [27] LEARNING VOICE SOURCE RELATED INFORMATION FOR DEPRESSION DETECTION
    Dubagunta, S. Pavankumar
    Vlasenko, Bogdan
    Magimai-Doss, Mathew
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6525 - 6529
  • [28] An Unsupervised Visual-only Voice Activity Detection Approach Using Temporal Orofacial Features
    Tao, Fei
    Hansen, John H. L.
    Busso, Carlos
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2302 - 2306
  • [29] USING ONLINE MODEL COMPARISON IN THE VARIATIONAL BAYES FRAMEWORK FOR ONLINE UNSUPERVISED VOICE ACTIVITY DETECTION
    Cournapeau, David
    Watanabe, Shinji
    Nakamura, Atsushi
    Kawahara, Tatsuya
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4462 - 4465
  • [30] Detection of Glottal Activity Using Different Attributes of Source Information
    Adiga, Nagaraj
    Prasanna, S. R. M.
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) : 2107 - 2111