Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

被引：0

作者：

Sarkar, Eklavya ^{[1
,2
]}

Prasad, RaviShankar ^{[1
]}

Doss, Mathew Magimai ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

INTERSPEECH 2022 | 2022年

基金：

瑞士国家科学基金会;

关键词：

Voice activity detection; zero-frequency filtering; speech analysis; signal processing; NOISE;

D O I：

10.21437/Interspeech.2022-10535

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.

引用

页码：4626 / 4630

页数：5

共 50 条

[21] The Detection of Parkinson's Disease From Speech Using Voice Source Information
Narendra, N. P.
Schuller, Bjorn
Alku, Paavo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1925 - 1936
[22] Single versus Multi-Source Discrimination in Birdcalls using Zero-Frequency Filtering
Sinha, Ragini
Vadluri, Vivek
Arya, Ashish
Rajan, Padmanabhan
2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
[23] Unsupervised Singing Voice Detection Using Dictionary Learning
Pikrakis, Aggelos
Kopsinis, Yannis
Kroher, Nadine
Diaz-Banez, Jose-Miguel
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1212 - 1216
[24] Unsupervised fault detection using frequency-wise angular filtering in contaminated vibration signals
Byun, Yunseon
Maeng, Daeju
Baek, Jun-Geol
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2024,
[25] Voice Activity Detection using AdaBoost with Multi-Frame Information
Usukura, Tohru
Mitsuhashi, Wataru
ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 262 - +
[26] EFFECTS OF FREQUENCY FILTERING UPON INFORMATION CONTENT AND STRUCTURE OF VOICE RATINGS
VOIERS, WD
MILLER, JF
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1965, 37 (06): : 1212 - &
[27] LEARNING VOICE SOURCE RELATED INFORMATION FOR DEPRESSION DETECTION
Dubagunta, S. Pavankumar
Vlasenko, Bogdan
Magimai-Doss, Mathew
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6525 - 6529
[28] An Unsupervised Visual-only Voice Activity Detection Approach Using Temporal Orofacial Features
Tao, Fei
Hansen, John H. L.
Busso, Carlos
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2302 - 2306
[29] USING ONLINE MODEL COMPARISON IN THE VARIATIONAL BAYES FRAMEWORK FOR ONLINE UNSUPERVISED VOICE ACTIVITY DETECTION
Cournapeau, David
Watanabe, Shinji
Nakamura, Atsushi
Kawahara, Tatsuya
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4462 - 4465
[30] Detection of Glottal Activity Using Different Attributes of Source Information
Adiga, Nagaraj
Prasanna, S. R. M.
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) : 2107 - 2111

← 1 2 3 4 5 →