A voice activity detection algorithm in spectro-temporal domain using sparse representation

被引:0
|
作者
Mohadese Eshaghi
Farbod Razzazi
Alireza Behrad
机构
[1] Islamic Azad University,Department of Electrical and Computer Engineering
[2] Shahed University,Electrical and Electronic Engineering Department
关键词
Speech processing; Voice activity detector; VAD; Spectro-temporal domain representation; Sparse representation;
D O I
暂无
中图分类号
学科分类号
摘要
This paper describes a new algorithm for voice activity detection (VAD), based on sparse representation of spectro-temporal domain. Our audio classification algorithm is based on multi-scale spectro-temporal modulation features which are extracted using auditory cortex model. The key concept in sparse representation is that any speech fragment can be represented as a linear combination of a small number of exemplar speech tokens. In this algorithm, the approach transforms the speech into spectro-temporal domain resulting in its decomposition into auditory-based features with multiple scales of temporal and spectral resolutions; in the next stage, each frame is divided into several sub-cubes in the new domain; then the algorithm detects the speech in the signal by using the sparse representation of sub-cubes of the frames in this domain. Simulation results are given to illustrate the effectiveness of our new VAD algorithms. The results reveal that the achieved performance is 90.11 and 91.75% under − 5 db SNR in white and car noise respectively, outperforming most of the state of the art VAD algorithms.
引用
收藏
页码:1791 / 1803
页数:12
相关论文
共 50 条
  • [1] A voice activity detection algorithm in spectro-temporal domain using sparse representation
    Eshaghi, Mohadese
    Razzazi, Farbod
    Behrad, Alireza
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1791 - 1803
  • [2] A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain
    Mavaddati, S.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2023, 36 (08): : 1478 - 1488
  • [3] Spectro-Temporal Attention-Based Voice Activity Detection
    Lee, Younglo
    Min, Jeongki
    Han, David K.
    Ko, Hanseok
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 131 - 135
  • [4] Bioinspired sparse spectro-temporal representation of speech for robust classification
    Martinez, C.
    Goddard, J.
    Milone, D.
    Rufiner, H.
    COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05): : 336 - 348
  • [5] Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection
    Fukuda, Takashi
    Ichikawa, Osamu
    Nishimura, Masafumi
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 834 - 844
  • [6] A Two-stage Singing Voice Separation Algorithm Using Spectro-temporal Modulation Features
    Yen, Frederick Z.
    Huang, Mao-Chang
    Chi, Tai-Shih
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3321 - 3324
  • [7] THE PHONOCHROME - A COHERENT SPECTRO-TEMPORAL REPRESENTATION OF SOUND
    JOHANNESMA, P
    AERTSEN, A
    CRANEN, B
    VANERNING, L
    HEARING RESEARCH, 1981, 5 (2-3) : 123 - 145
  • [8] Spectro-temporal modulation detection in children
    Kirby, Benjamin J.
    Browning, Jenna M.
    Brennan, Marc A.
    Spratford, Meredith
    McCreery, Ryan W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (05): : EL465 - EL468
  • [9] Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
    Esfandian, N.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (01): : 105 - 111
  • [10] Analysis of Spectro-Temporal Modulation Representation for Deep-Fake Speech Detection
    Cheng, Haowei
    Mawalim, Candy Olivia
    Li, Kai
    Wang, Lijun
    Unoki, Masashi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1822 - 1829