Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods

被引:26
|
作者
Zheng, Chengshi [1 ,2 ,4 ,5 ]
Zhang, Huiyong [1 ,2 ]
Liu, Wenzhe [1 ,2 ]
Luo, Xiaoxue [1 ,2 ]
Li, Andong [1 ,2 ]
Li, Xiaodong [1 ,2 ]
Moore, Brian C. J. [3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Univ Cambridge, Dept Psychol, Cambridge Hearing Grp, Cambridge, England
[4] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
来源
TRENDS IN HEARING | 2023年 / 27卷
关键词
speech enhancement; speech dereverberation; multistage learning; noise estimation; deep complex network; GENERALIZED SPECTRAL SUBTRACTION; NOISE-REDUCTION ALGORITHM; RECURRENT NEURAL-NETWORKS; SQUARE ERROR ESTIMATION; HEARING-AID DELAYS; STATISTICAL-MODEL; SOURCE SEPARATION; MMSE ESTIMATOR; MUSICAL NOISE; PHASE;
D O I
10.1177/23312165231209913
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Frequency-domain monaural speech enhancement has been extensively studied for over 60 years, and a great number of methods have been proposed and applied to many devices. In the last decade, monaural speech enhancement has made tremendous progress with the advent and development of deep learning, and performance using such methods has been greatly improved relative to traditional methods. This survey paper first provides a comprehensive overview of traditional and deep-learning methods for monaural speech enhancement in the frequency domain. The fundamental assumptions of each approach are then summarized and analyzed to clarify their limitations and advantages. A comprehensive evaluation of some typical methods was conducted using the WSJ + Deep Noise Suppression (DNS) challenge and Voice Bank + DEMAND datasets to give an intuitive and unified comparison. The benefits of monaural speech enhancement methods using objective metrics relevant for normal-hearing and hearing-impaired listeners were evaluated. The objective test results showed that compression of the input features was important for simulated normal-hearing listeners but not for simulated hearing-impaired listeners. Potential future research and development topics in monaural speech enhancement are suggested.
引用
收藏
页数:52
相关论文
共 50 条
  • [1] A Time-domain Monaural Speech Enhancement with Feedback Learning
    Li, Andong
    Zheng, Chengshi
    Cheng, Linjuan
    Peng, Renhua
    Li, Xiaodong
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 769 - 774
  • [2] FREQUENCY-DOMAIN ADAPTIVE POSTFILTERING FOR ENHANCEMENT OF NOISY SPEECH
    WANG, FM
    KABAL, P
    RAMACHANDRAN, RP
    OSHAUGHNESSY, D
    SPEECH COMMUNICATION, 1993, 12 (01) : 41 - 56
  • [3] Speech feature enhancement based on frequency-domain ICA
    Lü, Zhao
    Wu, Xiao-Pei
    Li, Mi
    Zhendong yu Chongji/Journal of Vibration and Shock, 2011, 30 (02): : 238 - 242
  • [4] Speech Enhancement: Traditional and Deep Learning Techniques
    Gaddamedi, Satya Prasad
    Patel, Anuj
    Chandra, Sabyasachi
    Bharati, Puja
    Ghosh, Nirmalya
    Das Mandal, Shyamal Kumar
    PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 75 - 86
  • [5] Depression Diagnosis Modeling With Advanced Computational Methods: Frequency-Domain eMVAR and Deep Learning
    Uyulan, Caglar
    de la Salle, Sara
    Erguzel, Turker T.
    Lynn, Emma
    Blier, Pierre
    Knott, Verner
    Adamson, Maheen M.
    Zelka, Mehmet
    Tarhan, Nevzat
    CLINICAL EEG AND NEUROSCIENCE, 2022, 53 (01) : 24 - 36
  • [6] Frequency-domain beamformers using conjugate gradient techniques for speech enhancement
    Zhao, Shengkui
    Jones, Douglas L.
    Khoo, Suiyang
    Man, Zhihong
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (03): : 1160 - 1175
  • [7] Frequency-domain beamformers using conjugate gradient techniques for speech enhancement
    Zhao, Shengkui (shengkui.zhao@adsc.com.sg), 1600, Acoustical Society of America (136):
  • [8] A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] Acoustic Modeling from Frequency-Domain Representations of Speech
    Ghahremani, Pegah
    Hadian, Hossein
    Lv, Hang
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1596 - 1600
  • [10] IMPROVING ROBUSTNESS OF DEEP LEARNING BASED MONAURAL SPEECH ENHANCEMENT AGAINST PROCESSING ARTIFACTS
    Tan, Ke
    Wang, DeLiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6914 - 6918