Multi-target ensemble learning based speech enhancement with temporal-spectral structured target

被引:3
|
作者
Wang, Wenbo [1 ]
Guo, Weiwei [2 ,3 ,4 ]
Liu, Houguang [1 ]
Yang, Jianhua [1 ]
Liu, Songyong [1 ]
机构
[1] China Univ Min & Technol, Sch Mechatron Engn, Xuzhou 221116, Peoples R China
[2] Chinese Peoples Liberat Army Gen Hosp, Coll Otolaryngol Head & Neck Surg, Beijing 100853, Peoples R China
[3] Natl Clin Res Ctr Otolaryngol Dis, Beijing 100853, Peoples R China
[4] Minist Educ, Key Lab Hearing Sci, Beijing 100853, Peoples R China
关键词
Speech enhancement; Temporal -spectral structured target; Multi -target ensemble learning; Sparse nonnegative matrix factorization; RECURRENT NEURAL-NETWORKS; TRAINING TARGETS; NOISE; SEPARATION; FEATURES; QUALITY; BINARY; INTELLIGIBILITY; RECOGNITION; ALGORITHM;
D O I
10.1016/j.apacoust.2023.109268
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, deep neural network (DNN)-based speech enhancement has shown considerable success, and mapping-based and masking-based are the two most commonly used methods. However, these methods do not consider the spectrum structures of signal. In this paper, a novel structured multi-target ensemble learning (SMTEL) framework is proposed, which uses target temporal-spectral structures to improve speech quality and intelligibility. First, the basis matrices of clean speech, noise, and ideal ratio mask (IRM) are captured by the sparse nonnegative matrix factorization, which contain the basic structures of the signal. Second, the basis matrices are co-trained with a multi-target DNN to estimate the activation matrices instead of directly estimating the targets. Then a joint training single layer perceptron is pro-posed to integrate the two targets and further improve speech quality and intelligibility. The sequential floating forward selection method is used to systematically analyze the impact of the integrated targets on enhanced performance, and analyze the effect of the target weights on the results. Finally, the pro-posed method with progressive learning is combined to improve the enhanced performance. Systematic experiments on the UW/NU corpus show that the proposed method achieves the best enhancement effect in the case of low network cost and complexity, especially in visible nonstationary noise environment. Compared with the target integration method which does not use structured targets and the long short-term memory masking method, the speech quality of the proposed method is improved by 25.6 % and 29.2 % of restaurant noise, and the speech intelligibility is improved by 35.5 % and 15.8 %, respectively.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Multi-target Ensemble Learning for Monaural Speech Separation
    Zhang, Hui
    Zhang, Xueliang
    Gao, Guanglai
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1958 - 1962
  • [2] A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement
    Tu, Yan-Hui
    Du, Jun
    Gao, Tian
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1608 - 1619
  • [3] PART-BASED MULTI-TARGET TRACKING WITH STRUCTURED LEARNING
    Zhu, Da-Yong
    Zhang, Xin-Li
    [J]. 2013 10TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2013, : 104 - 107
  • [4] Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
    Abdullah, Salinna
    Zamani, Majid
    Demosthenous, Andreas
    [J]. IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS, 2024, 5 : 141 - 152
  • [5] Constrained Energy Minimization for Hyperspectral Multi-target Detection Based on Ensemble Learning
    Wu, Qinggang
    Liu, Zhongchi
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2021, PT II, 2022, 13088 : 406 - 416
  • [6] A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition
    Zhou, Nan
    Du, Jun
    Tu, Yan-Hui
    Gao, Tian
    Lee, Chin-Hui
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 873 - 877
  • [7] A Novel Approach based on Spectral-Temporal Information Fusion for Multi-Target Detection
    Zhang, Guoliang
    Yang, Chunling
    Zhang, Yan
    Jiao, Yang
    [J]. PROCEEDINGS OF THE 2016 IEEE 11TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2016, : 1661 - 1665
  • [8] Online Discriminative Structured Output SVM Learning for Multi-Target Tracking
    Xu, Yingkun
    Qin, Lei
    Li, Guorong
    Huang, Qingming
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (02) : 190 - 194
  • [9] Incorporating Multi-Target in Multi-Stage Speech Enhancement Model for Better Generalization
    Zhang, Lu
    Wang, Mingjiang
    Li, Andong
    Zhang, Zehua
    Zhuang, Xuyi
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 553 - 558
  • [10] PROGRESSIVE MULTI-TARGET NETWORK BASED SPEECH ENHANCEMENT WITH SNR-PRESELECTION FOR ROBUST SPEAKER DIARIZATION
    Sun, Lei
    Du, Jun
    Zhang, Xueyang
    Gao, Tian
    Fang, Xin
    Lee, Chin-Hui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7099 - 7103