Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural Speech Enhancement

被引:5
|
作者
Xiang, Xiaoxiao [1 ,2 ,3 ]
Zhang, Xiaojuan [1 ,2 ]
Chen, Haozhe [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Key Lab Electromagnet Radiat & Sensing Technol, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Decoding; Logic gates; Speech enhancement; Training; Noise measurement; Signal to noise ratio; noise aware; gated linear unit; two-stage network; dilated dense block; RECURRENT NEURAL-NETWORK; ATTENTION;
D O I
10.1109/LSP.2021.3105925
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The neural network has become a new and powerful paradigm in speech enhancement, triggering the surge of research. However, most existing methods directly predict speech and ignore the noise information. In this paper, we propose a two-stage learning and fusion network with noise awareness for time-domain monaural speech enhancement, which can be regarded as a progressive learning process. More specifically, in the first network, speech and noise are estimated simultaneously. The estimated speech and the signal by subtracting estimated noise from the noisy speech are stacked with noisy speech as input to obtain further refined speech in the second network. Both networks are mainly built on encoders and decoders with skip connections. To better control the information flow in the network, we introduce the gated linear unit in the encoder and the decoder, which also can help model complex interactions. Dilated dense blocks are added after each layer of the encoder and decoder to improve the model efficiency and enlarge the receptive field. Our experiments confirm that the proposed two-stage learning network with noise awareness achieves better performance than several advanced systems under various conditions.
引用
收藏
页码:1754 / 1758
页数:5
相关论文
共 50 条
  • [1] A Time-domain Monaural Speech Enhancement with Feedback Learning
    Li, Andong
    Zheng, Chengshi
    Cheng, Linjuan
    Peng, Renhua
    Li, Xiaodong
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 769 - 774
  • [2] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Soren Holdt
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 825 - 838
  • [3] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [4] TSTNN: TWO-STAGE TRANSFORMER BASED NEURAL NETWORK FOR SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Wan, Kai
    He, Bengbeng
    Zh, Wei-Ping
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7098 - 7102
  • [5] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    [J]. NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [6] A two-stage frequency-time dilated dense network for speech enhancement
    Huang, Xiangdong
    Chen, Honghong
    Lu, Wei
    [J]. APPLIED ACOUSTICS, 2022, 201
  • [7] Two-Stage Multi-Target Joint Learning for Monaural Speech Separation
    Nie, Shuai
    Liang, Shan
    Xue, Wei
    Zhang, Xueliang
    Liu, Wenju
    Dong, Like
    Yang, Hong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1503 - 1507
  • [8] PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT
    Du, Zhihao
    Lei, Ming
    Han, Jiqing
    Zhang, Shiliang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6634 - 6638
  • [9] A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
    Yin, Lu
    Li, Junfeng
    Yan, Yonghong
    Akagi, Masato
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07): : 1732 - 1743
  • [10] A NOISE PREDICTION AND TIME-DOMAIN SUBTRACTION APPROACH TO DEEP NEURAL NETWORK BASED SPEECH ENHANCEMENT
    Odelowo, Babafemi O.
    Anderson, David V.
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 372 - 377