Multi-resolution Stacking for Speech Separation Based on Boosted DNN

被引:0
|
作者
Zhang, Xiao-Lei [1 ]
Wang, DeLiang
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
boosted deep neural networks; contextual information; multi-resolution stacking; speech separation; ENHANCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent progress in speech separation shows that deep neural networks (DNN) based supervised methods can improve the performance in difficult noise conditions and exhibit good generalization to unseen noise scenarios. However, existing approaches do not explore contextual information sufficiently. In this paper, we focus on exploring contextual information using DNN. The proposed method has two parts a multi-resolution stacking (MRS) framework and a boosted DNN (bDNN) classifier. The MRS framework trains a stack of classifier ensembles, where each classifier in an ensemble concatenates the raw acoustic feature and the outputs of its bottom ensemble as a new feature, and different classifiers in an ensemble work with different window lengths. The bDNN classifier first generates multiple base predictions for a frame from a given window that is centered on the frame and contains multiple neighboring frames, and then aggregates the base predictions for the final prediction. Our experimental comparison with DNN based speech separation in difficult noise scenarios demonstrates the effectiveness of the proposed method in terms of both prediction accuracy and objective speech intelligibility.
引用
收藏
页码:1745 / 1749
页数:5
相关论文
共 50 条
  • [1] A Multi-Resolution Approach to GAN-Based Speech Enhancement
    Kim, Hyung Yong
    Yoon, Ji Won
    Cheon, Sung Jun
    Kang, Woo Hyun
    Kim, Nam Soo
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 15
  • [2] Boosted multi-resolution spatiotemporal descriptors for facial expression recognition
    Zhao, Guoying
    Pietikainen, Matti
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (12) : 1117 - 1127
  • [3] A multi-resolution envelope-power based model for speech intelligibility
    Jorgensen, Soren
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (01): : 436 - 446
  • [4] Speech signal enhancement based on adaptive multi-resolution form of SVD
    Lu Yanhong
    Qin Xiaohong
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 2, PROCEEDINGS, 2008, : 137 - 140
  • [5] A MULTI-RESOLUTION APPROACH TO COMMON FATE-BASED AUDIO SEPARATION
    Pishdadian, Fatemeh
    Pardo, Bryan
    Liutkus, Antoine
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 566 - 570
  • [6] Bi-Sep: A Multi-Resolution Cross-Domain Monaural Speech Separation Framework
    Ho, Kuan-Hsun
    Hung, Jeih-weih
    Chen, Berlin
    [J]. 2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 72 - 77
  • [7] Speech Recognition System of the Punjabi Language for Multi-Resolution Speech Analysis
    Guglani, Jyoti
    Mishra, A.N.
    [J]. SSRN, 1600,
  • [8] Speech source localization using a multi-resolution technique
    Mahmoudi, D
    [J]. 1998 IEEE 4TH WORKSHOP INTERACTIVE VOICE TECHNOLOGY FOR TELECOMMUNICATIONS APPLICATIONS - IVTTA '98, 1998, : 161 - 165
  • [9] Microphone Array Speech Separation Algorithm based on DNN
    Wu, Chaoyan
    Zhou, Lin
    Chen, Xijin
    Chen, Liyuan
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1305 - 1310
  • [10] Plastic multi-resolution auditory model based neural network for speech enhancement
    Lai, Chen-Yen
    Lo, Yu-Wen
    Shen, Yih-Liang
    Chi, Tai-Shih
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 605 - 609