Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

被引:3
|
作者
Liu, Houguang [1 ]
Wang, Wenbo [1 ]
Xue, Lin [1 ]
Yang, Jianhua [1 ]
Wang, Zhihua [1 ]
Hua, Chunli [1 ]
机构
[1] China Univ Min & Technol, Sch Mechatron Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; discrete wavelet packet transform; nonnegative matrix factorisation; Itakura-Saito divergence; NOISE; ALGORITHMS; SEPARATION; QUALITY; NMF;
D O I
10.24425/aoa.2020.134072
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement (SE). However, there are two problems reducing the performance of the traditional NMF-based SE algorithms. One is related to the overlap-and-add operation used in the short time Fourier transform (STFT) based signal reconstruction, and the other is the Euclidean distance used commonly as an objective function; these methods can cause distortion in the SE process. In order to get over these shortcomings, we propose a novel SE joint framework which combines the discrete wavelet packet transform (DWPT) and the Itakura-Saito nonnegative matrix factorisation (ISNMF). In this approach, the speech signal was first split into a series of subband signals using the DWPT. Then, the ISNMF was used to enhance the speech for each subband signal. Finally, the inverse DWPT (IDWT) was utilised to reconstruct these enhanced speech subband signals. The experimental results show that the proposed joint framework effectively enhances the performance of speech enhancement and performs better in the unseen noise case compared to the traditional NMF methods.
引用
收藏
页码:565 / 572
页数:8
相关论文
共 50 条
  • [1] MASK ESTIMATE THROUGH ITAKURA-SAITO NONNEGATIVE RPCA FOR SPEECH ENHANCEMENT
    Min, Gang
    Zhang, Xiongwei
    Zou, Xia
    Sun, Meng
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [2] ITAKURA-SAITO NONNEGATIVE MATRIX FACTORIZATION WITH GROUP SPARSITY
    Lefevre, Augustin
    Bach, Francis
    Fevotte, Cedric
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 21 - 24
  • [3] ONLINE ALGORITHMS FOR NONNEGATIVE MATRIX FACTORIZATION WITH THE ITAKURA-SAITO DIVERGENCE
    Lefevre, Augustin
    Bach, Francis
    Fevotte, Cedric
    [J]. 2011 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2011, : 313 - 316
  • [4] EFFICIENT ALGORITHMS FOR MULTICHANNEL EXTENSIONS OF ITAKURA-SAITO NONNEGATIVE MATRIX FACTORIZATION
    Sawada, Hiroshi
    Kameoka, Hirokazu
    Araki, Shoko
    Ueda, Naonori
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 261 - 264
  • [5] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
    Fevotte, Cedric
    Bertin, Nancy
    Durrieu, Jean-Louis
    [J]. NEURAL COMPUTATION, 2009, 21 (03) : 793 - 830
  • [6] Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization
    Magron, Paul
    Virtanen, Tuomas
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 856 - 860
  • [7] MAJORIZATION-MINIMIZATION ALGORITHM FOR SMOOTH ITAKURA-SAITO NONNEGATIVE MATRIX FACTORIZATION
    Fevotte, Cedric
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1980 - 1983
  • [8] Speech information hiding method based on Itakura-Saito measure and psychoacoustic model
    Qi, Yin-Cheng
    Wang, Hui
    Yuan, Jin-Sha
    [J]. PROCEEDINGS OF 2008 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL, VOLS 1 AND 2, 2008, : 1739 - 1742
  • [9] A Risk-Estimation-Based Comparison of Mean Square Error and Itakura-Saito Distortion Measures for Speech Enhancement
    Muraka, Nagarjuna Reddy
    Seelamantula, Chandra Sekhar
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 356 - 359
  • [10] Combined discrete wavelet transform and wavelet packet decomposition for speech enhancement
    Wang, Zhen-li
    Yang, Jie
    Zhang, Xiong-wei
    [J]. 2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1107 - +