Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation

被引:13
|
作者
Wang, Taihui [1 ,2 ]
Yang, Feiran [3 ]
Yang, Jun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Inst Acoust, State Key Lab Acoust, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Covariance matrices; Time-domain analysis; Microphones; Blind source separation; Time-frequency analysis; Speech processing; Narrowband; convolutive transfer function; nonnegative matrix factorization; spatial covariance matrix; SPATIAL COVARIANCE MODEL; SPEECH DEREVERBERATION; PERMUTATION ALIGNMENT; DOMAIN; MIXTURES; IDENTIFICATION;
D O I
10.1109/TASLP.2022.3145304
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most multichannel blind source separation (BSS) approaches rely on a spatial model to encode the transfer functions from sources to microphones and a source model to encode the source power spectral density. The rank-1 spatial model has been widely exploited in independent component analysis (ICA), independent vector analysis (IVA), and independent low-rank matrix analysis (ILRMA). The full-rank spatial model is also considered in many BSS approaches, such as full-rank spatial covariance matrix analysis (FCA), multichannel nonnegative matrix factorization (MNMF), and FastMNMF, which can improve the separation performance in the case of long reverberation times. This paper proposes a new MNMF framework based on the convolutive transfer function (CTF) for overdetermined BSS. The time-domain convolutive mixture model is approximated by a frequency-wise convolutive mixture model instead of the widely adopted frequency-wise instantaneous mixture model. The iterative projection algorithm is adopted to estimate the demixing matrix, and the multiplicative update rule is employed to estimate nonnegative matrix factorization (NMF) parameters. Finally, the source image is reconstructed using a multichannel Wiener filter. The advantages of the proposed method are twofold. First, the CTF approximation enables us to use a short window to represent long impulse responses. Second, the full-rank spatial model can be derived based on the CTF approximation and slowly time-variant source variances, and close relationships between the proposed method and ILRMA, FCA, MNMF and FastMNMF are revealed. Extensive experiments show that the proposed algorithm achieves a higher separation performance than ILRMA and FastMNMF in reverberant environments.
引用
收藏
页码:802 / 815
页数:14
相关论文
共 50 条
  • [1] Convolutive transfer function-based independent component analysis for overdetermined blind source separation
    Wang, Taihui
    Yang, Feiran
    Li, Nan
    Zhang, Chen
    Yang, Jun
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 22 - 26
  • [2] Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
    Ozerov, Alexey
    Fevotte, Cedric
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 550 - 563
  • [3] MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION IN CONVOLUTIVE MIXTURES. WITH APPLICATION TO BLIND AUDIO SOURCE SEPARATION.
    Ozerov, Alexey
    Fevotte, Cedric
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3137 - +
  • [4] SPARSENESS-BASED MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
    Higuchi, Takuya
    Yoshioka, Takuya
    Nakatani, Tomohiro
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [5] FLOW-BASED FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
    Nugraha, Aditya Arie
    Sekiguchi, Kouhei
    Fontaine, Mathieu
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 501 - 505
  • [6] STUDENT'S T MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
    Kitamura, Koichi
    Bando, Yoshiaki
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [7] Minimum-Volume Multichannel Nonnegative Matrix Factorization for Blind Audio Source Separation
    Wang, Jianyu
    Guan, Shanzheng
    Liu, Shupei
    Zhang, Xiao-Lei
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3089 - 3103
  • [8] AUTOREGRESSIVE FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR JOINT BLIND SOURCE SEPARATION AND DEREVERBERATION
    Sekiguchi, Kouhei
    Bando, Yoshiaki
    Nugraha, Aditya Arie
    Fontaine, Mathieu
    Yoshii, Kazuyoshi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 511 - 515
  • [9] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation
    Fontaine, Mathieu
    Sekiguchi, Kouhei
    Nugraha, Aditya Arie
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1734 - 1748
  • [10] Underdetermined blind source separation using normalized spatial covariance matrix and multichannel nonnegative matrix factorization
    Oh, Son-hook
    Kim, Jung-Han
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (02): : 120 - 130