Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

被引:0
|
作者
Hasumi, Takuya [1 ]
Nakamura, Tomohiko [1 ]
Takamune, Norihiro [1 ]
Saruwatari, Hiroshi [1 ]
Kitamura, Daichi [2 ]
Takahashi, Yu [3 ]
Kondo, Kazunobu [3 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Kagawa Coll, Natl Inst Technol, Takamatsu, Kagawa, Japan
[3] Yamaha Corp, Shizuoka, Japan
关键词
ICA;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mismatch causes the performance degradation of IDLMA. To tackle this problem, we focus on a blind source separation counterpart of IDLMA, independent low-rank matrix analysis. It uses nonnegative matrix factorization (NMF) as the source model, which can capture source spectral components that only appear in the target mixture, using the low-rank structure of the source spectrogram as a clue. We thus extend the DNN-based source model to encompass the NMF-based source model on the basis of the product-of-expert concept, which we call the product of source models (PoSM). For the proposed PoSM-based IDLMA, we derive a computationally efficient parameter estimation algorithm based on an optimization principle called the majorization-minimization algorithm. Experimental evaluations show the effectiveness of the proposed method.
引用
收藏
页码:1226 / 1233
页数:8
相关论文
共 50 条
  • [1] Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation
    Mogami, Shinichi
    Sumino, Hayato
    Kitamura, Daichi
    Takamune, Norihiro
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    Ono, Nobutaka
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1557 - 1561
  • [2] Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation
    Hasumi, Takuya
    Nakamura, Tomohiko
    Takamune, Norihiro
    Saruwatari, Hiroshi
    Kitamura, Daichi
    Takahashi, Yu
    Kondo, Kazunobu
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 331 - 335
  • [3] Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation
    Makishima, Naoki
    Mogami, Shinichi
    Takamune, Norihiro
    Kitamura, Daichi
    Sumino, Hayato
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    Ono, Nobutaka
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (10) : 1601 - 1615
  • [4] Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation
    Narisawa, Naoki
    Ikeshita, Rintaro
    Takamune, Norihiro
    Kitamura, Daichi
    Nakamura, Tomohiko
    Saruwatari, Hiroshi
    Nakatani, Tomohiro
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 326 - 330
  • [5] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation
    Hasumi, Takuya
    Nakamura, Tomohiko
    Takamune, Norihiro
    Saruwatari, Hiroshi
    Kitamura, Daichi
    Takahashi, Yu
    Kondo, Kazunobu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2680 - 2694
  • [6] Bayesian Multichannel Audio Source Separation Based on Integrated Source and Spatial Models
    Itakura, Kousuke
    Bando, Yoshiaki
    Nakamura, Eita
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 831 - 846
  • [7] Student's t Source and Mixing Models for Multichannel Audio Source Separation
    Leglaive, Simon
    Badeau, Roland
    Richard, Gael
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (06) : 1150 - 1164
  • [8] SEPNET: A DEEP SEPARATION MATRIX PREDICTION NETWORK FOR MULTICHANNEL AUDIO SOURCE SEPARATION
    Inoue, Shota
    Kameoka, Hirokazu
    Li, Li
    Makino, Shoji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 191 - 195
  • [9] BAYESIAN MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR AUDIO SOURCE SEPARATION AND LOCALIZATION
    Itakura, Kousuke
    Bando, Yoshiaki
    Nakamura, Eita
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 551 - 555
  • [10] Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
    Ozerov, Alexey
    Fevotte, Cedric
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 550 - 563