Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation

被引:0
|
作者
Fontaine, Mathieu [1 ]
Sekiguchi, Kouhei [1 ]
Nugraha, Aditya Arie [1 ]
Bando, Yoshiaki [1 ,2 ]
Yoshii, Kazuyoshi [1 ,3 ]
机构
[1] RIKEN, AIP, Tokyo, Japan
[2] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan
[3] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
来源
关键词
speech enhancement; dereverberation; automatic speech recognition; alpha-stable model; joint diagonalization;
D O I
10.21437/Interspeech.2021-742
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper proposes alpha-stable autoregressive fast multichannel nonnegative matrix factorization (alpha-AR-FastMNMF), a robust joint blind speech enhancement and dereverberation method for improved automatic speech recognition in a realistic adverse environment. The state-of-the-art versatile blind source separation method called FastMNMF that assumes the short-time Fourier transform (STFT) coefficients of a direct sound to follow a circular complex Gaussian distribution with jointly-diagonalizable full-rank spatial covariance matrices was extended to AR-FastMNMF with an autoregressive reverberation model. Instead of the light-tailed Gaussian distribution, we use the heavy-tailed alpha-stable distribution, which also has the reproductive property useful for the additive source modeling, to better deal with the large dynamic range of the direct sound. The experimental results demonstrate that the proposed alpha-AR-FastMNMF works well as a front-end of an automatic speech recognition system. It outperforms alpha-AR-ILRMA, which is a special case of alpha-AR-FastMNMF, and their Gaussian counterparts, i.e., AR-FastMNMF and AR-ILRMA, in terms of the speech signal quality metrics and word error rate.
引用
收藏
页码:661 / 665
页数:5
相关论文
共 46 条
  • [1] Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization
    Fontaine, Mathieu
    Sekiguchi, Kouhei
    Nugraha, Aditya Arie
    Yoshii, Kazuyoshi
    [J]. INTERSPEECH 2020, 2020, : 4541 - 4545
  • [2] AUTOREGRESSIVE FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR JOINT BLIND SOURCE SEPARATION AND DEREVERBERATION
    Sekiguchi, Kouhei
    Bando, Yoshiaki
    Nugraha, Aditya Arie
    Fontaine, Mathieu
    Yoshii, Kazuyoshi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 511 - 515
  • [3] Alpha-Stable Matrix Factorization
    Simsekli, Umut
    Liutkus, Antoine
    Cemgil, Ali Taylan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (12) : 2289 - 2293
  • [4] LINEAR DEMIXED DOMAIN MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR SPEECH ENHANCEMENT
    Taniguchi, Toru
    Masuda, Taro
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 476 - 480
  • [5] A Bayesian approach to convolutive nonnegative matrix factorization for blind speech dereverberation
    Ibarrola, Francisco J.
    Di Persia, Leandro E.
    Spies, Ruben D.
    [J]. SIGNAL PROCESSING, 2018, 151 : 89 - 98
  • [6] On the use of convolutive nonnegative matrix factorization with mixed penalization for blind speech dereverberation
    Ibarrola, Francisco
    Di Persia, Leandro
    Spies, Ruben
    [J]. 2017 XLIII LATIN AMERICAN COMPUTER CONFERENCE (CLEI), 2017,
  • [7] SPEECH ENHANCEMENT USING SEGMENTAL NONNEGATIVE MATRIX FACTORIZATION
    Fan, Hao-Teng
    Hung, Jeih-weih
    Lu, Xugang
    Wang, Syu-Siang
    Tsao, Yu
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [8] Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization
    Wang, Syu-Siang
    Chern, Alan
    Tsao, Yu
    Hung, Jeih-weih
    Lu, Xugang
    Lai, Ying-Hui
    Su, Borching
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (08) : 1101 - 1105
  • [9] On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain
    Murase, Yoshikazu
    Chiba, Hironobu
    Ono, Nobutaka
    Miyabe, Shigeki
    Yamada, Takeshi
    Makino, Shoji
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [10] SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODERS AND ALPHA-STABLE DISTRIBUTIONS
    Leglaive, Simon
    Simsekli, Umut
    Liutkus, Antoine
    Girin, Laurent
    Horaud, Radu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 541 - 545