Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation

被引：0

作者：

Fontaine, Mathieu ^{[1
]}

Sekiguchi, Kouhei ^{[1
]}

Nugraha, Aditya Arie ^{[1
]}

Bando, Yoshiaki ^{[1
,2
]}

Yoshii, Kazuyoshi ^{[1
,3
]}

机构：

[1] RIKEN, AIP, Tokyo, Japan

[2] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan

[3] Kyoto Univ, Grad Sch Informat, Kyoto, Japan

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech enhancement; dereverberation; automatic speech recognition; alpha-stable model; joint diagonalization;

D O I：

10.21437/Interspeech.2021-742

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper proposes alpha-stable autoregressive fast multichannel nonnegative matrix factorization (alpha-AR-FastMNMF), a robust joint blind speech enhancement and dereverberation method for improved automatic speech recognition in a realistic adverse environment. The state-of-the-art versatile blind source separation method called FastMNMF that assumes the short-time Fourier transform (STFT) coefficients of a direct sound to follow a circular complex Gaussian distribution with jointly-diagonalizable full-rank spatial covariance matrices was extended to AR-FastMNMF with an autoregressive reverberation model. Instead of the light-tailed Gaussian distribution, we use the heavy-tailed alpha-stable distribution, which also has the reproductive property useful for the additive source modeling, to better deal with the large dynamic range of the direct sound. The experimental results demonstrate that the proposed alpha-AR-FastMNMF works well as a front-end of an automatic speech recognition system. It outperforms alpha-AR-ILRMA, which is a special case of alpha-AR-FastMNMF, and their Gaussian counterparts, i.e., AR-FastMNMF and AR-ILRMA, in terms of the speech signal quality metrics and word error rate.

引用

页码：661 / 665

页数：5

共 46 条

[1] Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization
Fontaine, Mathieu
Sekiguchi, Kouhei
Nugraha, Aditya Arie
Yoshii, Kazuyoshi
[J]. INTERSPEECH 2020, 2020, : 4541 - 4545
[2] AUTOREGRESSIVE FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR JOINT BLIND SOURCE SEPARATION AND DEREVERBERATION
Sekiguchi, Kouhei
Bando, Yoshiaki
Nugraha, Aditya Arie
Fontaine, Mathieu
Yoshii, Kazuyoshi
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 511 - 515
[3] Alpha-Stable Matrix Factorization
Simsekli, Umut
Liutkus, Antoine
Cemgil, Ali Taylan
[J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (12) : 2289 - 2293
[4] LINEAR DEMIXED DOMAIN MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR SPEECH ENHANCEMENT
Taniguchi, Toru
Masuda, Taro
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 476 - 480
[5] A Bayesian approach to convolutive nonnegative matrix factorization for blind speech dereverberation
Ibarrola, Francisco J.
Di Persia, Leandro E.
Spies, Ruben D.
[J]. SIGNAL PROCESSING, 2018, 151 : 89 - 98
[6] On the use of convolutive nonnegative matrix factorization with mixed penalization for blind speech dereverberation
Ibarrola, Francisco
Di Persia, Leandro
Spies, Ruben
[J]. 2017 XLIII LATIN AMERICAN COMPUTER CONFERENCE (CLEI), 2017,
[7] SPEECH ENHANCEMENT USING SEGMENTAL NONNEGATIVE MATRIX FACTORIZATION
Fan, Hao-Teng
Hung, Jeih-weih
Lu, Xugang
Wang, Syu-Siang
Tsao, Yu
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[8] Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization
Wang, Syu-Siang
Chern, Alan
Tsao, Yu
Hung, Jeih-weih
Lu, Xugang
Lai, Ying-Hui
Su, Borching
[J]. IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (08) : 1101 - 1105
[9] On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain
Murase, Yoshikazu
Chiba, Hironobu
Ono, Nobutaka
Miyabe, Shigeki
Yamada, Takeshi
Makino, Shoji
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[10] SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODERS AND ALPHA-STABLE DISTRIBUTIONS
Leglaive, Simon
Simsekli, Umut
Liutkus, Antoine
Girin, Laurent
Horaud, Radu
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 541 - 545

← 1 2 3 4 5 →