Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

被引:0
|
作者
Sami Keronen
Heikki Kallasjoki
Kalle J. Palomäki
Guy J. Brown
Jort F. Gemmeke
机构
[1] Aalto university,Department of Signal Processing and Acoustics
[2] University of Sheffield,Department of Computer Science
[3] Audience,undefined
[4] Inc.,undefined
关键词
Speech dereverberation; Feature enhancement; Non-negative matrix factorization; Distribution matching;
D O I
暂无
中图分类号
学科分类号
摘要
This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques.
引用
收藏
相关论文
共 50 条
  • [1] Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization
    Keronen, Sami
    Kallasjoki, Heikki
    Palomaki, Kalle J.
    Brown, Guy J.
    Gemmeke, Jort F.
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [2] BASIS COMPENSATION IN NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR SPEECH ENHANCEMENT
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2249 - 2253
  • [3] Non-negative Tensor Factorization for Speech Enhancement
    He, Liang
    Zhang, Weiqiang
    Shi, Mengnan
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2016, 127
  • [4] Feature Weighted Non-Negative Matrix Factorization
    Chen, Mulin
    Gong, Maoguo
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 1093 - 1105
  • [5] NON-NEGATIVE MATRIX FACTORIZATION AS NOISE-ROBUST FEATURE EXTRACTOR FOR SPEECH RECOGNITION
    Schuller, Bjoern
    Weninger, Felix
    Woellmer, Martin
    Sun, Yang
    Rigoll, Gerhard
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4562 - 4565
  • [6] Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    [J]. SPEECH COMMUNICATION, 2017, 87 : 18 - 30
  • [7] Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation
    Carlin, Michael A.
    Malyska, Nicolas
    Quatieri, Thomas F.
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 582 - 585
  • [8] Non-negative Matrix Factorization with Linear Constraints for Single-Channel Speech Enhancement
    Lyubimov, Nikolay
    Kotov, Mikhail
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 446 - 450
  • [9] Non-negative Matrix Factorization Speech Enhancement Method Based on Constraints of Temporal Continuity
    Zou, Qiang
    Sun, Chengli
    Yuan, Conglin
    Sun, Yifan
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 542 - 546
  • [10] A DNN-HMM Approach to Non-negative Matrix Factorization Based Speech Enhancement
    Wang, Ziteng
    Li, Xu
    Wang, Xiaofei
    Fu, Qiang
    Yan, Yonghong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3763 - 3767