DEEP MIXTURE DENSITY NETWORK FOR STATISTICAL MODEL-BASED FEATURE ENHANCEMENT

被引:0
|
作者
Kinoshita, Keisuke [1 ]
Delcroix, Mare [1 ]
Ogawa, Atsunori [1 ]
Higuehi, Takuya [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
关键词
Mixture density network; model-based feature enhancement; conditional density; VECTOR TAYLOR-SERIES; SPEECH RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel framework designed to extend conventional deep neural network (DNN)-based feature enhancement approaches. In general, the conventional DNN-based feature enhancement framework aims to map input noisy observation to clean speech or a binary! soft mask in a deterministic way, assuming that there is oneto- one mapping between the input and the output without any uncertainty. However, when we consider that the general feature enhancement problem to be an ill-posed inverse problem where the mapping cannot be uniquely determined given an input signal, the assumption in the conventional approaches is not theoretically correct and potentially limits the performance ofDNN-based feature enhancement. To overcome this problem, this paper proposes utilizing a mixture density network (MDN), which is a neural network that maps an input feature to a set of Gaussian mixture model (GMM) parameters representing the distribution of a target variable. By estimating the distribution of clean speech feature based on MDN, we are now able to explicitly consider the uncertainty in the parameter estimation. Then, we further utilizes the estimated GMM to obtain a refined clean speech estimate in the framework of statistical model-based feature enhancement. In this paper, after detailing the proposed framework and the MDN, we show mathematically and experimentally how MDN appropriately models the uncertainty information. We also show that the proposed method can outperform a conventional DNN-based feature enhancement method.
引用
收藏
页码:251 / 255
页数:5
相关论文
共 50 条
  • [1] IMPROVING STATISTICAL MODEL-BASED SPEECH ENHANCEMENT WITH DEEP NEURAL NETWORKS
    Borgstrom, Bengt J.
    Brandstein, Michael S.
    Dunn, Robert B.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 471 - 475
  • [2] Gaussian mixture model-based contrast enhancement
    Abdoli, Mohsen
    Sarikhani, Hossein
    Ghanbari, Mohammad
    Brault, Patrice
    IET IMAGE PROCESSING, 2015, 9 (07) : 569 - 577
  • [3] Model-Based Feature Enhancement for Reverberant Speech Recognition
    Krueger, Alexander
    Haeb-Umbach, Reinhold
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707
  • [4] Model-based feature enhancement for noisy speech recognition
    Couvreur, C
    Van hamme, H
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1719 - 1722
  • [5] Gaussian mixture model-based target feature extraction and visualization
    Ma, Ji
    Chen, Jinjin
    Chen, Liye
    Zhou, Xingjian
    Qin, Xujia
    Tang, Ying
    Sun, Guodao
    Chen, Jiazhou
    JOURNAL OF VISUALIZATION, 2021, 24 (03) : 545 - 563
  • [6] Gaussian mixture model-based target feature extraction and visualization
    Ji Ma
    Jinjin Chen
    Liye Chen
    Xingjian Zhou
    Xujia Qin
    Ying Tang
    Guodao Sun
    Jiazhou Chen
    Journal of Visualization, 2021, 24 : 545 - 563
  • [7] Deep Photo: Model-Based Photograph Enhancement and Viewing
    Kopf, Johannes
    Neubert, Boris
    Chen, Billy
    Cohen, Michael
    Cohen-Or, Daniel
    Deussen, Oliver
    Uyttendaele, Matt
    Lischinski, Dani
    ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (05):
  • [8] Gaussian model-based statistical matching for image enhancement and segmentation
    Zheng, Yufeng
    VISUAL INFORMATION PROCESSING XVII, 2008, 6978
  • [9] Finite mixture models and model-based clusteringFinite mixture models and model-based clustering
    Melnykov, Volodymyr
    Maitra, Ranjan
    STATISTICS SURVEYS, 2010, 4 : 80 - 116
  • [10] Model-Based Deep Network for Single Image Deraining
    Li, Pengyue
    Tian, Jiandong
    Tang, Yandong
    Wang, Guolin
    Wu, Chengdong
    IEEE ACCESS, 2020, 8 (14036-14047) : 14036 - 14047