Feature Vector Normalization with Combined Standard and Throat Microphones for Robust ASR

被引:0
|
作者
Buera, Luis [1 ]
Miguel, Antonio [1 ]
Saz, Oscar [1 ]
Ortega, Alfonso [1 ]
Lleida, Eduardo [1 ]
机构
[1] Univ Zaragoza, Commun Technol Grp GTC, E-50009 Zaragoza, Spain
关键词
Throat microphone; robust speech recognition; feature vector normalization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose on-line unsupervised compensation technique for robust speech recognition that combines standard and throat microphone feature vectors. The solution, called Multi-Environment Model-based Linear Normalization with Throat microphone information, MEMLINT, is an extension of MEM-LIN formulation. Hence, standard microphone noisy space and throat microphone space arc modelled as GMMs and a set of linear transformations are learnt from data associated to each pair of Gaussians (one for each GMM) using training stereo data. On the other hand, to compensate some kinds of degradation which are not considered in MEMLINT, we propose to use jointly an on-line unsupervised acoustic model adaptation method based on rotation transformations over an expanded HMM-state space (augMented stAte space acousTic dEcoder, MATE). Some experiments with an own recorded database were carried out, showing that the proposed approach significantly outperforms the single microphone approach.
引用
收藏
页码:1289 / 1292
页数:4
相关论文
共 50 条
  • [31] ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION
    Sharma, Dushyant
    Nour-Eldin, Amr
    Harding, Philip
    Karimian-Azari, Sam
    Naylor, Patrick A.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 296 - 300
  • [32] Temporal modulation normalization for robust speech feature extraction and recognition
    Xugang Lu
    Shigeki Matsuda
    Masashi Unoki
    Satoshi Nakamura
    Multimedia Tools and Applications, 2011, 52 : 187 - 199
  • [33] Robust Image Watermarking Using Feature Points and Image Normalization
    Na, Wei
    Yamaguchi, Kazuhiko
    Cedillo-Hernandez, Manuel
    Nakano-Miyatake, Mariko
    Perez-Meana, Hector
    2010 IEEE ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2010), 2010, : 313 - 318
  • [34] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
  • [35] Temporal structure normalization of speech feature for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) : 500 - 503
  • [36] Double Gaussian based feature normalization for robust speech recognition
    Liu, B
    Dai, LR
    Li, JY
    Wang, RH
    2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 253 - 256
  • [37] Within-Class Feature Normalization for Robust Speech Recognition
    Liao, Yuan-Fu
    Hsu, Chi-Hui
    Yang, Chi-Min
    Lin, Jeng-Shien
    Chang, Sen-Chia
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1020 - 1023
  • [38] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [39] Articulatory Feature Detection with Support Vector Machines for Integration into ASR and Phone Recognition
    Chaudhari, Upendra V.
    Picheny, Michael
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 93 - 98
  • [40] Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition
    Choi, Bo Kyeong
    Ban, Sung Min
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 316 - 320