Improving Grapheme-based ASR by Probabilistic Lexical Modeling Approach

被引:0
|
作者
Rasipuram, Ramya [1 ,2 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Automatic speech recognition; hidden Markov model; Lexical modeling; Graphemes; Phonemes; Posterior features; Kullback-Leibler divergence based HMM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usually depends upon the grapheme-to-phoneme relationship within the language. This paper builds upon our recent interpretation of Kullback-Leibler divergence based HMM (KL-HMM) as a probabilistic lexical modeling approach to propose a novel grapheme-based ASR approach where, first a set of acoustic units are derived by modeling context-dependent graphemes in the framework of conventional HMM/Gaussian mixture model (HMM/GMM) system, and then the probabilistic relationship between the derived acoustic units and the lexical units representing graphemes is modeled in the framework of KL-HMM. Through experimental studies on English, where the grapheme-to-phoneme relationship is irregular, we show that the proposed grapheme-based ASR approach (without using any phoneme information) can achieve performance comparable to standard phoneme-based ASR approach.
引用
收藏
页码:505 / 509
页数:5
相关论文
共 50 条
  • [1] Study on the grapheme-based actuator
    Liang, Xu
    Oh, Il-Kwon
    [J]. BEHAVIOR AND MECHANICS OF MULTIFUNCTIONAL MATERIALS AND COMPOSITES 2011, 2011, 7978
  • [2] Universal Grapheme-based Speech Synthesis
    Sitaram, Sunayana
    Parlikar, Alok
    Anumanchipalli, Gopala Krishna
    Black, Alan W.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3360 - 3364
  • [3] INVESTIGATING THE DOWNSTREAM IMPACT OF GRAPHEME-BASED ACOUSTIC MODELING ON SPOKEN UTTERANCE CLASSIFICATION
    Price, Ryan
    Ch, Bhargav Srinivas
    Singhal, Surbhi
    Bangalore, Srinivas
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 727 - 734
  • [4] Dealing with Numbers in Grapheme-Based Speech Recognition
    Janda, Milos
    Karafiat, Martin
    Cernocky, Jan
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 438 - 445
  • [5] PROBABILISTIC LEXICAL MODELING AND UNSUPERVISED TRAINING FOR ZERO-RESOURCED ASR
    Rasipuram, Ramya
    Razavi, Marzieh
    Magimai-Doss, Mathew
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 446 - 451
  • [6] A comparison of phone and grapheme-based spoken term detection
    Wang, Dong
    Frankel, Joe
    Tejedor, Javier
    King, Simon
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4969 - 4972
  • [7] Effect of Gaussian Densities and Amount of Training Data on Grapheme-Based Acoustic Modeling for Arabic
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    [J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 424 - +
  • [8] EFFICIENT RULE SCORING FOR IMPROVED GRAPHEME-BASED LEXICONS
    Hartmann, William
    Lamel, Lori
    Gauvain, Jean-Luc
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1477 - 1481
  • [9] Characteristics of inkjet-printed separators in grapheme-based supercapacitors
    Yang, Y. S.
    You, I. -K.
    Hong, S. -H.
    Yun, H. -G.
    [J]. SEMICONDUCTORS, DIELECTRICS, AND METALS FOR NANOELECTRONICS 12, 2014, 64 (08): : 135 - 137
  • [10] A GRAPHEME-BASED METHOD FOR AUTOMATIC ALIGNMENT OF SPEECH AND TEXT DATA
    Stan, Adriana
    Bell, Peter
    King, Simon
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 286 - 290