Acoustic Data-Driven Subword Units Obtained through Segment Embedding and Clustering for Spontaneous Speech Recognition

被引:4
|
作者
Bang, Jeong-Uk [1 ]
Kim, Sang-Hun [2 ]
Kwon, Oh-Wook [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect Engn, 1 Chungdae Ro, Cheongju 28644, Chungbuk, South Korea
[2] ETRI, Artificial Intelligence Res Lab, 218 Gajeong Ro, Daejeon 34129, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 06期
关键词
acoustic subword unit; phoneme set; spontaneous speech recognition; BROADCAST DATA; MODEL;
D O I
10.3390/app10062079
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them into fixed-length embedding vectors based on a long short-term memory architecture. We use decision tree-based clustering to find acoustically similar embedding vectors and then build new acoustic subword units by gathering the clustered vectors. To update the lexicon of a speech recognizer, we build a lookup table between the tri-phone units and the units derived from the decision tree. Finally, the proposed lexicon is obtained by updating the original phoneme-based lexicon by referencing the lookup table. To verify the performance of the proposed unit, we compare the proposed unit with the previous units obtained by using the segment-based k-means clustering method or the frame-based decision-tree clustering method. As a result, the proposed unit is shown to produce better performance than the previous units in both spontaneous, and read Korean speech recognition tasks.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
    Zhou, Wei
    Zeineldeen, Mohammad
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    [J]. INTERSPEECH 2021, 2021, : 2886 - 2890
  • [2] Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition
    Bang, Jeong-Uk
    Choi, Mu-Yeol
    Kim, Sang-Hun
    Kwon, Oh-Wook
    [J]. INTERSPEECH 2019, 2019, : 4405 - 4409
  • [3] ACOUSTIC MODELING OF SUBWORD UNITS FOR LARGE VOCABULARY SPEAKER INDEPENDENT SPEECH RECOGNITION
    LEE, CH
    RABINER, LR
    PIERACCINI, R
    WILPON, JG
    [J]. SPEECH AND NATURAL LANGUAGE, 1989, : 280 - 291
  • [4] ACOUSTIC DATA-DRIVEN PRONUNCIATION LEXICON FOR LARGE VOCABULARY SPEECH RECOGNITION
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 374 - 379
  • [5] Using Syllables as Acoustic Units for Spontaneous Speech Recognition
    Hejtmanek, Jan
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 299 - 305
  • [6] Combined optimisation of baseforms and model parameters in speech recognition based on acoustic subword units
    Holter, T
    Svendsen, T
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 199 - 206
  • [7] LATENT PERCEPTUAL MAPPING WITH DATA-DRIVEN VARIABLE-LENGTH ACOUSTIC UNITS FOR TEMPLATE-BASED SPEECH RECOGNITION
    Sundaram, Shiva
    Bellegarda, Jerome R.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4125 - 4128
  • [8] USING DATA-DRIVEN SUBWORD UNITS IN LANGUAGE MODEL OF HIGHLY INFLECTIVE SLOVENIAN LANGUAGE
    Maucec, Mirjam Sepesy
    Rotovnik, Tomaz
    Kacic, Zdravko
    Brest, Janez
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (02) : 287 - 312
  • [9] Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition
    Buera, Luis
    Miguel, Antonio
    Saz, Oscar
    Ortega, Alfonso
    Lleida, Eduardo
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (02): : 296 - 309
  • [10] Scalable algorithms for unsupervised clustering of acoustic data for speech recognition
    Rath, Shakti P.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 233 - 248