THE CU-MFEC CORPUS FOR THAI AND ENGLISH SPELLING SPEECH RECOGNITION

被引:0
|
作者
Kertkeidkachorn, Natthawut [1 ]
Chanjaradwichai, Supadaech [1 ]
Suri, Teera [1 ]
Likitsupin, Krerksak [1 ]
Vorapatratorn, Surapol [1 ]
Hirankan, Pawanrat [1 ]
Limpanadusadee, Worasa [1 ]
Chuetanapinyo, Supakit [1 ]
Pitakpawatkul, Kitanan [1 ]
Puangsri, Natnarong [1 ]
Tangsirirat, Nathacha [1 ]
Trakulsuk, Konlawachara [1 ]
Punyabukkana, Proadpran [1 ]
Suchato, Atiwong [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Spoken Language Syst Res Grp, Bangkok, Thailand
关键词
Speech corpus; Thai spelling corpus; Automatic speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of the efficiency of any Automatic Speech Recognition (ASR) system depends on its speech corpus. This is even more so for recognizers designed for specific tasks. Naturally, an ASR for spelling recognition performs better if it is trained with a spelling speech corpus rather than a generic one. Although several speech corpora are available in Thai, we are still lack of Thai spelling speech corpora. This paper reports collection of experiences gained from constructing CU-MFEC, a Thai spelling speech corpus designed for form filling or other applications of similar nature. CU-MFEC corpus employed 100 speakers and encompassed 58 hours and 10 minutes of speech. There are four sets of the corpus; Alphabets with short pauses, Continuous free spelling, Sentences, and Numbers and commands. We evaluated its efficiency by utilizing CU-MFEC with speech recognition tasks and found the accuracy rate of 79.37% for spelling task and 54.92% for connected spelling task.
引用
收藏
页码:18 / 23
页数:6
相关论文
共 50 条
  • [21] Phonetically Balanced Code-Mixed Speech Corpus for Hindi-English Automatic Speech Recognition
    Pandey, Ayushi
    Srivastava, B. M. L.
    Kumar, Rohit
    Nellore, B. T.
    Teja, K. S.
    Gangashetty, S., V
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1480 - 1484
  • [22] An open and free Speech Corpus for Speaker Recognition: The FSCSR Speech Corpus
    Bouziane, Ayoub
    Kadi, Houda
    Hourri, Soufiane
    Kharroubi, Jamal
    [J]. 2016 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2016,
  • [23] A MULTI PURPOSE AND LARGE SCALE SPEECH CORPUS IN PERSIAN AND ENGLISH FOR SPEAKER AND SPEECH RECOGNITION: THE DEEPMINE DATABASE
    Zeinali, Hossein
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 397 - 402
  • [24] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [25] The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
    Mukiibi, Jonathan
    Katumba, Andrew
    Nakatumba-Nabende, Joyce
    Hussein, Ali
    Meyer, Josh
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1945 - 1954
  • [26] MECOS: A bilingual Manipuri-English spontaneous code-switching speech corpus for automatic speech recognition
    Singh, Naorem Karline
    Chanu, Yambem Jina
    Pangsatabam, Hoomexsun
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [27] A speech recognition and speech corpus system based on Matlab
    He, Q
    Zhang, YW
    [J]. PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, : 559 - 562
  • [28] Urdu Speech Corpus and Preliminary Results on Speech Recognition
    Ali, Hazrat
    Ahmad, Nasir
    Hafeez, Abdul
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2016, 2016, 629 : 317 - 325
  • [29] MINIMALLY BALANCED CORPUS FOR SPEECH RECOGNITION
    Irtza, Saad
    Hussain, Sarmad
    [J]. 2013 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS SIGNAL PROCESSING, AND THEIR APPLICATIONS (ICCSPA'13), 2013,
  • [30] Creation of Marathi Speech Corpus for Automatic Speech Recognition
    Gaikwad, Santosh
    Gawali, Bharti
    Mehrotra, Suresh
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,