Constructing a Phonetic Transcribed Text Corpus for Southern Thai Dialect Speech Recognition

被引:0
|
作者
Aunkaew, Sittichok [1 ]
Karnjanadecha, Montri [1 ]
Wutiwiwatchai, Chai [2 ]
机构
[1] Prince Songkla Univ, Fac Engn, Dept Comp Engn, Hat Yai, Songkhla, Thailand
[2] NECTEC, Speech & Audio Technol Lab, Klongluang, Pathumthani, Thailand
关键词
Southern Thai dialect; speech recognition; word segmentation; pronunciation dictionary; phonetic transcription;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents the progress of the development of a Southern Thai dialect speech corpus to build automatic speech recognition. This is the recent Southern Thai dialect pronunciation dictionary containing more than 15,000 words, including words from a Southern Thai dialect dictionary, academic thesis papers, and online article resources. In this study, a hybrid technique was proposed to construct a phonetic of a Southern Thai dialect in building a Southern Thai Dialect Continuous Speech Recognition corpus. This system achieves 97.35% word accuracy in 6,500 sentences of the Southern Thai dialect. The complete resources are freely used available for the purpose of any research in order to encourage speech technology research in Southern Thai dialect.
引用
收藏
页码:69 / 73
页数:5
相关论文
共 23 条
  • [1] Using Speech Recognition Technique for Constructing a Phonetically Transcribed Taiwanese (Min-nan) Text Corpus
    Liang, Min-Siong
    Lyu, Ren-Yuan
    Chiang, Yuang-Chin
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 193 - +
  • [2] Northern Thai Dialect Text to Speech
    Chao-angthong, Pannakorn
    Suchato, Atiwong
    Punyabukkana, Proadpran
    [J]. PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [3] Satja: Thai Elderly Speech Corpus for Speech Recognition
    Prajongjai, Suphunnee
    Triyason, Tuul
    Mongkolnam, Pornchai
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY (IAIT2018), 2018,
  • [4] A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
    Masmoudi, Abir
    Khemakhem, Mariem Ellouze
    Esteve, Yannick
    Belguith, Lamia Hadrich
    Habash, Nizar
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [5] PHONETIC AND PROSODICALLY RICH TRANSCRIBED SPEECH CORPUS IN INDIAN LANGUAGES : BENGALI AND ODIA
    Kumar, Sunil S. B.
    Rao, K. Sreenivasa
    Pati, Debadatta
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [6] Indonesian Corpus Constructing and Text Processing for Speech Synthesis
    Kong, Xuan
    Yang, Jian
    [J]. 2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 193 - 196
  • [7] Arabic Speech Emotion Recognition From Saudi Dialect Corpus
    Aljuhani, Reem Hamed
    Alshutayri, Areej
    Alahdal, Shahd
    [J]. IEEE ACCESS, 2021, 9 : 127081 - 127085
  • [8] An Annotated Speech Corpus of Rare Dialect for Recognition-Take Dali Dialect as an Example
    Huang, Tian
    Yang, Dongqi
    Qin, Wanyun
    Zhang, Shubo
    Li, Binyang
    Li, Yan
    [J]. COGNITIVE COMPUTING, ICCC 2021, 2022, 12992 : 3 - 13
  • [9] Constructing a speech audio–video corpus by aligning long segments of speech and text
    Karpukhin I.A.
    Konushin A.S.
    [J]. Moscow University Computational Mathematics and Cybernetics, 2017, 41 (2) : 97 - 103
  • [10] Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results
    Polat, Huseyin
    Oyucu, Saadin
    [J]. SYMMETRY-BASEL, 2020, 12 (02):