Multi-lingual Transformer Training for Khmer Automatic Speech Recognition

被引:0
|
作者
Soky, Kak [1 ,4 ,5 ]
Li, Sheng [2 ]
Kawahara, Tatsuya [3 ]
Seng, Sopheap [1 ]
机构
[1] Natl Inst Posts Telecoms & ICT NIPTICT, Phnom Penh, Cambodia
[2] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[4] NIPTICT, Phnom Penh, Cambodia
[5] Minist Educ Youth & Sports MoEYS, Phnom Penh, Cambodia
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Currently, there are three challenges for constructing reliable ASR systems for the Khmer language: (1) the lack of language resources (text and speech corpora) in digital form, (2) the writing system without explicit word boundary, and (3) the pronunciation model is not well studied. In this paper, to avoid the extensive work on selecting proper acoustic units (e.g., phones, syllables) and preparing the frame-level labels on the traditional DNN-HMM framework, we directly use words or characters as the label using state-of-the-art transformer-based end-to-end model. Moreover, we use the multi-lingual training framework to tackle the low-resource data problem. All experiments are performed on the Basic Expressions Travel Corpus (BTEC) datasets. The experiments show that the proposed multi-lingual transformer-based end-to-end model can achieve significant improvement compared to the DNN-HMM baseline model(1).
引用
收藏
页码:1893 / 1896
页数:4
相关论文
共 50 条
  • [21] Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media
    Mishra S.
    Prasad S.
    Mishra S.
    [J]. SN Computer Science, 2021, 2 (2)
  • [22] Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition
    Vanderreydt, Geoffroy
    Remy, Francois
    Demuynck, Kris
    [J]. INTERSPEECH 2022, 2022, : 3053 - 3057
  • [23] Multi-lingual fingerspelling recognition for handicapped kiosk
    Kindiroglu A.A.
    Yalcin H.
    Aran O.
    Hruz M.
    Campr P.
    Akarun L.
    Karpov A.
    [J]. Pattern Recognition and Image Analysis, 2011, 21 (3) : 402 - 406
  • [24] Online Character Recognition in Multi-lingual Framework
    Vidya, V.
    Indhu, T. R.
    Bhadran, V. K.
    [J]. INTELLIGENT SYSTEMS TECHNOLOGIES AND APPLICATIONS, VOL 1, 2016, 384 : 153 - 162
  • [25] Multi-lingual and multi-modal speech processing and applications
    Ivanecky, J
    Fischer, J
    Mast, M
    Kunzmann, S
    Ross, T
    Fischer, V
    [J]. PATTERN RECOGNITION, PROCEEDINGS, 2005, 3663 : 149 - 159
  • [26] Multi-Lingual Speech Emotion Recognition: Investigating Similarities between English and German Languages
    Devi, Ghaayathri K.
    Likhitha, Kolluru
    Akshaya, J.
    Rfj, Gokul
    Lal, Jyothish G.
    [J]. 2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [27] A Multi-Lingual Speech Recognition-Based Framework to Human-Drone Interaction
    Choutri, Kheireddine
    Lagha, Mohand
    Meshoul, Souham
    Batouche, Mohamed
    Kacel, Yasmine
    Mebarkia, Nihad
    [J]. ELECTRONICS, 2022, 11 (12)
  • [28] JS']JSPEECH: A MULTI-LINGUAL CONVERSATIONAL SPEECH CORPUS
    Choobbasti, Ali Janalizadeh
    Gholamian, Mohammad Erfan
    Vaheb, Amir
    Safavi, Saeid
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 927 - 933
  • [29] Development of the "VoiceTra" Multi-Lingual Speech Translation System
    Matsuda, Shigeki
    Hayashi, Teruaki
    Ashikari, Yutaka
    Shiga, Yoshinori
    Kashioka, Hidenori
    Yasuda, Keiji
    Okuma, Hideo
    Uchiyama, Masao
    Sumita, Eiichiro
    Kawai, Hisashi
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (04): : 621 - 632
  • [30] MULTI-LINGUAL SPEECH RECOGNITION WITH LOW-RANK MULTI-TASK DEEP NEURAL NETWORKS
    Mohan, Aanchan
    Rose, Richard
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4994 - 4998