A Novel Approach for Vietnamese Speech Recognition Using Conformer

被引:0
|
作者
Tuan, Nguyen Van Anh [1 ]
Hoa, Nguyen Thi Thanh [1 ]
Dat, Nguyen Thanh [1 ]
Tuan, Pham Minh [1 ]
Truong, Dao Duy [1 ]
Phuc, Dang Thi [1 ]
机构
[1] Ind Univ Ho Chi Minh City, Fac Informat Technol, Ho Chi Minh City, Vietnam
关键词
Deep learning; CTC Joint CTC/Attention; Conformer; Vietnamese speech recognition;
D O I
10.1007/978-981-19-8069-5_53
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on speech recognition has existed for a long time, but there is very little research on applying deep learning to Vietnamese language speech recognition. In this paper, we solve the Vietnamese speech recognition problem by deep learning speech recognition frameworks including CTC and Joint CTC/Attention combined with encoder architectures Conformer. Experimental results achieved moderate accuracy using over 115 h of training data of VLSP and Vivos. Compared with the other models, the training results show that the Conformer model trained on CTC achieved good results with a WER value of 20%. Training on big data gives remarkable results and is the basis for us to continue improving the model and increasing accuracy in the future.
引用
收藏
页码:723 / 730
页数:8
相关论文
共 50 条
  • [21] A robust method for the Vietnamese handwritten and speech recognition
    Quan, VH
    Trung, PN
    Nguyen, DHH
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 732 - 735
  • [22] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
    Deng, Jiajun
    Xie, Xurong
    Wang, Tianzi
    Cui, Mingyu
    Xue, Boyang
    Jin, Zengrui
    Geng, Mengzhe
    Li, Guinan
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 2623 - 2627
  • [23] Efficient conformer-based speech recognition with linear attention
    Li, Shengqiang
    Xu, Menglong
    Zhang, Xiao-Lei
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 448 - 453
  • [24] A NOVEL APPROACH USING MODULATION FEATURES FOR MULTIPHONE-BASED SPEECH RECOGNITION
    Clark, Pascal
    Sell, Gregory
    Atlas, Les
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5264 - 5267
  • [25] HMM/MLP Speech Recognition System Using a Novel Data Clustering Approach
    Lazli, Lilia
    Boukadoum, Mounir
    Mohamed, Otmane Ait
    [J]. 2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,
  • [26] Audio-Visual Efficient Conformer for Robust Speech Recognition
    Burchi, Maxime
    Timofte, Radu
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2257 - 2266
  • [27] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    [J]. INTERSPEECH 2021, 2021, : 2736 - 2740
  • [28] Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging
    Hien Nguyen Thi Thu
    Binh Nguyen Thai
    Hung Nguyen Vu Bao
    Truong Do Quoc
    Mai Luong Chi
    Huyen Nguyen Thi Minh
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 430 - 434
  • [29] A Hybrid Approach to Vietnamese Word Segmentation using Part of Speech tags
    Dang Due Pham
    Giang Binh Tran
    Son Bao Pham
    [J]. INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2009), 2009, : 154 - 161
  • [30] Speech emotion recognition using a fuzzy approach
    Ton-That, An H.
    Cao, Nhan T.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (02) : 1587 - 1597