A Novel Approach for Vietnamese Speech Recognition Using Conformer

被引：0

作者：

Tuan, Nguyen Van Anh ^{[1
]}

Hoa, Nguyen Thi Thanh ^{[1
]}

Dat, Nguyen Thanh ^{[1
]}

Tuan, Pham Minh ^{[1
]}

Truong, Dao Duy ^{[1
]}

Phuc, Dang Thi ^{[1
]}

机构：

[1] Ind Univ Ho Chi Minh City, Fac Informat Technol, Ho Chi Minh City, Vietnam

来源：

FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022 | 2022年 / 1688卷

关键词：

Deep learning; CTC Joint CTC/Attention; Conformer; Vietnamese speech recognition;

D O I：

10.1007/978-981-19-8069-5_53

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research on speech recognition has existed for a long time, but there is very little research on applying deep learning to Vietnamese language speech recognition. In this paper, we solve the Vietnamese speech recognition problem by deep learning speech recognition frameworks including CTC and Joint CTC/Attention combined with encoder architectures Conformer. Experimental results achieved moderate accuracy using over 115 h of training data of VLSP and Vivos. Compared with the other models, the training results show that the Conformer model trained on CTC achieved good results with a WER value of 20%. Training on big data gives remarkable results and is the basis for us to continue improving the model and increasing accuracy in the future.

引用

页码：723 / 730

页数：8

共 50 条

[21] A robust method for the Vietnamese handwritten and speech recognition
Quan, VH
Trung, PN
Nguyen, DHH
[J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 732 - 735
[22] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Deng, Jiajun
Xie, Xurong
Wang, Tianzi
Cui, Mingyu
Xue, Boyang
Jin, Zengrui
Geng, Mengzhe
Li, Guinan
Liu, Xunying
Meng, Helen
[J]. INTERSPEECH 2022, 2022, : 2623 - 2627
[23] Efficient conformer-based speech recognition with linear attention
Li, Shengqiang
Xu, Menglong
Zhang, Xiao-Lei
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 448 - 453
[24] A NOVEL APPROACH USING MODULATION FEATURES FOR MULTIPHONE-BASED SPEECH RECOGNITION
Clark, Pascal
Sell, Gregory
Atlas, Les
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5264 - 5267
[25] HMM/MLP Speech Recognition System Using a Novel Data Clustering Approach
Lazli, Lilia
Boukadoum, Mounir
Mohamed, Otmane Ait
[J]. 2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,
[26] Audio-Visual Efficient Conformer for Robust Speech Recognition
Burchi, Maxime
Timofte, Radu
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2257 - 2266
[27] SE-Conformer: Time-Domain Speech Enhancement using Conformer
Kim, Eesung
Seo, Hyeji
[J]. INTERSPEECH 2021, 2021, : 2736 - 2740
[28] Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging
Hien Nguyen Thi Thu
Binh Nguyen Thai
Hung Nguyen Vu Bao
Truong Do Quoc
Mai Luong Chi
Huyen Nguyen Thi Minh
[J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 430 - 434
[29] A Hybrid Approach to Vietnamese Word Segmentation using Part of Speech tags
Dang Due Pham
Giang Binh Tran
Son Bao Pham
[J]. INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2009), 2009, : 154 - 161
[30] Speech emotion recognition using a fuzzy approach
Ton-That, An H.
Cao, Nhan T.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (02) : 1587 - 1597

← 1 2 3 4 5 →