Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引：0

作者：

Viet The Bui ^{[1
]}

Tho Chi Luong ^{[2
]}

Oanh Thi Tran ^{[3
]}

机构：

[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore

[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam

[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam

来源：

CYBERNETICS AND SYSTEMS | 2024年 / 55卷 / 07期

关键词：

ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;

D O I：

10.1080/01969722.2022.2145654

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.

引用

页码：1614 / 1630

页数：17

共 50 条

[41] Transfer Learning for Automatic Speech Recognition Systems
Asefisaray, Behnam
Haznedaroglu, Ali
Erden, Mustafa
Arslan, Levent M.
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[42] Vietnamese Voice2Text: A Web Application for Whisper Implementation in Vietnamese Automatic Speech Recognition Tasks: Vietnamese Voice2Text
Nguyen, Quangphuoc
Nguyen, Ngocminh
Dang, Thanhluan
Tran, Vanha
ACM International Conference Proceeding Series, 2023, : 312 - 318
[43] Transformer-Based Approach for Automatic Semantic Financial Document Verification
Toprak, Ahmet
Turan, Metin
IEEE Access, 2024, 12 : 184327 - 184349
[44] A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
de Moor, Aral
van Deursen, Arie
Izadi, Maliheh
PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 28 - 37
[45] A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
Delbrouck, Jean-Benoit
Tits, Noe
Brousmiche, Mathilde
Dupont, Stephane
PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, 2020, : 1 - 7
[46] IMPROVED TEXT NORMALIZATION AND LANGUAGE MODELS FOR SPEED'S AUTOMATIC SPEECH RECOGNITION SYSTEM
Manolache, Cristian
Georgescu, Alexandru-Lucian
Cucu, Horia
Mititelu, Verginica Barbu
Burileanu, Corneliu
PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 115 - 128
[47] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
Gaur, Yashesh
Kibre, Nick
Xue, Jian
Shu, Kangyuan
Wang, Yuhui
Alphanso, Issac
Li, Jinyu
Gong, Yifan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244
[48] Transformer-Based Self-Supervised Learning for Emotion Recognition
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
[49] Transformer-Based Federated Learning Models for Recommendation Systems
Reddy, M. Sujaykumar
Karnati, Hemanth
Sundari, L. Mohana
IEEE ACCESS, 2024, 12 : 109596 - 109607
[50] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Vich, Robert
Nouza, Jan
Vondra, Martin
VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +

← 1 2 3 4 5 →