Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引:0
|
作者
Viet The Bui [1 ]
Tho Chi Luong [2 ]
Oanh Thi Tran [3 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam
[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam
关键词
ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;
D O I
10.1080/01969722.2022.2145654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
引用
收藏
页码:1614 / 1630
页数:17
相关论文
共 50 条
  • [41] Transfer Learning for Automatic Speech Recognition Systems
    Asefisaray, Behnam
    Haznedaroglu, Ali
    Erden, Mustafa
    Arslan, Levent M.
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [42] Vietnamese Voice2Text: A Web Application for Whisper Implementation in Vietnamese Automatic Speech Recognition Tasks: Vietnamese Voice2Text
    Nguyen, Quangphuoc
    Nguyen, Ngocminh
    Dang, Thanhluan
    Tran, Vanha
    ACM International Conference Proceeding Series, 2023, : 312 - 318
  • [43] Transformer-Based Approach for Automatic Semantic Financial Document Verification
    Toprak, Ahmet
    Turan, Metin
    IEEE Access, 2024, 12 : 184327 - 184349
  • [44] A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
    de Moor, Aral
    van Deursen, Arie
    Izadi, Maliheh
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 28 - 37
  • [45] A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
    Delbrouck, Jean-Benoit
    Tits, Noe
    Brousmiche, Mathilde
    Dupont, Stephane
    PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, 2020, : 1 - 7
  • [46] IMPROVED TEXT NORMALIZATION AND LANGUAGE MODELS FOR SPEED'S AUTOMATIC SPEECH RECOGNITION SYSTEM
    Manolache, Cristian
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Mititelu, Verginica Barbu
    Burileanu, Corneliu
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 115 - 128
  • [47] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
    Gaur, Yashesh
    Kibre, Nick
    Xue, Jian
    Shu, Kangyuan
    Wang, Yuhui
    Alphanso, Issac
    Li, Jinyu
    Gong, Yifan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244
  • [48] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [49] Transformer-Based Federated Learning Models for Recommendation Systems
    Reddy, M. Sujaykumar
    Karnati, Hemanth
    Sundari, L. Mohana
    IEEE ACCESS, 2024, 12 : 109596 - 109607
  • [50] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +