Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引:0
|
作者
Viet The Bui [1 ]
Tho Chi Luong [2 ]
Oanh Thi Tran [3 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam
[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam
关键词
ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;
D O I
10.1080/01969722.2022.2145654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
引用
收藏
页码:1614 / 1630
页数:17
相关论文
共 50 条
  • [1] TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Vasile, Alin-Florentin
    Boros, Tiberiu
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2016, : 121 - 128
  • [2] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [3] Transformer-based Automatic Speech Recognition of Simultaneous Interpretation with Auxiliary Input of Source Language Text
    Taniguchi, Shuta
    Kato, Tsuneo
    Tamura, Akihiro
    Yasuda, Keiji
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1857 - 1861
  • [4] A transformer-based approach for Arabic offline handwritten text recognition
    Momeni, Saleh
    Babaali, Bagher
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
  • [5] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [6] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
    Zhao, Chendong
    Wang, Jianzong
    Wei, Wenqi
    Qu, Xiaoyang
    Wang, Haoqian
    Xiao, Jing
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
  • [7] A transformer-based approach for Arabic offline handwritten text recognition
    Saleh Momeni
    Bagher BabaAli
    Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
  • [8] A transformer-based network for speech recognition
    Tang L.
    International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [9] Transformer-Based Automatic Speech Recognition with Auxiliary Input of Source Language Text Toward Transcribing Simultaneous Interpretation
    Taniguchi, Shuta
    Kato, Tsuneo
    Tamura, Akihiro
    Yasuda, Keiji
    INTERSPEECH 2022, 2022, : 2813 - 2817
  • [10] FOUR-IN-ONE: A JOINT APPROACH TO INVERSE TEXT NORMALIZATION, PUNCTUATION, CAPITALIZATION, AND DISFLUENCY FOR AUTOMATIC SPEECH RECOGNITION
    Tan, Sharman
    Behre, Piyush
    Kibre, Nick
    Alphonso, Issac
    Chang, Shuangyu
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 677 - 684