Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging

被引:0
|
作者
Hien Nguyen Thi Thu [1 ]
Binh Nguyen Thai [2 ]
Hung Nguyen Vu Bao [2 ]
Truong Do Quoc [3 ]
Mai Luong Chi [4 ]
Huyen Nguyen Thi Minh [5 ]
机构
[1] Thai Nguyen Univ Educ, Dept Math, Thai Nguyen, Vietnam
[2] Vietnam Artificial Intelligence Syst, Res Dept, Hanoi, Vietnam
[3] Vietnam Artificial Intelligence Syst, Hanoi, Vietnam
[4] Univ Sci & Technol Hanoi, ICT Dept, Hanoi, Vietnam
[5] VNU Univ Sci, Dept Math Mech & Informat, Hanoi, Vietnam
关键词
Automatic Speech Recognition; Capitalization;
D O I
10.1109/kse.2019.8919342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last few years, Automatic Speech Recognition (ASR) systems for Vietnamese are utilized in various applications with exceptional results. Nevertheless, such ASR output still contains limitations such as the absence of punctuation, capitalization and standardize numeric data. These shortcomings cause difficulties for readers to understand context efficiently and for Natural Language Processing (NLP) tasks to be well-performed. Capitalization is one of the most critical factors to enhance human readability, parsing, and Named Entity Recognition (NER). Additionally, Vietnamese ASR output has its own features comparing to English such as lisp words, local words, compound words, and homophone. In this paper, we propose a method to Recover Capitalization for long-speech ASR transcription of Vietnamese using Transformer models and chunk merging. Furthermore, we perform decoding in parallel while improving the prediction accuracy.
引用
收藏
页码:430 / 434
页数:5
相关论文
共 50 条
  • [1] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
    Binh Nguyen
    Vu Bao Hung Nguyen
    Hien Nguyen
    Pham Ngoc Phuong
    The-Loc Nguyen
    Quoc Truong Do
    Luong Chi Mai
    [J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
  • [2] Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news
    L2F, Spoken Language Systems Laboratory, INESC ID Lisboa, R. Alves Redol, 9, 1000-029 Lisboa, Portugal
    不详
    不详
    [J]. Speech Commun, 2008, 10 (847-862):
  • [3] Vietnamese automatic speech recognition: The FLaVoR approach
    Vu, Quan
    Demuynck, Kris
    Van Compernolle, Dirk
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 464 - +
  • [4] Recovering Punctuation Marks for Automatic Speech Recognition
    Batista, Fernando
    Caseiro, Diamantino
    Mamede, Nuno
    Trancoso, Isabel
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1977 - 1980
  • [5] Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
    Thai Binh Nguyen
    Quang Minh Nguyen
    Thi Thu Hien Nguyen
    Quoc Truong Do
    Chi Mai Luong
    [J]. INTERSPEECH 2020, 2020, : 4263 - 4267
  • [6] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
    Viet The Bui
    Tho Chi Luong
    Oanh Thi Tran
    [J]. CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630
  • [7] Vietnamese Automatic Speech Recognition for Financial Conversation Data
    Doan, Tung Tran Nguyen
    Huynh, Son Thanh
    Nguyen, An Trong
    Le, An Tran-Hoai
    Thuy, An Phan Thi
    Huynh, Dang T.
    Nguyen, Binh T.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 372 - 383
  • [8] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
    Kim, Sehoon
    Gholami, Amir
    Shaw, Albert
    Lee, Nicholas
    Mangalam, Karttikeya
    Malik, Jitendra
    Mahoney, Michael W.
    Keutzer, Kurt
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] STREAMING AUTOMATIC SPEECH RECOGNITION WITH THE TRANSFORMER MODEL
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6074 - 6078
  • [10] A Bidirectional Context Embedding Transformer for Automatic Speech Recognition
    Liao, Lyuchao
    Afedzie Kwofie, Francis
    Chen, Zhifeng
    Han, Guangjie
    Wang, Yongqiang
    Lin, Yuyuan
    Hu, Dongmei
    [J]. INFORMATION, 2022, 13 (02)