Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging

被引：0

作者：

Hien Nguyen Thi Thu ^{[1
]}

Binh Nguyen Thai ^{[2
]}

Hung Nguyen Vu Bao ^{[2
]}

Truong Do Quoc ^{[3
]}

Mai Luong Chi ^{[4
]}

Huyen Nguyen Thi Minh ^{[5
]}

机构：

[1] Thai Nguyen Univ Educ, Dept Math, Thai Nguyen, Vietnam

[2] Vietnam Artificial Intelligence Syst, Res Dept, Hanoi, Vietnam

[3] Vietnam Artificial Intelligence Syst, Hanoi, Vietnam

[4] Univ Sci & Technol Hanoi, ICT Dept, Hanoi, Vietnam

[5] VNU Univ Sci, Dept Math Mech & Informat, Hanoi, Vietnam

来源：

PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019) | 2019年

关键词：

Automatic Speech Recognition; Capitalization;

D O I：

10.1109/kse.2019.8919342

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the last few years, Automatic Speech Recognition (ASR) systems for Vietnamese are utilized in various applications with exceptional results. Nevertheless, such ASR output still contains limitations such as the absence of punctuation, capitalization and standardize numeric data. These shortcomings cause difficulties for readers to understand context efficiently and for Natural Language Processing (NLP) tasks to be well-performed. Capitalization is one of the most critical factors to enhance human readability, parsing, and Named Entity Recognition (NER). Additionally, Vietnamese ASR output has its own features comparing to English such as lisp words, local words, compound words, and homophone. In this paper, we propose a method to Recover Capitalization for long-speech ASR transcription of Vietnamese using Transformer models and chunk merging. Furthermore, we perform decoding in parallel while improving the prediction accuracy.

引用

页码：430 / 434

页数：5

共 50 条

[1] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
Binh Nguyen
Vu Bao Hung Nguyen
Hien Nguyen
Pham Ngoc Phuong
The-Loc Nguyen
Quoc Truong Do
Luong Chi Mai
[J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
[2] Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news
L2F, Spoken Language Systems Laboratory, INESC ID Lisboa, R. Alves Redol, 9, 1000-029 Lisboa, Portugal
不详
不详
[J]. Speech Commun, 2008, 10 (847-862):
[3] Vietnamese automatic speech recognition: The FLaVoR approach
Vu, Quan
Demuynck, Kris
Van Compernolle, Dirk
[J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 464 - +
[4] Recovering Punctuation Marks for Automatic Speech Recognition
Batista, Fernando
Caseiro, Diamantino
Mamede, Nuno
Trancoso, Isabel
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1977 - 1980
[5] Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models
Thai Binh Nguyen
Quang Minh Nguyen
Thi Thu Hien Nguyen
Quoc Truong Do
Chi Mai Luong
[J]. INTERSPEECH 2020, 2020, : 4263 - 4267
[6] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
Viet The Bui
Tho Chi Luong
Oanh Thi Tran
[J]. CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630
[7] Vietnamese Automatic Speech Recognition for Financial Conversation Data
Doan, Tung Tran Nguyen
Huynh, Son Thanh
Nguyen, An Trong
Le, An Tran-Hoai
Thuy, An Phan Thi
Huynh, Dang T.
Nguyen, Binh T.
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 372 - 383
[8] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Kim, Sehoon
Gholami, Amir
Shaw, Albert
Lee, Nicholas
Mangalam, Karttikeya
Malik, Jitendra
Mahoney, Michael W.
Keutzer, Kurt
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[9] STREAMING AUTOMATIC SPEECH RECOGNITION WITH THE TRANSFORMER MODEL
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6074 - 6078
[10] A Bidirectional Context Embedding Transformer for Automatic Speech Recognition
Liao, Lyuchao
Afedzie Kwofie, Francis
Chen, Zhifeng
Han, Guangjie
Wang, Yongqiang
Lin, Yuyuan
Hu, Dongmei
[J]. INFORMATION, 2022, 13 (02)

← 1 2 3 4 5 →