STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

被引：3

作者：

Gaur, Yashesh ^{[1
]}

Kibre, Nick ^{[1
]}

Xue, Jian ^{[1
]}

Shu, Kangyuan ^{[1
]}

Wang, Yuhui ^{[1
]}

Alphanso, Issac ^{[1
]}

Li, Jinyu ^{[1
]}

Gong, Yifan ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

Inverse Text Normalization; Automatic Speech Recognition; on-device; streaming;

D O I：

10.1109/SLT54892.2023.10022543

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State Transducers (WFST) have been employed to do ITN. WFSTs are nicely suited to this task but their size and run-time costs can make deployment on embedded applications challenging. In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. At the core of our system is a streaming transformer tagger, that tags lexical tokens from ASR. The tag informs which ITN category might be applied, if at all. Following that, we apply an ITN-category-specific WFST, only on the tagged text, to reliably perform the ITN conversion. We show that the proposed ITN solution performs equivalent to strong baselines, while being significantly smaller in size and retaining customization capabilities.

引用

页码：237 / 244

页数：8

共 50 条

[31] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
Binh Nguyen
Vu Bao Hung Nguyen
Hien Nguyen
Pham Ngoc Phuong
The-Loc Nguyen
Quoc Truong Do
Luong Chi Mai
[J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
[32] Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems
Tyagi, Shubhi
Bonafonte, Antonio
Lorenzo-Trueba, Jaime
Latorre, Javier
[J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 72 - 79
[33] Enabling On-Device Learning with Deep Spiking Neural Networks for Speech Recognition
Soures, N. M.
Kudithipudi, D.
Jacobs-Gedrim, R. B.
Agarwal, S.
Marinella, M.
[J]. SILICON COMPATIBLE MATERIALS, PROCESSES, AND TECHNOLOGIES FOR ADVANCED INTEGRATED CIRCUITS AND EMERGING APPLICATIONS 8, 2018, 85 (06): : 127 - 137
[34] LimitAccess: on-device TinyML based robust speech recognition and age classification
Maayah M.
Abunada A.
Al-Janahi K.
Ahmed M.E.
Qadir J.
[J]. Discover Artificial Intelligence, 3 (1):
[35] Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Ramsay, David B.
Kilgour, Kevin
Roblek, Dominik
Sharifi, Matthew
[J]. INTERSPEECH 2019, 2019, : 3456 - 3459
[36] Channel normalization techniques for automatic speech recognition over the telephone
de Veth, J
Boves, L
[J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
[37] Joint streaming model for backchannel prediction and automatic speech recognition
Choi, Yong-Seok
Bang, Jeong-Uk
Kim, Seung Hi
[J]. ETRI JOURNAL, 2024, 46 (01) : 118 - 126
[38] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Ni, Junrui
Wang, Liming
Gao, Heting
Qian, Kaizhi
Zhang, Yang
Chang, Shiyu
Hasegawa-Johnson, Mark
[J]. INTERSPEECH 2022, 2022, : 461 - 465
[39] Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios
Than Anh Phan
Ngoc Dung Nguyen
Huong Le Thanh
Khac-Hoai Nam Bui
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 582 - 594
[40] Automatic Personality Recognition from Reading Text Speech
Fallahnezhad, Mohsen
Vali, Mansour
Khalili, Mehdi
[J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 18 - 23

← 1 2 3 4 5 →