STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

被引:3
|
作者
Gaur, Yashesh [1 ]
Kibre, Nick [1 ]
Xue, Jian [1 ]
Shu, Kangyuan [1 ]
Wang, Yuhui [1 ]
Alphanso, Issac [1 ]
Li, Jinyu [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Inverse Text Normalization; Automatic Speech Recognition; on-device; streaming;
D O I
10.1109/SLT54892.2023.10022543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State Transducers (WFST) have been employed to do ITN. WFSTs are nicely suited to this task but their size and run-time costs can make deployment on embedded applications challenging. In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. At the core of our system is a streaming transformer tagger, that tags lexical tokens from ASR. The tag informs which ITN category might be applied, if at all. Following that, we apply an ITN-category-specific WFST, only on the tagged text, to reliably perform the ITN conversion. We show that the proposed ITN solution performs equivalent to strong baselines, while being significantly smaller in size and retaining customization capabilities.
引用
收藏
页码:237 / 244
页数:8
相关论文
共 50 条
  • [31] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
    Binh Nguyen
    Vu Bao Hung Nguyen
    Hien Nguyen
    Pham Ngoc Phuong
    The-Loc Nguyen
    Quoc Truong Do
    Luong Chi Mai
    [J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
  • [32] Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems
    Tyagi, Shubhi
    Bonafonte, Antonio
    Lorenzo-Trueba, Jaime
    Latorre, Javier
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 72 - 79
  • [33] Enabling On-Device Learning with Deep Spiking Neural Networks for Speech Recognition
    Soures, N. M.
    Kudithipudi, D.
    Jacobs-Gedrim, R. B.
    Agarwal, S.
    Marinella, M.
    [J]. SILICON COMPATIBLE MATERIALS, PROCESSES, AND TECHNOLOGIES FOR ADVANCED INTEGRATED CIRCUITS AND EMERGING APPLICATIONS 8, 2018, 85 (06): : 127 - 137
  • [34] LimitAccess: on-device TinyML based robust speech recognition and age classification
    Maayah M.
    Abunada A.
    Al-Janahi K.
    Ahmed M.E.
    Qadir J.
    [J]. Discover Artificial Intelligence, 3 (1):
  • [35] Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
    Ramsay, David B.
    Kilgour, Kevin
    Roblek, Dominik
    Sharifi, Matthew
    [J]. INTERSPEECH 2019, 2019, : 3456 - 3459
  • [36] Channel normalization techniques for automatic speech recognition over the telephone
    de Veth, J
    Boves, L
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
  • [37] Joint streaming model for backchannel prediction and automatic speech recognition
    Choi, Yong-Seok
    Bang, Jeong-Uk
    Kim, Seung Hi
    [J]. ETRI JOURNAL, 2024, 46 (01) : 118 - 126
  • [38] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2022, 2022, : 461 - 465
  • [39] Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios
    Than Anh Phan
    Ngoc Dung Nguyen
    Huong Le Thanh
    Khac-Hoai Nam Bui
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 582 - 594
  • [40] Automatic Personality Recognition from Reading Text Speech
    Fallahnezhad, Mohsen
    Vali, Mansour
    Khalili, Mehdi
    [J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 18 - 23