STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

被引:3
|
作者
Gaur, Yashesh [1 ]
Kibre, Nick [1 ]
Xue, Jian [1 ]
Shu, Kangyuan [1 ]
Wang, Yuhui [1 ]
Alphanso, Issac [1 ]
Li, Jinyu [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Inverse Text Normalization; Automatic Speech Recognition; on-device; streaming;
D O I
10.1109/SLT54892.2023.10022543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State Transducers (WFST) have been employed to do ITN. WFSTs are nicely suited to this task but their size and run-time costs can make deployment on embedded applications challenging. In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. At the core of our system is a streaming transformer tagger, that tags lexical tokens from ASR. The tag informs which ITN category might be applied, if at all. Following that, we apply an ITN-category-specific WFST, only on the tagged text, to reliably perform the ITN conversion. We show that the proposed ITN solution performs equivalent to strong baselines, while being significantly smaller in size and retaining customization capabilities.
引用
收藏
页码:237 / 244
页数:8
相关论文
共 50 条
  • [1] PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
    Zhou, Zhikai
    Tan, Tian
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7277 - 7281
  • [2] Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
    Li, Wei
    Qin, James
    Chiu, Chung-Cheng
    Pang, Ruoming
    He, Yanzhang
    [J]. INTERSPEECH 2020, 2020, : 2122 - 2126
  • [3] ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS
    Kim, Kwangyoun
    Lee, Kyungmin
    Gowda, Dhananjaya
    Park, Junmo
    Kim, Sungsoo
    Jin, Sichen
    Lee, Young-Yoon
    Yeo, Jinsu
    Kim, Daehyun
    Jung, Seokyeong
    Lee, Jungin
    Han, Myoungji
    Kim, Chanwoo
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 956 - 963
  • [4] TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Vasile, Alin-Florentin
    Boros, Tiberiu
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2016, : 121 - 128
  • [5] Robust Continuous On-device Personalization for Automatic Speech Recognition
    Sim, Khe Chai
    Chandorkar, Angad
    Gao, Fan
    Chua, Mason
    Munkhdalai, Tsendsuren
    Beaufays, Francoise
    [J]. INTERSPEECH 2021, 2021, : 1284 - 1288
  • [6] VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
    Wang, Quan
    Moreno, Ignacio Lopez
    Saglam, Mert
    Wilson, Kevin
    Chiao, Alan
    Liu, Renjie
    He, Yanzhang
    Li, Wei
    Pelecanos, Jason
    Nika, Marily
    Gruenstein, Alexander
    [J]. INTERSPEECH 2020, 2020, : 2677 - 2681
  • [7] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    [J]. INTERSPEECH 2021, 2021, : 967 - 968
  • [8] Garbage Modeling for On-device Speech Recognition
    Van Gysel, Christophe
    Velikovich, Leonid
    McGraw, Ian
    Beaufays, Francoise
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2127 - 2131
  • [9] FOUR-IN-ONE: A JOINT APPROACH TO INVERSE TEXT NORMALIZATION, PUNCTUATION, CAPITALIZATION, AND DISFLUENCY FOR AUTOMATIC SPEECH RECOGNITION
    Tan, Sharman
    Behre, Piyush
    Kibre, Nick
    Alphonso, Issac
    Chang, Shuangyu
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 677 - 684
  • [10] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    [J]. INTERSPEECH 2019, 2019, : 774 - 778