TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS

被引:5
|
作者
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Zhang, Frank [1 ]
Wu, Chunyang [1 ]
Chan, Julian [1 ]
Yeh, Ching-Feng [1 ]
Xiao, Alex [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
关键词
speech recognition; acoustic modeling; transformer; recurrent neural networks; NEURAL-NETWORKS;
D O I
10.1109/ICASSP39728.2021.9414087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based acoustic models have shown promising results very recently. In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model [1] for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24% to 26% relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.
引用
收藏
页码:6778 / 6782
页数:5
相关论文
共 50 条
  • [21] A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition
    Feng, Sheng
    Zhu, Xiaoqian
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [22] End to end transformer-based contextual speech recognition based on pointer network
    Lin, Binghuai
    Wang, Liyuan
    [J]. INTERSPEECH 2021, 2021, : 2087 - 2091
  • [23] Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
    Li, Sheng
    Raj, Dabre
    Lu, Xugang
    Shen, Peng
    Kawahara, Tatsuya
    Kawai, Hisashi
    [J]. INTERSPEECH 2019, 2019, : 4400 - 4404
  • [24] A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS
    Karita, Shigeki
    Chen, Nanxin
    Hayashi, Tomoki
    Hori, Takaaki
    Inaguma, Hirofumi
    Jiang, Ziyan
    Someki, Masao
    Soplin, Nelson Enrique Yalta
    Yamamoto, Ryuichi
    Wang, Xiaofei
    Watanabe, Shinji
    Yoshimura, Takenori
    Zhang, Wangyou
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 449 - 456
  • [25] Trear: Transformer-Based RGB-D Egocentric Action Recognition
    Li, Xiangyu
    Hou, Yonghong
    Wang, Pichao
    Gao, Zhimin
    Xu, Mingliang
    Li, Wanqing
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (01) : 246 - 252
  • [26] Musical Speech: A Transformer-based Composition Tool
    d'Eon, Jason
    Dumpala, Harsha
    Sastry, Chandramouli Shama
    Oore, Dani
    Oore, Sageev
    [J]. NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 253 - 274
  • [27] An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking
    Weng, Shi-Yan
    Chiu, Hsuan-Sheng
    Chen, Berlin
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 518 - 522
  • [28] EEG Classification with Transformer-Based Models
    Sun, Jiayao
    Xie, Jin
    Zhou, Huihui
    [J]. 2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 92 - 93
  • [29] Transformer-based models to deal with heterogeneous environments in Human Activity Recognition
    Ek S.
    Portet F.
    Lalanda P.
    [J]. Personal and Ubiquitous Computing, 2023, 27 (06) : 2267 - 2280
  • [30] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
    Lehecka, Jan
    Psutka, Josef, V
    Psutka, Josef
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312