TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS

被引:5
|
作者
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Zhang, Frank [1 ]
Wu, Chunyang [1 ]
Chan, Julian [1 ]
Yeh, Ching-Feng [1 ]
Xiao, Alex [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
关键词
speech recognition; acoustic modeling; transformer; recurrent neural networks; NEURAL-NETWORKS;
D O I
10.1109/ICASSP39728.2021.9414087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based acoustic models have shown promising results very recently. In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model [1] for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24% to 26% relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.
引用
收藏
页码:6778 / 6782
页数:5
相关论文
共 50 条
  • [1] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [2] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
    Lee, Mun-Hak
    Lee, Sang-Eon
    Seong, Ju-Seok
    Chang, Joon-Hyuk
    Kwon, Haeyoung
    Park, Chanhee
    [J]. INTERSPEECH 2022, 2022, : 56 - 60
  • [3] A transformer-based network for speech recognition
    Tang L.
    [J]. International Journal of Speech Technology, 2023, 26 (2) : 531 - 539
  • [4] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
    Lu, Xingyu
    Hu, Jianguo
    Li, Shenhao
    Ding, Yanyu
    [J]. 2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
  • [5] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    [J]. ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [6] Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
    Ganesh, Prakhar
    Chen, Yao
    Lou, Xin
    Khan, Mohammad Ali
    Yang, Yin
    Sajjad, Hassan
    Nakov, Preslav
    Chen, Deming
    Winslett, Marianne
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1061 - 1080
  • [7] Transformer-based Acoustic Modeling for Streaming Speech Synthesis
    Wu, Chunyang
    Xiu, Zhiping
    Shi, Yangyang
    Kalinli, Ozlem
    Fuegen, Christian
    Koehler, Thilo
    He, Qing
    [J]. INTERSPEECH 2021, 2021, : 146 - 150
  • [8] Transformer-based Models for Arabic Online Handwriting Recognition
    Alwajih, Fakhraddin
    Badr, Eman
    Abdou, Sherif
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 898 - 905
  • [9] Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech
    Lehecka, Jan
    Svec, Jan
    Psutka, Josef V.
    Ircing, Pavel
    [J]. INTERSPEECH 2023, 2023, : 201 - 205
  • [10] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ashihara, Takanori
    Orihashi, Shota
    Makishima, Naoki
    [J]. INTERSPEECH 2021, 2021, : 4059 - 4063