LVCSR with Transformer Language Models

被引:7
|
作者
Beck, Eugen [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany
来源
基金
欧洲研究理事会;
关键词
speech recognition; decoding; Transformer language model;
D O I
10.21437/Interspeech.2020-1164
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural network language models (LMs) based on self-attention have recently outperformed the previous state of the art, LSTM LMs. Transformer LMs today are often used as a post-processing step in lattice or n-best list rescoring. In this work the main focus is on using them in one-pass recognition. We show that by a simple reduction of redundant computations in batched self-attention we can obtain a 15% reduction in overall RTF on a well-tuned system. We also show that through proper initialization the layer normalization inside the residual blocks can be removed, yielding a further increase in forwarding speed. This is done under the constraint of staying close to state-of-the-art in terms of word-error rate (5.4% on LibriSpeech test-other) and achieving a real-time factor of around 1. Last but not least we also present an approach to speed up classic push-forward rescoring by mixing it with n-best list rescoring to better utilize the inherent parallelizability of Transformer language models, cutting the time needed for rescoring in half.
引用
收藏
页码:1798 / 1802
页数:5
相关论文
共 50 条
  • [1] Morpheme Based Factored Language Models for German LVCSR
    Mousa, Amr El-Desoky
    Shaik, M. Ali Basha
    Schlueter, Ralf
    Ney, Hermann
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1456 - 1459
  • [2] VARIATIONAL APPROXIMATION OF LONG-SPAN LANGUAGE MODELS FOR LVCSR
    Deoras, Anoop
    Mikolov, Tomas
    Kombrink, Stefan
    Karafiat, Martin
    Khudanpur, Sanjeev
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5532 - 5535
  • [3] INVESTIGATIONS ON THE USE OF MORPHEME LEVEL FEATURES IN LANGUAGE MODELS FOR ARABIC LVCSR
    Mousa, Amr El-Desoky
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5021 - 5024
  • [4] Morpheme Level Feature-based Language Models for German LVCSR
    Mousa, Amr El-Desoky
    Shaik, M. Ali Basha
    Schlueter, Ralf
    Ney, Hermann
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 170 - 173
  • [5] Molecular language models: RNNs or transformer?
    Chen, Yangyang
    Wang, Zixu
    Zeng, Xiangxiang
    Li, Yayang
    Li, Pengyong
    Ye, Xiucai
    Sakurai, Tetsuya
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2023, 22 (04) : 392 - 400
  • [6] Structural Guidance for Transformer Language Models
    Qian, Peng
    Naseem, Tahira
    Levy, Roger
    Astudillo, Ramon Fernandez
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3735 - 3745
  • [7] Staged Training for Transformer Language Models
    Shen, Sheng
    Walsh, Pete
    Keutzer, Kurt
    Dodge, Jesse
    Peters, Matthew
    Beltagy, Iz
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR
    Shaik, M. Ali Basha
    Mousa, Amr El-Desoky
    Schlueter, Ralf
    Ney, Hermann
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1070 - 1073
  • [9] LVCSR-based language identification
    Schultz, T
    Rogina, I
    Waibel, A
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 781 - 784
  • [10] Speech and Language Resources for LVCSR of Russian
    Zablotskiy, Sergey
    Shvets, Alexander
    Sidorov, Maxim
    Semenkin, Eugene
    Minker, Wolfgang
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3374 - 3377