ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

被引:0
|
作者
Palaskar, Shruti [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
end-to-end speech recognition; encoder-decoder; acoustic-to-word; speech embeddings;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. While character-based models offer a natural solution to the out-of-vocabulary problem, word models can be simpler to decode and may also be able to directly recognize semantically meaningful units. We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4.4-5.0% in Word Error Rate on the Switchboard corpus compared to prior work. In addition to these promising results, word-based models are more interpretable than character models, which have to be composed into words using a separate decoding step. We analyze the encoder hidden states and the attention behavior, and show that location-aware attention naturally represents words as a single speech-word-vector, despite spanning multiple frames in the input. We finally show that the Acoustic-to-Word model also learns to segment speech into words with a mean standard deviation of 3 frames as compared with human annotated forced-alignments for the Switchboard corpus.
引用
收藏
页码:397 / 404
页数:8
相关论文
共 50 条
  • [41] FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4789 - 4793
  • [42] Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
    Panchbhai, Anand
    Soru, Tommaso
    Marx, Edgard
    KNOWLEDGE GRAPHS AND SEMANTIC WEB, KGSWC 2020, 2020, 1232 : 158 - 165
  • [43] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
    Shi, Tian
    Keneshloo, Yaser
    Ramakrishnan, Naren
    Reddy, Chandan K.
    ACM/IMS Transactions on Data Science, 2021, 2 (01):
  • [44] Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models
    Parry, Andrew
    Froebe, Maik
    MacAvaney, Sean
    Potthast, Martin
    Hagen, Matthias
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 286 - 302
  • [45] Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models
    Huangfu, Yourui
    Wang, Jian
    Li, Rong
    Xu, Chen
    Wang, Xianbin
    Zhang, Huazi
    Wang, Jun
    2019 IEEE 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2019, : 1043 - 1049
  • [46] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
  • [47] High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
    Thai-Son Nguyen
    Ngoc-Quan Pham
    Stueker, Sebastian
    Waibel, Alex
    INTERSPEECH 2020, 2020, : 2147 - 2151
  • [48] Persian Keyphrase Generation Using Sequence-to-sequence Models
    Doostmohammadi, Ehsan
    Bokaei, Mohammad Hadi
    Sameti, Hossein
    2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 2010 - 2015
  • [49] Constrained Sequence-to-sequence Semitic Root Extraction for Enriching Word Embeddings
    El-Kishky, Ahmed
    Fu, Xingyu
    Addawood, Aseel
    Sobh, Nahil
    Voss, Clare
    Han, Jiawei
    FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 88 - 96
  • [50] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
    Baro, Arnau
    Badal, Carles
    Fornes, Alicia
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210