END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES

被引:1
|
作者
Morais, Edmilson [1 ]
Kuo, Hong-Kwang J. [1 ]
Thomas, Samuel [1 ]
Tuske, Zoltan [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
Spoken language understanding; transformer networks; self-supervised pre-training; end-to-end systems;
D O I
10.1109/ICASSP39728.2021.9414522
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to- End (E2E) SLU transformer network based architecture which allows the use of self-supervised pretrained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.
引用
收藏
页码:7483 / 7487
页数:5
相关论文
共 50 条
  • [1] INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Seo, Seunghyun
    Kwak, Donghyun
    Lee, Bowon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7152 - 7156
  • [2] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    [J]. IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [3] Pre-trained multimodal end-to-end network for spoken language assessment incorporating prompts
    Lin, Binghuai
    Wang, Liyuan
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1394 - 1398
  • [4] An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
    Fang, Hongchao
    Xie, Pengtao
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1324 - 1340
  • [5] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    [J]. INTERSPEECH 2020, 2020, : 866 - 870
  • [6] Unsupervised Visual Anomaly Detection Using Self-Supervised Pre-Trained Transformer
    Kim, Jun-Hyung
    Kwon, Goo-Rak
    [J]. IEEE ACCESS, 2024, 12 : 127604 - 127613
  • [7] Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
    Wang, Nick J. C.
    Wang, Lu
    Sun, Yandan
    Kang, Haimei
    Zhang, Dejun
    [J]. INTERSPEECH 2021, 2021, : 4718 - 4722
  • [8] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [9] Low resource end-to-end spoken language understanding with capsule networks
    Poncelet, Jakob
    Renkens, Vincent
    Van hamme, Hugo
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [10] Adapting Transformer to End-to-end Spoken Language Translation
    Di Gangi, Mattia A.
    Negri, Matteo
    Turchi, Marco
    [J]. INTERSPEECH 2019, 2019, : 1133 - 1137