END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES

被引:1
|
作者
Morais, Edmilson [1 ]
Kuo, Hong-Kwang J. [1 ]
Thomas, Samuel [1 ]
Tuske, Zoltan [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
Spoken language understanding; transformer networks; self-supervised pre-training; end-to-end systems;
D O I
10.1109/ICASSP39728.2021.9414522
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to- End (E2E) SLU transformer network based architecture which allows the use of self-supervised pretrained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.
引用
下载
收藏
页码:7483 / 7487
页数:5
相关论文
共 50 条
  • [41] Enhancing Pre-trained Language Models by Self-supervised Learning for Story Cloze Test
    Xie, Yuqiang
    Hu, Yue
    Xing, Luxi
    Wang, Chunhui
    Hu, Yong
    Wei, Xiangpeng
    Sun, Yajing
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 271 - 279
  • [42] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
    Asif Mehmood
    Muhammad Attique Khan
    Muhammad Sharif
    Sajid Ali Khan
    Muhammad Shaheen
    Tanzila Saba
    Naveed Riaz
    Imran Ashraf
    Multimedia Tools and Applications, 2024, 83 : 14979 - 14999
  • [43] An End-to-End Autonomous Driving Pre-trained Transformer Model for Multi-Behavior-Optimal Trajectory Generation
    Qian, Zelin
    Jiang, Kun
    Zhou, Weitao
    Wen, Junze
    Jing, Cheng
    Cao, Zhong
    Yang, Diange
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 4730 - 4737
  • [44] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 1491 - 1495
  • [45] Geometric Consistency for Self-Supervised End-to-End Visual Odometry
    Iyer, Ganesh
    Murthy, J. Krishna
    Gupta, Gunshi
    Krishna, K. Madhava
    Paull, Liam
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 380 - 388
  • [46] LARGE-SCALE UNSUPERVISED PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Wang, Pengwei
    Wei, Liangchen
    Cao, Yong
    Xie, Jinghui
    Nie, Zaiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7999 - 8003
  • [47] Explore the Use of Self-supervised Pre-trained Acoustic Features on Disguised Speech Detection
    Quan, Jie
    Yang, Yingchun
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 483 - 490
  • [48] Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
    Zhao, Xiao-Ying
    Zhu, Qiu-Shi
    Zhang, Jie
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 330 - 334
  • [49] End-to-end speech topic classification based on pre-trained model Wavlm
    Cao, Tengfei
    He, Liang
    Niu, Fangjing
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 369 - 373
  • [50] End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding
    Chen, Yun-Nung
    Hakkani-Tur, Dilek
    Tur, Gokhan
    Gao, Jianfeng
    Deng, Li
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3245 - 3249