Low resource end-to-end spoken language understanding with capsule networks

被引:10
|
作者
Poncelet, Jakob [1 ]
Renkens, Vincent [1 ]
Van hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT PSI, Kasteelpk Arenberg 10,Bus 2441, B-3001 Leuven, Belgium
来源
关键词
Spoken language understanding; End-to-end; Intent recognition; Capsule networks; Multitask learning;
D O I
10.1016/j.csl.2020.101142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Designing a Spoken Language Understanding (SLU) system for command-and-control applications is challenging. Both Automatic Speech Recognition and Natural Language Understanding are language and application dependent to a great extent. Even with a lot of design effort, users often still have to know what to say to the system for it to do what they want. We propose to use an end-to-end SLU system that maps speech directly to semantics and that can be trained by the user through demonstrations. The user can teach the system a new command by uttering the command and subsequently demonstrating its meaning through an alternative interface. The system will learn the mapping from the spoken command to the task. The dependency on the user also allows different languages and non-standard or impaired speech as valid inputs. Teaching the system requires effort from the user, so it is crucial that the system learns quickly. In this paper we propose to use capsule networks for this task, which are believed to be data efficient. We discuss two architectures for using capsule networks. We analyse their performance and compare them with two baseline systems, one based on Non-negative Matrix Factorisation (NMF) which has been successful for this task and one encoder-decoder approach. We show that in most cases the capsule network performs better than the baseline systems. Furthermore, we demonstrate the versatility of the architecture by inferring speaker identity and the user's word choice through multitask learning. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios
    Bhosale, Swapnil
    Sheikh, Imran
    Dumpala, Sri Harsha
    Kopparapu, Sunil Kumar
    [J]. INTERSPEECH 2019, 2019, : 1188 - 1192
  • [2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [3] Capsule Networks for Low Resource Spoken Language Understanding
    Renkens, Vincent
    Van Hamme, Hugo
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 601 - 605
  • [4] Toward Low-Cost End-to-End Spoken Language Understanding
    Dinarelli, Marco
    Naguib, Marco
    Portet, Francois
    [J]. INTERSPEECH 2022, 2022, : 2728 - 2732
  • [5] End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
    Desot, Thierry
    Portet, Francois
    Vacher, Michel
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 75
  • [6] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2020, 2020, : 4273 - 4277
  • [7] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [8] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    [J]. INTERSPEECH 2023, 2023, : 770 - 774
  • [9] Two-Pass Low Latency End-to-End Spoken Language Understanding
    Arora, Siddhant
    Dalmia, Siddharth
    Chang, Xuankai
    Yan, Brian
    Black, Alan
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3478 - 3482
  • [10] Low-bit Shift Network for End-to-End Spoken Language Understanding
    Avila, Anderson R.
    Bibi, Khalil
    Yang, Ruiheng
    Li, Xinlin
    Xing, Chao
    Chen, Xiao
    [J]. INTERSPEECH 2022, 2022, : 2698 - 2702