END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING

被引:0
|
作者
Palogiannidi, Elisavet [1 ]
Gkinis, Ioannis [1 ]
Mastrapas, George [1 ]
Mizera, Petr [1 ]
Stafylakis, Themos [1 ]
机构
[1] Omilia Conversat Intelligence, Athens, Greece
关键词
spoken language understanding; end-to-end models; recurrent neural networks; intent classification;
D O I
10.1109/icassp40776.2020.9054314
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken Language Understanding (SLU) is the problem of extracting the meaning from speech utterances. It is typically addressed as a two-step problem, where an Automatic Speech Recognition (ASR) model is employed to convert speech into text, followed by a Natural Language Understanding (NLU) model to extract meaning from the decoded text. Recently, end-to-end approaches were emerged, aiming at unifying the ASR and NLU into a single SLU deep neural architecture, trained using combinations of ASR and NLU-level recognition units. In this paper, we explore a set of recurrent architectures for intent classification, tailored to the recently introduced Fluent Speech Commands (FSC) dataset, where intents are formed as combinations of three slots (action, object, and location). We show that by combining deep recurrent architectures with standard data augmentation, state-of-the-art results can be attained, without using ASR-level targets or pre-trained ASR models. We also investigate its generalizability to new wordings, and we show that the model can perform reasonably well on wordings unseen during training.
引用
收藏
页码:7974 / 7978
页数:5
相关论文
共 50 条
  • [1] A low latency ASR-free end to end spoken language understanding system
    Mhiri, Mohamed
    Myer, Samuel
    Tomar, Vikrant Singh
    [J]. INTERSPEECH 2020, 2020, : 1947 - 1951
  • [2] EXPLORING ASR-FREE END-TO-END MODELING TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING IN A CLOUD-BASED DIALOG SYSTEM
    Qian, Yao
    Ubale, Rutuja
    Ramanaryanan, Vikram
    Lange, Patrick
    Suendermann-Oeft, David
    Evanini, Keelan
    Tsuprun, Eugene
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 569 - 576
  • [3] End-to-End ASR-Free Keyword Search From Speech
    Audhkhasi, Kartik
    Rosenberg, Andrew
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Kingsbury, Brian
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1351 - 1359
  • [4] END-TO-END ASR-FREE KEYWORD SEARCH FROM SPEECH
    Audhkhasi, Kartik
    Rosenberg, Andrew
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Kingsbury, Brian
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4840 - 4844
  • [5] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [6] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2020, 2020, : 4273 - 4277
  • [7] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [8] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    [J]. INTERSPEECH 2023, 2023, : 770 - 774
  • [9] End-to-End Spoken Language Understanding for Generalized Voice Assistants
    Saxon, Michael
    Choudhary, Samridhi
    McKenna, Joseph P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2021, 2021, : 4738 - 4742
  • [10] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    [J]. INTERSPEECH 2020, 2020, : 906 - 910