Semantic Complexity in End-to-End Spoken Language Understanding

被引:7
|
作者
McKenna, Joseph P. [1 ]
Choudhary, Samridhi [1 ]
Saxon, Michael [1 ]
Strimel, Grant P. [1 ]
Mouchtaris, Athanasios [1 ]
机构
[1] Amazon, Alexa Machine Learning, Seattle, WA 98109 USA
来源
关键词
spoken language understanding; semantic complexity; speech-to-interpretation; NETWORKS;
D O I
10.21437/Interspeech.2020-2929
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.
引用
收藏
页码:4273 / 4277
页数:5
相关论文
共 50 条
  • [1] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [2] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [3] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    [J]. INTERSPEECH 2023, 2023, : 770 - 774
  • [4] End-to-End Spoken Language Understanding for Generalized Voice Assistants
    Saxon, Michael
    Choudhary, Samridhi
    McKenna, Joseph P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2021, 2021, : 4738 - 4742
  • [5] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    [J]. INTERSPEECH 2020, 2020, : 866 - 870
  • [6] Exploring Transfer Learning For End-to-End Spoken Language Understanding
    Rongali, Subendhu
    Liu, Beiye
    Cai, Liwei
    Arkoudas, Konstantine
    Su, Chengwei
    Hamza, Wael
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13754 - 13761
  • [7] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    [J]. INTERSPEECH 2020, 2020, : 906 - 910
  • [8] ERROR ANALYSIS APPLIED TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Caubriere, Antoine
    Ghannay, Sahar
    Tomashenko, Natalia
    De Mori, Renato
    Laurent, Antoine
    Morin, Emmanuel
    Esteve, Yannick
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8514 - 8518
  • [9] Privacy-Preserving End-to-End Spoken Language Understanding
    Wang, Yinggui
    Huang, Wei
    Yang, Le
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5224 - 5232
  • [10] A DATA EFFICIENT END-TO-END SPOKEN LANGUAGE UNDERSTANDING ARCHITECTURE
    Dinarelli, Marco
    Kapoor, Nikita
    Jabaian, Bassam
    Besacier, Laurent
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8519 - 8523