Integrating Dialog History into End-to-End Spoken Language Understanding Systems

被引:3
|
作者
Ganhotra, Jatin [1 ]
Thomas, Samuel [1 ]
Kuo, Hong-Kwang J. [1 ]
Joshi, Sachindra [1 ]
Saon, George [1 ]
Tuske, Zoltan [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
来源
关键词
speech recognition; human-computer interaction; spoken language understanding; end-to-end systems;
D O I
10.21437/Interspeech.2021-1460
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HARPERVALLEYBANK corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.
引用
收藏
页码:1254 / 1258
页数:5
相关论文
共 50 条
  • [1] TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING
    Sunder, Vishal
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Ganhotra, Jatin
    Kingsbury, Brian
    Fosler-Lussier, Eric
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7497 - 7501
  • [2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [3] DIALOGUE HISTORY INTEGRATION INTO END-TO-END SIGNAL-TO-CONCEPT SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
    Tomashenko, Natalia
    Raymond, Christian
    Caubriere, Antoine
    De Mori, Renato
    Esteve, Yannick
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8509 - 8513
  • [4] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2020, 2020, : 4273 - 4277
  • [5] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [6] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    [J]. INTERSPEECH 2023, 2023, : 770 - 774
  • [7] End-to-End Spoken Language Understanding for Generalized Voice Assistants
    Saxon, Michael
    Choudhary, Samridhi
    McKenna, Joseph P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2021, 2021, : 4738 - 4742
  • [8] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    [J]. INTERSPEECH 2020, 2020, : 906 - 910
  • [9] Privacy-Preserving End-to-End Spoken Language Understanding
    Wang, Yinggui
    Huang, Wei
    Yang, Le
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5224 - 5232
  • [10] ERROR ANALYSIS APPLIED TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Caubriere, Antoine
    Ghannay, Sahar
    Tomashenko, Natalia
    De Mori, Renato
    Laurent, Antoine
    Morin, Emmanuel
    Esteve, Yannick
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8514 - 8518