Integrating Dialog History into End-to-End Spoken Language Understanding Systems

被引：3

作者：

Ganhotra, Jatin ^{[1
]}

Thomas, Samuel ^{[1
]}

Kuo, Hong-Kwang J. ^{[1
]}

Joshi, Sachindra ^{[1
]}

Saon, George ^{[1
]}

Tuske, Zoltan ^{[1
]}

Kingsbury, Brian ^{[1
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech recognition; human-computer interaction; spoken language understanding; end-to-end systems;

D O I：

10.21437/Interspeech.2021-1460

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HARPERVALLEYBANK corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.

引用

页码：1254 / 1258

页数：5

共 50 条

[1] TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING
Sunder, Vishal
Thomas, Samuel
Kuo, Hong-Kwang J.
Ganhotra, Jatin
Kingsbury, Brian
Fosler-Lussier, Eric
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7497 - 7501
[2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Serdyuk, Dmitriy
Wang, Yongqiang
Fuegen, Christian
Kumar, Anuj
Liu, Baiyang
Bengio, Yoshua
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
[3] DIALOGUE HISTORY INTEGRATION INTO END-TO-END SIGNAL-TO-CONCEPT SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
Tomashenko, Natalia
Raymond, Christian
Caubriere, Antoine
De Mori, Renato
Esteve, Yannick
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8509 - 8513
[4] Semantic Complexity in End-to-End Spoken Language Understanding
McKenna, Joseph P.
Choudhary, Samridhi
Saxon, Michael
Strimel, Grant P.
Mouchtaris, Athanasios
[J]. INTERSPEECH 2020, 2020, : 4273 - 4277
[5] A Streaming End-to-End Framework For Spoken Language Understanding
Potdar, Nihal
Avila, Anderson R.
Xing, Chao
Wang, Dong
Cao, Yiran
Chen, Xiao
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
[6] WhiSLU: End-to-End Spoken Language Understanding with Whisper
Wang, Minghan
Li, Yinglu
Guo, Jiaxin
Qiao, Xiaosong
Li, Zongyao
Shang, Hengchao
Wei, Daimeng
Tao, Shimin
Zhang, Min
Yang, Hao
[J]. INTERSPEECH 2023, 2023, : 770 - 774
[7] End-to-End Spoken Language Understanding for Generalized Voice Assistants
Saxon, Michael
Choudhary, Samridhi
McKenna, Joseph P.
Mouchtaris, Athanasios
[J]. INTERSPEECH 2021, 2021, : 4738 - 4742
[8] End-to-End Spoken Language Understanding Without Full Transcripts
Kuo, Hong-Kwang J.
Tuske, Zoltan
Thomas, Samuel
Huang, Yinghui
Audhkhasi, Kartik
Kingsbury, Brian
Kurata, Gakuto
Kons, Zvi
Hoory, Ron
Lastras, Luis
[J]. INTERSPEECH 2020, 2020, : 906 - 910
[9] Privacy-Preserving End-to-End Spoken Language Understanding
Wang, Yinggui
Huang, Wei
Yang, Le
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5224 - 5232
[10] ERROR ANALYSIS APPLIED TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Caubriere, Antoine
Ghannay, Sahar
Tomashenko, Natalia
De Mori, Renato
Laurent, Antoine
Morin, Emmanuel
Esteve, Yannick
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8514 - 8518

← 1 2 3 4 5 →