Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引：0

作者：

Miah, Md Messal Monem ^{[1
]}

Pyarelal, Adarsh ^{[2
]}

Huang, Ruihong ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Univ Arizona, Tucson, AZ 85721 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

引用

页码：7532 / 7545

页数：14

共 50 条

[31] An online algorithm for hierarchical phoneme classification
Dekel, O
Keshet, J
Singer, Y
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 146 - 158
[32] Multimodal Hierarchical CNN Feature Fusion for Stress Detection
Kuttala, Radhika
Subramanian, Ramanathan
Oruganti, Venkata Ramana Murthy
IEEE ACCESS, 2023, 11 : 6867 - 6878
[33] Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion
Lei, Yu
Qu, Keshuai
Zhao, Yifan
Han, Qing
Wang, Xuguang
COMPUTER JOURNAL, 2024, 67 (06): : 2230 - 2245
[34] Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis
Han, Wei
Chen, Hui
Poria, Soujanya
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9180 - 9192
[35] A HIERARCHICAL MODEL FOR DIALOG ACT RECOGNITION CONSIDERING ACOUSTIC AND LEXICAL CONTEXT INFORMATION
Si, Yuke
Wang, Longbiao
Dang, Jianwu
Wu, Mengfei
Li, Aijun
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7994 - 7998
[36] DIALOG ACT CLASSIFICATION USING ACOUSTIC AND DISCOURSE INFORMATION OF MAPTASK DATA
Julia, Fatema
Iftekharuddin, Khan
Islam, Atiq
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2010, 9 (04) : 289 - 311
[37] Deep Multimodal Fusion for Surgical Feedback Classification
Kocielnik, Rafal
Wong, Elyssa Y.
Chu, Timothy N.
Lin, Lydia
Huang, De-An
Wang, Jiayun
Anandkumar, Anima
Hung, Andrew J.
MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 256 - 267
[38] ECG Heartbeat Classification Using Multimodal Fusion
Ahmad, Zeeshan
Tabassum, Anika
Guan, Ling
Khan, Naimul Mefraz
IEEE ACCESS, 2021, 9 : 100615 - 100626
[39] Multimodal Keyless Attention Fusion for Video Classification
Long, Xiang
Gan, Chuang
de Melo, Gerard
Liu, Xiao
Li, Yandong
Li, Fu
Wen, Shilei
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7202 - 7209
[40] Neural-based Context Representation Learning for Dialog Act Classification
Ortega, Daniel
Ngoc Thang Vu
18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 247 - 252

← 1 2 3 4 5 →