Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引：0

作者：

Miah, Md Messal Monem ^{[1
]}

Pyarelal, Adarsh ^{[2
]}

Huang, Ruihong ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Univ Arizona, Tucson, AZ 85721 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

引用

页码：7532 / 7545

页数：14

共 50 条

[21] Multimodal skin lesion classification in dermoscopy and clinical images using a hierarchical attention fusion network
He, X.
Wang, Y.
Zhao, S.
Chen, X.
JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2021, 141 (05) : S52 - S52
[22] Fusion with Hierarchical Graphs for Multimodal Emotion Recognition
Tang, Shuyun
Luo, Zhaojie
Nan, Guoshun
Baba, Jun
Yoshikawa, Yuichiro
Ishiguro, Hiroshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1288 - 1296
[23] Hierarchical Multi-Label Dialog Act Recognition on Spanish Data
Ribeiro, Eugenio
Ribeiro, Ricardo
de Matos, David Martins
LINGUAMATICA, 2019, 11 (01): : 17 - 40
[24] Incorporation of Contextual Information into BERT for Dialog Act Classification in Japanese
Katada, Shun
Shirai, Kiyoaki
Okada, Shogo
16th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2021, 2021,
[25] Incorporation of Contextual Information into BERT for Dialog Act Classification in Japanese
Katada, Shun
Shirai, Kiyoaki
Okada, Shogo
16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,
[26] Multimodal Co-Attention Fusion Network With Online Data Augmentation for Cancer Subtype Classification
Ding, Saisai
Li, Juncheng
Wang, Jun
Ying, Shihui
Shi, Jun
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (11) : 3977 - 3989
[27] Hierarchical Bayesian classification of multimodal medical images
Mardia, KV
Hainsworth, TJ
Kirkbride, J
Hurn, MA
Berry, E
PROCEEDINGS OF THE IEEE WORKSHOP ON MATHEMATICAL METHODS IN BIOMEDICAL IMAGE ANALYSIS, 1996, : 53 - 63
[28] Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
Zhang, Jiaping
Zhao, Tiancheng
Yu, Zhou
19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), 2018, : 140 - 150
[29] Modeling the intonation of discourse segments for improved online dialog act tagging
Sridhar, Vivek Kumar Rangarajan
Narayanan, Shrikanth
Bangalore, Srinivas
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5033 - +
[30] A hierarchical fuzzy classification of online customers
Werro, Nicolas
Stormer, Henrik
Meier, Andreas
ICEBE 2006: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2006, : 256 - +

← 1 2 3 4 5 →