Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引：0

作者：

Miah, Md Messal Monem ^{[1
]}

Pyarelal, Adarsh ^{[2
]}

Huang, Ruihong ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Univ Arizona, Tucson, AZ 85721 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

引用

页码：7532 / 7545

页数：14

共 50 条

[41] CLASSIFICATION OF BREAST CANCER IN MRI WITH MULTIMODAL FUSION
Morais, Margarida
Calisto, Francisco Maria
Santiago, Carlos
Aleluia, Clara
Nascimento, Jacinto C.
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[42] Colposcopic multimodal fusion for the classification of cervical lesions
Fan, Yinuo
Ma, Huizhan
Fu, Yuanbin
Liang, Xiaoyun
Yu, Hui
Liu, Yuzhen
PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (13):
[43] Interpretation on Deep Multimodal Fusion for Diagnostic Classification
Xin, Bowen
Huang, Jing
Zhou, Yun
Lu, Jie
Wang, Xiuying
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[44] Emotion Classification of Text Based Conversations Through Dialog Act Modeling
Gorer, Binnur
2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 2221 - 2224
[45] OPTIMIZING NEURAL NETWORK HYPERPARAMETERS WITH GAUSSIAN PROCESSES FOR DIALOG ACT CLASSIFICATION
Dernoncourt, Franck
Lee, Ji Young
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 406 - 413
[46] Dialog Act Classification using Acoustic and Discourse Information of MapTask Data
Julia, Fatema N.
Iftekharuddin, Khan M.
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1472 - 1479
[47] Graph-based multimodal fusion with metric learning for multimodal classification
Angelou, Michalis
Solachidis, Vassilis
Vretos, Nicholas
Daras, Petros
PATTERN RECOGNITION, 2019, 95 : 296 - 307
[48] FUSION - An Online Method for Multistream Classification
Haque, Ahsanul
Wang, Zhuoyi
Chandra, Swarup
Dong, Bo
Khan, Latifur
Hamlen, Kevin W.
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 919 - 928
[49] Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
Sridhar, Vivek Kumar Rangarajan
Bangalore, Srinivas
Narayanan, Shrikanth
COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04): : 407 - 422
[50] Multimodal Physiological Signals Fusion for Online Emotion Recognition
Pan, Tongjie
Ye, Yalan
Cai, Hecheng
Huang, Shudong
Yang, Yang
Wang, Guoqing
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5879 - 5888

← 1 2 3 4 5 →