Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

被引：0

作者：

Sun, Zhongkai ^{[1
]}

Sarma, Prathusha K. ^{[2
]}

Sethares, William A. ^{[1
]}

Liang, Yingyu ^{[1
]}

机构：

[1] Univ Wisconsin, Madison, WI USA

[2] Curai, Palo Alto, CA 94306 USA

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment analysis and emotion recognition can be improved by learning (hidden) correlations between features extracted from the outer product of text and audio (we call this text-based audio) and analogous text-based video. This paper proposes a novel model, the Interaction Canonical Correlation Network (ICCN), to learn such multimodal embeddings. ICCN learns correlations between all three modes via deep canonical correlation analysis (DCCA) and the proposed embeddings are then tested on several benchmark datasets and against other state-of-the-art multimodal embedding algorithms. Empirical results and ablation studies confirm the effectiveness of ICCN in capturing useful information from all three views.

引用

页码：8992 / 8999

页数：8

共 50 条

[1] Multimodal deep learning for dementia classification using text and audio
Lin, Kaiying
Washington, Peter Y.
SCIENTIFIC REPORTS, 2024, 14 (01):
[2] MULTIVIEW LEARNING VIA DEEP DISCRIMINATIVE CANONICAL CORRELATION ANALYSIS
Elmadany, Nour El Din
He, Yifeng
Guan, Ling
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2409 - 2413
[3] Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis
Tang, Qingming
Wang, Weiran
Livescu, Karen
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1656 - 1660
[4] UNSUPERVISED LEARNING OF ACOUSTIC FEATURES VIA DEEP CANONICAL CORRELATION ANALYSIS
Wang, Weiran
Arora, Raman
Livescu, Karen
Bilmes, Jeff A.
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4590 - 4594
[5] Multimodal Learning for Human Action Recognition Via Bimodal/Multimodal Hybrid Centroid Canonical Correlation Analysis
Elmadany, Nour El Din
He, Yifeng
Guan, Ling
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1317 - 1331
[6] Multimodal Analysis for Deep Video Understanding with Video Language Transformer
Zhang, Beibei
Fang, Yaqun
Ren, Tongwei
Wu, Gangshan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7165 - 7169
[7] Multimodal Biomarkers for Cancer Treatment Outcome Prediction by Use of Deep Learning and Canonical Correlation Analysis
Saad, M.
He, S.
Thorstad, W.
Gay, H.
Wu, X.
Zhao, Y.
Ruan, S.
Wang, X.
Li, H.
MEDICAL PHYSICS, 2020, 47 (06) : E356 - E356
[8] AENet: Learning Deep Audio Features for Video Analysis
Takahashi, Naoya
Gygli, Michael
Van Gool, Luc
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 513 - 524
[9] Discriminative Learning for Alzheimer's Disease Diagnosis via Canonical Correlation Analysis and Multimodal Fusion
Lei, Baiying
Chen, Siping
Ni, Dong
Wang, Tianfu
FRONTIERS IN AGING NEUROSCIENCE, 2016, 8
[10] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Akbari, Hassan
Yuan, Liangzhe
Qian, Rui
Chuang, Wei-Hong
Chang, Shih-Fu
Cui, Yin
Gong, Boqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,

← 1 2 3 4 5 →