Multi-modal embeddings using multi-task learning for emotion recognition

被引：8

作者：

Khare, Aparna ^{[1
]}

Parthasarathy, Srinivas ^{[1
]}

Sundaram, Shiva ^{[1
]}

机构：

[1] Amazon Com, Sunnyvale, CA 94089 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

general embeddings; multi-modal; emotion recognition;

D O I：

10.21437/Interspeech.2020-1827

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.

引用

页码：384 / 388

页数：5

共 50 条

[1] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Akhtar, Md Shad
Chauhan, Dushyant Singh
Ghosal, Deepanway
Poria, Soujanya
Ekbal, Asif
Bhattacharyya, Pushpak
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
[2] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Fan, Dinghao
Lu, Hengjie
Xu, Shugong
Cao, Shan
IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
[3] A Multi-modal Sentiment Recognition Method Based on Multi-task Learning
Lin, Zijie
Long, Yunfei
Du, Jiachen
Xu, Ruifeng
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 7 - 15
[4] MULTI-MODAL MULTI-TASK DEEP LEARNING FOR SPEAKER AND EMOTION RECOGNITION OF TV-SERIES DATA
Novitasari, Sashi
Quoc Truong Do
Sakti, Sakriani
Lestari, Dessi
Nakamura, Satoshi
2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 37 - 42
[5] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
Yi, Jiangyan
Tao, Jianhua
Fu, Ruibo
Wang, Tao
Zhang, Chu Yuan
Wang, Chenglong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
[6] Driver multi-task emotion recognition network based on multi-modal facial video analysis
Xiang, Guoliang
Yao, Song
Wu, Xianhui
Deng, Hanwen
Wang, Guojie
Liu, Yu
Li, Fan
Peng, Yong
PATTERN RECOGNITION, 2025, 161
[7] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
Zhang, Yazhou
Rong, Lu
Li, Xiang
Chen, Rui
ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
[8] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
Chowdhuri, Sauhaarda
Pankaj, Tushar
Zipser, Karl
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
[9] Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning
Vijayaraghavan, Prashanth
Vosoughi, Soroush
Roy, Deb
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 478 - 483
[10] Multi-modal microblog classification via multi-task learning
Sicheng Zhao
Hongxun Yao
Sendong Zhao
Xuesong Jiang
Xiaolei Jiang
Multimedia Tools and Applications, 2016, 75 : 8921 - 8938

← 1 2 3 4 5 →