Multi-modal embeddings using multi-task learning for emotion recognition

被引:8
|
作者
Khare, Aparna [1 ]
Parthasarathy, Srinivas [1 ]
Sundaram, Shiva [1 ]
机构
[1] Amazon Com, Sunnyvale, CA 94089 USA
来源
关键词
general embeddings; multi-modal; emotion recognition;
D O I
10.21437/Interspeech.2020-1827
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.
引用
收藏
页码:384 / 388
页数:5
相关论文
共 50 条
  • [1] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
    Akhtar, Md Shad
    Chauhan, Dushyant Singh
    Ghosal, Deepanway
    Poria, Soujanya
    Ekbal, Asif
    Bhattacharyya, Pushpak
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
  • [2] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
  • [3] A Multi-modal Sentiment Recognition Method Based on Multi-task Learning
    Lin, Zijie
    Long, Yunfei
    Du, Jiachen
    Xu, Ruifeng
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 7 - 15
  • [4] MULTI-MODAL MULTI-TASK DEEP LEARNING FOR SPEAKER AND EMOTION RECOGNITION OF TV-SERIES DATA
    Novitasari, Sashi
    Quoc Truong Do
    Sakti, Sakriani
    Lestari, Dessi
    Nakamura, Satoshi
    2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 37 - 42
  • [5] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
    Yi, Jiangyan
    Tao, Jianhua
    Fu, Ruibo
    Wang, Tao
    Zhang, Chu Yuan
    Wang, Chenglong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
  • [6] Driver multi-task emotion recognition network based on multi-modal facial video analysis
    Xiang, Guoliang
    Yao, Song
    Wu, Xianhui
    Deng, Hanwen
    Wang, Guojie
    Liu, Yu
    Li, Fan
    Peng, Yong
    PATTERN RECOGNITION, 2025, 161
  • [7] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
    Zhang, Yazhou
    Rong, Lu
    Li, Xiang
    Chen, Rui
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
  • [8] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
    Chowdhuri, Sauhaarda
    Pankaj, Tushar
    Zipser, Karl
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
  • [9] Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning
    Vijayaraghavan, Prashanth
    Vosoughi, Soroush
    Roy, Deb
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 478 - 483
  • [10] Multi-modal microblog classification via multi-task learning
    Sicheng Zhao
    Hongxun Yao
    Sendong Zhao
    Xuesong Jiang
    Xiaolei Jiang
    Multimedia Tools and Applications, 2016, 75 : 8921 - 8938