Multi-modal embeddings using multi-task learning for emotion recognition

被引:8
|
作者
Khare, Aparna [1 ]
Parthasarathy, Srinivas [1 ]
Sundaram, Shiva [1 ]
机构
[1] Amazon Com, Sunnyvale, CA 94089 USA
来源
关键词
general embeddings; multi-modal; emotion recognition;
D O I
10.21437/Interspeech.2020-1827
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.
引用
收藏
页码:384 / 388
页数:5
相关论文
共 50 条
  • [31] Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer
    Yuan, Zhaoquan
    Peng, Xiao
    Wu, Xiao
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1313 - 1321
  • [32] A Dirty Multi-task Learning Method for Multi-modal Brain Imaging Genetics
    Du, Lei
    Liu, Fang
    Liu, Kefei
    Yao, Xiaohui
    Risacher, Shannon L.
    Han, Junwei
    Guo, Lei
    Saykin, Andrew J.
    Shen, Li
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 447 - 455
  • [33] Facial emotion recognition using multi-modal information
    De Silva, LC
    Miyasato, T
    Nakatsu, R
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
  • [34] Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition
    Zhang, Yi
    Chen, Mingyuan
    Shen, Jundong
    Wang, Chongjun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9100 - 9108
  • [35] A multi-modal deep learning system for Arabic emotion recognition
    Abu Shaqra F.
    Duwairi R.
    Al-Ayyoub M.
    International Journal of Speech Technology, 2023, 26 (01) : 123 - 139
  • [36] Multi-task Learning using Multi-modal Encoder-Decoder Networks with Shared Skip Connections
    Kuga, Ryohei
    Kanezaki, Asako
    Samejima, Masaki
    Sugano, Yusuke
    Matsushita, Yasuyuki
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 403 - 411
  • [37] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Wei, Yu-Hung
    INTERSPEECH 2023, 2023, : 4553 - 4557
  • [38] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Ramaneswaran, S.
    Srivastava, Harshvardhan
    Manocha, Dinesh
    INTERSPEECH 2023, 2023, : 1209 - 1213
  • [39] A Multi-modal Multi-task based Approach for Movie Recommendation
    Raj, Subham
    Mondal, Prabir
    Chakder, Daipayan
    Saha, Sriparna
    Onoe, Naoyuki
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [40] Multi-task Multi-modal Models for Collective Anomaly Detection
    Ide, Tsuyoshi
    Phan, Dzung T.
    Kalagnanam, Jayant
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186