Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

被引:0
|
作者
Liu, Zhun [1 ]
Shen, Ying [1 ]
Lakshminarasimhan, Varun Bharadhwaj [1 ]
Liang, Paul Pu [1 ]
Zadeh, Amir [1 ]
Morency, Louis-Philippe [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multimodal research is an emerging field of artificial intelligence, and one of the main research problems in this field is multimodal fusion. The fusion of multimodal data is the process of integrating multiple unimodal representations into one compact multimodal representation. Previous research in this field has exploited the expressiveness of tensors for multimodal representation. However, these methods often suffer from exponential increase in dimensions and in computational complexity introduced by transformation of input into tensor. In this paper, we propose the Low-rank Multimodal Fusion method, which performs multimodal fusion using low-rank tensors to improve efficiency. We evaluate our model on three different tasks: multimodal sentiment analysis, speaker trait analysis, and emotion recognition. Our model achieves competitive results on all these tasks while drastically reducing computational complexity. Additional experiments also show that our model can perform robustly for a wide range of low-rank settings, and is indeed much more efficient in both training and inference compared to other methods that utilize tensor representations.
引用
收藏
页码:2247 / 2256
页数:10
相关论文
共 50 条
  • [21] Poster Abstract: Multimodal Emotion Recognition by extracting common and modality-specific information
    Zhang, Wei
    Gu, Weixi
    Ma, Fei
    Ni, Shiguang
    Zhang, Lin
    Huang, Shao-Lun
    SENSYS'18: PROCEEDINGS OF THE 16TH CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, 2018, : 396 - 397
  • [22] Latent Low-rank Graph Learning for Multimodal Clustering
    Zhong, Guo
    Pun, Chi-Man
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 492 - 503
  • [23] Cross modality fusion for modality-specific lung tumor segmentation in PET-CT images
    Zhang, Xu
    Zhang, Bin
    Deng, Shengming
    Meng, Qingquan
    Chen, Xinjian
    Xiang, Dehui
    PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (22):
  • [24] Orienting and maintenance of spatial attention in audition and vision: multimodal and modality-specific brain activations
    Juha Salmi
    Teemu Rinne
    Alexander Degerman
    Oili Salonen
    Kimmo Alho
    Brain Structure and Function, 2007, 212 : 181 - 194
  • [25] Orienting and maintenance of spatial attention in audition and vision: multimodal and modality-specific brain activations
    Salmi, Juha
    Rinne, Teemu
    Degerman, Alexander
    Salonen, Oili
    Alho, Kimmo
    BRAIN STRUCTURE & FUNCTION, 2007, 212 (02): : 181 - 194
  • [26] Task-Dependent Recruitment of Modality-Specific and Multimodal Regions during Conceptual Processing
    Kuhnke, Philipp
    Kiefer, Markus
    Hartwigsen, Gesa
    CEREBRAL CORTEX, 2020, 30 (07) : 3938 - 3959
  • [27] CROSS: EFFICIENT LOW-RANK TENSOR COMPLETION
    Zhang, Anru
    ANNALS OF STATISTICS, 2019, 47 (02): : 936 - 964
  • [28] Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis
    Miao, Xinmeng
    Zhang, Xuguang
    Zhang, Haoran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63291 - 63308
  • [29] EFFICIENT LEARNING OF DICTIONARIES WITH LOW-RANK ATOMS
    Ravishankar, Saiprasad
    Moore, Brian E.
    Nadakuditi, Raj Rao
    Fessler, Jeffrey A.
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 222 - 226
  • [30] Multi-focus image fusion based on latent low-rank representation combining low-rank representation
    Chen M.
    Zhong Y.
    Li Z.-D.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2020, 50 (01): : 297 - 305