Learning Fused Representations for Large-Scale Multimodal Classification

被引:3
|
作者
Nawaz, Shah [1 ]
Calefati, Alessandro [1 ]
Janjua, Muhammad Kamran [2 ]
Anwaar, Muhammad Umer [3 ]
Gallo, Ignazio [1 ]
机构
[1] Univ Insubria, Dept Theoret & Appl Sci, I-21100 Varese, Italy
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Islamabad 44000, Pakistan
[3] Tech Univ Munich, D-80333 Munich, Germany
关键词
Image and text fusion; multimodal data fusion; text encoding;
D O I
10.1109/LSENS.2018.2880790
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multimodal strategies combine different input sources into a joint representation that provides enhanced information from the unimodal strategy. In this article, we present a novel multimodal approach that fuses image and encoded text description to obtain an information-enriched image. This approach casts encoded text obtained from Word2Vec word embedding into visual embedding to be concatenated with the image. We employ standard convolutional neural networks to learn representations of information-enriched images. Finally, we compare our approach with the unimodal approach and their combination on three large-scale multimodal datasets. Our findings indicate that the joint representation of encoded text and image in feature space improves the multimodal classification performance aiding the interpretability.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Blackthorn: Large-Scale Interactive Multimodal Learning
    Zahalka, Jan
    Rudinac, Stevan
    Jonsson, Bjorn Dor
    Koelma, Dennis C.
    Worring, Marcel
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 687 - 698
  • [2] Hierarchical Classification for Large-Scale Learning
    Wang, Boshi
    Barbu, Adrian
    [J]. ELECTRONICS, 2023, 12 (22)
  • [3] On Learning Semantic Representations for Large-Scale Abstract Sketches
    Xu, Peng
    Huang, Yongye
    Yuan, Tongtong
    Xiang, Tao
    Hospedales, Timothy M.
    Song, Yi-Zhe
    Wang, Liang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (09) : 3366 - 3379
  • [4] Learning Taxonomy Adaptation in Large-scale Classification
    Babbar, Rohit
    Partalas, Ioannis
    Gaussier, Eric
    Amini, Massih-Reza
    Amblard, Cecile
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [5] Learning multi-layer coarse-to-fine representations for large-scale image classification
    Zhang, Ji
    Mei, Kuizhi
    Zheng, Yu
    Fan, Jianping
    [J]. PATTERN RECOGNITION, 2019, 91 : 175 - 189
  • [6] Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing
    Chen, Qihua
    Chen, Xuejin
    Wang, Chenxuan
    Liu, Yixiong
    Xiong, Zhiwei
    Wu, Feng
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1174 - 1182
  • [7] Learning Distributed Representations for Large-Scale Dynamic Social Networks
    Zhiyuli, Aakas
    Liang, Xun
    Xu, Zhiming
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [8] Learning local equivariant representations for large-scale atomistic dynamics
    Musaelian, Albert
    Batzner, Simon
    Johansson, Anders
    Sun, Lixin
    Owen, Cameron J.
    Kornbluth, Mordechai
    Kozinsky, Boris
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)
  • [9] Learning local equivariant representations for large-scale atomistic dynamics
    Albert Musaelian
    Simon Batzner
    Anders Johansson
    Lixin Sun
    Cameron J. Owen
    Mordechai Kornbluth
    Boris Kozinsky
    [J]. Nature Communications, 14
  • [10] DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
    Ghosh, Sreyan
    Lepcha, Samden
    Sakshi, S.
    Shah, Rajiv Ratn
    Umesh, S.
    [J]. INTERSPEECH 2022, 2022, : 5185 - 5189