Learning Fused Representations for Large-Scale Multimodal Classification

被引:3
|
作者
Nawaz, Shah [1 ]
Calefati, Alessandro [1 ]
Janjua, Muhammad Kamran [2 ]
Anwaar, Muhammad Umer [3 ]
Gallo, Ignazio [1 ]
机构
[1] Univ Insubria, Dept Theoret & Appl Sci, I-21100 Varese, Italy
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Islamabad 44000, Pakistan
[3] Tech Univ Munich, D-80333 Munich, Germany
关键词
Image and text fusion; multimodal data fusion; text encoding;
D O I
10.1109/LSENS.2018.2880790
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multimodal strategies combine different input sources into a joint representation that provides enhanced information from the unimodal strategy. In this article, we present a novel multimodal approach that fuses image and encoded text description to obtain an information-enriched image. This approach casts encoded text obtained from Word2Vec word embedding into visual embedding to be concatenated with the image. We employ standard convolutional neural networks to learn representations of information-enriched images. Finally, we compare our approach with the unimodal approach and their combination on three large-scale multimodal datasets. Our findings indicate that the joint representation of encoded text and image in feature space improves the multimodal classification performance aiding the interpretability.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Learning General Audio Representations With Large-Scale Training of Patchout Audio Transformers
    Koutini, Khaled
    Masoudian, Shahed
    Schmid, Florian
    Eghbal-zadeh, Hamid
    Schlueter, Jan
    Widmer, Gerhard
    [J]. HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 65 - 88
  • [32] Learning-Dependent Evolution of Spatial Representations in Large-Scale Virtual Environments
    Starrett, Michael J.
    Stokes, Jared D.
    Huffman, Derek J.
    Ferrer, Emilio
    Ekstrom, Arne D.
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2019, 45 (03) : 497 - 514
  • [33] MultiSubs: A Large-scale Multimodal and Multilingual Dataset
    Wang, Josiah
    Figueiredo, Josiel
    Specia, Lucia
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6776 - 6785
  • [34] Large-Scale Multimodal Movie Dialogue Corpus
    Yasuhara, Ryu
    Inoue, Masashi
    Suga, Ikuya
    Kosaka, Tetsuo
    [J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 414 - 415
  • [35] Deep Multi-Task Learning for Large-Scale Image Classification
    Kuang, Zhenzhong
    Li, Zongmin
    Zhao, Tianyi
    Fan, Jianping
    [J]. 2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 310 - 317
  • [36] Joint Hierarchical Category Structure Learning and Large-Scale Image Classification
    Qu, Yanyun
    Lin, Li
    Shen, Fumin
    Lu, Chang
    Wu, Yang
    Xie, Yuan
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (09) : 4331 - 4346
  • [37] Extreme Learning Machine for Large-Scale Graph Classification Based on MapReduce
    Wang, Zhanghui
    Zhao, Yuhai
    Wang, Guoren
    [J]. PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 93 - 105
  • [38] Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing
    Sun, Chong
    Rampalli, Narasimhan
    Yang, Frank
    Doan, Anhai
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1529 - 1540
  • [39] Large-scale Landsat image classification based on deep learning methods
    Zhao, Xuemei
    Gao, Lianru
    Chen, Zhengchao
    Zhang, Bing
    Liao, Wenzhi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2019, 8
  • [40] An Active Learning Based LDA Algorithm for Large-Scale Data Classification
    [J]. Yu, Xu (yuxu0532@163.com), 1600, Science and Engineering Research Support Society (09):