Efficient Large-Scale Multi-Modal Classification

被引:0
|
作者
Kiela, Douwe [1 ]
Grave, Edouard [1 ]
Joulin, Armand [1 ]
Mikolov, Tomas [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi modal fusion, with the additional benefit of improved interpretability.
引用
收藏
页码:5198 / 5204
页数:7
相关论文
共 50 条
  • [41] Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
    Zeng, Zhaoyang
    Luo, Yongsheng
    Liu, Zhenhua
    Rao, Fengyun
    Li, Dian
    Guo, Weidong
    Wen, Zhen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3128 - 3137
  • [42] Application of smart card data in validating a large-scale multi-modal transit assignment model
    Tavassoli A.
    Mesbah M.
    Hickman M.
    [J]. Tavassoli, Ahmad (a.tavassoli@uq.edu.au), 2018, Springer Verlag (10) : 1 - 21
  • [43] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
    Yiming Qin
    Xiaoyu Chi
    Bin Sheng
    Rynson W. H. Lau
    [J]. The Visual Computer, 2023, 39 : 3597 - 3607
  • [44] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
    Qin, Yiming
    Chi, Xiaoyu
    Sheng, Bin
    Lau, Rynson W. H.
    [J]. VISUAL COMPUTER, 2023, 39 (08): : 3597 - 3607
  • [45] Biological Insight From Large-Scale Studies of Bipolar Disorder With Multi-Modal Imaging and Genomics
    Andreassen, Ole
    Houenou, Josselin
    Duchesnay, Edouard
    Favre, Pauline
    Pauling, Melissa
    van Haren, Neeltje
    Brouwer, Rachel
    de Zwarte, Sonja
    Thompson, Paul
    Ching, Christopher
    [J]. BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S49 - S50
  • [46] CASIA-SURF: A Large-Scale Multi-Modal Benchmark for Face Anti-Spoofing
    Zhang, Shifeng
    Liu, Ajian
    Wan, Jun
    Liang, Yanyan
    Guo, Guodong
    Escalera, Sergio
    Escalante, Hugo Jair
    Li, Stan Z.
    [J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 182 - 193
  • [47] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390
  • [48] A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
    Lu, Chongshan
    Yin, Fukun
    Chen, Xin
    Liu, Wen
    Chen, Tao
    Yu, Gang
    Fan, Jiayuan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7523 - 7533
  • [49] Multi-modal data collection for measuring health, behavior, and living environment of large-scale participant cohorts
    Wu, Congyu
    Fritz, Hagen
    Bastami, Sepehr
    Maestre, Juan P.
    Thomaz, Edison
    Julien, Christine
    Castelli, Darla M.
    de Barbaro, Kaya
    Bearman, Sarah Kate
    Harari, Gabriella M.
    Craddock, R. Cameron
    Kinney, Kerry A.
    Gosling, Samuel D.
    Schnyer, David M.
    Nagy, Zoltan
    [J]. GIGASCIENCE, 2021, 10 (06):
  • [50] Integrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration
    Luo, H.
    Fan, J.
    Satoh, S.
    Yang, J.
    Ribarsky, W.
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2008, 23 (07) : 538 - 553