Efficient Large-Scale Multi-Modal Classification

被引:0
|
作者
Kiela, Douwe [1 ]
Grave, Edouard [1 ]
Joulin, Armand [1 ]
Mikolov, Tomas [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi modal fusion, with the additional benefit of improved interpretability.
引用
收藏
页码:5198 / 5204
页数:7
相关论文
共 50 条
  • [1] Effective Classification for Multi-modal Behavioral Authentication on Large-Scale Data
    Yamaguchi, Shuji
    Gomi, Hidehito
    Kobayashi, Ryosuke
    Tran Phuong Thao
    Irvan, Mhd
    Yamaguchi, Rie Shigetomi
    [J]. 2020 15TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS 2020), 2020, : 101 - 109
  • [2] Effective Classification for Multi-modal Behavioral Authentication on Large-Scale Data
    Yamaguchi, Shuji
    Gomi, Hidehito
    Kobayashi, Ryosuke
    Yamaguchi, Rie Shigetomi
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (05): : 1171 - 1183
  • [3] Towards Good Practices for Multi-modal Fusion in Large-Scale Video Classification
    Liu, Jinlai
    Yuan, Zehuan
    Wang, Changhu
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 287 - 296
  • [4] Large-scale Multi-modal Search and QA at Alibaba
    Jin, Rong
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 8 - 8
  • [5] MMpedia: A Large-Scale Multi-modal Knowledge Graph
    Wu, Yinan
    Wu, Xiaowei
    Li, Junwen
    Zhang, Yue
    Wang, Haofen
    Du, Wen
    He, Zhidong
    Liu, Jingping
    Ruan, Tong
    [J]. SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 18 - 37
  • [6] Exploring a large-scale multi-modal transportation recommendation system
    Liu, Yang
    Lyu, Cheng
    Liu, Zhiyuan
    Cao, Jinde
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 126
  • [7] Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
    Wang, Meng
    Wang, Haofen
    Qi, Guilin
    Zheng, Qiushuo
    [J]. BIG DATA RESEARCH, 2020, 22
  • [8] Operational planning of a large-scale multi-modal transportation system
    Jansen, B
    Swinkels, PCJ
    Teeuwen, GJA
    de Fluiter, BV
    Fleuren, HA
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 156 (01) : 41 - 53
  • [9] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    Xiang, Tao
    Chang, Shih-Fu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731
  • [10] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    [J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3