Efficient Large-Scale Multi-Modal Classification

被引：0

作者：

Kiela, Douwe ^{[1
]}

Grave, Edouard ^{[1
]}

Joulin, Armand ^{[1
]}

Mikolov, Tomas ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi modal fusion, with the additional benefit of improved interpretability.

引用

页码：5198 / 5204

页数：7

共 50 条

[41] Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
Zeng, Zhaoyang
Luo, Yongsheng
Liu, Zhenhua
Rao, Fengyun
Li, Dian
Guo, Weidong
Wen, Zhen
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3128 - 3137
[42] Application of smart card data in validating a large-scale multi-modal transit assignment model
Tavassoli A.
Mesbah M.
Hickman M.
[J]. Tavassoli, Ahmad (a.tavassoli@uq.edu.au), 2018, Springer Verlag (10) : 1 - 21
[43] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
Yiming Qin
Xiaoyu Chi
Bin Sheng
Rynson W. H. Lau
[J]. The Visual Computer, 2023, 39 : 3597 - 3607
[44] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
Qin, Yiming
Chi, Xiaoyu
Sheng, Bin
Lau, Rynson W. H.
[J]. VISUAL COMPUTER, 2023, 39 (08): : 3597 - 3607
[45] Biological Insight From Large-Scale Studies of Bipolar Disorder With Multi-Modal Imaging and Genomics
Andreassen, Ole
Houenou, Josselin
Duchesnay, Edouard
Favre, Pauline
Pauling, Melissa
van Haren, Neeltje
Brouwer, Rachel
de Zwarte, Sonja
Thompson, Paul
Ching, Christopher
[J]. BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S49 - S50
[46] CASIA-SURF: A Large-Scale Multi-Modal Benchmark for Face Anti-Spoofing
Zhang, Shifeng
Liu, Ajian
Wan, Jun
Liang, Yanyan
Guo, Guodong
Escalera, Sergio
Escalante, Hugo Jair
Li, Stan Z.
[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 182 - 193
[47] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
Liu, Tengfei
Hu, Yongli
Gao, Junbin
Sun, Yanfeng
Yin, Baocai
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390
[48] A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
Lu, Chongshan
Yin, Fukun
Chen, Xin
Liu, Wen
Chen, Tao
Yu, Gang
Fan, Jiayuan
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7523 - 7533
[49] Multi-modal data collection for measuring health, behavior, and living environment of large-scale participant cohorts
Wu, Congyu
Fritz, Hagen
Bastami, Sepehr
Maestre, Juan P.
Thomaz, Edison
Julien, Christine
Castelli, Darla M.
de Barbaro, Kaya
Bearman, Sarah Kate
Harari, Gabriella M.
Craddock, R. Cameron
Kinney, Kerry A.
Gosling, Samuel D.
Schnyer, David M.
Nagy, Zoltan
[J]. GIGASCIENCE, 2021, 10 (06):
[50] Integrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration
Luo, H.
Fan, J.
Satoh, S.
Yang, J.
Ribarsky, W.
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2008, 23 (07) : 538 - 553

← 1 2 3 4 5 →