On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study

被引:0
|
作者
Ma, Chunpeng [1 ]
Shen, Aili [2 ]
Yoshikawa, Hiyori [2 ]
Iwakura, Tomoya [2 ]
Beck, Daniel [3 ]
Baldwin, Timothy [3 ,4 ]
机构
[1] Fujitsu Ltd, 4-1-1 Kamikodanaka, Kawasaki, Kanagawa 2118588, Japan
[2] Amazon, Sydney, NSW, Australia
[3] Univ Melbourne, Melbourne, Vic, Australia
[4] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
Datasets; neural networks; natural language processing; text classification; multi-modality;
D O I
10.1145/3565572
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performance given text-only, images-only, or both text and images. To this end, we create a new dataset accompanied with a novel annotation method-Japanese Entity Labeling with Dynamic Annotation-to deepen our understanding of the effectiveness of images for multi-modal text classification. By performing careful comparative analysis of human performance and the performance of state-of-the-art multi-modal text classification models, we gain valuable insights into differences between human and model performance, and the conditions under which images are beneficial for text classification.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Alzheimer's disease classification method based on multi-modal medical images
    Han K.
    Pan H.
    Zhang W.
    Bian X.
    Chen C.
    He S.
    [J]. Pan, Haiwei (panhaiwei@hrbeu.edu.cn), 1600, Press of Tsinghua University (60): : 664 - 671and682
  • [32] Erratum to: Jointly Image Annotation and Classification Based on Supervised Multi-Modal Hierarchical Semantic Model
    Chun-yan Yin
    Yong-Heng Chen
    Wan-li Zuo
    [J]. Pattern Recognition and Image Analysis, 2020, 30 : 566 - 566
  • [33] A new multi-modal approach to bib number/text detection and recognition in Marathon images
    Shivakumara, Palaiahnakote
    Raghavendra, R.
    Qin, Longfei
    Raja, Kiran B.
    Lu, Tong
    Pal, Umapada
    [J]. PATTERN RECOGNITION, 2017, 61 : 479 - 491
  • [34] Multi-modal Broad Learning System for Medical Image and Text-based Classification
    Zhou, Yanhong
    Du, Jie
    Guan, Kai
    Wang, Tianfu
    [J]. 2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3439 - 3442
  • [35] Multi-text multi-modal reading processes and comprehension
    Cromley, Jennifer G.
    Kunze, Andrea J.
    Dane, Aygul Parpucu
    [J]. LEARNING AND INSTRUCTION, 2021, 71
  • [36] A Multi-Modal Hashing Learning Framework for Automatic Image Annotation
    Wang, Jiale
    Li, Guohui
    [J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 14 - 21
  • [37] Multi-modal browsing of images in Web documents
    Chen, F
    Gargi, U
    Niles, L
    Schütze, H
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL VI, 1999, 3651 : 122 - 133
  • [38] Efficient multi-modal fusion on supergraph for scalable image annotation
    Amiri, S. Hamid
    Jarnzad, Mansour
    [J]. PATTERN RECOGNITION, 2015, 48 (07) : 2241 - 2253
  • [39] Semantic relationships in multi-modal graphs for automatic image annotation
    Stathopoulos, Vassilios
    Urban, Jana
    Jose, Joemon
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 490 - 497
  • [40] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123