On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study

被引:0
|
作者
Ma, Chunpeng [1 ]
Shen, Aili [2 ]
Yoshikawa, Hiyori [2 ]
Iwakura, Tomoya [2 ]
Beck, Daniel [3 ]
Baldwin, Timothy [3 ,4 ]
机构
[1] Fujitsu Ltd, 4-1-1 Kamikodanaka, Kawasaki, Kanagawa 2118588, Japan
[2] Amazon, Sydney, NSW, Australia
[3] Univ Melbourne, Melbourne, Vic, Australia
[4] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
Datasets; neural networks; natural language processing; text classification; multi-modality;
D O I
10.1145/3565572
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performance given text-only, images-only, or both text and images. To this end, we create a new dataset accompanied with a novel annotation method-Japanese Entity Labeling with Dynamic Annotation-to deepen our understanding of the effectiveness of images for multi-modal text classification. By performing careful comparative analysis of human performance and the performance of state-of-the-art multi-modal text classification models, we gain valuable insights into differences between human and model performance, and the conditions under which images are beneficial for text classification.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Image and Encoded Text Fusion for Multi-Modal Classification
    Gallo, I.
    Calefati, A.
    Nawaz, S.
    Janjua, M. K.
    [J]. 2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2018, : 203 - 209
  • [2] A Multi-Modal Topic Model for Image Annotation Using Text Analysis
    Tian, Jing
    Huang, Yu
    Guo, Zhi
    Qi, Xiang
    Chen, Ziyan
    Huang, Tinglei
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (07) : 886 - 890
  • [3] Multi-modal text recognition and encryption in scanned document images
    Maemoona Kayani
    Abdul Ghafoor
    M. Mohsin Riaz
    [J]. The Journal of Supercomputing, 2023, 79 : 7916 - 7936
  • [4] Multi-modal text recognition and encryption in scanned document images
    Kayani, Maemoona
    Ghafoor, Abdul
    Riaz, M. Mohsin
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (07): : 7916 - 7936
  • [5] Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics
    Chen, YongHeng
    Zhang, Fuquan
    Zuo, WanLi
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (01): : 392 - 412
  • [6] EFFECTIVENESS OF MULTI-MODAL ANALGESIA
    Torres, Mary
    [J]. JOURNAL OF PERIANESTHESIA NURSING, 2019, 34 (04) : E9 - E9
  • [7] Multi-modal Extreme Classification
    Mittal, Anshul
    Dahiya, Kunal
    Malani, Shreya
    Ramaswamy, Janani
    Kuruvilla, Seba
    Ajmera, Jitendra
    Chang, Keng-Hao
    Agarwal, Sumeet
    Kar, Purushottam
    Varma, Manik
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12383 - 12392
  • [8] UNIVERSAL MULTI-MODAL DEEP NETWORK FOR CLASSIFICATION AND SEGMENTATION OF MEDICAL IMAGES
    Harouni, Ahmed
    Karargyris, Alexandros
    Negahdar, Mohammadreza
    Beymer, David
    Syeda-Mahmood, Tanveer
    [J]. 2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 872 - 876
  • [9] Classification of multi-modal remote sensing images based on knowledge graph
    Fang, Jianyong
    Yan, Xuefeng
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (15) : 4815 - 4835
  • [10] Multi-modal Shape Classification using Point Cloud and Projection Images
    Seo, Hogeon
    [J]. JOURNAL OF THE KOREAN SOCIETY FOR NONDESTRUCTIVE TESTING, 2022, 42 (01) : 18 - 25