Understanding, Categorizing and Predicting Semantic Image-Text Relations

被引:15
|
作者
Otto, Christian [1 ]
Springstein, Matthias [1 ]
Anand, Avishek [2 ]
Ewerth, Ralph [3 ]
机构
[1] Leibniz Informat Ctr Sci & Technol TIB, Hannover, Germany
[2] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
[3] Leibniz Univ Hannover, L3S Res Ctr, Leibniz Informat Ctr Sci & Technol TIB, Hannover, Germany
关键词
Image-text class; multimodality; data augmentation; semantic gap;
D O I
10.1145/3323873.3325049
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes ( e.g., "illustration" or "anchorage") and show how they can systematically be characterized by a set of three metrics: cross-modal mutual information, semantic correlation, and the status relation of image and text. Furthermore, we present a deep learning system to predict these classes by utilizing multimodal embeddings. To obtain a sufficiently large amount of training data, we have automatically collected and augmented data from a variety of datasets and web resources, which enables future research on this topic. Experimental results on a demanding test set demonstrate the feasibility of the approach.
引用
收藏
页码:168 / 176
页数:9
相关论文
共 50 条
  • [31] DIAL: Dense Image-Text ALignment for Weakly Supervised Semantic Segmentation
    Jang, Soojin
    Yun, Jungmin
    Kwon, Junehyoung
    Lee, Eunju
    Kim, Youngbin
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 248 - 266
  • [32] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Zuo, Lin
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
  • [33] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [34] Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval
    Li, Wenhui
    Yang, Song
    Li, Qiang
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1867 - 1880
  • [35] A method for image-text matching based on semantic filtering and adaptive adjustment
    Jin, Ran
    Hou, Tengda
    Jin, Tao
    Yuan, Jie
    Du, Chenjie
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2024, 2024 (01)
  • [36] Progressive semantic aggregation and structured cognitive enhancement for image-text matching
    Li, Mingyong
    Gao, Yihua
    Zhao, Honggang
    Li, Ruiheng
    Chen, Junyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [37] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [38] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    Engineering Applications of Artificial Intelligence, 2024, 133
  • [39] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
    Li, Pengwei
    Wu, Shihua
    Lian, Zhichao
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
  • [40] USER: Unified Semantic Enhancement With Momentum Contrast for Image-Text Retrieval
    Zhang, Yan
    Ji, Zhong
    Wang, Di
    Pang, Yanwei
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 595 - 609