Understanding, Categorizing and Predicting Semantic Image-Text Relations

被引:15
|
作者
Otto, Christian [1 ]
Springstein, Matthias [1 ]
Anand, Avishek [2 ]
Ewerth, Ralph [3 ]
机构
[1] Leibniz Informat Ctr Sci & Technol TIB, Hannover, Germany
[2] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
[3] Leibniz Univ Hannover, L3S Res Ctr, Leibniz Informat Ctr Sci & Technol TIB, Hannover, Germany
关键词
Image-text class; multimodality; data augmentation; semantic gap;
D O I
10.1145/3323873.3325049
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes ( e.g., "illustration" or "anchorage") and show how they can systematically be characterized by a set of three metrics: cross-modal mutual information, semantic correlation, and the status relation of image and text. Furthermore, we present a deep learning system to predict these classes by utilizing multimodal embeddings. To obtain a sufficiently large amount of training data, we have automatically collected and augmented data from a variety of datasets and web resources, which enables future research on this topic. Experimental results on a demanding test set demonstrate the feasibility of the approach.
引用
收藏
页码:168 / 176
页数:9
相关论文
共 50 条
  • [1] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [2] Characterization and classification of semantic image-text relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2020, 9 (01) : 31 - 45
  • [3] Analysing image-text relations for semantic media adaptation and personalisation
    Hughes, Mark
    Salway, Andrew
    Jones, Gareth
    O'Connor, Noel
    SECOND INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, PROCEEDINGS, 2007, : 181 - +
  • [4] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [5] Understanding image-text relations and news values for multimodal news analysis
    Cheema, Gullal S.
    Hakimov, Sherzod
    Mueller-Budack, Eric
    Otto, Christian
    Bateman, John A.
    Ewerth, Ralph
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [6] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [7] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [8] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [9] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [10] CITE: A Corpus of Image-Text Discourse Relations
    Alikhani, Malihe
    Chowdhury, Sreyasi Nag
    De Melo, Gerard
    Stone, Matthew
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 570 - 575