Multimodal Logical Inference System for Visual-Textual Entailment

被引:0
|
作者
Suzuki, Riko [1 ]
Yanaka, Hitomi [1 ,2 ]
Yoshikawa, Masashi [3 ]
Mineshima, Koji [1 ]
Bekki, Daisuke [1 ]
机构
[1] Ochanomizu Univ, Tokyo, Japan
[2] RIKEN Ctr Adv Intelligence Project, Tokyo, Japan
[3] Nara Inst Sci & Technol, Nara, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.
引用
收藏
页码:386 / 392
页数:7
相关论文
共 50 条
  • [21] Visual-textual prototyping of 4D scenes
    Duecker, M
    Geiger, C
    Hunstock, R
    Lehrenfeld, G
    Mueller, W
    1997 IEEE SYMPOSIUM ON VISUAL LANGUAGES, PROCEEDINGS, 1997, : 328 - 335
  • [22] Visual-Textual Attentive Semantic Consistency for Medical Report Generation
    Zhou, Yi
    Huang, Lei
    Zhou, Tao
    Fu, Huazhu
    Shao, Ling
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3965 - 3974
  • [23] Speech Grammars for Textual Entailment Patterns in Multimodal Question Answering
    Sonntag, Daniel
    Sacaleanu, Bogdan
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3554 - 3558
  • [24] Visual-textual framework for serverless computation: a Luna Language approach
    Moczurad, Piotr
    Malawski, Maciej
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING COMPANION (UCC COMPANION), 2018, : 169 - 174
  • [25] Visual-textual adversarial learning for person re-identification
    Yin, Pengqi
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [26] MUTATT: VISUAL-TEXTUAL MUTUAL GUIDANCE FOR REFERRING EXPRESSION COMPREHENSION
    Wang, Shuai
    Lyu, Fan
    Feng, Wei
    Wang, Song
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [27] Heterogeneous Dual-Task Clustering with Visual-Textual Information
    Yan, Xiaoqiang
    Mao, Yiqiao
    Hu, Shizhe
    Ye, Yangdong
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 658 - 666
  • [28] Visual-Textual Matching Attention for Lesion Segmentation in Chest Images
    Phuoc-Nguyen Bui
    Duc-Tai Le
    Choo, Hyunseung
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IX, 2024, 15009 : 702 - 711
  • [29] Visual-Textual Integration: Emoji as a Supplement in Health Information Design
    Lin, Tingyi S.
    Luo, Yue
    INTERNATIONAL JOURNAL OF DESIGN, 2024, 18 (02): : 37 - 58
  • [30] Visual-Textual Encounters with a German Grandfather: The Work of Angela Findlay
    Pettitt, Joanne
    JEWISH FILM & NEW MEDIA-AN INTERNATIONAL JOURNAL, 2023, 11 (01)