Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning

被引:13
|
作者
Zhang, Xi [1 ,2 ]
Zhang, Feifei [1 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
VCR; contrastive learning; counterfactual thinking;
D O I
10.1145/3474085.3475328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a question about an image, a Visual Commonsense Reasoning (VCR) model needs to provide not only a correct answer, but also a rationale to justify the answer. It is a challenging task due to the requirements of diverse visual content understanding, abstract language comprehending, and complicated inter-modality relationship reasoning. To solve above challenges, previous methods either resort to holistic attention mechanism or explore transformer-based model with pre-training, which, however, cannot perform comprehensive understanding and usually suffer from heavy computing burden. In this paper, we propose a novel multi-level counterfactual contrastive learning network for VCR by jointly modeling the hierarchical visual contents and the inter-modality relationships between the visual and linguistic domains. The proposed method enjoys several merits. First, with sufficient instance-level, imagelevel, and semantic-level contrastive learning, our model can extract discriminative features and perform comprehensive understanding for the image and linguistic expressions. Second, taking advantage of counterfactual thinking, we can generate informative factual and counterfactual samples for contrastive learning, resulting in stronger perception ability of our model. Third, an auxiliary contrast module is incorporated into our method to directly optimize the answer prediction in VCR, which further facilitates the representation learning. Extensive experiments on the VCR dataset demonstrate that our approach performs favorably against the state-of-the-arts.
引用
收藏
页码:1793 / 1802
页数:10
相关论文
共 50 条
  • [41] AND/OR reasoning graphs for determining prime implicants in multi-level combinational networks
    Stoffel, D
    Kunz, W
    Gerber, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1997, E80A (12) : 2581 - 2588
  • [42] A multi-level evaluation approach to chemical processes based on fuzzy reasoning
    Chen, C
    Shen, JZ
    Li, YR
    Hu, SY
    JOURNAL OF CHEMICAL ENGINEERING OF JAPAN, 2001, 34 (09) : 1147 - 1152
  • [43] A Multi-Level Study of Undergraduate Computer Science Reasoning about Concurrency
    Lawson, Aubrey
    Kraemer, Eileen T.
    Che, S. Megan
    Kennedy, Cazembe
    PROCEEDINGS OF THE 2019 ACM CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION (ITICSE '19), 2019, : 210 - 216
  • [44] Multi-Level Online Learning and Reasoning for Self-Integrating Systems
    Pol, Marius
    Diaconescu, Ada
    2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2021), 2021, : 187 - 192
  • [45] Multi-level Recommendation Reasoning over Knowledge Graphs with Reinforcement Learning
    Wang, Xiting
    Liu, Kunpeng
    Wang, Dongjie
    Wu, Le
    Fu, Yanjie
    Xie, Xing
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 2098 - 2108
  • [46] Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
    Yin, Da
    Li, Liunian Harold
    Hu, Ziniu
    Peng, Nanyun
    Chang, Kai-Wei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2115 - 2129
  • [47] Multi-Agent Planning and Diagnosis with Commonsense Reasoning
    Son, Tran Cao
    Yeoh, William
    Stern, Roni
    Kalech, Meir
    2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [48] Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
    Tang, Xuejiao
    Huang, Xin
    Zhang, Wenbin
    Child, Travers B.
    Hu, Qiong
    Liu, Zhen
    Zhang, Ji
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2021), 2021, 12925 : 81 - 93
  • [49] Multi-level nature of and multi-level approaches to leadership
    Yammarino, Francis J.
    Dansereau, Fred
    LEADERSHIP QUARTERLY, 2008, 19 (02): : 135 - 141
  • [50] Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
    Leibniz University of Hannover, Germany
    不详
    不详
    不详
    不详
    不详
    arXiv, 1600,