Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework

被引:0
|
作者
Wei, Jingxuan [1 ,3 ]
Tan, Cheng [2 ]
Gao, Zhangyang [2 ]
Sun, Linzhuang [1 ,3 ]
Li, Siyuan [2 ]
Yu, Bihui [1 ,3 ]
Guo, Ruifeng [1 ,3 ]
Li, Stan Z. [2 ]
机构
[1] Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Liaoning, China
[2] AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China
[3] University of Chinese Academy of Sciences, Liaoning, China
关键词
Contrastive Learning;
D O I
10.1007/s00521-024-10310-2
中图分类号
学科分类号
摘要
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence, especially when tackling complex tasks. While the chain-of-thought (CoT) technique has gained considerable attention, the existing ScienceQA dataset, primarily focused on multimodal scientific questions and explanations from elementary and high school textbooks, exhibits limitations in providing a comprehensive evaluation across a broader spectrum of open-domain questions. To address this gap, we introduce the COCO Multi-Modal Reasoning (COCO-MMR) dataset, a comprehensive collection of open-ended questions, rationales, and answers derived from the COCO dataset. Unlike previous datasets that rely on multiple-choice questions, our dataset utilizes open-ended questions to more effectively challenge and assess CoT models’ reasoning capabilities. Through comprehensive evaluations and detailed analyses, we demonstrate that our multihop cross-modal attention and sentence-level contrastive learning modules, designed to simulate human thought processes, significantly enhance model comprehension abilities. Experiments confirm the proposed dataset and techniques, showing their potential to advance multimodal reasoning. The data and code are available at https://github.com/weijingxuan/COCO-MMR. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:20849 / 20861
页数:12
相关论文
共 50 条
  • [41] A Human-Like Learning Framework of Robot Interaction Skills Based on Environmental Dynamics
    Liu, Hanzhong
    Yang, Chenguang
    Dai, Shi-Lu
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT IV, 2021, 13016 : 606 - 616
  • [42] AGI Preschool: A Framework for Evaluating Early-Stage Human-like AGIs
    Goertzel, Ben
    Bugaj, Stephan Vladimir
    ARTIFICIAL GENERAL INTELLIGENCE PROCEEDINGS, 2009, 8 : 31 - +
  • [43] An open framework for human-like autonomous driving using Inverse Reinforcement Learning
    Vasquez, Dizan
    Yu, Yufeng
    Kumar, Suryansh
    Laugier, Christian
    2014 IEEE VEHICLE POWER AND PROPULSION CONFERENCE (VPPC), 2014,
  • [44] Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
    Ellis, Kevin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] A New Principle toward Robust Matching in Human-like Stereovision
    Xie, Ming
    Lai, Tingfeng
    Fang, Yuhui
    BIOMIMETICS, 2023, 8 (03)
  • [46] A human-like SOA-based interdisciplinary framework for intelligent virtual agents
    Paletta, Mauricio
    Herrero, Pilar
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 1, PROCEEDINGS, 2007, 4805 : 115 - +
  • [47] A technical framework for human-like motion generation with autonomous anthropomorphic redundant manipulators
    Averta, Giuseppe
    Caporale, Danilo
    Della Santina, Cosimo
    Bicchi, Antonio
    Bianchi, Matteo
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3853 - 3859
  • [48] Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video Localization
    Liu, Daizong
    Hu, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4536 - 4545
  • [49] Enhancing multi-agent based simulation with human-like decision making strategies
    Norling, E
    Sonenberg, L
    Rönnquist, R
    MULTI-AGENT-BASED SIMULATION, 2001, 1979 : 214 - 228
  • [50] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
    Hagendorff, Thilo
    Fabi, Sarah
    Kosinski, Michal
    NATURE COMPUTATIONAL SCIENCE, 2023, 3 (10): : 833 - +