Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework

被引：0

作者：

Wei, Jingxuan ^{[1
,3
]}

Tan, Cheng ^{[2
]}

Gao, Zhangyang ^{[2
]}

Sun, Linzhuang ^{[1
,3
]}

Li, Siyuan ^{[2
]}

Yu, Bihui ^{[1
,3
]}

Guo, Ruifeng ^{[1
,3
]}

Li, Stan Z. ^{[2
]}

机构：

[1] Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Liaoning, China

[2] AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China

[3] University of Chinese Academy of Sciences, Liaoning, China

来源：

Neural Computing and Applications | 2024年 / 36卷 / 33期

关键词：

Contrastive Learning;

D O I：

10.1007/s00521-024-10310-2

中图分类号：

学科分类号：

摘要：

Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence, especially when tackling complex tasks. While the chain-of-thought (CoT) technique has gained considerable attention, the existing ScienceQA dataset, primarily focused on multimodal scientific questions and explanations from elementary and high school textbooks, exhibits limitations in providing a comprehensive evaluation across a broader spectrum of open-domain questions. To address this gap, we introduce the COCO Multi-Modal Reasoning (COCO-MMR) dataset, a comprehensive collection of open-ended questions, rationales, and answers derived from the COCO dataset. Unlike previous datasets that rely on multiple-choice questions, our dataset utilizes open-ended questions to more effectively challenge and assess CoT models’ reasoning capabilities. Through comprehensive evaluations and detailed analyses, we demonstrate that our multihop cross-modal attention and sentence-level contrastive learning modules, designed to simulate human thought processes, significantly enhance model comprehension abilities. Experiments confirm the proposed dataset and techniques, showing their potential to advance multimodal reasoning. The data and code are available at https://github.com/weijingxuan/COCO-MMR. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

引用

页码：20849 / 20861

页数：12

共 50 条

[41] A Human-Like Learning Framework of Robot Interaction Skills Based on Environmental Dynamics
Liu, Hanzhong
Yang, Chenguang
Dai, Shi-Lu
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT IV, 2021, 13016 : 606 - 616
[42] AGI Preschool: A Framework for Evaluating Early-Stage Human-like AGIs
Goertzel, Ben
Bugaj, Stephan Vladimir
ARTIFICIAL GENERAL INTELLIGENCE PROCEEDINGS, 2009, 8 : 31 - +
[43] An open framework for human-like autonomous driving using Inverse Reinforcement Learning
Vasquez, Dizan
Yu, Yufeng
Kumar, Suryansh
Laugier, Christian
2014 IEEE VEHICLE POWER AND PROPULSION CONFERENCE (VPPC), 2014,
[44] Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
Ellis, Kevin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[45] A New Principle toward Robust Matching in Human-like Stereovision
Xie, Ming
Lai, Tingfeng
Fang, Yuhui
BIOMIMETICS, 2023, 8 (03)
[46] A human-like SOA-based interdisciplinary framework for intelligent virtual agents
Paletta, Mauricio
Herrero, Pilar
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 1, PROCEEDINGS, 2007, 4805 : 115 - +
[47] A technical framework for human-like motion generation with autonomous anthropomorphic redundant manipulators
Averta, Giuseppe
Caporale, Danilo
Della Santina, Cosimo
Bicchi, Antonio
Bianchi, Matteo
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3853 - 3859
[48] Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video Localization
Liu, Daizong
Hu, Wei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4536 - 4545
[49] Enhancing multi-agent based simulation with human-like decision making strategies
Norling, E
Sonenberg, L
Rönnquist, R
MULTI-AGENT-BASED SIMULATION, 2001, 1979 : 214 - 228
[50] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
Hagendorff, Thilo
Fabi, Sarah
Kosinski, Michal
NATURE COMPUTATIONAL SCIENCE, 2023, 3 (10): : 833 - +

← 1 2 3 4 5 →