Flexible Visual Grounding

被引：0

作者：

Kim, Yongmin ^{[1
]}

Chu, Chenhui ^{[1
]}

Kurohashi, Sadao ^{[1
]}

机构：

[1] Kyoto Univ, Kyoto, Japan

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing visual grounding datasets are artificially made, where every query regarding an entity must be able to be grounded to a corresponding image region, i.e., answerable. However, in real-world multimedia data such as news articles and social media, many entities in the text cannot be grounded to the image, i.e., unanswerable, due to the fact that the text is unnecessarily directly describing the accompanying image. A robust visual grounding model should be able to flexibly deal with both answerable and unanswerable visual grounding. To study this flexible visual grounding problem, we construct a pseudo dataset and a social media dataset including both answerable and unanswerable queries. In order to handle unanswerable visual grounding, we propose a novel method by adding a pseudo image region corresponding to a query that cannot be grounded. The model is then trained to ground to ground-truth regions for answerable queries and pseudo regions for unanswerable queries. In our experiments, we show that our model can flexibly process both answerable and unanswerable queries with high accuracy on our datasets.(1)

引用

页码：285 / 299

页数：15

共 50 条

[21] Learning and grounding visual multimodal adaptive graph for visual navigation
Zhou, Kang
Wang, Jianping
Xu, Weitao
Song, Linqi
Ye, Zaiqiao
Guo, Chi
Li, Cong
INFORMATION FUSION, 2025, 118
[22] Flexible PCB Grounding Connections for Hybrid Systems
Cracraft, Michael
Connor, Samuel
Archambeault, Bruce
2013 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY (EMC), 2013, : 466 - 471
[23] INGRESS: Interactive visual grounding of referring expressions
Shridhar, Mohit
Mittal, Dixant
Hsu, David
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (2-3): : 217 - 232
[24] Countering Language Drift via Visual Grounding
Lee, Jason
Cho, Kyunghyun
Kiela, Douwe
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4385 - 4395
[25] Measuring Faithful and Plausible Visual Grounding in VQA
Reich, Daniel
Putze, Felix
Schultz, Tanja
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3129 - 3144
[26] Detecting and Grounding Important Characters in Visual Stories
Liu, Danyang
Keller, Frank
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13210 - 13218
[27] Grounding Visual Representations with Texts for Domain Generalization
Min, Seonwoo
Park, Nokyung
Kim, Siwon
Park, Seunghyun
Kim, Jinkyu
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 37 - 53
[28] INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Zhang, Hanbo
Lu, Yunfan
Yu, Cunjun
Hsu, David
Lan, Xuguang
Zheng, Nanning
arXiv, 2021,
[29] Visual Grounding With Joint Multimodal Representation and Interaction
Zhu, Hong
Lu, Qingyang
Xue, Lei
Xue, Mogen
Yuan, Guanglin
Zhong, Bineng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72 : 1 - 11
[30] Towards Unified Interactive Visual Grounding in The Wild
Xu, Jie
Zhang, Hanbo
Shi, Qingyi
Liu, Yifeng
Lan, Xuguang
Kong, Tao
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 3288 - 3295

← 1 2 3 4 5 →