Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

被引：5

作者：

Wu, Qi ^{[1
]}

Wu, Cheng-Ju ^{[1
]}

Zhu, Yixin ^{[1
]}

Joo, Jungseock ^{[1
]}

机构：

[1] UCLA, Los Angeles, CA 90024 USA

来源：

2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2021年

基金：

美国国家科学基金会;

关键词：

RECOGNITION; ROBOT;

D O I：

10.1109/IROS51168.2021.9636208

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human-robot collaboration is an essential research topic in artificial intelligence (AI), enabling researchers to devise cognitive AI systems and affords an intuitive means for users to interact with the robot. Of note, communication plays a central role. To date, prior studies in embodied agent navigation have only demonstrated that human languages facilitate communication by instructions in natural languages. Nevertheless, a plethora of other forms of communication is left unexplored. In fact, human communication originated in gestures and oftentimes is delivered through multimodal cues, e.g., "go there" with a pointing gesture. To bridge the gap and fill in the missing dimension of communication in embodied agent navigation, we propose investigating the effects of using gestures as the communicative interface instead of verbal cues. Specifically, we develop a VR-based 3D simulation environment, named Gesture-based THOR (Ges-THOR), based on AI2-THOR platform. In this virtual environment, a human player is placed in the same virtual scene and shepherds the artificial agent using only gestures. The agent is tasked to solve the navigation problem guided by natural gestures with unknown semantics; we do not use any predefined gestures due to the diversity and versatile nature of human gestures. We argue that learning the semantics of natural gestures is mutually beneficial to learning the navigation task-learn to communicate and communicate to learn. In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

引用

页码：4095 / 4102

页数：8

共 5 条

[1] Scene Graph Contrastive Learning for Embodied Navigation
Singh, Kunal Pratap
Salvador, Jordi
Weihs, Luca
Kembhavi, Aniruddha
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10850 - 10860
[2] Learning to generate pointing gestures in situated embodied conversational agents
Deichler, Anna
Wang, Siyang
Alexanderson, Simon
Beskow, Jonas
[J]. FRONTIERS IN ROBOTICS AND AI, 2023, 10
[3] Learning communicative actions of conflicting human agents
Galitsky, Boris A.
Kuznetsov, Sergei O.
[J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2008, 20 (04) : 277 - 317
[4] Learning Cross Dimension Scene Representation for Interactive Navigation Agents in Obstacle-Cluttered Environments
Sang, Hongrui
Jiang, Rong
Li, Xin
Wang, Zhipeng
Zhou, Yanmin
He, Bin
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6264 - 6271
[5] Learning about; versus learning, from' other minds: Natural pedagogy and the role of ostensive communicative cues in cultural learning in human infants
Gergely, Gyoergy
[J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 528 - 528

← 1 →