Automating Gaze Target Annotation in Human-Robot Interaction

被引：0

作者：

Cheng, Linlin ^{[1
]}

Hindriks, Koen V. ^{[1
]}

Belopolsky, Artem V. ^{[2
]}

机构：

[1] Vrije Univ Amsterdam, Fac Sci, Comp Sci, Amsterdam, Netherlands

[2] Vrije Univ Amsterdam, Dept Human Movement Sci, Amsterdam, Netherlands

来源：

2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024 | 2024年

关键词：

D O I：

10.1109/RO-MAN60168.2024.10731455

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Identifying gaze targets in videos of human-robot interaction is useful for measuring engagement. In practice, this requires manually annotating for a fixed set of objects that a participant is looking at in a video, which is very time-consuming. To address this issue, we propose an annotation pipeline for automating this effort. In this work, we focus on videos in which the objects looked at do not move. As input for the proposed pipeline, we therefore only need to annotate object bounding boxes for the first frame of each video. The benefit, moreover, of manually annotating these frames is that we can also draw bounding boxes for objects outside of it, which enables estimating gaze targets in videos where not all objects are visible. A second issue that we address is that the models used for automating the pipeline annotate individual video frames. In practice, however, manual annotation is done at the event level for video segments instead of single frames. Therefore, we also introduce and investigate several variants of algorithms for aggregating frame-level to event-level annotations, which are used in the last step in our annotation pipeline. We compare two versions of our pipeline: one that uses a state-of-the-art gaze estimation model (GEM) and a second one using a state-of-the-art target detection model (TDM). Our results show that both versions successfully automate the annotation, but the GEM pipeline performs slightly (approximate to 10%) better for videos where not all objects are visible. Analysis of our aggregation algorithm, moreover, shows that there is no need for manual video segmentation because a fixed time interval for segmentation yields very similar results. We conclude that the proposed pipeline can be used to automate almost all of the annotation effort.

引用

页码：991 / 998

页数：8

共 50 条

[41] Expressiveness in human-robot interaction
Marti, Patrizia
Giusti, Leonardo
Pollini, Alessandro
Rullo, Alessia
INTERACTION DESIGN AND ARCHITECTURES, 2008, (5-6) : 93 - 98
[42] Communication in Human-Robot Interaction
Andrea Bonarini
Current Robotics Reports, 2020, 1 (4): : 279 - 285
[43] Natural Human-Robot Interaction
Kanda, Takayuki
SIMULATION, MODELING, AND PROGRAMMING FOR AUTONOMOUS ROBOTS, 2010, 6472 : 2 - 2
[44] Intuitive human-robot interaction through active 3D gaze tracking
Atienza, R
Zelinsky, A
Robotics Research, 2005, 15 : 172 - 181
[45] Human-robot interaction and psychoanalysis
Scalzone, Franco
Tamburrini, Guglielmo
AI & SOCIETY, 2013, 28 (03) : 297 - 307
[46] The Science of Human-Robot Interaction
Kiesler, Sara
Goodrich, Michael A.
ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2018, 7 (01)
[47] Power in Human-Robot Interaction
Hou, Yoyo Tsung-Yu
Cheon, EunJeong
Jung, Malte F.
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024, 2024, : 269 - 282
[48] Comparing the Gaze Responses of Children with Autism and Typically Developed Individuals in Human-Robot Interaction
Mavadati, S. Mohammad
Feng, Huanghao
Gutierrez, Anibal
Mahoor, Mohammad H.
2014 14TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2014, : 1128 - 1133
[49] Visual Intention Classification by Deep Learning for Gaze-based Human-Robot Interaction
Shi, Lei
Copot, Cosmin
Vanlanduit, Steve
IFAC PAPERSONLINE, 2020, 53 (05): : 750 - 755
[50] Anthropomorphism and Human-Robot Interaction
Kim, Rae Yule
COMMUNICATIONS OF THE ACM, 2024, 67 (02) : 80 - 85

← 1 2 3 4 5 →