OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

被引:0
|
作者
Fang, Yini [1 ]
Yu, Jingling [1 ]
Zhang, Haozheng [2 ]
van der Lans, Ralf [1 ]
Shi, Bertram [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Clear Water Bay, Hong Kong, Peoples R China
[2] Univ Durham, Durham, England
来源
关键词
VISUAL-ATTENTION;
D O I
10.1007/978-3-031-73001-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual search is important in our daily life. The efficient allocation of visual attention is critical to effectively complete visual search tasks. Prior research has predominantly modelled the spatial allocation of visual attention in images at the pixel level, e.g. using a saliency map. However, emerging evidence shows that visual attention is guided by objects rather than pixel intensities. This paper introduces the Object-level Attention Transformer (OAT), which predicts human scanpaths as they search for a target object within a cluttered scene of distractors. OAT uses an encoder-decoder architecture. The encoder captures information about the position and appearance of the objects within an image and about the target. The decoder predicts the gaze scanpath as a sequence of object fixations, by integrating output features from both the encoder and decoder. We also propose a new positional encoding that better reflects spatial relationships between objects. We evaluated OAT on the Amazon book cover dataset and a new dataset for visual search that we collected. OAT's predicted gaze scanpaths align more closely with human gaze patterns, compared to predictions by algorithms based on spatial attention on both established metrics and a novel behavioural-based metric. Our results demonstrate the generalization ability of OAT, as it accurately predicts human scanpaths for unseen layouts and target objects. The code is available at: https://github.com/HKUST-NISL/oat_eccv24.
引用
收藏
页码:366 / 382
页数:17
相关论文
共 50 条
  • [21] Object-Level Video Advertising: An Optimization Framework
    Zhang, Haijun
    Cao, Xiong
    Ho, John K. L.
    Chow, Tommy W. S.
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (02) : 520 - 531
  • [22] An Effective Object-Level XML Keyword Search
    Bao, Zhifeng
    Lu, Jiaheng
    Ling, Tok Wang
    Xu, Liang
    Wu, Huayu
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, PROCEEDINGS, 2010, 5981 : 93 - +
  • [23] Experiences with an object-level scalable web framework
    Prabhakaran, B
    Tu, YU
    Wu, Y
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2003, 26 (02) : 163 - 196
  • [24] Query Result Clustering for Object-level Search
    Lee, Jongwuk
    Hwang, Seung-won
    Nie, Zaiqing
    Wen, Ji-Rong
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 1205 - 1213
  • [25] Object-level structured contour map extraction
    Bergevin, R
    Bubel, A
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 91 (03) : 302 - 334
  • [26] Object-level saliency detection with color attributes
    Huo, Lina
    Jiao, Licheng
    Wang, Shuang
    Yang, Shuyuan
    PATTERN RECOGNITION, 2016, 49 : 162 - 173
  • [27] ObjectFusion: Accurate object-level SLAM with neural object priors
    Zou, Zi-Xin
    Huang, Shi-Sheng
    Mu, Tai-Jiang
    Wang, Yu-Ping
    GRAPHICAL MODELS, 2022, 123
  • [28] Object-level change detection in spectral imagery
    Hazel, GG
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2001, 39 (03): : 553 - 561
  • [29] Contextual Outpainting with Object-Level Contrastive Learning
    Li, Jiacheng
    Chen, Chang
    Xiong, Zhiwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11441 - 11450
  • [30] Global Localization with Object-Level Semantics and Topology
    Liu, Yu
    Petillot, Yvan
    Lane, David
    Wang, Sen
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 4909 - 4915