OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

被引:0
|
作者
Fang, Yini [1 ]
Yu, Jingling [1 ]
Zhang, Haozheng [2 ]
van der Lans, Ralf [1 ]
Shi, Bertram [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Clear Water Bay, Hong Kong, Peoples R China
[2] Univ Durham, Durham, England
来源
关键词
VISUAL-ATTENTION;
D O I
10.1007/978-3-031-73001-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual search is important in our daily life. The efficient allocation of visual attention is critical to effectively complete visual search tasks. Prior research has predominantly modelled the spatial allocation of visual attention in images at the pixel level, e.g. using a saliency map. However, emerging evidence shows that visual attention is guided by objects rather than pixel intensities. This paper introduces the Object-level Attention Transformer (OAT), which predicts human scanpaths as they search for a target object within a cluttered scene of distractors. OAT uses an encoder-decoder architecture. The encoder captures information about the position and appearance of the objects within an image and about the target. The decoder predicts the gaze scanpath as a sequence of object fixations, by integrating output features from both the encoder and decoder. We also propose a new positional encoding that better reflects spatial relationships between objects. We evaluated OAT on the Amazon book cover dataset and a new dataset for visual search that we collected. OAT's predicted gaze scanpaths align more closely with human gaze patterns, compared to predictions by algorithms based on spatial attention on both established metrics and a novel behavioural-based metric. Our results demonstrate the generalization ability of OAT, as it accurately predicts human scanpaths for unseen layouts and target objects. The code is available at: https://github.com/HKUST-NISL/oat_eccv24.
引用
收藏
页码:366 / 382
页数:17
相关论文
共 50 条
  • [1] Object-level Attention for Aesthetic Rating Distribution Prediction
    Hou, Jingwen
    Yang, Sheng
    Lin, Weisi
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 816 - 824
  • [2] Object-Level Scene Context Prediction
    Qiao, Xiaotian
    Zheng, Quanlong
    Cao, Ying
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5280 - 5292
  • [3] Object-Level Attention Prediction for Drivers in the Information-Rich Traffic Environment
    Liu, QingXiao
    Yao, Hui
    Lu, Chao
    Liu, HaiOu
    Yi, Yangtian
    Chen, HuiYan
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6396 - 6406
  • [4] A Self Validation Network for Object-Level Human Attention Estimation
    Zhang, Zehua
    Yu, Chen
    Crandall, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] TransGOP: Transformer-Based Gaze Object Prediction
    Wang, Binglu
    Guo, Chenxi
    Jin, Yang
    Xia, Haisheng
    Liu, Nian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10180 - 10188
  • [6] Object-level Proposals
    Ma, Jianxiang
    Ming, Anlong
    Huang, Zilong
    Wang, Xinggang
    Zhou, Yu
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4931 - 4939
  • [7] Modeling Programmer Attention as Scanpath Prediction
    Bansal, Aakash
    Su, Chia-Yi
    Karas, Zachary
    Zhang, Yifan
    Huang, Yu
    Li, Toby Jia-Jun
    McMillan, Collin
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1732 - 1736
  • [8] Object-level Scene Deocclusion
    Liu, Zhengzhe
    Liu, Qing
    Chang, Chirui
    Zhang, Jianming
    Pakhomov, Daniil
    Zheng, Haitian
    Lin, Zhe
    Cohen-Or, Daniel
    Fu, Chi-Wing
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [9] Multi-grained Attention with Object-level Grounding for Visual Question Answering
    Huang, Pingping
    Huang, Jianhui
    Guo, Yuqing
    Qiao, Min
    Zhu, Yong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3595 - 3600
  • [10] Object-level change detection with a dual correlation attention-guided detector
    Zhang, Lin
    Hu, Xiangyun
    Zhang, Mi
    Shu, Zhen
    Zhou, Hao
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 177 : 147 - 160