SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

被引:0
|
作者
Zhong, Wenqi [1 ]
Yu, Linzhi [1 ]
Xia, Chen [1 ]
Han, Junwei [1 ]
Zhang, Dingwen [1 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
VISUAL WORKING-MEMORY; EYE-MOVEMENTS; PREDICTION; TASK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains. Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information. However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration). In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems. First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths. Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath. Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing. We conduct extensive experiments on four databases under three tasks. The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications. The code can be obtained from https://github.com/wenqizhong/SpFormer.
引用
收藏
页码:7605 / 7613
页数:9
相关论文
共 50 条
  • [21] Modeling spatio-temporal field evolution
    A. Borštnik Bračič
    I. Grabec
    E. Govekar
    [J]. The European Physical Journal B, 2009, 69 : 529 - 538
  • [22] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
    Sun, Xiaohu
    Chen, Jinyi
    Shen, Xulin
    Li, Hongjun
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
  • [23] An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
    Weng, Yuetian
    Pan, Zizheng
    Han, Mingfei
    Chang, Xiaojun
    Zhuang, Bohan
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 358 - 375
  • [24] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [25] Transformer RGBT Tracking With Spatio-Temporal Multimodal Tokens
    Sun, Dengdi
    Pan, Yajie
    Lu, Andong
    Li, Chenglong
    Luo, Bin
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (11) : 12059 - 12072
  • [26] Shifted Chunk Transformer for Spatio-Temporal Representational Learning
    Zha, Xuefan
    Zhu, Wentao
    Lv, Tingxun
    Yang, Sen
    Liu, Ji
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [27] Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
    Fan, Hehe
    Yang, Yi
    Kankanhalli, Mohan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14199 - 14208
  • [28] Spatio-temporal modeling of residential sales data
    Gelfand, AE
    Ghosh, SK
    Knight, JR
    Sirmans, CF
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1998, 16 (03) : 312 - 321
  • [29] Spatio-temporal change of support modeling with R
    Raim, Andrew M.
    Holan, Scott H.
    Bradley, Jonathan R.
    Wikle, Christopher K.
    [J]. COMPUTATIONAL STATISTICS, 2021, 36 (01) : 749 - 780
  • [30] Modeling spatio-temporal constraints for multimedia objects
    Kwon, YM
    Ferrari, E
    Bertino, E
    [J]. DATA & KNOWLEDGE ENGINEERING, 1999, 30 (03) : 217 - 238