Learning to Predict Sequences of Human Visual Fixations

被引:41
|
作者
Jiang, Ming [1 ]
Boix, Xavier [1 ,2 ,3 ]
Roig, Gemma [2 ,3 ]
Xu, Juan [1 ]
Van Gool, Luc [2 ]
Zhao, Qi [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[2] ETH, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] MIT, Ist Italiano & Tecnol, Ctr Brains Minds & Machines, Lab Computat & Stat Learning, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
Scanpath prediction; visual saliency prediction; SALIENCY DETECTION; EYE-MOVEMENTS; ATTENTION; FRAMEWORK; SCENE; VIDEO;
D O I
10.1109/TNNLS.2015.2496306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.
引用
收藏
页码:1241 / 1252
页数:12
相关论文
共 50 条
  • [1] Learning visual saliency from human fixations for stereoscopic images
    Fang, Yuming
    Lei, Jianjun
    Li, Jia
    Xu, Long
    Lin, Weisi
    Le Callet, Patrick
    [J]. NEUROCOMPUTING, 2017, 266 : 284 - 292
  • [2] Two-Stage Learning to Predict Human Eye Fixations via SDAEs
    Han, Junwei
    Zhang, Dingwen
    Wen, Shifeng
    Guo, Lei
    Liu, Tianming
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 487 - 498
  • [3] AN ITERATIVE REPRESENTATION LEARNING FRAMEWORK TO PREDICT THE SEQUENCE OF EYE FIXATIONS
    Xia, Chen
    Qi, Fei
    Shi, Guangming
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1530 - 1535
  • [4] CONCURRENT SCHEDULE CONTROL OF HUMAN VISUAL TARGET FIXATIONS
    ROSENBERGER, PB
    [J]. JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1973, 20 (03) : 411 - 416
  • [5] Predicting Visual Fixations
    Kuemmerer, Matthias
    Bethge, Matthias
    [J]. ANNUAL REVIEW OF VISION SCIENCE, 2023, 9 : 269 - 291
  • [6] Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks
    Liu, Nian
    Han, Junwei
    Liu, Tianming
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (02) : 392 - 404
  • [7] VISUAL-PERCEPTION AND SEQUENCES OF EYE-MOVEMENT FIXATIONS - A STOCHASTIC MODELING APPROACH
    HACISALIHZADE, SS
    STARK, LW
    ALLEN, JS
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1992, 22 (03): : 474 - 481
  • [8] Visual attention while driving: sequences of eye fixations made by experienced and novice drivers
    Underwood, G
    Chapman, P
    Brocklehurst, N
    Underwood, J
    Crundall, D
    [J]. ERGONOMICS, 2003, 46 (06) : 629 - 646
  • [9] A Low-complexity Wavelet-based Visual Saliency Model to Predict Fixations
    Narayanaswamy, Manjula
    Zhao, Yafan
    Fung, Wai Keung
    Fough, Nazila
    [J]. 2020 27TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2020,
  • [10] FVDP - A visual difference model of human fixations in natural scene search
    Asher, M. F.
    Gilchrist, I. D.
    Tolhurst, D. J.
    [J]. PERCEPTION, 2014, 43 (01) : 161 - 162