Learning to Predict Sequences of Human Visual Fixations

被引:41
|
作者
Jiang, Ming [1 ]
Boix, Xavier [1 ,2 ,3 ]
Roig, Gemma [2 ,3 ]
Xu, Juan [1 ]
Van Gool, Luc [2 ]
Zhao, Qi [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[2] ETH, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] MIT, Ist Italiano & Tecnol, Ctr Brains Minds & Machines, Lab Computat & Stat Learning, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
Scanpath prediction; visual saliency prediction; SALIENCY DETECTION; EYE-MOVEMENTS; ATTENTION; FRAMEWORK; SCENE; VIDEO;
D O I
10.1109/TNNLS.2015.2496306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.
引用
收藏
页码:1241 / 1252
页数:12
相关论文
共 50 条
  • [41] Enhanced learning of natural visual sequences in newborn chicks
    Wood, Justin N.
    Prasad, Aditya
    Goldman, Jason G.
    Wood, Samantha M. W.
    [J]. ANIMAL COGNITION, 2016, 19 (04) : 835 - 845
  • [42] Enhanced learning of natural visual sequences in newborn chicks
    Justin N. Wood
    Aditya Prasad
    Jason G. Goldman
    Samantha M. W. Wood
    [J]. Animal Cognition, 2016, 19 : 835 - 845
  • [43] Visual statistical learning of shape sequences: An ERP study
    Abla, Dilshat
    Okanoya, Kazuo
    [J]. NEUROSCIENCE RESEARCH, 2009, 64 (02) : 185 - 190
  • [44] Tones Disrupt Visual Fixations and Responding on a Visual-Spatial Task
    Laughery, Dylan
    Pesina, Noah
    Robinson, Christopher W.
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2020, 46 (11) : 1301 - 1312
  • [45] Learning to recognize human action sequences
    Yu, C
    Ballard, DH
    [J]. 2ND INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, PROCEEDINGS, 2002, : 28 - 33
  • [46] Learning to predict future locations with internally generated theta sequences
    Parra-Barrero, Eloy J.
    Cheng, Sen
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (05)
  • [47] LEARNING TO REMEMBER PAST TO PREDICT FUTURE FOR VISUAL TRACKING
    Baik, Sungyong
    Kwon, Junseok
    Lee, Kyoung Mu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3068 - 3072
  • [48] IS VISUAL INFORMATION INTEGRATED ACROSS SUCCESSIVE FIXATIONS IN READING
    MCCONKIE, GW
    ZOLA, D
    [J]. PERCEPTION & PSYCHOPHYSICS, 1979, 25 (03): : 221 - 224
  • [49] CLOSURE AS AFFECTED BY VIEWING TIME AND MULTIPLE VISUAL FIXATIONS
    MOONEY, CM
    [J]. CANADIAN JOURNAL OF PSYCHOLOGY, 1957, 11 (01): : 21 - 28
  • [50] Visual fixations and the computation and comparison of value in simple choice
    Ian Krajbich
    Carrie Armel
    Antonio Rangel
    [J]. Nature Neuroscience, 2010, 13 : 1292 - 1298