DeepVS: A Deep Learning Based Video Saliency Prediction Approach

被引:100
|
作者
Jiang, Lai [1 ]
Xu, Mai [1 ]
Liu, Tie [1 ]
Qiao, Minglang [1 ]
Wang, Zulin [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
来源
关键词
Saliency prediction; Convolutional LSTM; Eye-tracking database; DETECTION MODEL;
D O I
10.1007/978-3-030-01264-9_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel deep learning based video saliency prediction method, named DeepVS. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which includes 32 subjects' fixations on 538 videos. We find from LEDOV that human attention is more likely to be attracted by objects, particularly the moving objects or the moving parts of objects. Hence, an object-to-motion convolutional neural network (OM-CNN) is developed to predict the intra-frame saliency for DeepVS, which is composed of the objectness and motion subnets. In OM-CNN, cross-net mask and hierarchical feature normalization are proposed to combine the spatial features of the objectness subnet and the temporal features of the motion subnet. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. We thus propose saliency-structured convolutional long short-term memory (SS-ConvLSTM) network, using the extracted features from OM-CNN as the input. Consequently, the inter-frame saliency maps of a video can be generated, which consider both structured output with center-bias and cross-frame transitions of human attention maps. Finally, the experimental results show that DeepVS advances the state-of-the-art in video saliency prediction.
引用
收藏
页码:625 / 642
页数:18
相关论文
共 50 条
  • [1] Revisiting Video Saliency Prediction in the Deep Learning Era
    Wang, Wenguan
    Shen, Jianbing
    Xie, Jianwen
    Cheng, Ming-Ming
    Ling, Haibin
    Borji, Ali
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 220 - 237
  • [2] Deep Saliency Features for Video Saliency Prediction
    Azaza, Aymen
    Douik, Ali
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND ELECTRICAL TECHNOLOGIES (IC_ASET), 2017, : 335 - 339
  • [3] TRANSFER LEARNING WITH DEEP NETWORKS FOR SALIENCY PREDICTION IN NATURAL VIDEO
    Chaabouni, Souad
    Benois-Pineau, Jenny
    Ben Amari, Chokri
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1604 - 1608
  • [4] Visual Saliency Prediction Based on Deep Learning
    Ghariba, Bashir
    Shehata, Mohamed S.
    McGuire, Peter
    [J]. INFORMATION, 2019, 10 (08)
  • [5] DeepVS: A Deep Learning Approach For RF-based Vital Signs Sensing
    Xie, Zongxing
    Wang, Hanrui
    Han, Song
    Schoenfeld, Elinor
    Ye, Fan
    [J]. 13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [6] Prediction of visual saliency in video with deep CNNs
    Chaabouni, Souad
    Benois-Pineau, Jenny
    Hadar, Ofer
    [J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIX, 2016, 9971
  • [7] A Manifold Learning based Video Prediction approach for Deep Motion Transfer
    Cai, Yuliang
    Mohan, Sumit
    Niranjan, Adithya
    Jain, Nilesh
    Cloninger, Alex
    Das, Srinjoy
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 4214 - 4221
  • [8] Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction
    Tao, Yiran
    Hu, Yaosi
    Chen, Zhenzhong
    [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [9] DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention
    Jiang, Lai
    Xu, Mai
    Wang, Zulin
    Sigal, Leonid
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (01) : 203 - 224
  • [10] DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention
    Lai Jiang
    Mai Xu
    Zulin Wang
    Leonid Sigal
    [J]. International Journal of Computer Vision, 2021, 129 : 203 - 224