DeepVS: A Deep Learning Based Video Saliency Prediction Approach

被引：100

作者：

Jiang, Lai ^{[1
]}

Xu, Mai ^{[1
]}

Liu, Tie ^{[1
]}

Qiao, Minglang ^{[1
]}

Wang, Zulin ^{[1
]}

机构：

[1] Beihang Univ, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷

关键词：

Saliency prediction; Convolutional LSTM; Eye-tracking database; DETECTION MODEL;

D O I：

10.1007/978-3-030-01264-9_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel deep learning based video saliency prediction method, named DeepVS. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which includes 32 subjects' fixations on 538 videos. We find from LEDOV that human attention is more likely to be attracted by objects, particularly the moving objects or the moving parts of objects. Hence, an object-to-motion convolutional neural network (OM-CNN) is developed to predict the intra-frame saliency for DeepVS, which is composed of the objectness and motion subnets. In OM-CNN, cross-net mask and hierarchical feature normalization are proposed to combine the spatial features of the objectness subnet and the temporal features of the motion subnet. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. We thus propose saliency-structured convolutional long short-term memory (SS-ConvLSTM) network, using the extracted features from OM-CNN as the input. Consequently, the inter-frame saliency maps of a video can be generated, which consider both structured output with center-bias and cross-frame transitions of human attention maps. Finally, the experimental results show that DeepVS advances the state-of-the-art in video saliency prediction.

引用

页码：625 / 642

页数：18

共 50 条

[21] Deep Learning based Prediction Model for Adaptive Video Streaming
Lekharu, Anirban
Moulii, K. Y.
Sur, Arijit
Sarkar, Arnab
[J]. 2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
[22] DEEP LEARNING FOR MULTIMODAL-BASED VIDEO INTERESTINGNESS PREDICTION
Shen, Yuesong
Demarty, Claire-Helene
Duong, Ngoc Q. K.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1003 - 1008
[23] ChaboNet : Design of a deep CNN for prediction of visual saliency in natural video
Chaabouni, Souad
Benois-Pineau, Jenny
Ben Amar, Chokri
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 60 : 79 - 93
[24] DEEP REINFORCEMENT LEARNING FOR VIDEO PREDICTION
Ho, Yung-Han
Cho, Chuan-Yuan
Peng, Wen-Hsiao
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 604 - 608
[25] VIDEO SALIENCY DETECTION BASED ON SPATIOTEMPORAL FEATURE LEARNING
Lee, Se-Ho
Kim, Jin-Hwan
Choi, Kwang Pyo
Sim, Jae-Young
Kim, Chang-Su
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1120 - 1124
[26] Learning saliency estimation in the video based on the subjective experiments
Nakagawa, S.
Tsumura, N.
Nakaguchi, T.
Miyake, Y.
[J]. IDW '06: PROCEEDINGS OF THE 13TH INTERNATIONAL DISPLAY WORKSHOPS, VOLS 1-3, 2006, : 1413 - +
[27] Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
G. Bellitto
F. Proietto Salanitri
S. Palazzo
F. Rundo
D. Giordano
C. Spampinato
[J]. International Journal of Computer Vision, 2021, 129 : 3216 - 3232
[28] Tourism Growth Prediction Based on Deep Learning Approach
Ren, Xiaoling
Li, Yanyan
Zhao, JuanJuan
Qiang, Yan
[J]. COMPLEXITY, 2021, 2021
[29] Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
Bellitto, G.
Proietto Salanitri, F.
Palazzo, S.
Rundo, F.
Giordano, D.
Spampinato, C.
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (12) : 3216 - 3232
[30] An Approach to Video Compression Using Saliency Based Foveation
Polakovic, Adam
Vargic, Radoslav
Rozinaj, Gregor
Muntean, Gabriel-Miro
[J]. PROCEEDINGS OF ELMAR-2018: 60TH INTERNATIONAL SYMPOSIUM ELMAR-2018, 2018, : 169 - 172

← 1 2 3 4 5 →