STI-Net: Spatiotemporal integration network for video saliency detection

被引:14
|
作者
Zhou, Xiaofei [1 ]
Cao, Weipeng [2 ]
Gao, Hanxiao [1 ]
Ming, Zhong [2 ]
Zhang, Jiyong [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatiotemporal saliency; Feature aggregation; Saliency prediction; Saliency fusion; OBJECT DETECTION; FUSION; SEGMENTATION; ATTENTION; FEATURES;
D O I
10.1016/j.ins.2023.01.106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image saliency detection, to which much effort has been devoted in recent years, has advanced significantly. In contrast, the community has paid little attention to video saliency detection. Especially, existing video saliency models are very likely to fail in videos with difficult scenarios such as fast motion, dynamic background, and nonrigid deformation. Furthermore, performing video saliency detection directly using image saliency models that ignore video temporal information is inappropriate. To alleviate this issue, this study proposes a novel end-to-end spatiotemporal integration network (STI-Net) for detecting salient objects in videos. Specifically, our method is made up of three key steps: feature aggregation, saliency prediction, and saliency fusion, which are used sequentially to generate spatiotemporal deep feature maps, coarse saliency predictions, and the final saliency map. The key advantage of our model lies in the comprehensive exploration of spatial and temporal information across the entire network, where the two features interact with each other in the feature aggregation step, are used to construct boundary cue in the saliency prediction step, and also serve as the original information in the saliency fusion step. As a result, the generated spatiotemporal deep feature maps can precisely and completely characterize the salient objects, and the coarse saliency predictions have well-defined boundaries, effectively improving the final saliency map's quality. Furthermore, "shortcut connections" are introduced into our model to make the proposed network easy to train and obtain accurate results when the network is deep. Extensive experimental results on two publicly available challenging video datasets demonstrate the effectiveness of the proposed model, which achieves comparable performance to state-of-the-art saliency models.
引用
收藏
页码:134 / 147
页数:14
相关论文
共 50 条
  • [31] Video smoke detection based on deep saliency network
    Xu, Gao
    Zhang, Yongming
    Zhang, Qixing
    Lin, Gaohua
    Wang, Zhong
    Jia, Yang
    Wang, Jinjun
    FIRE SAFETY JOURNAL, 2019, 105 : 277 - 285
  • [32] Multi-Scale Spatiotemporal Feature Fusion Network for Video Saliency Prediction
    Zhang, Yunzuo
    Zhang, Tian
    Wu, Cunyu
    Tao, Ran
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4183 - 4193
  • [33] TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
    Min, Kyle
    Corso, Jason J.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2394 - 2403
  • [34] Video saliency detection via bagging-based prediction and spatiotemporal propagation
    Zhou, Xiaofei
    Liu, Zhi
    Li, Kai
    Sun, Guangling
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 51 : 131 - 143
  • [35] Video Saliency Detection via Graph Clustering With Motion Energy and Spatiotemporal Objectness
    Xu, Mingzhu
    Liu, Bing
    Fu, Ping
    Li, Junbao
    Hu, Yu Hen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2790 - 2805
  • [36] Compressed domain video saliency detection using global and local spatiotemporal features
    Lee, Se-Ho
    Kang, Je-Won
    Kim, Chang-Su
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 35 : 169 - 183
  • [37] Spatiotemporal Saliency Detection Based on Maximum Consistency Superpixels Merging for Video Analysis
    Zhang, Jianhua
    Chen, Jingbo
    Wang, Qichao
    Chen, Shengyong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 606 - 614
  • [38] Stereoscopic video saliency detection based on spatiotemporal correlation and depth confidence optimization
    Zhang, Ping
    Liu, Jingwen
    Wang, Xiaoyang
    Pu, Tian
    Fei, Chun
    Guo, Zhengkui
    NEUROCOMPUTING, 2020, 377 : 256 - 268
  • [39] A local spatiotemporal optimization framework for video saliency detection using region covariance
    Tian C.
    Jiang Q.
    Wu Z.
    Liu T.
    Hu L.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2016, 38 (07): : 1586 - 1593
  • [40] Novelty-based Spatiotemporal Saliency Detection for Prediction of Gaze in Egocentric Video
    Polatsek, Patrik
    Benesova, Wanda
    Paletta, Lucas
    Perko, Roland
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (03) : 394 - 398