STI-Net: Spatiotemporal integration network for video saliency detection

被引:14
|
作者
Zhou, Xiaofei [1 ]
Cao, Weipeng [2 ]
Gao, Hanxiao [1 ]
Ming, Zhong [2 ]
Zhang, Jiyong [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatiotemporal saliency; Feature aggregation; Saliency prediction; Saliency fusion; OBJECT DETECTION; FUSION; SEGMENTATION; ATTENTION; FEATURES;
D O I
10.1016/j.ins.2023.01.106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image saliency detection, to which much effort has been devoted in recent years, has advanced significantly. In contrast, the community has paid little attention to video saliency detection. Especially, existing video saliency models are very likely to fail in videos with difficult scenarios such as fast motion, dynamic background, and nonrigid deformation. Furthermore, performing video saliency detection directly using image saliency models that ignore video temporal information is inappropriate. To alleviate this issue, this study proposes a novel end-to-end spatiotemporal integration network (STI-Net) for detecting salient objects in videos. Specifically, our method is made up of three key steps: feature aggregation, saliency prediction, and saliency fusion, which are used sequentially to generate spatiotemporal deep feature maps, coarse saliency predictions, and the final saliency map. The key advantage of our model lies in the comprehensive exploration of spatial and temporal information across the entire network, where the two features interact with each other in the feature aggregation step, are used to construct boundary cue in the saliency prediction step, and also serve as the original information in the saliency fusion step. As a result, the generated spatiotemporal deep feature maps can precisely and completely characterize the salient objects, and the coarse saliency predictions have well-defined boundaries, effectively improving the final saliency map's quality. Furthermore, "shortcut connections" are introduced into our model to make the proposed network easy to train and obtain accurate results when the network is deep. Extensive experimental results on two publicly available challenging video datasets demonstrate the effectiveness of the proposed model, which achieves comparable performance to state-of-the-art saliency models.
引用
收藏
页码:134 / 147
页数:14
相关论文
共 50 条
  • [1] Triplet Spatiotemporal Aggregation Network for Video Saliency Detection
    Tan, Zhenshan
    Chen, Cheng
    Gu, Xiaodong
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2393 - 2398
  • [2] A spatiotemporal model for video saliency detection
    Kalboussi, Rahma
    Abdellaoui, Mehrez
    Douik, Ali
    2016 SECOND INTERNATIONAL IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2016,
  • [3] Video Saliency Detection Using Spatiotemporal Cues
    Chen, Yu
    Xiao, Jing
    Hu, Liuyi
    Chen, Dan
    Wang, Zhongyuan
    Li, Dengshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2201 - 2208
  • [4] End-to-End Video Saliency Detection via a Deep Contextual Spatiotemporal Network
    Wei, Lina
    Zhao, Shanshan
    Bourahla, Omar Farouk
    Li, Xi
    Wu, Fei
    Zhuang, Yueting
    Han, Junwei
    Xu, Mingliang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (04) : 1691 - 1702
  • [5] Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection
    Tang, Yi
    Zou, Wenbin
    Jin, Zhi
    Li, Xia
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 362 - 369
  • [6] VIDEO SALIENCY DETECTION BASED ON SPATIOTEMPORAL FEATURE LEARNING
    Lee, Se-Ho
    Kim, Jin-Hwan
    Choi, Kwang Pyo
    Sim, Jae-Young
    Kim, Chang-Su
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1120 - 1124
  • [7] SPATIOTEMPORAL UTILIZATION OF DEEP FEATURES FOR VIDEO SALIENCY DETECTION
    Le, Trung-Nghia
    Sugimoto, Akihiro
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [8] A New Method for Spatiotemporal Textual Saliency Detection in Video
    Shan, Susu
    Xu, Hailiang
    Su, Feng
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3240 - 3245
  • [9] Spatiotemporal Saliency Detection based Video Quality Assessment
    Jia, Changcheng
    Lu, Wen
    He, Lihuo
    He, Ran
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 340 - 343
  • [10] DS-Net: Dynamic spatiotemporal network for video salient object detection
    Liu, Jing
    Wang, Jiaxiang
    Wang, Weikang
    Su, Yuting
    DIGITAL SIGNAL PROCESSING, 2022, 130