Motion-Aware Memory Network for Fast Video Salient Object Detection

被引:3
|
作者
Zhao, Xing [1 ]
Liang, Haoran [1 ]
Li, Peipei [2 ]
Sun, Guodao [1 ]
Zhao, Dongdong [1 ]
Liang, Ronghua [1 ]
He, Xiaofei [1 ]
机构
[1] Zhejiang Univ Technol, Coll Comp Sci & Technol, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ Technol, Coll Mech Engn, Hangzhou 310023, Peoples R China
关键词
Video salient object detection; salient object detection; memory network; feature fusion; OPTIMIZATION;
D O I
10.1109/TIP.2023.3348659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous methods based on 3DCNN, convLSTM, or optical flow have achieved great success in video salient object detection (VSOD). However, these methods still suffer from high computational costs or poor quality of the generated saliency maps. To address this, we design a space-time memory (STM)-based network that employs a standard encoder-decoder architecture. During the encoding stage, we extract high-level temporal features from the current frame and its adjacent frames, which is more efficient and practical than methods reliant on optical flow. During the decoding stage, we introduce an effective fusion strategy for both spatial and temporal branches. The semantic information of the high-level features is used to improve the object details in the low-level features. Subsequently, spatiotemporal features are methodically derived step by step to reconstruct the saliency maps. Moreover, inspired by the boundary supervision prevalent in image salient object detection (ISOD), we design a motion-aware loss that predicts object boundary motion, and simultaneously perform multitask learning for VSOD and object motion prediction. This can further enhance the model's capability to accurately extract spatiotemporal features while maintaining object integrity. Extensive experiments on several datasets demonstrate the effectiveness of our method and can achieve state-of-the-art metrics on some datasets. Our proposed model does not require optical flow or additional preprocessing, and can reach an impressive inference speed of nearly 100 FPS.
引用
收藏
页码:709 / 721
页数:13
相关论文
共 50 条
  • [1] Fully Motion-Aware Network for Video Object Detection
    Wang, Shiyao
    Zhou, Yucong
    Yan, Junjie
    Deng, Zhidong
    COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 557 - 573
  • [2] Motion-Aware Deep Video Coding Network
    Khan, Rida
    Liu, Ying
    BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395
  • [3] Motion-Aware Rapid Video Saliency Detection
    Guo, Fang
    Wang, Wenguan
    Shen, Ziyi
    Shen, Jianbing
    Shao, Ling
    Tao, Dacheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4887 - 4898
  • [4] Spatiotemporal context-aware network for video salient object detection
    Tianyou Chen
    Jin Xiao
    Xiaoguang Hu
    Guofeng Zhang
    Shaojie Wang
    Neural Computing and Applications, 2022, 34 : 16861 - 16877
  • [5] Motion-Aware Feature Enhancement Network for Video Prediction
    Lin, Xue
    Zou, Qi
    Xu, Xixia
    Huang, Yaping
    Tian, Yi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) : 688 - 700
  • [6] Spatiotemporal context-aware network for video salient object detection
    Chen, Tianyou
    Xiao, Jin
    Hu, Xiaoguang
    Zhang, Guofeng
    Wang, Shaojie
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (19): : 16861 - 16877
  • [7] Manet: motion-aware network for video action recognition
    Li, Xiaoyang
    Yang, Wenzhu
    Wang, Kanglin
    Wang, Tiebiao
    Zhang, Chen
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (03)
  • [8] Video Salient Object Detection Network with Bidirectional Memory and Spatiotemporal Constraints
    Wang, Hongyu
    Mu, Nan
    Zhang, Yu
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2781 - 2786
  • [9] Motion-Aware Video Frame Interpolation
    Han, Pengfei
    Zhang, Fuhua
    Zhao, Bin
    Li, Xuelong
    NEURAL NETWORKS, 2024, 178
  • [10] Motion-Aware Video Quality Assessment
    Arvanitidou, Marina Georgia
    Sikora, Thomas
    2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, : 2042 - 2045