SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION

被引:0
|
作者
Chen, Zhu [1 ]
Li, Weihai [1 ]
Fei, Chi [1 ]
Liu, Bin [1 ]
Yu, Nenghai [1 ]
机构
[1] Univ Sci & Technol China, Chinese Acad Sci, Sch Informat Sci & Technol, Key Lab Electromagnet Space Informat, Hefei, Anhui, Peoples R China
关键词
Video Object Detection; Feature Aggregation; Pixel-Level; Instance-Level;
D O I
10.1109/icassp40776.2020.9054080
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Video object detection is a challenging problem in computer vision. In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. Specifically, we present a novel instance-level feature aggregation module as complementary to traditional pixel-level feature aggregation, in which we build a new movement estimation module to learn instance movements across frames. Then the Graph Convolutional Networks (GCNs) is applied to obtain temporal relation among instances over frames to implement instance-level feature aggregation. At last, we combine pixel-level and instance-level features by learnable soft weights to make use of their complementary information. Our framework is simple to implement and enables end-to-end training, which achieves state-of-art performance on the ImageNet VID dataset by extensive experiments.
引用
收藏
页码:1858 / 1862
页数:5
相关论文
共 50 条
  • [1] Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
    Xu, Chao
    Zhang, Jiangning
    Wang, Mengmeng
    Tian, Guanzhong
    Liu, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7809 - 7820
  • [2] Patchwise Temporal-Spatial Feature Aggregation Network for Object Detection in Satellite Video
    Zheng, Shangdong
    Wu, Zebin
    Xu, Yang
    Liu, Pengfei
    Zheng, Peng
    Wei, Zhihui
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [3] Deep Spatial-Temporal Joint Feature Representation for Video Object Detection
    Zhao, Baojun
    Zhao, Boya
    Tang, Linbo
    Han, Yuqi
    Wang, Wenzheng
    [J]. SENSORS, 2018, 18 (03)
  • [4] Video Object Detection with an Aligned Spatial-Temporal Memory
    Xiao, Fanyi
    Lee, Yong Jae
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
  • [5] STA-Net: spatial-temporal attention network for video salient object detection
    Bi, Hong-Bo
    Lu, Di
    Zhu, Hui-Hui
    Yang, Li-Na
    Guan, Hua-Ping
    [J]. APPLIED INTELLIGENCE, 2021, 51 (06) : 3450 - 3459
  • [6] STA-Net: spatial-temporal attention network for video salient object detection
    Hong-Bo Bi
    Di Lu
    Hui-Hui Zhu
    Li-Na Yang
    Hua-Ping Guan
    [J]. Applied Intelligence, 2021, 51 : 3450 - 3459
  • [7] Temporal Context Enhanced Feature Aggregation for Video Object Detection
    He, Fei
    Gao, Naiyu
    Li, Qiaozhe
    Du, Senyao
    Zhao, Xin
    Huang, Kaiqi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948
  • [8] Slow Video Detection Based on Spatial-Temporal Feature Representation
    Ma, Jianyu
    Yao, Haichao
    Ni, Rongrong
    Zhao, Yao
    [J]. PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 298 - 309
  • [9] Self-supervised spatial-temporal feature enhancement for one-shot video object detection
    Yao, Xudong
    Yang, Xiaoshan
    [J]. NEUROCOMPUTING, 2024, 601
  • [10] Temporal-adaptive sparse feature aggregation for video object detection
    He, Fei
    Li, Qiaozhe
    Zhao, Xin
    Huang, Kaiqi
    [J]. PATTERN RECOGNITION, 2022, 127