End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks

被引:8
|
作者
Wang, Tao [1 ]
Xu, Ning [2 ]
Chen, Kean [1 ]
Lin, Weiyao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Adobe Res, San Jose, CA USA
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain. Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step, which limit their capability to fully leverage and share useful spatial-temporal information for all the subproblems. In this paper, we propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation. Specifically, graph nodes representing instance features are used for detection and segmentation while graph edges representing instance relations are used for tracking. Both inter and intra-frame information is effectively propagated and shared via graph updates and all the subproblems (i.e. detection, segmentation and tracking) are jointly optimized in an unified framework. The performance of our method shows great improvement on the YoutubeVIS validation dataset compared to existing methods and achieves 36.5% AP with a ResNet-50 backbone, operating at 22 FPS.
引用
收藏
页码:10777 / 10786
页数:10
相关论文
共 50 条
  • [1] End-to-End Video Instance Segmentation with Transformers
    Wang, Yuqing
    Xu, Zhaoliang
    Wang, Xinlong
    Shen, Chunhua
    Cheng, Baoshan
    Shen, Hao
    Xia, Huaxia
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
  • [2] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [3] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [4] LEARNING-BASED END-TO-END VIDEO COMPRESSION WITH SPATIAL-TEMPORAL ADAPTATION
    Zhang, Zhaobin
    Li, Yue
    Zhang, Kai
    Zhang, Li
    He, Yuwen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2821 - 2825
  • [5] Building an End-to-End Spatial-Temporal Convolutional Network for Video Super-Resolution
    Guo, Jun
    Chao, Hongyang
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4053 - 4060
  • [6] Spatial-Temporal Graph Boosting Networks: Enhancing Spatial-Temporal Graph Neural Networks via Gradient Boosting
    Fan, Yujie
    Yeh, Chin-Chia Michael
    Chen, Huiyuan
    Zheng, Yan
    Wang, Liang
    Wang, Junpeng
    Dai, Xin
    Zhuang, Zhongfang
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 504 - 513
  • [7] End-to-End Video Gaze Estimation via Capturing Head-Face-Eye Spatial-Temporal Interaction Context
    Guan, Yiran
    Chen, Zhuoguang
    Zeng, Wenzheng
    Cao, Zhiguo
    Xiao, Yang
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1687 - 1691
  • [8] Study and Generalization on an End-to-End Spatial-temporal Driving Model
    Yao, Tingting
    Chen, Xin
    Yuan, Sheng
    Wang, Huaying
    Guo, Lili
    Tian, Bin
    Ai, Yunfeng
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4755 - 4760
  • [9] Spatial-temporal transformer for end-to-end sign language recognition
    Cui, Zhenchao
    Zhang, Wenbo
    Li, Zhaoxin
    Wang, Zhaoqi
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 4645 - 4656
  • [10] Leukocyte Segmentation via End-to-End Learning of Deep Convolutional Neural Networks
    Lu, Yan
    Fan, Haoyi
    Li, Zuoyong
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 191 - 200