See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

被引:418
|
作者
Lu, Xiankai [1 ]
Wang, Wenguan [1 ]
Ma, Chao [2 ]
Shen, Jianbing [1 ]
Shao, Ling [1 ]
Porikli, Fatih [3 ]
机构
[1] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[3] Australian Natl Univ, Canberra, ACT, Australia
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
基金
澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR.2019.00374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.
引用
收藏
页码:3618 / 3627
页数:10
相关论文
共 35 条
  • [31] Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks
    He, Zhen
    He, Hangen
    SYMMETRY-BASEL, 2018, 10 (09):
  • [32] Unsupervised Video Object Foreground Segmentation and Co-Localization by Combining Motion Boundaries and Actual Frame Edges
    Zhang, Chao
    Qiu, Guoping
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2018, 9 (04): : 21 - 39
  • [33] Mutual Information-Based Graph Co-Attention Networks for Multimodal Prior-Guided Magnetic Resonance Imaging Segmentation
    Mo, Shaocong
    Cai, Ming
    Lin, Lanfen
    Tong, Ruofeng
    Chen, Qingqing
    Wang, Fang
    Hu, Hongjie
    Iwamoto, Yutaro
    Han, Xian-Hua
    Chen, Yen-Wei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2512 - 2526
  • [34] Co-segmentation Inspired Attention Networks for Video-based Person Re-identification
    Subramaniam, Arulkumar
    Nambiar, Athira
    Mittal, Anurag
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 562 - 572
  • [35] DIMF-Nets: depth-informed cross-modal fusion in three-stream networks for enhanced unsupervised video object segmentation
    Zhang, Yunzuo
    Yu, Puze
    Liu, Ting
    Wang, Shuangshuang
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (05)