See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

被引：418

作者：

Lu, Xiankai ^{[1
]}

Wang, Wenguan ^{[1
]}

Ma, Chao ^{[2
]}

Shen, Jianbing ^{[1
]}

Shao, Ling ^{[1
]}

Porikli, Fatih ^{[3
]}

机构：

[1] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[3] Australian Natl Univ, Canberra, ACT, Australia

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1109/CVPR.2019.00374

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.

引用

页码：3618 / 3627

页数：10

共 35 条

[31] Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks
He, Zhen
He, Hangen
SYMMETRY-BASEL, 2018, 10 (09):
[32] Unsupervised Video Object Foreground Segmentation and Co-Localization by Combining Motion Boundaries and Actual Frame Edges
Zhang, Chao
Qiu, Guoping
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2018, 9 (04): : 21 - 39
[33] Mutual Information-Based Graph Co-Attention Networks for Multimodal Prior-Guided Magnetic Resonance Imaging Segmentation
Mo, Shaocong
Cai, Ming
Lin, Lanfen
Tong, Ruofeng
Chen, Qingqing
Wang, Fang
Hu, Hongjie
Iwamoto, Yutaro
Han, Xian-Hua
Chen, Yen-Wei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2512 - 2526
[34] Co-segmentation Inspired Attention Networks for Video-based Person Re-identification
Subramaniam, Arulkumar
Nambiar, Athira
Mittal, Anurag
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 562 - 572
[35] DIMF-Nets: depth-informed cross-modal fusion in three-stream networks for enhanced unsupervised video object segmentation
Zhang, Yunzuo
Yu, Puze
Liu, Ting
Wang, Shuangshuang
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (05)

← 1 2 3 4 →