See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

被引：418

作者：

Lu, Xiankai ^{[1
]}

Wang, Wenguan ^{[1
]}

Ma, Chao ^{[2
]}

Shen, Jianbing ^{[1
]}

Shao, Ling ^{[1
]}

Porikli, Fatih ^{[3
]}

机构：

[1] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[3] Australian Natl Univ, Canberra, ACT, Australia

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1109/CVPR.2019.00374

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.

引用

页码：3618 / 3627

页数：10

共 35 条

[21] Adaptable neural networks for unsupervised video object segmentation of stereoscopic sequences
Doulamis, AD
Ntalianis, KS
Doulamis, ND
Kollias, SD
ARTIFICIAL NEURAL NETWORKS-ICANN 2001, PROCEEDINGS, 2001, 2130 : 1060 - 1066
[22] Video object segmentation via attention-modulating networks
Tang, Runfa
Song, Huihui
Zhang, Kaihua
Jiang, Sihao
ELECTRONICS LETTERS, 2019, 55 (08) : 455 - 456
[23] Video Object Segmentation Using Multi-Scale Attention-Based Siamese Network
Zhu, Zhiliang
Qiu, Leiningxin
Wang, Jiaxin
Xiong, Jinquan
Peng, Hua
ELECTRONICS, 2023, 12 (13)
[24] Saliency-based dual-attention network for unsupervised video object segmentation
Zhang, Guifang
Wong, Hon-Cheng
JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4996 - 5010
[25] Saliency-based dual-attention network for unsupervised video object segmentation
Guifang Zhang
Hon-Cheng Wong
The Journal of Supercomputing, 2024, 80 (4) : 4996 - 5010
[26] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
Li, Ping
Zhang, Yu
Yuan, Li
Xiao, Huaxin
Lin, Binbin
Xu, Xianghua
PATTERN RECOGNITION, 2024, 146
[27] Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings
Siam, Mennatullah
Doraiswamy, Naren
Oreshkin, Boris N.
Yao, Hengshuai
Jagersand, Martin
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 860 - 867
[28] See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data
Lu, Yuhang
Jiang, Qi
Chen, Runnan
Hou, Yuenan
Zhu, Xinge
Ma, Yuexin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21617 - 21627
[29] Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation
Zhu, Hongliang
Yin, Hui
Liu, Yanting
Chen, Ning
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (04): : 938 - 958
[30] Unsupervised Point Cloud Object Co-segmentation by Co-contrastive Learning and Mutual Attention Sampling
Yang, Cheng-Kun
Chuang, Yung-Yu
Lin, Yen-Yu
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7315 - 7324

← 1 2 3 4 →