See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

被引：418

作者：

Lu, Xiankai ^{[1
]}

Wang, Wenguan ^{[1
]}

Ma, Chao ^{[2
]}

Shen, Jianbing ^{[1
]}

Shao, Ling ^{[1
]}

Porikli, Fatih ^{[3
]}

机构：

[1] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[3] Australian Natl Univ, Canberra, ACT, Australia

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1109/CVPR.2019.00374

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.

引用

页码：3618 / 3627

页数：10

共 35 条

[1] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
Lu, Xiankai
Wang, Wenguan
Shen, Jianbing
Crandall, David
Luo, Jiebo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242
[2] Co-attention CNNs for Unsupervised Object Co-segmentation
Hsu, Kuang-Jui
Lin, Yen-Yu
Chuang, Yung-Yu
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 748 - 756
[3] COMatchNet: Co-Attention Matching Network for Video Object Segmentation
Huang, Lufei
Sun, Fengming
Yuan, Xia
PATTERN RECOGNITION, ACPR 2021, PT I, 2022, 13188 : 271 - 284
[4] Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Pei, Gensheng
Yao, Yazhou
Shen, Fumin
Huang, Dan
Huang, Xingguo
Shen, Heng-Tao
arXiv, 2023,
[5] Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
Pei, Gensheng
Yao, Yazhou
Shen, Fumin
Huang, Dan
Huang, Xingguo
Shen, Heng-Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2348 - 2359
[6] Fast Video Object Segmentation Based on Siamese Networks
Fu L.-H.
Zhao Y.
Sun X.-W.
Lu Z.-S.
Wang D.
Yang H.-X.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2020, 48 (04): : 625 - 630
[7] Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation
Yang, Shu
Zhang, Lu
Qi, Jinqing
Lu, Huchuan
Wang, Shuo
Zhang, Xiaoxing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1544 - 1553
[8] Guided Slot Attention for Unsupervised Video Object Segmentation
Lee, Minhyeok
Cho, Suhwan
Lee, Dogyoon
Park, Chaewon
Lee, Jungho
Lee, Sangyoun
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3807 - 3816
[9] Asymmetric Attention Fusion for Unsupervised Video Object Segmentation
Jiang, Hongfan
Wu, Xiaojun
Xu, Tianyang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 170 - 182
[10] Joint Attention Mechanism for Unsupervised Video Object Segmentation
Yao, Rui
Xu, Xin
Zhou, Yong
Zhao, Jiaqi
Fang, Liang
PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 154 - 165

← 1 2 3 4 →