From Semantic Categories to Fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach

被引：36

作者：

Wang, Guotao ^{[1
]}

Chen, Chenglizhao ^{[2
]}

Fan, Dengping ^{[4
]}

Hao, Aimin ^{[1
,3
,6
]}

Qin, Hong ^{[5
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China

[2] Qingdao Univ, Coll Comp Sci & Technol, Qingdao, Peoples R China

[3] Chinese Acad Med Sci, Res Unit Virtual Human & Virtual Surg, Beijing, Peoples R China

[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[5] SUNY Stony Brook, Stony Brook, NY 11794 USA

[6] Pengcheng Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金; 美国国家科学基金会; 国家重点研发计划;

关键词：

ATTENTION;

D O I：

10.1109/CVPR46437.2021.01487

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly. However, the deep learning based visual-audio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished with real fixations being recorded in the real visual-audio environment. Hence, it would be neither efficiency nor necessary to re-collect real fixations under the same visual-audio circumstance. To address the problem, this paper advocate a novel approach in a weakly-supervised manner to alleviating the demand of large-scale training sets for visual-audio model training. By using the video category tags only, we propose the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Moreover, these regions exhibit high consistency with the real human-eye fixations, which could subsequently be employed as the pseudo GTs to train a new spatial-temporal-audio (STA) network. Without resorting to any real fixation, the performance of our STA network is comparable to that of the fully supervised ones.

引用

页码：15114 / 15123

页数：10

共 50 条

[1] A Weakly-Supervised Approach for Semantic Segmentation
Feng, Yanqing
Wang, Lunwen
PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2311 - 2314
[2] Saliency Background Guided Network for Weakly-Supervised Semantic Segmentation
Bai X.
Li W.
Wang W.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (09): : 824 - 835
[3] Weakly-supervised semantic segmentation with saliency and incremental supervision updating
Luo, Wenfeng
Yang, Meng
Zheng, Weishi
PATTERN RECOGNITION, 2021, 115
[4] Weakly-supervised butterfly detection based on saliency map
Zhang, Ting
Waqas, Muhammad
Fang, Yu
Liu, Zhaoying
Halim, Zahid
Li, Yujian
Chen, Sheng
PATTERN RECOGNITION, 2023, 138
[5] Learning Visual Words for Weakly-Supervised Semantic Segmentation
Ru, Lixiang
Du, Bo
Wu, Chen
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 982 - 988
[6] Weakly-Supervised Saliency Detection via Salient Object Subitizing
Zheng, Xiaoyang
Tan, Xin
Zhou, Jie
Ma, Lizhuang
Lau, Rynson W. H.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4370 - 4380
[7] Weakly-Supervised Salient Object Detection With Saliency Bounding Boxes
Liu, Yuxuan
Wang, Pengjie
Cao, Ying
Liang, Zijian
Lau, Rynson W. H.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4423 - 4435
[8] Weakly-Supervised Salient Object Detection with Saliency Bounding Boxes
Liu, Yuxuan
Wang, Pengjie
Cao, Ying
Liang, Zijian
Lau, Rynson W. H.
IEEE Transactions on Image Processing, 2021, 30 : 4423 - 4435
[9] Visual-Auditory saliency detection using event-driven visual sensors
Akolkar, Himanshu
Valeiras, David Reverter
Benosman, Ryad
Bartolozzi, Chiara
PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON EVENT-BASED CONTROL, COMMUNICATION AND SIGNAL PROCESSING EBCCSP 2015, 2015,
[10] Learning Saliency-Free Model with Generic Features for Weakly-Supervised Semantic Segmentation
Luo, Wenfeng
Yang, Meng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11717 - 11724

← 1 2 3 4 5 →