From Semantic Categories to Fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach

被引：36

作者：

Wang, Guotao ^{[1
]}

Chen, Chenglizhao ^{[2
]}

Fan, Dengping ^{[4
]}

Hao, Aimin ^{[1
,3
,6
]}

Qin, Hong ^{[5
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China

[2] Qingdao Univ, Coll Comp Sci & Technol, Qingdao, Peoples R China

[3] Chinese Acad Med Sci, Res Unit Virtual Human & Virtual Surg, Beijing, Peoples R China

[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[5] SUNY Stony Brook, Stony Brook, NY 11794 USA

[6] Pengcheng Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金; 美国国家科学基金会; 国家重点研发计划;

关键词：

ATTENTION;

D O I：

10.1109/CVPR46437.2021.01487

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly. However, the deep learning based visual-audio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished with real fixations being recorded in the real visual-audio environment. Hence, it would be neither efficiency nor necessary to re-collect real fixations under the same visual-audio circumstance. To address the problem, this paper advocate a novel approach in a weakly-supervised manner to alleviating the demand of large-scale training sets for visual-audio model training. By using the video category tags only, we propose the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Moreover, these regions exhibit high consistency with the real human-eye fixations, which could subsequently be employed as the pseudo GTs to train a new spatial-temporal-audio (STA) network. Without resorting to any real fixation, the performance of our STA network is comparable to that of the fully supervised ones.

引用

页码：15114 / 15123

页数：10

共 50 条

[31] Focusing on feature-level domain alignment with text semantic for weakly-supervised domain adaptive object detection
Chen, Zichong
Cheng, Jian
Xia, Ziying
Hu, Yongxiang
Li, Xiaochen
Dong, Zhicheng
Tashi, Nyima
NEUROCOMPUTING, 2025, 622
[32] Audio-Visual Weakly Supervised Approach for Apathy Detection in the Elderly
Sharma, Garima
Joshi, Jyoti
Zeghari, Radia
Guerchouche, Rachid
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[33] Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection
Gungor, Cagri
Kovashka, Adriana
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2184 - 2193
[34] Weakly-supervised Semantic Segmentation with Image-level Labels: From Traditional Models to Foundation Models
Chen, Zhaozheng
Sun, Qianru
ACM COMPUTING SURVEYS, 2025, 57 (05)
[35] A Weakly-Supervised Approach for Discovering New User Intents from Search Query Logs
Hakkani-Tur, Dilek
Celikyilmaz, Ash
Heck, Larry
Tur, Gokhan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3747 - 3751
[36] Water-Matching CAM: A Novel Class Activation Map for Weakly-Supervised Semantic Segmentation of Water in SAR Images
Wang, Kai
Ren, Zhongle
Hou, Biao
Sha, Feng
Wang, Zhiyang
Li, Weibin
Jiao, Licheng
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 3222 - 3235
[37] Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Liu, Yongcheng
Sheng, Lu
Shao, Jing
Yan, Junjie
Xiang, Shiming
Pan, Chunhong
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 700 - 708
[38] A visual knowledge oriented approach for weakly supervised remote sensing object detection
Zhang, Junjie
Ye, Binfeng
Zhang, Qiming
Gong, Yongshun
Lu, Jianfeng
Zeng, Dan
NEUROCOMPUTING, 2024, 597
[39] MODEL AGNOSTIC SALIENCY FOR WEAKLY SUPERVISED LESION DETECTION FROM BREAST DCE-MRI
Maicas, Gabriel
Snaauw, Gerard
Bradley, Andrew P.
Reid, Ian
Carneiro, Gustavo
2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1057 - 1060
[40] PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring
Lin, Haitao
Fu, Yanwei
Xue, Xiangyang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 241 - 251

← 1 2 3 4 5 →