From Semantic Categories to Fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach

被引:36
|
作者
Wang, Guotao [1 ]
Chen, Chenglizhao [2 ]
Fan, Dengping [4 ]
Hao, Aimin [1 ,3 ,6 ]
Qin, Hong [5 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Qingdao Univ, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Chinese Acad Med Sci, Res Unit Virtual Human & Virtual Surg, Beijing, Peoples R China
[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[5] SUNY Stony Brook, Stony Brook, NY 11794 USA
[6] Pengcheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会; 国家重点研发计划;
关键词
ATTENTION;
D O I
10.1109/CVPR46437.2021.01487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly. However, the deep learning based visual-audio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished with real fixations being recorded in the real visual-audio environment. Hence, it would be neither efficiency nor necessary to re-collect real fixations under the same visual-audio circumstance. To address the problem, this paper advocate a novel approach in a weakly-supervised manner to alleviating the demand of large-scale training sets for visual-audio model training. By using the video category tags only, we propose the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Moreover, these regions exhibit high consistency with the real human-eye fixations, which could subsequently be employed as the pseudo GTs to train a new spatial-temporal-audio (STA) network. Without resorting to any real fixation, the performance of our STA network is comparable to that of the fully supervised ones.
引用
收藏
页码:15114 / 15123
页数:10
相关论文
共 50 条
  • [31] Focusing on feature-level domain alignment with text semantic for weakly-supervised domain adaptive object detection
    Chen, Zichong
    Cheng, Jian
    Xia, Ziying
    Hu, Yongxiang
    Li, Xiaochen
    Dong, Zhicheng
    Tashi, Nyima
    NEUROCOMPUTING, 2025, 622
  • [32] Audio-Visual Weakly Supervised Approach for Apathy Detection in the Elderly
    Sharma, Garima
    Joshi, Jyoti
    Zeghari, Radia
    Guerchouche, Rachid
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [33] Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection
    Gungor, Cagri
    Kovashka, Adriana
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2184 - 2193
  • [34] Weakly-supervised Semantic Segmentation with Image-level Labels: From Traditional Models to Foundation Models
    Chen, Zhaozheng
    Sun, Qianru
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [35] A Weakly-Supervised Approach for Discovering New User Intents from Search Query Logs
    Hakkani-Tur, Dilek
    Celikyilmaz, Ash
    Heck, Larry
    Tur, Gokhan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3747 - 3751
  • [36] Water-Matching CAM: A Novel Class Activation Map for Weakly-Supervised Semantic Segmentation of Water in SAR Images
    Wang, Kai
    Ren, Zhongle
    Hou, Biao
    Sha, Feng
    Wang, Zhiyang
    Li, Weibin
    Jiao, Licheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 3222 - 3235
  • [37] Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
    Liu, Yongcheng
    Sheng, Lu
    Shao, Jing
    Yan, Junjie
    Xiang, Shiming
    Pan, Chunhong
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 700 - 708
  • [38] A visual knowledge oriented approach for weakly supervised remote sensing object detection
    Zhang, Junjie
    Ye, Binfeng
    Zhang, Qiming
    Gong, Yongshun
    Lu, Jianfeng
    Zeng, Dan
    NEUROCOMPUTING, 2024, 597
  • [39] MODEL AGNOSTIC SALIENCY FOR WEAKLY SUPERVISED LESION DETECTION FROM BREAST DCE-MRI
    Maicas, Gabriel
    Snaauw, Gerard
    Bradley, Andrew P.
    Reid, Ian
    Carneiro, Gustavo
    2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1057 - 1060
  • [40] PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring
    Lin, Haitao
    Fu, Yanwei
    Xue, Xiangyang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 241 - 251