From Semantic Categories to Fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach

被引:36
|
作者
Wang, Guotao [1 ]
Chen, Chenglizhao [2 ]
Fan, Dengping [4 ]
Hao, Aimin [1 ,3 ,6 ]
Qin, Hong [5 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Qingdao Univ, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Chinese Acad Med Sci, Res Unit Virtual Human & Virtual Surg, Beijing, Peoples R China
[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[5] SUNY Stony Brook, Stony Brook, NY 11794 USA
[6] Pengcheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会; 国家重点研发计划;
关键词
ATTENTION;
D O I
10.1109/CVPR46437.2021.01487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly. However, the deep learning based visual-audio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished with real fixations being recorded in the real visual-audio environment. Hence, it would be neither efficiency nor necessary to re-collect real fixations under the same visual-audio circumstance. To address the problem, this paper advocate a novel approach in a weakly-supervised manner to alleviating the demand of large-scale training sets for visual-audio model training. By using the video category tags only, we propose the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Moreover, these regions exhibit high consistency with the real human-eye fixations, which could subsequently be employed as the pseudo GTs to train a new spatial-temporal-audio (STA) network. Without resorting to any real fixation, the performance of our STA network is comparable to that of the fully supervised ones.
引用
收藏
页码:15114 / 15123
页数:10
相关论文
共 50 条
  • [1] A Weakly-Supervised Approach for Semantic Segmentation
    Feng, Yanqing
    Wang, Lunwen
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2311 - 2314
  • [2] Saliency Background Guided Network for Weakly-Supervised Semantic Segmentation
    Bai X.
    Li W.
    Wang W.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (09): : 824 - 835
  • [3] Weakly-supervised semantic segmentation with saliency and incremental supervision updating
    Luo, Wenfeng
    Yang, Meng
    Zheng, Weishi
    PATTERN RECOGNITION, 2021, 115
  • [4] Weakly-supervised butterfly detection based on saliency map
    Zhang, Ting
    Waqas, Muhammad
    Fang, Yu
    Liu, Zhaoying
    Halim, Zahid
    Li, Yujian
    Chen, Sheng
    PATTERN RECOGNITION, 2023, 138
  • [5] Learning Visual Words for Weakly-Supervised Semantic Segmentation
    Ru, Lixiang
    Du, Bo
    Wu, Chen
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 982 - 988
  • [6] Weakly-Supervised Saliency Detection via Salient Object Subitizing
    Zheng, Xiaoyang
    Tan, Xin
    Zhou, Jie
    Ma, Lizhuang
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4370 - 4380
  • [7] Weakly-Supervised Salient Object Detection With Saliency Bounding Boxes
    Liu, Yuxuan
    Wang, Pengjie
    Cao, Ying
    Liang, Zijian
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4423 - 4435
  • [8] Weakly-Supervised Salient Object Detection with Saliency Bounding Boxes
    Liu, Yuxuan
    Wang, Pengjie
    Cao, Ying
    Liang, Zijian
    Lau, Rynson W. H.
    IEEE Transactions on Image Processing, 2021, 30 : 4423 - 4435
  • [9] Visual-Auditory saliency detection using event-driven visual sensors
    Akolkar, Himanshu
    Valeiras, David Reverter
    Benosman, Ryad
    Bartolozzi, Chiara
    PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON EVENT-BASED CONTROL, COMMUNICATION AND SIGNAL PROCESSING EBCCSP 2015, 2015,
  • [10] Learning Saliency-Free Model with Generic Features for Weakly-Supervised Semantic Segmentation
    Luo, Wenfeng
    Yang, Meng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11717 - 11724