Visual saliency prediction using multi-scale attention gated network

被引:0
|
作者
Yubao Sun
Mengyang Zhao
Kai Hu
Shaojing Fan
机构
[1] Nanjing University of Information Science and Technology,The Jiangsu Key Laboratory of Big Data Analysis Technology (B
[2] National University of Singapore,DAT Laboratory), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology
来源
Multimedia Systems | 2022年 / 28卷
关键词
Saliency prediction; Multi-scale attention; Gating fusion;
D O I
暂无
中图分类号
学科分类号
摘要
Predicting human visual attention cannot only increase our understanding of the underlying biological mechanisms, but also bring new insights for other computer vision-related tasks such as autonomous driving and human–computer interaction. Current deep learning-based methods often place emphasis on high-level semantic feature for prediction. However, high-level semantic feature lacks fine-scale spatial information. Ideally, a saliency prediction model should include both spatial and semantic features. In this paper, we propose a multi-scale attention gated network (we refer to as MSAGNet) to fuse semantic features with different spatial resolutions for visual saliency prediction. Specifically, we adopt the high-resolution net (HRNet) as the backbone to extract the multi-scales semantic features. A multi-scale attention gating module is designed to adaptively fuse these multi-scale features in a hierarchical way. Different from the conventional way of feature concatenation from multiple layers or multi-scale inputs, this module calculates a spatial attention map from high-level semantic feature and then fuses it with the low-level spatial feature through gating operation. Through the hierarchical gating fusion, final saliency prediction is achieved at the finest scale. Extensive experimental analyses on three benchmark datasets demonstrate the superior performance of the proposed method.
引用
收藏
页码:131 / 139
页数:8
相关论文
共 50 条
  • [41] Multi-scale coupled attention for visual object detection
    Li, Fei
    Yan, Hongping
    Shi, Linsu
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [42] Multi-scale Spectrum Visual Saliency Perception via Hypercomplex DCT
    Xiao, Limei
    Li, Ce
    Hu, Zhijia
    Pan, Zhengrong
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 645 - 655
  • [43] Attention Prediction in Egocentric Video Using Motion and Visual Saliency
    Yamada, Kentaro
    Sugano, Yusuke
    Okabe, Takahiro
    Sato, Yoichi
    Sugimoto, Akihiro
    Hiraki, Kazuo
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PT I, 2011, 7087 : 277 - +
  • [44] Multi-scale fusion visual attention network for facial micro-expression recognition
    Pan, Hang
    Yang, Hongling
    Xie, Lun
    Wang, Zhiliang
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [45] Multi-scale network with shared cross-attention for audio–visual correlation learning
    Jiwei Zhang
    Yi Yu
    Suhua Tang
    Wei Li
    Jianming Wu
    Neural Computing and Applications, 2023, 35 : 20173 - 20187
  • [46] Stereoscopic Visual Discomfort Prediction Using Multi-scale DCT Features
    Zhou, Yang
    Yu, Wanli
    Li, Zhu
    Yin, Haibing
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 184 - 191
  • [47] Language conditioned multi-scale visual attention networks for visual grounding
    Yao, Haibo
    Wang, Lipeng
    Cai, Chengtao
    Wang, Wei
    Zhang, Zhi
    Shang, Xiaobing
    IMAGE AND VISION COMPUTING, 2024, 150
  • [48] Attention-Based Multi-Scale Prediction Network for Time-Series Data
    Junjie Li
    Lin Zhu
    Yong Zhang
    Da Guo
    Xingwen Xia
    China Communications, 2022, 19 (05) : 286 - 301
  • [49] Tool Wear Prediction Based on a Multi-Scale Convolutional Neural Network with Attention Fusion
    Huang, Qingqing
    Wu, Di
    Huang, Hao
    Zhang, Yan
    Han, Yan
    INFORMATION, 2022, 13 (10)
  • [50] MCDAN: A Multi-Scale Context-Enhanced Dynamic Attention Network for Diffusion Prediction
    Wang, Xiaowen
    Wang, Lanjun
    Su, Yuting
    Zhang, Yongdong
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7850 - 7862