Exploring Duality in Visual Question-Driven Top-Down Saliency

被引:9
|
作者
He, Shengfeng [1 ]
Han, Chu [2 ]
Han, Guoqiang [1 ]
Qin, Jing [3 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Dept Nursing, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Feature extraction; Training; Pipelines; Learning systems; Knowledge discovery; Dual learning; saliency; visual question answering (VQA); visual question generation (VQG);
D O I
10.1109/TNNLS.2019.2933439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Top-down, goal-driven visual saliency exerts a huge influence on the human visual system for performing visual tasks. Text generations, like visual question answering (VQA) and visual question generation (VQG), have intrinsic connections with top-down saliency, which is usually involved in both VQA and VQG processes in an unsupervised manner. However, it is shown that the regions that humans choose to look at to answer questions are very different from the unsupervised attention models. In this brief, we aim to explore the intrinsic relationship between top-down saliency and text generations, and to figure out whether an accurate saliency response benefits text generation. To this end, we propose a dual supervised network with dynamic parameter prediction. Dual-supervision explicitly exploits the probabilistic correlation between the primal task top-down saliency detection and the dual task text generation, while dynamic parameter prediction encodes the given text (i.e., question or answer) into the fully convolutional network. Extensive experiments show the proposed top-down saliency method achieves the best correlation with human attention among various baselines. In addition, the proposed model can be guided by either questions or answers, and output the counterpart. Furthermore, we show that combining human-like visual question-saliency improves the performance of both answer and question generations.
引用
收藏
页码:2672 / 2679
页数:8
相关论文
共 50 条
  • [1] Top-down saliency detection driven by visual classification
    Murabito, Francesca
    Spampinato, Concetto
    Palazzo, Simone
    Giordano, Daniela
    Pogorelov, Konstantin
    Riegler, Michael
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 172 : 67 - 76
  • [2] Top-down Visual Saliency Guided by Captions
    Ramanishka, Vasili
    Das, Abir
    Zhang, Jianming
    Saenko, Kate
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3135 - 3144
  • [3] SalChartQA: Question-driven Saliency on Information Visualisations
    Wang, Yao
    Wang, Weitian
    Abdelhafez, Abdullah
    Elfares, Mayar
    Hu, Zhiming
    Bace, Mihai
    Bulling, Andreas
    [J]. PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [4] Layout-Driven Top-Down Saliency Detection for Webpage
    Li, Xixi
    Liu, Di
    Zhang, Kao
    Chen, Zhenzhong
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 438 - 446
  • [5] Cascading Top-Down Attention for Visual Question Answering
    Tian, Weidong
    Zhou, Rencai
    Zhao, Zhongqiu
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Top-Down Visual Saliency via Joint CRF and Dictionary Learning
    Yang, Jimei
    Yang, Ming-Hsuan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (03) : 576 - 588
  • [7] Top-Down Visual Saliency via Joint CRF and Dictionary Learning
    Yang, Jimei
    Yang, Ming-Hsuan
    [J]. 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 2296 - 2303
  • [8] Multistep Question-Driven Visual Question Answering for Remote Sensing
    Zhang, Meimei
    Chen, Fang
    Li, Bin
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [9] Exemplar-Driven Top-Down Saliency Detection via Deep Association
    He, Shengfeng
    Lau, Rynson W. H.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5723 - 5732
  • [10] Combining Top-down and Bottom-up Visual Saliency for Firearms Localization
    Ardizzone, Edoardo
    Gallea, Roberto
    La Cascia, Marco
    Mazzola, Giuseppe
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS (SIGMAP), 2014, : 25 - 32