Salient region detection in the task of visual question answering

被引:0
|
作者
Favorskaya, Margarita [1 ]
Andreev, Vladimir [1 ]
Popov, Aleksei [1 ]
机构
[1] Reshetnev Siberian State Univ Sci & Technol, 31 Krasnoyarsky Rabochy Ave, Krasnoyarsk 660037, Russia
关键词
D O I
10.1088/1757-899X/450/5/052017
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Salient region detection in Visual Question Answering (VQA) is an attempt to simulate a human ability to quickly perceive a scene by selectively looking on image fragments instead of processing a whole scene. The conventional approach deals with a neural network application. However, the Convolutional Neural Networks (CNNs) have many disadvantages compared with traditional methods for salient region detection. We modified the basic algorithm of salient region detection for VQA task by selecting such image fragments, which have a high probability to be included in a questionnaire. The experiments have been conducted on images from MS-COCO dataset and provided good segmentation results.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [2] Visual Question Answering as a Meta Learning Task
    Teney, Damien
    van den Hengel, Anton
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
  • [3] Change Detection Meets Visual Question Answering
    Yuan, Zhenghang
    Mou, Lichao
    Xiong, Zhitong
    Zhu, Xiao Xiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [4] Visual question answering model based on visual relationship detection
    Xi, Yuling
    Zhang, Yanning
    Ding, Songtao
    Wan, Shaohua
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
  • [5] Word-to-region attention network for visual question answering
    Liang Peng
    Yang Yang
    Yi Bin
    Ning Xie
    Fumin Shen
    Yanli Ji
    Xing Xu
    [J]. Multimedia Tools and Applications, 2019, 78 : 3843 - 3858
  • [6] Word-to-region attention network for visual question answering
    Peng, Liang
    Yang, Yang
    Bin, Yi
    Xie, Ning
    Shen, Fumin
    Ji, Yanli
    Xu, Xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3843 - 3858
  • [7] Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
    Noh, Hyeonwoo
    Kim, Taehoon
    Mun, Jonghwan
    Han, Bohyung
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8377 - 8386
  • [8] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [9] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
    Lee, Doyup
    Cheon, Yeongjae
    Han, Wook-Shin
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1845 - 1853
  • [10] Detection-Based Intermediate Supervision For Visual Question Answering
    Liu, Yuhang
    Peng, Daowan
    Wei, Wei
    Fu, Yuanyuan
    Xie, Wenfeng
    Chen, Dangyang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14061 - 14068