DCF-VQA: COUNTERFACTUAL STRUCTURE BASED ON MULTI-FEATURE ENHANCEMENT

被引:0
|
作者
Yang, Guan [1 ,2 ]
Ji, Cheng [1 ,2 ]
Liu, Xiaoming [1 ,2 ]
Zhang, Ziming [1 ,2 ]
Wang, Chen [1 ,2 ]
机构
[1] Zhongyuan Univ Technol, Sch Comp Sci, 41 Zhongyuan Middle Rd, Zhengzhou 450007, Henan, Peoples R China
[2] Zhongyuan Univ Technol, Henan Key Lab Publ Opin Intelligent Anal, 41 Zhongyuan Middle Rd, Zhengzhou 450007, Henan, Peoples R China
关键词
visual question answering; multi-feature enhancement; counterfactual; discrete cosine transform;
D O I
10.61822/amcs-2024-0032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual question answering (VQA) is a pivotal topic at the intersection of computer vision and natural language processing. This paper addresses the challenges of linguistic bias and bias fusion within invalid regions encountered in existing VQA models due to insufficient representation of multi-modal features. To overcome those issues, we propose a multi-feature enhancement scheme. This scheme involves the fusion of one or more features with the original ones, incorporating discrete cosine transform (DCT) features into the counterfactual reasoning framework. This approach harnesses finegrained information and spatial relationships within images and questions, enabling a more refined understanding of the indirect relationship between images and questions. Consequently, it effectively mitigates linguistic bias and bias fusion within invalid regions in the model. Extensive experiments are conducted on multiple datasets, including VQA2 and VQA-CP2, employing various baseline models and fusion techniques, resulting in promising and robust performance.
引用
收藏
页码:453 / 466
页数:14
相关论文
共 50 条
  • [21] Multi-Feature Gesture Recognition Based on Kinect
    Zhao, Yue
    Liu, Yunda
    Dong, Min
    Si, Sheng
    2016 IEEE INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2016, : 392 - 396
  • [22] Birdsong classification based on multi-feature fusion
    Yan, Na
    Chen, Aibin
    Zhou, Guoxiong
    Zhang, Zhiqiang
    Liu, Xiangyong
    Wang, Jianwu
    Liu, Zhihua
    Chen, Wenjie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (30) : 36529 - 36547
  • [23] Palmprint Recognition Based On Multi-feature Integration
    Zhang Yaxin
    Liu Huanhuan
    Geng Xuefei
    Liu Lili
    PROCEEDINGS OF 2016 IEEE ADVANCED INFORMATION MANAGEMENT, COMMUNICATES, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IMCEC 2016), 2016, : 992 - 995
  • [24] EEG FEATURE EXTRACTION AND RECOGNITION BASED ON MULTI-FEATURE FUSION
    Sun, Jian
    Wu, Quanyu
    Gao, Nan
    Pan, Lingjiao
    Tao, Weige
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024, 36 (06):
  • [25] Knowledge tracing based on multi-feature fusion
    Xiao, Yongkang
    Xiao, Rong
    Huang, Ning
    Hu, Yixin
    Li, Huan
    Sun, Bo
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1819 - 1833
  • [26] Configurable ontology mapping based on multi-feature
    钱鹏飞
    王英林
    张申生
    Journal of Harbin Institute of Technology(New series), 2009, (06) : 781 - 788
  • [27] Multi-feature based fire detection in video
    Yu, Fa-Xin
    Su, Jing-Yong
    Lu, Zhe-Ming
    Huang, Ping-He
    Pan, Jeng-Shyang
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (08): : 1987 - 1993
  • [28] MULTI-FEATURE HASHING BASED ON SNR MAXIMIZATION
    Yu, Honghai
    Moulin, Pierre
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 1815 - 1819
  • [29] Image retrieval based on multi-feature fusion
    Dong Wenfei
    Yu Shuchun
    Liu Songyu
    Zhang Zhiqiang
    Gu Wenbo
    2014 FOURTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC), 2014, : 240 - 243
  • [30] Multi-feature fusion dehazing based on CycleGAN
    Wang, Jingpin
    Ge, Yuan
    Zhao, Jie
    Han, Chao
    AI COMMUNICATIONS, 2024, 37 (04) : 619 - 635