DHHG-TAC: Fusion of Dynamic Heterogeneous Hypergraphs and Transformer Attention Mechanism for Visual Question Answering Tasks

被引：1

作者：

Liu, Xuetao ^{[1
]}

Dong, Ruiliang ^{[1
]}

Yang, Hongyan ^{[1
]}

机构：

[1] Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100021, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2025年 / 21卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Transformers; Feature extraction; Visualization; Attention mechanisms; Imaging; Question answering (information retrieval); Informatics; Context modeling; Vectors; Semantics; Combined attention; hypergraph neural networks (HGNNs); visual question answering (VQA);

D O I：

10.1109/TII.2024.3453919

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Amidst the burgeoning advancements in deep learning, traditional neural networks have demonstrated significant achievements in unimodal tasks such as image recognition. However, the handling of multimodal data, especially in visual question answering (VQA) tasks, presents challenges in processing the complex structural relationships among modalities. To address this issue, this article introduces a dynamic heterogeneous hypergraph neural network (HGNN) model that utilizes a Transformer-based combined attention mechanism and designs a hypergraph representation imaging network to enhance model inference without increasing parameter count. Initially, image scenes and textual questions are converted into pairs of hypergraphs with preliminary weights, which facilitate the capture of complex structural relationships through the HGNN. The hypergraph representation imaging network further aids the HGNN in learning and understanding the scene image modalities. Subsequently, a transformer-based combined attention mechanism is employed to adapt to the distinct characteristics of each modality and their intermodal interactions. This integration of multiple attention mechanisms helps identify critical structural information within the answer regions. Dynamic updates to the hyperedge weights of the hypergraph pairs, guided by the attention weights, enable the model to assimilate more relevant information progressively. Experiments on two public VQA datasets attest to the model's superior performance. Furthermore, this article envisions future advancements in model optimization and feature information extraction, extending the potential of HGNNs in multimodal fusion technology.

引用

页码：545 / 553

页数：9

共 41 条

[1] Feature Fusion Attention Visual Question Answering
Wang, Chunlin
Sun, Jianyong
Chen, Xiaolin
ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
[2] Dynamic Capsule Attention for Visual Question Answering
Zhou, Yiyi
Ji, Rongrong
Su, Jinsong
Sun, Xiaoshuai
Chen, Weiqiu
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
[3] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
Gu, Geonmo
Kim, Seong Tae
Ro, Yong Man
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
[4] Local self-attention in transformer for visual question answering
Xiang Shen
Dezhi Han
Zihan Guo
Chongqing Chen
Jie Hua
Gaofeng Luo
Applied Intelligence, 2023, 53 : 16706 - 16723
[5] Local self-attention in transformer for visual question answering
Shen, Xiang
Han, Dezhi
Guo, Zihan
Chen, Chongqing
Hua, Jie
Luo, Gaofeng
APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
[6] TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
Zhou, Yiyi
Ren, Tianhe
Zhu, Chaoyang
Sun, Xiaoshuai
Liu, Jianzhuang
Ding, Xinghao
Xu, Mingliang
Ji, Rongrong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2054 - 2064
[7] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
Zhang, Haotian
Wu, Wei
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[8] CAT: Re-Conv Attention in Transformer for Visual Question Answering
Zhang, Haotian
Wu, Wei
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1471 - 1477
[9] Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering
Gao, Peng
Jiang, Zhengkai
You, Haoxuan
Lu, Pan
Hoi, Steven
Wang, Xiaogang
Li, Hongsheng
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6632 - 6641
[10] Dynamic Co-attention Network for Visual Question Answering
Ebaid, Doaa B.
Madbouly, Magda M.
El-Zoghabi, Adel A.
2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129

← 1 2 3 4 5 →