Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

被引：8

作者：

Hu, Xinyue ^{[1
]}

Gu, Lin ^{[2
,3
]}

An, Qiyuan ^{[1
]}

Zhang, Mengliang ^{[1
]}

Liu, Liangchen ^{[4
]}

Kobayashi, Kazuma ^{[5
]}

Harada, Tatsuya ^{[2
,3
]}

Summers, Ronald M. ^{[4
]}

Zhu, Yingying ^{[1
]}

机构：

[1] Univ Texas Arlington, Arlington, TX 76019 USA

[2] RIKEN, Tokyo, Japan

[3] Univ Tokyo, Tokyo, Japan

[4] NIH, Clin Ctr, Bethesda, MD 20892 USA

[5] Natl Canc Ctr, Res Inst, Tokyo, Japan

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

基金：

日本学术振兴会; 美国国家卫生研究院;

关键词：

visual question answering; medical imaging; datasets;

D O I：

10.1145/3580305.3599819

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model.

引用

页码：4156 / 4165

页数：10

共 50 条

[21] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
Zhang, Anda
Tao, Wei
Li, Ziyan
Wang, Haofen
Zhang, Wenqiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
[22] Knowledge-Aware Group Representation Learning for Group Recommendation
Deng, Zhiyi
Li, Changyu
Liu, Shujin
Ali, Waqar
Shao, Jie
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1571 - 1582
[23] KCRec: Knowledge-aware representation Graph Convolutional Network for Recommendation
Zhang, Lisa
Kang, Zhe
Sun, Xiaoxin
Sun, Hong
Zhang, Bangzuo
Pu, Dongbing
KNOWLEDGE-BASED SYSTEMS, 2021, 230
[24] HIERARCHICAL AND CONTRASTIVE REPRESENTATION LEARNING FOR KNOWLEDGE-AWARE RECOMMENDATION
Wu, Bingchao
Kang, Yangyuxuan
Zan, Daoguang
Guan, Bei
Wang, Yongji
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1050 - 1055
[25] Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Li, Jiawen
Chen, Yuxuan
Zhu, Hongbo
Sun, Qiehe
Guan, Tian
Han, Anjia
He, Yonghong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11323 - 11332
[26] Knowledge-Aware Neural Networks for Medical Forum Question Classification
Roy, Soumyadeep
Chakraborty, Sudip
Mandal, Aishik
Balde, Gunjan
Sharma, Prakhar
Natarajan, Anandhavelu
Khosla, Megha
Sural, Shamik
Ganguly, Niloy
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3398 - 3402
[27] ConceptBert: Concept-Aware Representation for Visual Question Answering
Garderes, Francois
Ziaeefard, Maryam
Abeloos, Baptiste
Lecue, Freddy
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 489 - 498
[28] Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering
Feng, Yanlin
Chen, Xinyue
Lin, Bill Yuchen
Wang, Peifeng
Yan, Jun
Ren, Xiang
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1295 - 1309
[29] RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering
Chen, Weipeng
Huang, Xu
Liu, Zifeng
Liu, Jin
Yo, Lan
INFORMATION FUSION, 2025, 118
[30] MM-Reasoner: A Multi-Modal Knowledge-Aware Framework for Knowledge-Based Visual Question Answering
Khademi, Mahmoud
Yang, Ziyi
Frujeri, Felipe Vieira
Zhu, Chenguang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6571 - 6581

← 1 2 3 4 5 →