Unifying Text, Tables, and Images for Multimodal Question Answering

被引：0

作者：

Luo, Haohao ^{[1
]}

Shen, Ying ^{[1
]}

Deng, Yang ^{[2
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal question answering (MMQA), which aims to derive the answer from multiple knowledge modalities (e.g., text, tables, and images), has received increasing attention due to its board applications. Current approaches to MMQA often rely on single-modal or bimodal QA models, which limits their ability to effectively integrate information across all modalities and leverage the power of pretrained language models. To address these limitations, we propose a novel framework called UniMMQA, which unifies three different input modalities into a text-to-text format by employing position-enhanced table linearization and diversified image captioning techniques. Additionally, we enhance cross-modal reasoning by incorporating a multimodal rationale generator, which produces textual descriptions of cross-modal relations for adaptation into the text-to-text generation process. Experimental results on three MMQA benchmark datasets show the superiority of UniMMQA in both supervised and unsupervised settings.

引用

页码：9355 / 9367

页数：13

共 50 条

[21] Unifying the Video and Question Attentions for Open-Ended Video Question Answering
Xue, Hongyang
Zhao, Zhou
Cai, Deng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) : 5656 - 5666
[22] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
Mahamoud, Ibrahim Souleiman
Coustaty, Mickael
Joseph, Aurelie
d'Andecy, Vincent Poulain
Ogier, Jean-Marc
DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
[23] Visual Question Answering on 360° Images
Chou, Shih-Han
Chao, Wei-Lun
Lai, Wei-Sheng
Sun, Min
Yang, Ming-Hsuan
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
[24] A text mining approach for definition question answering
Denicia-Carral, Claudia
Montes-y-Gomez, Manuel
Villasenor-Pineda, Luis
Garcia Hernandez, Rene
ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 76 - 86
[25] Definitional Question Answering Using Text Triplets
Kumar, Chandan
Anirudh, Ch Ram
Murthy, Kavi Narayana
DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 119 - 130
[26] IBQAst: A Question Answering System for Text Transcriptions
Pardino, Maria
Gomez, Jose M.
Llorens, Hector
Munoz-Terol, Rafael
Navarro-Colorado, Borja
Saquete, Estela
Martinez-Barco, Patricio
Moreda, Paloma
Palomar, Manuel
EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 488 - 491
[27] Interpretable Question Answering on Knowledge Bases and Text
Sydorova, Alona
Poerner, Nina
Roth, Benjamin
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4943 - 4951
[28] ViOCRVQA: novel benchmark dataset and VisionReader for visual question answering by understanding Vietnamese text in images
Pham, Huy Quang
Nguyen, Thang Kien-Bao
Nguyen, Quan Van
Tran, Dan Quang
Nguyen, Nghia Hieu
Nguyen, Kiet Van
Nguyen, Ngan Luu-Thuy
MULTIMEDIA SYSTEMS, 2025, 31 (02)
[29] Question Answering with Texts and Tables Through Deep Reinforcement Learning
Jose, Marcos M.
Cacao, Flavio N.
Ribeiro, Maria F.
Cheang, Rafael M.
Pirozelli, Paulo
Cozman, Fabio G.
INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 339 - 353
[30] TEMPTABQA: Temporal Question Answering for Semi-Structured Tables
Gupta, Vivek
Kandoi, Pranshu
Vora, Mahek Bhavesh
Zhang, Shuo
He, Yujie
Reinanda, Ridho
Srikumar, Vivek
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2431 - 2453

← 1 2 3 4 5 →