Unifying Text, Tables, and Images for Multimodal Question Answering

被引:0
|
作者
Luo, Haohao [1 ]
Shen, Ying [1 ]
Deng, Yang [2 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal question answering (MMQA), which aims to derive the answer from multiple knowledge modalities (e.g., text, tables, and images), has received increasing attention due to its board applications. Current approaches to MMQA often rely on single-modal or bimodal QA models, which limits their ability to effectively integrate information across all modalities and leverage the power of pretrained language models. To address these limitations, we propose a novel framework called UniMMQA, which unifies three different input modalities into a text-to-text format by employing position-enhanced table linearization and diversified image captioning techniques. Additionally, we enhance cross-modal reasoning by incorporating a multimodal rationale generator, which produces textual descriptions of cross-modal relations for adaptation into the text-to-text generation process. Experimental results on three MMQA benchmark datasets show the superiority of UniMMQA in both supervised and unsupervised settings.
引用
收藏
页码:9355 / 9367
页数:13
相关论文
共 50 条
  • [21] Unifying the Video and Question Attentions for Open-Ended Video Question Answering
    Xue, Hongyang
    Zhao, Zhou
    Cai, Deng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) : 5656 - 5666
  • [22] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [23] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [24] A text mining approach for definition question answering
    Denicia-Carral, Claudia
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Garcia Hernandez, Rene
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 76 - 86
  • [25] Definitional Question Answering Using Text Triplets
    Kumar, Chandan
    Anirudh, Ch Ram
    Murthy, Kavi Narayana
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 119 - 130
  • [26] IBQAst: A Question Answering System for Text Transcriptions
    Pardino, Maria
    Gomez, Jose M.
    Llorens, Hector
    Munoz-Terol, Rafael
    Navarro-Colorado, Borja
    Saquete, Estela
    Martinez-Barco, Patricio
    Moreda, Paloma
    Palomar, Manuel
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 488 - 491
  • [27] Interpretable Question Answering on Knowledge Bases and Text
    Sydorova, Alona
    Poerner, Nina
    Roth, Benjamin
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4943 - 4951
  • [28] ViOCRVQA: novel benchmark dataset and VisionReader for visual question answering by understanding Vietnamese text in images
    Pham, Huy Quang
    Nguyen, Thang Kien-Bao
    Nguyen, Quan Van
    Tran, Dan Quang
    Nguyen, Nghia Hieu
    Nguyen, Kiet Van
    Nguyen, Ngan Luu-Thuy
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [29] Question Answering with Texts and Tables Through Deep Reinforcement Learning
    Jose, Marcos M.
    Cacao, Flavio N.
    Ribeiro, Maria F.
    Cheang, Rafael M.
    Pirozelli, Paulo
    Cozman, Fabio G.
    INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 339 - 353
  • [30] TEMPTABQA: Temporal Question Answering for Semi-Structured Tables
    Gupta, Vivek
    Kandoi, Pranshu
    Vora, Mahek Bhavesh
    Zhang, Shuo
    He, Yujie
    Reinanda, Ridho
    Srikumar, Vivek
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2431 - 2453