Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

被引:1
|
作者
Sui, Yuan [1 ,4 ]
Zhou, Mengyu [2 ]
Zhou, Mingjie [3 ,4 ]
Han, Shi [2 ]
Zhang, Dongmei [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Microsoft, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
关键词
large language models; semi-structured data; structural understanding capabilities; benchmark;
D O I
10.1145/3616855.3635752
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, e.g., cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose self-augmentation for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact(. 2.31%), HybridQA(. 2.13%), SQA(. 2.72%), Feverous(. 0.84%), and ToTTo(. 5.68%). We believe that our open source1 benchmark and proposed prompting methods can serve as a simple yet generic selection for future research.
引用
收藏
页码:645 / 654
页数:10
相关论文
共 50 条
  • [1] A survey of table reasoning with large language models
    Xuanliang Zhang
    Dingzirui Wang
    Longxu Dou
    Qingfu Zhu
    Wanxiang Che
    Frontiers of Computer Science, 2025, 19 (9)
  • [2] Can large language models understand molecules?
    Sadeghi, Shaghayegh
    Bui, Alan
    Forooghi, Ali
    Lu, Jianguo
    Ngom, Alioune
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [3] Large Language Models are few(1)-shot Table Reasoners
    Chen, Wenhu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1120 - 1130
  • [4] Cocoon: Semantic Table Profiling Using Large Language Models
    Huang, Zezhou
    Wu, Eugene
    WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
  • [5] LLM-Mod: Can Large Language Models Assist Content Moderation?
    Kolla, Mahi
    Salunkhe, Siddharth
    Chandrasekharan, Eshwar
    Saha, Koustuv
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [6] Using large language models for safety-related table summarization in clinical study reports
    Landman, Rogier
    Healey, Sean P.
    Loprinzo, Vittorio
    Kochendoerfer, Ulrike
    Winnier, Angela Russell
    Henstock, Peter, V
    Lin, Wenyi
    Chen, Aqiu
    Rajendran, Arthi
    Penshanwar, Sushant
    Khan, Sheraz
    Madhavan, Subha
    JAMIA OPEN, 2024, 7 (02)
  • [7] Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
    Jang, Joel
    Ye, Seongheyon
    Seo, Minjoon
    TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 52 - 62
  • [8] Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation
    Oro, Ermelinda
    De Grandis, Luca
    Granata, Francesco Maria
    Ruffolo, Massimo
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 222 - 227
  • [9] An empirical study of router response to large BGP routing table load
    Chang, DF
    Govindan, R
    Heidemann, J
    IMW 2002: PROCEEDINGS OF THE SECOND INTERNET MEASUREMENT WORKSHOP, 2002, : 203 - 208
  • [10] BC4LLM: A perspective of trusted artificial intelligence when blockchain meets large language models
    Luo, Haoxiang
    Luo, Jian
    Vasilakos, Athanasios V.
    NEUROCOMPUTING, 2024, 599