ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models

被引:0
|
作者
Feuer, Benjamin [1 ]
Liu, Yurong [1 ]
Hegde, Chinmay [1 ]
Freire, Juliana [1 ]
机构
[1] NYU, New York, NY 10016 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 09期
关键词
D O I
10.14778/3665844.3665857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type; incur high run-time inference costs; and their performance can degrade when evaluated on novel datasets, even when types remain constant. Large language models have exhibited strong zero-shot classification performance on a wide range of tasks and in this paper we explore their use for CTA. We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner. We ablate each component of our method separately, and establish that improvements to context sampling and label remapping provide the most consistent gains. ArcheType establishes a new state-of-the-art performance on zero-shot CTA benchmarks (including three new domain-specific benchmarks which we release along with this paper), and when used in conjunction with classical CTA techniques, it outperforms a SOTA DoDuo model on the fine-tuned SOTAB benchmark.
引用
收藏
页码:2279 / 2292
页数:14
相关论文
共 50 条
  • [31] Harnessing Large Language Models for Simulink Toolchain Testing and Developing Diverse Open-Source Corpora of Simulink Models for Metric and Evolution Analysis
    Shrestha, Sohil Lal
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 1541 - 1545
  • [32] MAFOSS: Multi-Agent Framework using Open-Source Software
    Jones, Eric
    Adra, Dakota
    Miah, Md Suruz
    2019 7TH INTERNATIONAL CONFERENCE ON MECHATRONICS ENGINEERING (ICOM), 2019, : 149 - 154
  • [33] A Computational Framework for Atrioventricular Valve Modeling Using Open-Source Software
    Wu, Wensi
    Ching, Stephen
    Maas, Steve A.
    Lasso, Andras
    Sabin, Patricia
    Weiss, Jeffrey A.
    Jolley, Matthew A.
    JOURNAL OF BIOMECHANICAL ENGINEERING-TRANSACTIONS OF THE ASME, 2022, 144 (10):
  • [34] Starviewer and its comparison with other open-source DICOM viewers using a novel hierarchical evaluation framework
    Ruiz, Marc
    Julia, Adria
    Boada, Imma
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2020, 137
  • [35] Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models
    Sicari, Sabrina
    Cevallos, Jesus F.M.
    Rizzardi, Alessandra
    Coen-Porisini, Alberto
    ACM Computing Surveys, 2024, 57 (04)
  • [36] Large-eddy simulation of the flow in a direct injection spark ignition engine using an open-source framework
    Ribeiro, Mateus Dias
    Bimbato, Alex Mendonca
    Zanardi, Mauricio Araujo
    Perrella Balestieri, Jose Antonio
    Schmidt, David P.
    INTERNATIONAL JOURNAL OF ENGINE RESEARCH, 2021, 22 (04) : 1064 - 1085
  • [37] EASY GENERATION OF MODELS/MESHES USING AN OPEN-SOURCE SOFTWARE SALOME
    Hlavacek, P.
    Smilauer, V.
    Patzak, B.
    ENGINEERING MECHANICS 2011, 2011, : 187 - 190
  • [38] Performing a Research Study Using Open-Source Deep Learning Models
    Kim, Hyungjin
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (03) : 217 - 219
  • [39] Integrating FMI and ML/AI models on the open-source digital twin framework OpenTwins
    Infante, Sergio
    Martin, Cristian
    Robles, Julia
    Rubio, Bartolome
    Diaz, Manuel
    Perea, Rafael Gonzalez
    Montesinos, Pilar
    Poyato, Emilio Camacho
    SOFTWARE-PRACTICE & EXPERIENCE, 2024, 54 (08): : 1470 - 1490
  • [40] Benchmark Models for Low-Voltage Networks: a Novel Open-Source Approach
    De Paola, Antonio
    Thomas, Dimitrios
    Kotsakis, Evangelos
    Marinopoulos, Antonios
    Masera, Marcelo
    Paspatis, Alexandros
    Kontou, Alkistis
    Kotsampopoulos, Panos
    Hatziargyriou, Nikos
    2022 OPEN SOURCE MODELLING AND SIMULATION OF ENERGY SYSTEM (OSMSES), 2022,