Annotating Columns with Pre-trained Language Models

被引：19

作者：

Suhara, Yoshihiko ^{[1
]}

Li, Jinfeng ^{[1
]}

Li, Yuliang ^{[1
]}

Zhang, Dan ^{[1
]}

Demiralp, Cagatay ^{[2
]}

Chen, Chen ^{[1
]}

Tan, Wang-Chiew ^{[3
]}

机构：

[1] Megagon Labs, Mountain View, CA 94041 USA

[2] Sigma Comp, San Francisco, CA USA

[3] Meta AI, Menlo Pk, CA USA

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) | 2022年

关键词：

table understanding; language models; multi-task learning; TABLES;

D O I：

10.1145/3514221.3517906

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. We develop a multi-task learning framework (called DODUO) based on pre-trained language models, which takes the entire table as input and predicts column types/relations using a single model. Experimental results show that DODUO establishes new state-of-the-art performance on two benchmarks for the column type prediction and column relation prediction tasks with up to 4.0% and 11.9% improvements, respectively. We report that DODUO can already outperform the previous state-of-the-art performance with a minimal number of tokens, only 8 tokens per colunm. We release a toolbox(1) and confirm the effectiveness of DODUO on a real-world data science problem through a case study.

引用

页码：1493 / 1503

页数：11

共 50 条

[41] Leveraging pre-trained language models for code generation
Soliman, Ahmed
Shaheen, Samir
Hadhoud, Mayada
[J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
[42] Evaluating and Inducing Personality in Pre-trained Language Models
Jiang, Guangyuan
Xu, Manjie
Zhu, Song-Chun
Han, Wenjuan
Zhang, Chi
Zhu, Yixin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[43] Modeling Second Language Acquisition with pre-trained neural language models
Palenzuela, Alvaro J. Jimenez
Frasincar, Flavius
Trusca, Maria Mihaela
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
[44] μBERT: Mutation Testing using Pre-Trained Language Models
Degiovanni, Renzo
Papadakis, Mike
[J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
[45] Pre-trained language models evaluating themselves - A comparative study
Koch, Philipp
Assenmacher, Matthias
Heumann, Christian
[J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 180 - 187
[46] In-Context Analogical Reasoning with Pre-Trained Language Models
Hu, Xiaoyang
Storks, Shane
Lewis, Richard L.
Chai, Joyce
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1953 - 1969
[47] A Data Cartography based MixUp for Pre-trained Language Models
Park, Seo Yeon
Caragea, Cornelia
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
[48] Probing Simile Knowledge from Pre-trained Language Models
Chen, Weijie
Chang, Yongzhu
Zhang, Rongsheng
Pu, Jiashu
Chen, Guandan
Zhang, Le
Xi, Yadong
Chen, Yijiang
Su, Chang
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5875 - 5887
[49] Devulgarization of Polish Texts Using Pre-trained Language Models
Klamra, Cezary
Wojdyga, Grzegorz
Zurowski, Sebastian
Rosalska, Paulina
Kozlowska, Matylda
Ogrodniczuk, Maciej
[J]. COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 49 - 55
[50] Adapting Pre-trained Language Models to Rumor Detection on Twitter
Slimi, Hamda
Bounhas, Ibrahim
Slimani, Yahya
[J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (10) : 1128 - 1148

← 1 2 3 4 5 →