CORAL: Collaborative Automatic Labeling System based on Large Language Models

被引：0

作者：

Zhu, Zhen ^{[1
]}

Wang, Yibo ^{[1
]}

Yang, Shouqing ^{[1
]}

Long, Lin ^{[1
]}

Wu, Runze ^{[2
]}

Tang, Xiu ^{[1
]}

Zhao, Junbo ^{[1
]}

Wang, Haobo ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] NetEase Fuxi AI Lab, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期

关键词：

D O I：

10.14778/3685800.3685885

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.

引用

页码：4401 / 4404

页数：4

共 50 条

[31] LLaMPS: Large Language Models Placement System
Bandamudi, Likhith
Singh, Ravi Kumar
Kunde, Shruti
Mishra, Mayank
Singhal, Rekha
COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 87 - 88
[32] Improving drug repositioning with negative data labeling using large language models
Picard, Milan
Leclercq, Mickael
Bodein, Antoine
Scott-Boyer, Marie Pier
Perin, Olivier
Droit, Arnaud
JOURNAL OF CHEMINFORMATICS, 2025, 17 (01):
[33] CrowdLab: Collaborative Dataset Labeling System Based on Image Segmentation
Feng, Zehui
Zhu, Dongfu
Zhang, Kejie
Jia, Yizhe
Qiu, Jiefan
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 115 - 120
[34] Development of an Adaptive User Support System Based on Multimodal Large Language Models
Wang, Wei
Li, Lin
Wickramathilaka, Shavindra
Grundy, John
Khalajzadeh, Hourieh
Obie, Humphrey O.
Madugalla, Anuradha
2024 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, VL/HCC 2024, 2024, : 344 - 347
[35] Automatic Labeling of Multinomial Topic Models
Mei, Qiaozhu
Shen, Xuehua
Zhai, Chenpiang
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 490 - 499
[36] Review of Automatic Labeling for Topic Models
Ling H.
Ou S.
Data Analysis and Knowledge Discovery, 2019, 3 (09) : 16 - 26
[37] Automatic Kernel Generation for Large Language Models on Deep Learning Accelerators
Wang, Fuyu
Shen, Minghua
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[38] Large Language Models for Few-Shot Automatic Term Extraction
Banerjee, Shubhanker
Chakravarthi, Bharathi Raja
McCrae, John Philip
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 137 - 150
[39] Automatic Unit Test Code Generation Using Large Language Models
Ocal, Akdeniz Kutay
Keskinoz, Mehmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[40] Automatic readability assessment for sentences: neural, hybrid and large language models
Liu, Fengkai
Jin, Tan
Lee, John S. Y.
LANGUAGE RESOURCES AND EVALUATION, 2025,

← 1 2 3 4 5 →