CORAL: Collaborative Automatic Labeling System based on Large Language Models

被引：0

作者：

Zhu, Zhen ^{[1
]}

Wang, Yibo ^{[1
]}

Yang, Shouqing ^{[1
]}

Long, Lin ^{[1
]}

Wu, Runze ^{[2
]}

Tang, Xiu ^{[1
]}

Zhao, Junbo ^{[1
]}

Wang, Haobo ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] NetEase Fuxi AI Lab, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期

关键词：

D O I：

10.14778/3685800.3685885

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.

引用

页码：4401 / 4404

页数：4

共 50 条

[41] A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Zhang, Chen
D'Haro, Luis Fernando
Chen, Yiming
Zhang, Malu
Li, Haizhou
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19515 - 19524
[42] Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction
Kang, Sungmin
Yoon, Juyeon
Askarbekkyzy, Nargiz
Yoo, Shin
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (10) : 2677 - 2694
[43] An extended system for labeling graphical documents using statistical language models
O'Sullivan, Andrew
Keyes, Laura
Winstanley, Adam
GRAPHICS RECOGNITION: TEN YEARS REVIEW AND FUTURE PERSPECTIVES, 2006, 3926 : 61 - 75
[44] Language Models of Collaborative Filtering
Wang, Jun
INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 218 - 229
[45] Automatic Topic Labeling using Ontology-based Topic Models
Allahyari, Mehdi
Kochut, Krys
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 259 - 264
[46] Automatic Labeling of Topic Models Using Graph-Based Ranking
He, Dongbin
Wang, Minjuan
Khattak, Abdul Mateen
Zhang, Li
Gao, Wanlin
IEEE ACCESS, 2019, 7 : 131593 - 131608
[47] Providing tailored reflection instructions in collaborative learning using large language models
Naik, Atharva
Yin, Jessica Ruhan
Kamath, Anusha
Ma, Qianou
Wu, Sherry Tongshuang
Murray, R. Charles
Bogart, Christopher
Sakr, Majd
Rose, Carolyn P.
BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2024,
[48] CoLE: A collaborative legal expert prompting framework for large language models in law
Li, Bo
Fan, Shuang
Zhu, Shaolin
Wen, Lijie
KNOWLEDGE-BASED SYSTEMS, 2025, 311
[49] Collaborative large language models for automated data extraction in living systematic reviews
Khan, Muhammad Ali
Ayub, Umair
Naqvi, Syed Arsalan Ahmed
Khakwani, Kaneez Zahra Rubab
Sipra, Zaryab bin Riaz
Raina, Ammad
Zhou, Sihan
He, Huan
Saeidi, Amir
Hasan, Bashar
Rumble, Robert Bryan
Bitterman, Danielle S.
Warner, Jeremy L.
Zou, Jia
Tevaarwerk, Amye J.
Leventakos, Konstantinos
Kehl, Kenneth L.
Palmer, Jeanne M.
Murad, Mohammad Hassan
Baral, Chitta
bin Riaz, Irbaz
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025,
[50] Text-like Encoding of Collaborative Information in Large Language Models for Recommendation
Zhang, Yang
Bao, Keqin
Yan, Ming
Wang, Wenjie
Feng, Fuli
He, Xiangnan
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9181 - 9191

← 1 2 3 4 5 →