CORAL: Collaborative Automatic Labeling System based on Large Language Models

被引:0
|
作者
Zhu, Zhen [1 ]
Wang, Yibo [1 ]
Yang, Shouqing [1 ]
Long, Lin [1 ]
Wu, Runze [2 ]
Tang, Xiu [1 ]
Zhao, Junbo [1 ]
Wang, Haobo [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] NetEase Fuxi AI Lab, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
关键词
D O I
10.14778/3685800.3685885
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.
引用
收藏
页码:4401 / 4404
页数:4
相关论文
共 50 条
  • [31] LLaMPS: Large Language Models Placement System
    Bandamudi, Likhith
    Singh, Ravi Kumar
    Kunde, Shruti
    Mishra, Mayank
    Singhal, Rekha
    COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 87 - 88
  • [32] Improving drug repositioning with negative data labeling using large language models
    Picard, Milan
    Leclercq, Mickael
    Bodein, Antoine
    Scott-Boyer, Marie Pier
    Perin, Olivier
    Droit, Arnaud
    JOURNAL OF CHEMINFORMATICS, 2025, 17 (01):
  • [33] CrowdLab: Collaborative Dataset Labeling System Based on Image Segmentation
    Feng, Zehui
    Zhu, Dongfu
    Zhang, Kejie
    Jia, Yizhe
    Qiu, Jiefan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 115 - 120
  • [34] Development of an Adaptive User Support System Based on Multimodal Large Language Models
    Wang, Wei
    Li, Lin
    Wickramathilaka, Shavindra
    Grundy, John
    Khalajzadeh, Hourieh
    Obie, Humphrey O.
    Madugalla, Anuradha
    2024 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, VL/HCC 2024, 2024, : 344 - 347
  • [35] Automatic Labeling of Multinomial Topic Models
    Mei, Qiaozhu
    Shen, Xuehua
    Zhai, Chenpiang
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 490 - 499
  • [36] Review of Automatic Labeling for Topic Models
    Ling H.
    Ou S.
    Data Analysis and Knowledge Discovery, 2019, 3 (09) : 16 - 26
  • [37] Automatic Kernel Generation for Large Language Models on Deep Learning Accelerators
    Wang, Fuyu
    Shen, Minghua
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [38] Large Language Models for Few-Shot Automatic Term Extraction
    Banerjee, Shubhanker
    Chakravarthi, Bharathi Raja
    McCrae, John Philip
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 137 - 150
  • [39] Automatic Unit Test Code Generation Using Large Language Models
    Ocal, Akdeniz Kutay
    Keskinoz, Mehmet
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [40] Automatic readability assessment for sentences: neural, hybrid and large language models
    Liu, Fengkai
    Jin, Tan
    Lee, John S. Y.
    LANGUAGE RESOURCES AND EVALUATION, 2025,