CORAL: Collaborative Automatic Labeling System based on Large Language Models

被引:0
|
作者
Zhu, Zhen [1 ]
Wang, Yibo [1 ]
Yang, Shouqing [1 ]
Long, Lin [1 ]
Wu, Runze [2 ]
Tang, Xiu [1 ]
Zhao, Junbo [1 ]
Wang, Haobo [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] NetEase Fuxi AI Lab, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
关键词
D O I
10.14778/3685800.3685885
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.
引用
收藏
页码:4401 / 4404
页数:4
相关论文
共 50 条
  • [21] A Closer Look into Automatic Evaluation Using Large Language Models
    Chiang, Cheng-Han
    Lee, Hung-yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8928 - 8942
  • [22] Large language models for automatic equation discovery of nonlinear dynamics
    Du, Mengge
    Chen, Yuntian
    Wang, Zhongzheng
    Nie, Longfeng
    Zhang, Dongxiao
    PHYSICS OF FLUIDS, 2024, 36 (09)
  • [23] Improving Automatic VQA Evaluation Using Large Language Models
    Manas, Oscar
    Krojer, Benno
    Agrawal, Aishwarya
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4171 - 4179
  • [24] A Novel Gaussian Filter-based Automatic Labeling of Speech Data for TTS System in Gujarati Language
    Talesara, Swati
    Patil, Hemant A.
    Patel, Tanvina
    Sailor, Hardik
    Shah, Nirmesh
    2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 139 - 142
  • [25] Deep learning models based on automatic labeling with application in echocardiography
    Danu, Manuela
    Ciusdel, Costin Florian
    Itu, Lucian Mihai
    2020 24TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2020, : 373 - 378
  • [26] Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
    Chhun, Cyril
    Suchanek, Fabian M.
    Clavel, Chloe
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1122 - 1142
  • [27] The implementation solution for automatic visualization of tabular data in relational databases based on large language models
    Yang, Hao
    Yang, Zhaoyong
    Zhao, Ruyang
    Li, Xiaoran
    Rao, Gaoqi
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 175 - 180
  • [28] Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models
    Xi, Yunjia
    Liu, Weiwen
    Lin, Jianghao
    Wu, Chuhan
    Chen, Bo
    Tang, Ruiming
    Zhang, Weinan
    Yu, Yong
    INFORMATION RETRIEVAL, CCIR 2024, 2025, 15418 : 1 - 13
  • [29] A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
    Guo, Cong
    Cheng, Feng
    Du, Zhixu
    Kiessling, James
    Ku, Jonathan
    Li, Shiyu
    Li, Ziru
    Ma, Mingyuan
    Molom-Ochir, Tergel
    Morris, Benjamin
    Shan, Haoxuan
    Sun, Jingwei
    Wang, Yitu
    Wei, Chiyue
    Wu, Xueying
    Wu, Yuhao
    Yang, Hao Frank
    Zhang, Jingyang
    Zhang, Junyao
    Zheng, Qilin
    Zhou, Guanglei
    Li, Hai
    Chen, Yiran
    IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2025, 25 (01) : 35 - 57
  • [30] Cloud-Device Collaborative Learning for Multimodal Large Language Models
    Wang, Guanqun
    Chen, Jiaming
    Liu, Chenxuan
    Zhang, Yuan
    Ma, Junpeng
    Wei, Xinyu
    Zhang, Kevin
    Chong, Maurice
    Zhang, Renrui
    Liu, Yijiang
    Zhang, Shanghang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12646 - 12655