CORAL: Collaborative Automatic Labeling System based on Large Language Models

被引:0
|
作者
Zhu, Zhen [1 ]
Wang, Yibo [1 ]
Yang, Shouqing [1 ]
Long, Lin [1 ]
Wu, Runze [2 ]
Tang, Xiu [1 ]
Zhao, Junbo [1 ]
Wang, Haobo [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] NetEase Fuxi AI Lab, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
关键词
D O I
10.14778/3685800.3685885
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.
引用
收藏
页码:4401 / 4404
页数:4
相关论文
共 50 条
  • [41] A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
    Zhang, Chen
    D'Haro, Luis Fernando
    Chen, Yiming
    Zhang, Malu
    Li, Haizhou
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19515 - 19524
  • [42] Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction
    Kang, Sungmin
    Yoon, Juyeon
    Askarbekkyzy, Nargiz
    Yoo, Shin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (10) : 2677 - 2694
  • [43] An extended system for labeling graphical documents using statistical language models
    O'Sullivan, Andrew
    Keyes, Laura
    Winstanley, Adam
    GRAPHICS RECOGNITION: TEN YEARS REVIEW AND FUTURE PERSPECTIVES, 2006, 3926 : 61 - 75
  • [44] Language Models of Collaborative Filtering
    Wang, Jun
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 218 - 229
  • [45] Automatic Topic Labeling using Ontology-based Topic Models
    Allahyari, Mehdi
    Kochut, Krys
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 259 - 264
  • [46] Automatic Labeling of Topic Models Using Graph-Based Ranking
    He, Dongbin
    Wang, Minjuan
    Khattak, Abdul Mateen
    Zhang, Li
    Gao, Wanlin
    IEEE ACCESS, 2019, 7 : 131593 - 131608
  • [47] Providing tailored reflection instructions in collaborative learning using large language models
    Naik, Atharva
    Yin, Jessica Ruhan
    Kamath, Anusha
    Ma, Qianou
    Wu, Sherry Tongshuang
    Murray, R. Charles
    Bogart, Christopher
    Sakr, Majd
    Rose, Carolyn P.
    BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2024,
  • [48] CoLE: A collaborative legal expert prompting framework for large language models in law
    Li, Bo
    Fan, Shuang
    Zhu, Shaolin
    Wen, Lijie
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [49] Collaborative large language models for automated data extraction in living systematic reviews
    Khan, Muhammad Ali
    Ayub, Umair
    Naqvi, Syed Arsalan Ahmed
    Khakwani, Kaneez Zahra Rubab
    Sipra, Zaryab bin Riaz
    Raina, Ammad
    Zhou, Sihan
    He, Huan
    Saeidi, Amir
    Hasan, Bashar
    Rumble, Robert Bryan
    Bitterman, Danielle S.
    Warner, Jeremy L.
    Zou, Jia
    Tevaarwerk, Amye J.
    Leventakos, Konstantinos
    Kehl, Kenneth L.
    Palmer, Jeanne M.
    Murad, Mohammad Hassan
    Baral, Chitta
    bin Riaz, Irbaz
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025,
  • [50] Text-like Encoding of Collaborative Information in Large Language Models for Recommendation
    Zhang, Yang
    Bao, Keqin
    Yan, Ming
    Wang, Wenjie
    Feng, Fuli
    He, Xiangnan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9181 - 9191