Kern: A Labeling Environment for Large-Scale, High-Quality Training Data

被引:0
|
作者
Hoetter, Johannes [1 ]
Wenck, Henrik [1 ]
Feuerpfeil, Moritz [1 ]
Witzke, Simon [1 ]
机构
[1] Kern Ai, Gerhart Hauptmann Allee 71, D-15732 Eichwalde, Germany
关键词
Data labeling; Data management; Supervised learning;
D O I
10.1007/978-3-031-08473-7_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lack of large-scale, high-quality training data is a significant bottleneck in supervised learning. We introduce kern, a labeling environment used by machine learning experts and subject matter experts to create training data and find manual labeling errors powered by weak supervision, active transfer learning, and confident learning. We explain the current workflow and system overview and showcase the benefits of our system in an intent classification experiment, where we reduce the labeling error rate of a given dataset by an absolute 4.9% while improving the F-1 score of a baseline classifier by a total of 9.7%.
引用
收藏
页码:502 / 507
页数:6
相关论文
共 50 条
  • [1] Producing high-quality visualizations of large-scale simulations
    Popescu, V
    Hoffmann, C
    Kilic, S
    Sozen, M
    Meador, S
    [J]. IEEE VISUALIZATION 2003, PROCEEDINGS, 2003, : 575 - 580
  • [2] Towards Large-Scale and High-Quality Graphene Films
    Yang Jinlong
    [J]. ACTA PHYSICO-CHIMICA SINICA, 2019, 35 (10) : 1043 - 1044
  • [3] Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data
    Fu, Gang
    Zhang, Qing
    Zhu, Lei
    Xiao, Chunxia
    Li, Ping
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12811 - 12819
  • [4] Large-scale production of high-quality reduced graphene oxide
    Lee, Shichoon
    Eom, Sung Hun
    Chung, Jin Suk
    Hur, Seung Hyun
    [J]. CHEMICAL ENGINEERING JOURNAL, 2013, 233 : 297 - 304
  • [5] Large-scale synthesis of high-quality ultralong copper nanowires
    Chang, Y
    Lye, ML
    Zeng, HC
    [J]. LANGMUIR, 2005, 21 (09) : 3746 - 3748
  • [6] Efficient High-Quality Vectorized Modeling of Large-Scale Scenes
    Xiang, Xiaojun
    Jiang, Hanqing
    Yu, Yihao
    Shen, Donghui
    Zhen, Jianan
    Bao, Hujun
    Zhou, Xiaowei
    Zhang, Guofeng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 4564 - 4588
  • [7] High-quality Task Division for Large-scale Entity Alignment
    Liu, Bing
    Hua, Wen
    Zuccon, Guido
    Zhao, Genghong
    Zhang, Xia
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1258 - 1268
  • [8] Using Trellis software to enhance high-quality large-scale network data collection in the field
    Lungeanu, Alina
    McKnight, Mark
    Negron, Rennie
    Munar, Wolfgang
    Christakis, Nicholas A.
    Contractor, Noshir S.
    [J]. SOCIAL NETWORKS, 2021, 66 : 171 - 184
  • [9] High-Quality and Low-Memory-Footprint Progressive Decoding of Large-Scale Particle Data
    Hoang, Duong
    Bhatia, Harsh
    Lindstrom, Peter
    Pascucci, Valerio
    [J]. 2021 IEEE 11TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV 2021), 2021, : 32 - 42
  • [10] High-performance training of conditional random fields for large-scale applications of labeling sequence data
    Phan, Xuan-Hieu
    Nguyen, Le-Minh
    Inoguchi, Yasushi
    Horiguchi, Susumu
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 13 - 21