CoLAL: Co-learning Active Learning for Text Classification

被引：0

作者：

Le, Linh ^{[1
]}

Zhao, Genghong ^{[2
]}

Zhang, Xia ^{[3
]}

Zuccon, Guido ^{[1
]}

Demartini, Gianluca ^{[1
]}

机构：

[1] Univ Queensland, St Lucia, Qld, Australia

[2] Neusoft Res Intelligent Healthcare Technol Co Ltd, Shenyang, Peoples R China

[3] Neusoft Corp, Shenyang, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12 | 2024年

基金：

瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the machine learning field, the challenge of effectively learning with limited data has become increasingly crucial. Active Learning (AL) algorithms play a significant role in this by enhancing model performance. We introduce a novel AL algorithm, termed Co-learning (CoLAL), designed to select the most diverse and representative samples within a training dataset. This approach utilizes noisy labels and predictions made by the primary model on unlabeled data. By leveraging a probabilistic graphical model, we combine two multi-class classifiers into a binary one. This classifier determines if both the main and the peer models agree on a prediction. If they do, the unlabeled sample is assumed to be easy to classify and is thus not beneficial to increase the target model's performance. We prioritize data that represents the unlabeled set without overlapping decision boundaries. The discrepancies between these boundaries can be estimated by the probability that two models result in the same prediction. Through theoretical analysis and experimental validation, we reveal that the integration of noisy labels into the peer model effectively identifies target model's potential inaccuracies. We evaluated the CoLAL method across seven benchmark datasets: four text datasets (AGNews, DBPedia, PubMed, SST-2) and text-based state-of-the-art (SOTA) baselines, and three image datasets (CIFAR100, MNIST, OpenML-155) and computer vision SOTA baselines. The results show that our CoLAL method significantly outperforms existing SOTA in text-based AL, and is competitive with SOTA image-based AL techniques.

引用

页码：13337 / 13345

页数：9

共 50 条

[41] Impact of Stop Sets on Stopping Active Learning for Text Classification
Kurlandski, Luke
Bloodgood, Michael
16TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2022), 2022, : 25 - 32
[42] Learning Together: Co-Learning Among Faculty and Trainees in the Clinical Workplace
Haddock, Lindsey
Rivera, Josette
O'Brien, Bridget C.
ACADEMIC MEDICINE, 2023, 98 (02) : 228 - 236
[43] Active Learning Strategies for Multi-Label Text Classification
Esuli, Andrea
Sebastiani, Fabrizio
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 102 - +
[44] Co-learning binary classifiers for LP-based multi-label classification
Shan, Jincheng
Hou, Chenping
Tao, Hong
Zhuge, Wenzhang
Yi, Dongyun
COGNITIVE SYSTEMS RESEARCH, 2019, 55 : 146 - 152
[45] Children and sexting: The case for intergenerational co-learning
Lee, Nick
Hewett, Angela
Jorgensen, Clara Rubner
Turner, Jerome
Wade, Alex
Weckesser, Annalise
CHILDHOOD-A GLOBAL JOURNAL OF CHILD RESEARCH, 2018, 25 (03): : 385 - 399
[46] Scalable logo detection by self co-learning
Su, Hang
Gong, Shaogang
Zhu, Xiatian
PATTERN RECOGNITION, 2020, 97
[47] Co-learning Binary Classifiers for LP-Based Multi-label Classification
Shan, Jincheng
Hou, Chenping
Zhuge, Wenzhang
Yi, Dongyun
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 443 - 453
[48] From e-learning to "co-learning": the role of virtual communities
Colazzo, Luigi
Molinari, Andrea
Villa, Nicola
LEARNING TO LIVE IN THE KNOWLEDGE SOCIETY, 2008, : 329 - +
[49] Language development among co-learning agents
Gyenes, Viktor
Lorincz, Andras
2007 IEEE 6TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2007, : 111 - 116
[50] Addressing the Technology Learning Divide Using Co-Learning with Familiarization Method
Saha, Anik
Rahman, Naimur
Ahmed, Nova
EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,

← 1 2 3 4 5 →