Approximating Learning Curves for Imbalanced Big Data with Limited Labels

被引：0

作者：

Richter, Aaron N. ^{[1
]}

Khoshgoftaar, Taghi M. ^{[1
]}

机构：

[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA

来源：

2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019) | 2019年

关键词：

learning curve; semi-supervised learning; limited labels; big data; class imbalance;

D O I：

10.1109/ICTAI.2019.00041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Labeling data for supervised learning can be an expensive task, especially when large amounts of data are required to build an adequate classifier. For most problems, there exists a point of diminishing returns on a learning curve where adding more data only marginally increases model performance. It would be beneficial to approximate this point for scenarios where there is a large amount of data available but only a small amount of labeled data. Then, time and resources can be spent wisely to label the sample that is required for acceptable model performance. In this study, we explore learning curve approximation methods on a big imbalanced dataset from the bioinformatics domain. We evaluate a curve fitting method developed on small data using an inverse power law model, and propose a new semi-supervised method to take advantage of the large amount of unlabeled data. We find that the traditional curve fitting method is not effective for large sample sizes, while the semi-supervised method more accurately identifies the point of diminishing returns.

引用

页码：237 / 242

页数：6

共 50 条

[21] Incremental label propagation for data sets with imbalanced labels
Li, Yaoxing
Bai, Liang
Liang, Zhuomin
Du, Hangyuan
[J]. NEUROCOMPUTING, 2023, 535 : 144 - 155
[22] Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning
Robert K. L. Kennedy
Zahra Salekshahrezaee
Flavio Villanustre
Taghi M. Khoshgoftaar
[J]. Journal of Big Data, 10
[23] Smartwatch-Based Eating Detection: Data Selection for Machine Learning from Imbalanced Data with Imperfect Labels
Stankoski, Simon
Jordan, Marko
Gjoreski, Hristijan
Lustrek, Mitja
[J]. SENSORS, 2021, 21 (05) : 1 - 25
[24] Transfer Learning for Optical and SAR Data Correspondence Identification With Limited Training Labels
Zhang, Mengmeng
Li, Wei
Tao, Ran
Wang, Song
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 1545 - 1557
[25] Approximating Learning Curves for Active-Learning-Driven Annotation
Tomanek, Katrin
Hahn, Udo
[J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1319 - 1324
[26] Debiased Learning from Naturally Imbalanced Pseudo-Labels
Wang, Xudong
Wu, Zhirong
Lian, Long
Yu, Stella X.
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14627 - 14637
[27] Rethinking the Value of Labels for Improving Class-Imbalanced Learning
Yang, Yuzhe
Xu, Zhi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[28] The Regression Learning of the Imbalanced and Big Data by the Online Mixture Model for the Mach Number Forecasting
Wang, Xiao-Jun
Liu, Yan
Yuan, Ping
Zhou, Chang-Jun
Zhang, Lin
[J]. IEEE ACCESS, 2019, 7 : 7368 - 7380
[29] Evolutionary Undersampling for Imbalanced Big Data Classification
Triguero, I.
Galar, M.
Vluymans, S.
Cornelis, C.
Bustince, H.
Herrera, F.
Saeys, Y.
[J]. 2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
[30] Dealing with difficult minority labels in imbalanced mutilabel data sets
Charte, Francisco
Rivera, Antonio J.
del Jesus, Maria J.
Herrera, Francisco
[J]. NEUROCOMPUTING, 2019, 326 : 39 - 53

← 1 2 3 4 5 →