Approximating Learning Curves for Imbalanced Big Data with Limited Labels

被引:0
|
作者
Richter, Aaron N. [1 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
learning curve; semi-supervised learning; limited labels; big data; class imbalance;
D O I
10.1109/ICTAI.2019.00041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Labeling data for supervised learning can be an expensive task, especially when large amounts of data are required to build an adequate classifier. For most problems, there exists a point of diminishing returns on a learning curve where adding more data only marginally increases model performance. It would be beneficial to approximate this point for scenarios where there is a large amount of data available but only a small amount of labeled data. Then, time and resources can be spent wisely to label the sample that is required for acceptable model performance. In this study, we explore learning curve approximation methods on a big imbalanced dataset from the bioinformatics domain. We evaluate a curve fitting method developed on small data using an inverse power law model, and propose a new semi-supervised method to take advantage of the large amount of unlabeled data. We find that the traditional curve fitting method is not effective for large sample sizes, while the semi-supervised method more accurately identifies the point of diminishing returns.
引用
收藏
页码:237 / 242
页数:6
相关论文
共 50 条
  • [31] Learning Classifiers for Target Domain with Limited or No Labels
    Zhu, Pengkai
    Wang, Hanxiao
    Saligrama, Venkatesh
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [32] Large Scale Sentiment Learning with Limited Labels
    Iosifidis, Vasileios
    Ntoutsi, Eirini
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1823 - 1832
  • [33] Learning Graphs for Knowledge Transfer with Limited Labels
    Ghosh, Pallabi
    Saini, Nirat
    Davis, Larry S.
    Shrivastava, Abhinav
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11146 - 11156
  • [34] Limited Gradient Descent: Learning With Noisy Labels
    Sun, Yi
    Tian, Yan
    Xu, Yiping
    Li, Jianxiang
    [J]. IEEE ACCESS, 2019, 7 : 168296 - 168306
  • [35] Learning in Imbalanced Relational Data
    Ghanem, Amal S.
    Venkatesh, Svetha
    West, Geoff
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 436 - 439
  • [36] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284
  • [37] SAR Image Classification Using Contrastive Learning and Pseudo-Labels With Limited Data
    Wang, Chenchen
    Gu, Hong
    Su, Weimin
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [38] A reliable adaptive prototype-based learning for evolving data streams with limited labels
    Din, Salah Ud
    Ullah, Aman
    Mawuli, Cobbinah B.
    Yang, Qinli
    Shao, Junming
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [39] Data reduction techniques for highly imbalanced medicare Big Data
    Hancock, John T.
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Liang, Qianxin
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [40] Data reduction techniques for highly imbalanced medicare Big Data
    John T. Hancock
    Huanjing Wang
    Taghi M. Khoshgoftaar
    Qianxin Liang
    [J]. Journal of Big Data, 11