A Cost-sensitive Active Learning for Imbalance Data with Uncertainty and Diversity Combination

被引:5
|
作者
Dong, Huailong [1 ]
Zhu, Bowen [1 ]
Zhang, Jing [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei St, Nanjing 210094, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Active Learning; Imbalanced Learning; Cost-Sensitive Learning;
D O I
10.1145/3383972.3384002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The class distributions of real-world classification datasets are usually imbalanced because many applications, such as network intrusion detection, tumor classification, financial risk identification, etc., exhibit imbalance natures that positive examples are rare. When labeling such data to create training sets for supervised learning, too many examples belonging to the majority class will be labeled, which dramatically increase the labeling cost and usually is unnecessary, because balanced datasets are more suitable for inducing good learners. To deal with this problem, this paper proposes a novel cost-sensitive active learning algorithm that combines the uncertainty and diversity measures to select training examples for an unlabeled sample pool. We use the proportions of the majority and the minority against the whole examples in the training dataset as the weights of the majority class and the minority class, respectively. With the class weights, the minor examples can obtain more emphasis when building learning models. Experimental results show that our proposed method can significantly reduce the label cost while improving the performance of learning models.
引用
收藏
页码:218 / 224
页数:7
相关论文
共 50 条
  • [21] Cost-sensitive active learning through statistical methods
    Wang, Min
    Lin, Yao
    Min, Fan
    Liu, Dun
    INFORMATION SCIENCES, 2019, 501 : 460 - 482
  • [22] Annotation cost-sensitive active learning by tree sampling
    Yu-Lin Tsou
    Hsuan-Tien Lin
    Machine Learning, 2019, 108 : 785 - 807
  • [23] Training Cost-sensitive Deep Belief Networks on Imbalance Data Problems
    Zhang, Chong
    Tan, Kay Chen
    Ren, Ruoxu
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4362 - 4367
  • [24] RUE: A robust personalized cost assignment strategy for class imbalance cost-sensitive learning
    Zhou, Shanlin
    Gu, Yan
    Yu, Hualong
    Yang, Xibei
    Gao, Shang
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (04) : 36 - 49
  • [25] A cost-sensitive semi-supervised learning modelbased on uncertainty
    Zhu, Hongyu
    Wang, Xizhao
    NEUROCOMPUTING, 2017, 251 : 106 - 114
  • [26] Cost-sensitive hierarchical classification for imbalance classes
    Zheng, Weijie
    Zhao, Hong
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2328 - 2338
  • [27] Cost-sensitive hierarchical classification for imbalance classes
    Weijie Zheng
    Hong Zhao
    Applied Intelligence, 2020, 50 : 2328 - 2338
  • [28] Cost-sensitive learning for imbalanced medical data: a review
    Araf, Imane
    Idri, Ali
    Chairi, Ikram
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [29] On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling
    Krawczyk, Bartosz
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 180 - 191
  • [30] Efficient Utilization of Missing Data in Cost-Sensitive Learning
    Zhu, Xiaofeng
    Yang, Jianye
    Zhang, Chengyuan
    Zhang, Shichao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2425 - 2436