Prototyping: Sample Selection for Imbalanced Data

被引:0
|
作者
Schwalb, Edward
机构
关键词
Supervised Learning; Imbalanced Data; Sample Selection;
D O I
10.1109/CSCI54926.2021.00109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of supervised learning, we are concerned with the task of identifying a subset of the example instances which maximize the predictive performance. In contrast to sampling, we do not generate new instances because we do not know how to reliably label them. We propose a simple and effective method with complexity that is linear in the size of the source data, and logarithmic in the size of the number of examples selected. We demonstrate empirically that very significant improvements are achievable on skewed data across a wide range of model types and data sets. In particular, we observe that the fraction achieving peak performance is proportional to the square root of the reciprocal of the skewness.
引用
收藏
页码:221 / 227
页数:7
相关论文
共 50 条
  • [1] Sample Selection based Active Learning for Imbalanced Data
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    [J]. 10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, : 645 - 651
  • [2] Learning With Imbalanced Noisy Data by Preventing Bias in Sample Selection
    Liu, Huafeng
    Sheng, Mengmeng
    Sun, Zeren
    Yao, Yazhou
    Hua, Xian-Sheng
    Shen, Heng-Tao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7426 - 7437
  • [3] Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions
    Kamal, Abu H. M.
    Zhu, Xingquan
    Narayanan, Ramaswamy
    [J]. 2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 3 - +
  • [4] Intrusion Detection Based Sample Selection For Imbalanced Data Distribution
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    [J]. 2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2012, : 259 - 264
  • [5] Learning from Imbalanced Data Using Methods of Sample Selection
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    [J]. 2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 256 - 259
  • [6] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    [J]. Annals of Data Science, 2023, 10 (06) : 1527 - 1541
  • [7] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    [J]. MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [8] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [9] Evolutionary feature selection for imbalanced data
    Tusell Rey, Claudia C.
    Salinas Garcia, Viridiana
    Villuendas-Rey, Yenny
    [J]. 2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [10] A Novel Sample Selection Strategy for Imbalanced Data of Biomedical Event Extraction with Joint Scoring Mechanism
    Lu, Yang
    Ma, Xiaolei
    Lu, Yinan
    Zhou, Yuxin
    Pei, Zhili
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2016, 2016