Active Sampling for Class Probability Estimation and Ranking

被引:0
|
作者
Maytal Saar-Tsechansky
Foster Provost
机构
[1] Red McCombs School of Business,Department of Management Science and Information Systems
[2] The University of Texas at Austin,Department of Information Operations & Management Sciences
[3] Leonard N. Stern School of Business,undefined
[4] New York University,undefined
来源
Machine Learning | 2004年 / 54卷
关键词
active learning; cost-sensitive learning; class probability estimation; ranking; supervised learning; decision trees; uncertainty sampling; selective sampling;
D O I
暂无
中图分类号
学科分类号
摘要
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.
引用
收藏
页码:153 / 178
页数:25
相关论文
共 50 条
  • [1] Active sampling for class probability estimation and ranking
    Saar-Tsechansky, M
    Provost, F
    MACHINE LEARNING, 2004, 54 (02) : 153 - 178
  • [2] Failure probability estimation of a class of series systems by multidomain Line Sampling
    Valdebenito, Marcos A.
    Wei, Pengfei
    Song, Jingwen
    Beer, Michael
    Broggi, Matteo
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2021, 213 (213)
  • [3] Variance estimation for unequal probability sampling
    Guohua Zou
    Metrika, 1999, 50 : 71 - 82
  • [4] Variance estimation for unequal probability sampling
    Zou, GH
    METRIKA, 1999, 50 (01) : 71 - 82
  • [5] Class probability estimation for medical studies
    Simon, Richard
    BIOMETRICAL JOURNAL, 2014, 56 (04) : 597 - 600
  • [6] IMPORTANCE SAMPLING FOR ESTIMATION OF FALSE ALARM PROBABILITY
    NAWATHE, SP
    RAO, BV
    SIGNAL PROCESSING, 1984, 6 (01) : 37 - 44
  • [7] Variance estimation for measures of change in probability sampling
    Berger, YG
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2004, 32 (04): : 451 - 467
  • [8] Priority and choice probability estimation by ranking, rating and combined data
    Lipovetsky S.
    Journal of Statistical Theory and Practice, 2007, 1 (2) : 265 - 278
  • [9] Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems
    Gao, Ming
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [10] A Class of Sampling Two Units with Probability Proportional to Size
    Al-jararha, Jehad
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (08) : 1906 - 1916