Active Sampling for Class Probability Estimation and Ranking

被引：0

作者：

Maytal Saar-Tsechansky

Foster Provost

机构：

[1] Red McCombs School of Business,Department of Management Science and Information Systems

[2] The University of Texas at Austin,Department of Information Operations & Management Sciences

[3] Leonard N. Stern School of Business,undefined

[4] New York University,undefined

来源：

Machine Learning | 2004年 / 54卷

关键词：

active learning; cost-sensitive learning; class probability estimation; ranking; supervised learning; decision trees; uncertainty sampling; selective sampling;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.

引用

页码：153 / 178

页数：25

共 50 条

[1] Active sampling for class probability estimation and ranking
Saar-Tsechansky, M
Provost, F
MACHINE LEARNING, 2004, 54 (02) : 153 - 178
[2] Failure probability estimation of a class of series systems by multidomain Line Sampling
Valdebenito, Marcos A.
Wei, Pengfei
Song, Jingwen
Beer, Michael
Broggi, Matteo
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2021, 213 (213)
[3] Variance estimation for unequal probability sampling
Guohua Zou
Metrika, 1999, 50 : 71 - 82
[4] Variance estimation for unequal probability sampling
Zou, GH
METRIKA, 1999, 50 (01) : 71 - 82
[5] Class probability estimation for medical studies
Simon, Richard
BIOMETRICAL JOURNAL, 2014, 56 (04) : 597 - 600
[6] IMPORTANCE SAMPLING FOR ESTIMATION OF FALSE ALARM PROBABILITY
NAWATHE, SP
RAO, BV
SIGNAL PROCESSING, 1984, 6 (01) : 37 - 44
[7] Variance estimation for measures of change in probability sampling
Berger, YG
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2004, 32 (04): : 451 - 467
[8] Priority and choice probability estimation by ranking, rating and combined data
Lipovetsky S.
Journal of Statistical Theory and Practice, 2007, 1 (2) : 265 - 278
[9] Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems
Gao, Ming
Hong, Xia
Chen, Sheng
Harris, Chris J.
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[10] A Class of Sampling Two Units with Probability Proportional to Size
Al-jararha, Jehad
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (08) : 1906 - 1916

← 1 2 3 4 5 →