Scalable Kernel Density Classification via Threshold-Based Pruning

被引:18
|
作者
Gan, Edward [1 ]
Bailis, Peter [1 ]
机构
[1] Stanford InfoLab, Stanford, CA 94305 USA
关键词
NONPARAMETRIC-ESTIMATION; SELECTION;
D O I
10.1145/3035918.3064035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Density estimation forms a critical component of many analytics tasks including outlier detection, visualization, and statistical testing. These tasks often seek to classify data into high and low-density regions of a probability distribution. Kernel Density Estimation (KDE) is a powerful technique for computing these densities, offering excellent statistical accuracy but quadratic total runtime. In this paper, we introduce a simple technique for improving the performance of using a KDE to classify points by their density (density classification). Our technique, thresholded kernel density classification (tKDC), applies threshold-based pruning to spatial index traversal to achieve asymptotic speedups over naive KDE, while maintaining accuracy guarantees. Instead of exactly computing each point's exact density for use in classification, tKDC iteratively computes density bounds and short-circuits density computation as soon as bounds are either higher or lower than the target classification threshold. On a wide range of dataset sizes and dimensions, tKDC demonstrates empirical speedups of up to 1000x over alternatives.
引用
收藏
页码:945 / 959
页数:15
相关论文
共 50 条
  • [1] Pruning Optimization over Threshold-Based Historical Continuous Query
    Qin, Jiwei
    Ma, Liangli
    Liu, Qing
    [J]. ALGORITHMS, 2019, 12 (05)
  • [2] Sparse channel estimation in OFDM systems by threshold-based pruning
    Oliver, J.
    Aravind, R.
    Prabhu, K. M. M.
    [J]. ELECTRONICS LETTERS, 2008, 44 (13) : 830 - +
  • [3] An Efficient Protocol for RFID Multigroup Threshold-based Classification
    Luo, Wen
    Qiao, Yan
    Chen, Shigang
    [J]. 2013 PROCEEDINGS IEEE INFOCOM, 2013, : 890 - 898
  • [4] Supervised threshold-based heart sound classification algorithm
    Han, Wei
    Yang, Zuyuan
    Lu, Jun
    Xie, Shengli
    [J]. PHYSIOLOGICAL MEASUREMENT, 2018, 39 (11)
  • [5] Threshold-based declustering
    Tosun, Ali Saman
    [J]. INFORMATION SCIENCES, 2007, 177 (05) : 1309 - 1331
  • [6] Sentiment Classification Based On Syntax Tree Pruning and Tree Kernel
    Zhang, Wei
    Li, Peifeng
    Zhu, Qiaoming
    [J]. 11TH CHINESE LEXICAL SEMANTICS WORKSHOP (CKSW2010), 2010, : 432 - 439
  • [7] Towards Optimally Efficient Field Estimation with Threshold-Based Pruning in Real Robotic Sensor Networks
    Prorok, Amanda
    Cianci, Christopher M.
    Martinoli, Alcherio
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 5453 - 5459
  • [8] Evaluation of multiclass support vector machine classifiers using optimum threshold-based pruning technique
    Manikandan, J.
    Venkataramani, B.
    [J]. IET SIGNAL PROCESSING, 2011, 5 (05) : 506 - 513
  • [9] Adaptive threshold-based classification of sparse high-dimensional data
    Pavlenko, Tatjana
    Stepanova, Natalia
    Thompson, Lee
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 1952 - 1996
  • [10] Nonprofit classification decisions in response to threshold-based charity care incentives
    Lamboy-Ruiz, Melvin A.
    Lien, Donald
    Smith, Pamela C.
    [J]. ADVANCES IN ACCOUNTING, 2021, 53