Estimating the support of a high-dimensional distribution

被引:3596
|
作者
Schölkopf, B
Platt, JC
Shawe-Taylor, J
Smola, AJ
Williamson, RC
机构
[1] Microsoft Res Ltd, Cambridge CB2 3NH, England
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Univ London Royal Holloway & Bedford New Coll, Egham TW20 0EX, Surrey, England
[4] Australian Natl Univ, Dept Engn, Canberra, ACT 0200, Australia
关键词
D O I
10.1162/089976601750264965
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
引用
收藏
页码:1443 / 1471
页数:29
相关论文
共 50 条
  • [1] Estimating the mean and variance of a high-dimensional normal distribution using a mixture prior
    Sinha, Shyamalendu
    Hart, Jeffrey D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 138 : 201 - 221
  • [2] Estimating dependency and significance for high-dimensional data
    Siracusa, MR
    Tieu, K
    Ihler, AT
    Fisher, JW
    Willsky, AS
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1085 - 1088
  • [3] Estimating Functionals of the Out-of-Sample Error Distribution in High-Dimensional Ridge Regression
    Patil, Pratik
    Rinaldo, Alessandro
    Tibshirani, Ryan J.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 6087 - 6120
  • [4] Estimating the Number of Clusters in High-Dimensional Large Datasets
    Zhu, Xutong
    Li, Lingli
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2023, 19 (02)
  • [5] Estimating the error variance in a high-dimensional linear model
    Yu, Guo
    Bien, Jacob
    BIOMETRIKA, 2019, 106 (03) : 533 - 546
  • [6] Estimating classification probabilities in high-dimensional diagnostic studies
    Appel, Inka J.
    Gronwald, Wolfram
    Spang, Rainer
    BIOINFORMATICS, 2011, 27 (18) : 2563 - 2570
  • [7] Estimating the effect of a variable in a high-dimensional linear model
    Jensen, Peter S.
    Wurtz, Allan H.
    ECONOMETRICS JOURNAL, 2012, 15 (02): : 325 - 357
  • [8] Asymptotic Height Distribution in High-Dimensional Sandpiles
    Jarai, Antal A.
    Sun, Minwei
    JOURNAL OF THEORETICAL PROBABILITY, 2021, 34 (01) : 349 - 362
  • [9] On rank distribution classifiers for high-dimensional data
    Samuel Makinde, Olusola
    JOURNAL OF APPLIED STATISTICS, 2020, 47 (13-15) : 2895 - 2911
  • [10] Semiparametric estimation of the high-dimensional elliptical distribution
    Liebscher, Eckhard
    Okhrin, Ostap
    JOURNAL OF MULTIVARIATE ANALYSIS, 2023, 195