Statistical outlier detection using direct density ratio estimation

被引:103
|
作者
Hido, Shohei [1 ,2 ]
Tsuboi, Yuta [1 ]
Kashima, Hisashi [1 ]
Sugiyama, Masashi [3 ,4 ]
Kanamori, Takafumi [5 ]
机构
[1] IBM Res Tokyo, Kanagawa, Japan
[2] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Kyoto, Japan
[3] Tokyo Inst Technol, Dept Comp Sci, Grad Sch Informat Sci & Engn, Tokyo 152, Japan
[4] Japan Sci & Technol Agcy, PRESTO, Kawaguchi, Saitama, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Dept Comp Sci & Math Informat, Nagoya, Aichi 4648601, Japan
关键词
Outlier detection; Density ratio; Importance; Unconstrained least-squares importance fitting (uLSIF); COVARIATE SHIFT; LEAST-SQUARES; SUPPORT; ALGORITHM;
D O I
10.1007/s10115-010-0283-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.
引用
收藏
页码:309 / 336
页数:28
相关论文
共 50 条
  • [31] Study on Statistical Outlier Detection and Labelling
    Pawe? D.Domański
    [J]. Machine Intelligence Research, 2020, 17 (06) : 788 - 811
  • [32] Study on Statistical Outlier Detection and Labelling
    Paweł D. Domański
    [J]. International Journal of Automation and Computing, 2020, 17 : 788 - 811
  • [33] Study on Statistical Outlier Detection and Labelling
    Domanski, Pawel D.
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2020, 17 (06) : 788 - 811
  • [34] STATISTICAL MODELLING FOR ENHANCED OUTLIER DETECTION
    Piotto, Nicola
    Cordara, Giovanni
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 4280 - 4284
  • [35] Outlier detection and robust covariance estimation using mathematical programming
    Tri-Dzung Nguyen
    Roy E. Welsch
    [J]. Advances in Data Analysis and Classification, 2010, 4 : 301 - 334
  • [36] Estimation of the Number of Endmembers Using Robust Outlier Detection Method
    Andreou, Charoula
    Karathanassi, Vassilia
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (01) : 247 - 256
  • [37] Using the joint estimation outlier detection method for quality control
    Wright, CM
    Booth, DE
    Hu, MY
    [J]. DECISION SCIENCES INSTITUTE, 1997 ANNUAL MEETING, PROCEEDINGS, VOLS 1-3, 1997, : 991 - 993
  • [38] Multidimensional outlier detection and robust estimation using Sn covariance
    Kunjunni, Sajana O.
    Abraham, Sajesh T.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (07) : 3912 - 3922
  • [39] Outlier detection and robust covariance estimation using mathematical programming
    Nguyen, Tri-Dzung
    Welsch, Roy E.
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2010, 4 (04) : 301 - 334
  • [40] Adversarial Density Ratio Estimation for Change Point Detection
    Shreyas, S.
    Comar, Prakash Mandayam
    Kaveri, Sivaramakrishnan
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4254 - 4258