Statistical outlier detection using direct density ratio estimation

被引:103
|
作者
Hido, Shohei [1 ,2 ]
Tsuboi, Yuta [1 ]
Kashima, Hisashi [1 ]
Sugiyama, Masashi [3 ,4 ]
Kanamori, Takafumi [5 ]
机构
[1] IBM Res Tokyo, Kanagawa, Japan
[2] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Kyoto, Japan
[3] Tokyo Inst Technol, Dept Comp Sci, Grad Sch Informat Sci & Engn, Tokyo 152, Japan
[4] Japan Sci & Technol Agcy, PRESTO, Kawaguchi, Saitama, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Dept Comp Sci & Math Informat, Nagoya, Aichi 4648601, Japan
关键词
Outlier detection; Density ratio; Importance; Unconstrained least-squares importance fitting (uLSIF); COVARIATE SHIFT; LEAST-SQUARES; SUPPORT; ALGORITHM;
D O I
10.1007/s10115-010-0283-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.
引用
收藏
页码:309 / 336
页数:28
相关论文
共 50 条
  • [1] Statistical outlier detection using direct density ratio estimation
    Shohei Hido
    Yuta Tsuboi
    Hisashi Kashima
    Masashi Sugiyama
    Takafumi Kanamori
    [J]. Knowledge and Information Systems, 2011, 26 : 309 - 336
  • [2] Direct Density Ratio Estimation with Convolutional Neural Networks with Application in Outlier Detection
    Nam, Hyunha
    Sugiyama, Masashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (05): : 1073 - 1079
  • [3] Inlier-based Outlier Detection via Direct Density Ratio Estimation
    Hido, Shohei
    Tsuboi, Yuta
    Kashima, Hisashi
    Sugiyama, Masashi
    Kanamori, Takafumi
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 223 - 232
  • [4] Online Direct Density-Ratio Estimation Applied to Inlier-Based Outlier Detection
    du Plessis, Marthinus Christoffel
    Shiino, Hiroaki
    Sugiyama, Masashi
    [J]. NEURAL COMPUTATION, 2015, 27 (09) : 1899 - 1914
  • [5] DENSITY ESTIMATION APPLICATIONS FOR OUTLIER DETECTION
    TARTER, ME
    [J]. COMPUTER PROGRAMS IN BIOMEDICINE, 1979, 10 (01): : 55 - 60
  • [6] Unsupervised Recycled FPGA Detection Based on Direct Density Ratio Estimation
    Isaka, Yuya
    Ahmed, Foisal
    Shintani, Michihiro
    Inoue, Michiko
    [J]. 2021 IEEE 27TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS), 2021,
  • [7] Salient object detection based on direct density-ratio estimation
    Yamanaka, Masao
    Matsugu, Masakazu
    Sugiyama, Masashi
    [J]. IPSJ Online Transactions, 2013, 6 (2013) : 96 - 103
  • [8] Efficient Multistream Classification using Direct Density Ratio Estimation
    Haque, Ahsanul
    Chandra, Swarup
    Khan, Latifur
    Hamlen, Kevin
    Aggarwal, Charu
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 155 - 158
  • [9] Nonparametric direct density ratio estimation using beta kernel
    Igarashi, Gaku
    [J]. STATISTICS, 2020, 54 (02) : 257 - 280
  • [10] Proposal of Online Outlier Detection in Sensor Data Using Kernel Density Estimation
    Haque, Md Atiqul
    Mineno, Hiroshi
    [J]. 2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 1051 - 1052