A novel algorithm for outlier removal based on density

被引:0
|
作者
Wang Y. [1 ]
机构
[1] Beijing Key Laboratory of Transportation Engineering, Beijing University of Technology
来源
关键词
Density estimation; Expectation; Outlier detection; Outlier removal; Travel speed;
D O I
10.3724/SP.J.1004.2010.00343
中图分类号
学科分类号
摘要
Due to the limitation of the present techniques and facilities for data collection and various interferences, the data obtained are often distorted and noised, directly influencing the result of subsequent data analysis. The conventional approaches to outlier removal either assume that the data follow a certain known distribution or deal with the data that are from a single distribution, resulting in a reduced credibility of the data processed. This paper proposes a novel method to remove outliers based on density estimation and it has been applied to real-world traffic data. By comparison with the conventional approach, the experimental results indicate that the proposed algorithm is capable of detecting and removing outliers effectively for the data that may follow different unknown distributions, and the processed data retain the original and significant characteristics possessed by the system. Copyright © 2010 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:343 / 346
页数:3
相关论文
共 10 条
  • [1] Taylor J.R., An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (Second Edition), pp. 120-187, (1997)
  • [2] Bevington P.R., Robinson D.K., Data Reduction and Error Analysis for the Physical Sciences (Third Edition), (2002)
  • [3] Rousseeuw P.J., Ruts I., Tukey J.W., The bagplot: A bivariate boxplot, The American Statistician, 53, 4, pp. 382-387, (1999)
  • [4] Fornasini P., The Uncertainty in Physical Measurements: An Introduction to Data Analysis in the Physics Laboratory, (2008)
  • [5] He Z.Y., Xu X.F., Deng S.C., Discovering cluster-based local outliers, Pattern Recognition Letters, 24, 9-10, pp. 1641-1650, (2003)
  • [6] Fu L., Medico E., FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data, BMC Bioinformatics, 8, 3, pp. 1-15, (2007)
  • [7] Ferrer-i-Cancho R., The Euclidean distance between syntactically linked words, Physical Review E, 70, 5, pp. 1-5, (2004)
  • [8] Krause E.F., Taxicab Geometry, pp. 63-89, (1987)
  • [9] Huang W., Shi Y.Y., Zhang S.Y., Zhu Y.F., The communication complexity of the Hamming distance problem, Information Processing Letters, 99, 4, pp. 149-153, (2006)
  • [10] Li Y.J., Liu B., A normalized Levenshtein distance metric, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 6, pp. 1091-1095, (2007)