Parallel algorithms for distance-based and density-based outliers

被引:20
|
作者
Lozano, E [1 ]
Acuña, E [1 ]
机构
[1] Univ Puerto Rico, Dept Math, Mayaguez, PR 00680 USA
关键词
D O I
10.1109/ICDM.2005.116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An outlier is an observation that deviates so much front other observations as to arouse suspicion that it was generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the dataset. In this paper we design two parallel algorithms, the first one is for finding out distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithin is for detecting density-based local outliers. In both cases data parallelism is used. We show that both algorithms reach near linear speedup. Our algorithms are tested on four real-world datasets coining front the Machine Learning Database Repository at the UCI.
引用
收藏
页码:729 / 732
页数:4
相关论文
共 50 条
  • [1] Distance-based outliers: algorithms and applications
    Knorr, EM
    Ng, RT
    Tucakov, V
    [J]. VLDB JOURNAL, 2000, 8 (3-4): : 237 - 253
  • [2] Distance-based outliers: algorithms and applications
    Edwin M. Knorr
    Raymond T. Ng
    Vladimir Tucakov
    [J]. The VLDB Journal, 2000, 8 : 237 - 253
  • [3] Research on algorithms for mining distance-based outliers
    Wang, LZ
    Zou, LK
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (03) : 485 - 490
  • [4] Distance-based outliers in sequences
    Palshikar, GK
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 547 - 552
  • [5] Reducing distance computations for distance-based outliers
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147
  • [6] Distance-based detection and prediction of outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 145 - 160
  • [7] Improving prediction of distance-based outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 89 - 100
  • [8] A Probabilistic Transformation of Distance-Based Outliers
    Muhr, David
    Affenzeller, Michael
    Kueng, Josef
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2023, 5 (03): : 782 - 802
  • [9] Efficient and flexible algorithms for monitoring distance-based outliers over data streams
    Kontaki, Maria
    Gounaris, Anastasios
    Papadopoulos, Apostolos N.
    Tsichlas, Kostas
    Manolopoulos, Yannis
    [J]. INFORMATION SYSTEMS, 2016, 55 : 37 - 53
  • [10] Improved Parallel Algorithms for Density-Based Network Clustering
    Ghaffari, Mohsen
    Lattanzi, Silvio
    Mitrovic, Slobodan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97