Neighborhood relevant outlier detection approach based on information entropy

被引:4
|
作者
Yu, Qingying [1 ,2 ]
Luo, Yonglong [1 ,2 ]
Chen, Chuanming [2 ]
Bian, Weixin [2 ]
机构
[1] Anhui Normal Univ, Sch Territorial Resources & Tourism, 189 South Rd Jiuhua Rd, Wuhu 241003, Anhui, Peoples R China
[2] Anhui Normal Univ, Sch Math & Comp Sci, Wuhu, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier detection; information entropy; attribute weights; pruning; k-nearest neighborhood relevant outlier factor (kNNROF); DISTANCE-BASED OUTLIERS;
D O I
10.3233/IDA-150301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is an interesting issue in data mining and machine learning. In this paper, to detect outliers, an information-entropy-based k-nearest neighborhood relevant outlier factor algorithm is proposed that is combined with Shannon information theory and the triangle pruning strategy. The algorithm accounts for the data points whose k-nearest neighbors are distributed on the edge of the range within the designated radius. In particular, the neighborhood influence on each point is considered to address the problem of information concealment and submergence. Information entropy is used to calculate the weights to distinguish the importance of each attribute. Then, based on the attribute weights, the improved pruning strategy reduces the computational complexity of the subsequent procedures by removing some inliers and obtaining the outlier candidate dataset. Finally, according to the weighted distance between the objects in the candidate dataset and those in the original dataset, the algorithm calculates the dissimilarity between each object and its k-nearest neighbors. The data points with the top r dissimilarity are regarded as the outliers. Experimental results show that, compared
引用
收藏
页码:1247 / 1265
页数:19
相关论文
共 50 条
  • [1] An information entropy-based approach to outlier detection in rough sets
    Jiang, Feng
    Sui, Yuefei
    Cao, Cungen
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (09) : 6338 - 6344
  • [2] Outlier Detection of Mixed Data Based on Neighborhood Combinatorial Entropy
    Wang, Lina
    Zhang, Qixiang
    Niu, Xiling
    Ren, Yongjun
    Xia, Jinyue
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 1765 - 1781
  • [3] An rough entropy based approach to outlier detection
    Li, Xiangjun
    Rao, Fen
    Journal of Computational Information Systems, 2012, 8 (24): : 10501 - 10508
  • [4] Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures
    Yuan, Zhong
    Zhang, Xianyong
    Feng, Shan
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 112 : 243 - 257
  • [5] Local outlier detection based on information entropy weighting
    Wang, Lina
    Feng, Chao
    Ren, Yongjun
    Xia, Jinyue
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2019, 30 (04) : 207 - 217
  • [6] Fuzzy information entropy-based adaptive approach for hybrid feature outlier detection
    Yuan, Zhong
    Chen, Hongmei
    Li, Tianrui
    Liu, Jia
    Wang, Shu
    FUZZY SETS AND SYSTEMS, 2021, 421 : 1 - 28
  • [7] An outlier detection algorithm based on information entropy and rough set
    Li, Hui
    Zhang, Shu
    Wang, Xia
    International Journal of Digital Content Technology and its Applications, 2012, 6 (20) : 97 - 106
  • [8] An outlier recognition approach in surveying data based on information entropy
    史玉峰
    靳奉祥
    JournalofCoalScience&Engineering(China), 2003, (01) : 100 - 103
  • [9] Outlier detection based on neighborhood chain
    Liang S.-Y.
    Han D.-Q.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (07): : 1433 - 1440
  • [10] Multigranulation Relative Entropy-Based Mixed Attribute Outlier Detection in Neighborhood Systems
    Yuan, Zhong
    Chen, Hongmei
    Li, Tianrui
    Zhang, Xianyong
    Sang, Binbin
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (08): : 5175 - 5187