Outlier detection for set-valued data based on rough set theory and granular computing

被引:6
|
作者
Lin, Hai [1 ]
Li, Zhaowen [2 ]
机构
[1] Guangxi Univ, Coll Math & Informat Sci, Nanning, Guangxi, Peoples R China
[2] Yulin Normal Univ, Key Lab Complex Syst Optimizat & Big Data Proc, Dept Guangxi Educ, Yulin, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
RST; GrC; SVIS; outlier detection; outlier factor; INFORMATION GRANULATION; ATTRIBUTE REDUCTION; FUZZY; ALGORITHMS;
D O I
10.1080/03081079.2022.2132491
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Outlier detection has been broadly used in industrial practices such as public security and fraud detection, etc. Outlier detection from various perspectives against different backgrounds has been proposed. However, most of outlier detection consider categorical or numerical data. There are few researches on outlier detection for set-valued data, and a set-valued information system (SVIS) is a proper way of tackling the problem of missing values in data sets. This paper investigates outlier detection for set-valued data based on rough set theory (RST) and granular computing (GrC). First, the similarity between two information values in an SVIS is introduced and a variable parameter to control the similarity is given. Then, the tolerance relations on the object set are defined, and based on this tolerance relation, theta-lower and theta-upper approximations in an SVIS are put forward. Next, the outlier factor in an SVIS is presented and applied to various data sets. Finally, outlier detection method for set-valued data based on RST and GrC is proposed, and the corresponding algorithms are designed. Through numerical experiments based on UCI, the designed algorithm is compared with six other detection algorithms. The experimental results show the designed algorithm is arguably the best choice under the context of an SVIS. It is worth mentioning that for a comprehensive comparison, we use two criteria: AUC value and F-1 measure, to show the superiority of the designed algorithm.
引用
收藏
页码:385 / 413
页数:29
相关论文
共 50 条
  • [41] Outlier detection using conditional information entropy and rough set theory
    Li, Zhaowen
    Wei, Shengxue
    Liu, Suping
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 1899 - 1918
  • [42] An outlier detection algorithm based on information entropy and rough set
    Li, Hui
    Zhang, Shu
    Wang, Xia
    International Journal of Digital Content Technology and its Applications, 2012, 6 (20) : 97 - 106
  • [43] Research on granular computing approach in rough set
    Dai, Jin
    Hu, Feng
    Yan, Yi
    International Journal of Signal Processing, Image Processing and Pattern Recognition, 2014, 7 (06) : 85 - 94
  • [44] A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems
    Shivani Singh
    Shivam Shreevastava
    Tanmoy Som
    Gaurav Somani
    Soft Computing, 2020, 24 : 4675 - 4691
  • [45] A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems
    Singh, Shivani
    Shreevastava, Shivam
    Som, Tanmoy
    Somani, Gaurav
    SOFT COMPUTING, 2020, 24 (06) : 4675 - 4691
  • [46] Parameter Estimation Based on Set-valued Signals:Theory and Application
    Ting WANG
    Hang ZHANG
    Yan-long ZHAO
    Acta Mathematicae Applicatae Sinica, 2019, 35 (02) : 255 - 263
  • [47] A CALCULUS FOR SET-VALUED MAPS AND SET-VALUED EVOLUTION-EQUATIONS
    ARTSTEIN, Z
    SET-VALUED ANALYSIS, 1995, 3 (03): : 213 - 261
  • [48] Parameter Estimation Based on Set-valued Signals: Theory and Application
    Wang, Ting
    Zhang, Hang
    Zhao, Yan-long
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2019, 35 (02): : 255 - 263
  • [49] Parameter Estimation Based on Set-valued Signals: Theory and Application
    Ting Wang
    Hang Zhang
    Yan-long Zhao
    Acta Mathematicae Applicatae Sinica, English Series, 2019, 35 : 255 - 263
  • [50] Selection Properties and Set-Valued Young Integrals of Set-Valued Functions
    Mariusz Michta
    Jerzy Motyl
    Results in Mathematics, 2020, 75