Similarity distance based approach for outlier detection by matrix calculation

被引:0
|
作者
Ye, Ou [1 ]
Li, Zhanli [1 ]
机构
[1] Xi'an University of Science and Technology, Xi'an, China
基金
中国国家自然科学基金;
关键词
Statistics - Matrix algebra - Data mining - Calculations - Data handling;
D O I
暂无
中图分类号
学科分类号
摘要
Purpose: In client information, string outliers need to be detected and cleaned. At present, many outlier detection algorithms only focus on the semantics of data, and ignore the structure, so it is difficult to ensure the accuracy of outlier detection. In order to address this issue, outlier detection method based on similarity distance is suggested in this paper. Methodology: We formulated the similarity calculation model of string data by combining with semantic and structure factors. According to the outlier detection theory in data cleansing, one-dimensional string data were projected to two-dimensional space and string outlier data were detected by using a new similarity measurement mechanism in the two-dimensional space. Findings: We first got the word frequency of string data by using the matrix calculation. Then the semantic similarity and structure similarity were calculated by using word frequency. After the string data mapping from one-dimensional to two-dimensional space, we obtained the outlier data by using the similarity distance. Originality: We made a study of string outlier detection in data cleansing. Firstly, we formulated the similarity calculation model by considering the semantic factor and structure factor. Secondly, by constructing the similarity cell to project the string data, we fulfilled the similarity distance measurement in the similarity cell. Practical value: The method can be used to clean the outlier string data in client information for any enterprise so that to ensure the data quality of client information, and reduce the costs of data maintenance. Extensive simulation experiments have been conducted to prove the feasibility and rationality of this method. The results showed that this method allows improving the accuracy of string outlier detection. © Ou Ye, Zhanli Li, 2016.
引用
收藏
页码:99 / 105
相关论文
共 50 条
  • [41] Parameter-free Streaming Distance-based Outlier Detection
    Giannoulidis, Apostolos
    Nikolaidis, Nikodimos
    Gounaris, Anastasios
    2024 IEEE 40TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, ICDEW, 2024, : 102 - 106
  • [42] Distance Based Joint Probability Density Estimation For Unsupervised Outlier Detection
    Rehman, Atiq Ur
    Belhaouari, Samir Brahim
    2021 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2021, : 256 - 261
  • [43] A distance-based method for outlier detection on high dimensional datasets
    Carmona, J.
    Lopez, I
    Mateo, J.
    Jimenez, L.
    Aldana, E.
    IEEE LATIN AMERICA TRANSACTIONS, 2020, 18 (03) : 589 - 597
  • [44] An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection
    Flexer, Arthur
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 716 - 723
  • [45] Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators
    Elisa Cabana
    Rosa E. Lillo
    Henry Laniado
    Statistical Papers, 2021, 62 : 1583 - 1609
  • [46] Adaptivity in continuous massively parallel distance-based outlier detection
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    COMPUTING, 2022, 104 (12) : 2659 - 2684
  • [47] Geodesic distance based approach for sentence similarity computation
    Ma, Hui-Fang
    He, Qing
    Shi, Zhong-Zhi
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2551 - 2557
  • [48] A hybrid approach to outlier detection based on boundary region
    Jiang, Feng
    Sui, Yuefei
    Cao, Cungen
    PATTERN RECOGNITION LETTERS, 2011, 32 (14) : 1860 - 1870
  • [49] Fuzzy Clustering-Based Approach for Outlier Detection
    Al-Zoubi, Moh'd Belal
    Ali, Al-Dahoud
    Yahya, Abdelfatah A.
    RECENT ADVANCES AND APPLICATIONS OF COMPUTER ENGINEERING: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE (ACE 10), 2010, : 192 - +
  • [50] A novel IGA-based approach for Outlier detection
    Zhang, Xueqin
    Qu, Zhaoxia
    Yang, Lancang
    Chen, Yuehui
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 509 - 512