Outliers Detection for Pareto Distributed Data

被引:2
|
作者
Safari, M. A. Mohd [1 ]
Masseran, N. [1 ]
Ibrahim, K. [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Sch Math Sci, Bangi 43600, Selangor, Malaysia
来源
关键词
BOXPLOT; TAIL;
D O I
10.1063/1.5028034
中图分类号
O59 [应用物理学];
学科分类号
摘要
This study aims to examine the presence of outliers in the upper tail of Malaysian income distribution under the assumption that the data follow Pareto model. For this purpose, three types of boxplot: standard boxplot, adjusted boxplot and generalized boxplot are considered. The performance of these boxplots is determined by a simulation study. In this study, the data were simulated from Pareto distribution, P(1, alpha = 2, 3, 4), then the simulated data were contaminated by replacing a proportion epsilon (3%, 5%, 10%) of randomly selected data. It is found that the generalized boxplot gives higher power value compared to the standard and adjusted boxplots. Therefore, the generalized boxplot was used for determining the presence of outliers in the upper tail of income distribution, while the threshold for Pareto tail modelling was determined by using Van Kerm's formula. The results showed that 0.4%, 0.4%, 0.9% and 1.2% outliers were detected by the generalized boxplot in the household income data that exceeded the threshold for the years of 2007, 2009, 2012 and 2014.
引用
下载
收藏
页数:6
相关论文
共 50 条
  • [41] Dynamic outliers data identifying and detection based on chaotic
    Wang, Jianzhou
    Ma, Zhixin
    Li, Lian
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2005, 45 (SUPPL.): : 1753 - 1756
  • [42] Fast Distributed k-Center Clustering with Outliers on Massive Data
    Malkomes, Gustavo
    Kusner, Matt J.
    Chen, Wenlin
    Weinberger, Kilian Q.
    Moseley, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [43] Finding outliers in distributed data streams based on kernel density estimation
    Yang, Yidong
    Sun, Zhihui
    Zhang, Jing
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2005, 42 (09): : 1498 - 1504
  • [44] Optimal threshold for Pareto tail modelling in the presence of outliers
    Safari, Muhammad Aslam Mohd
    Masseran, Nurulkamal
    Ibrahim, Kamarulzaman
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 509 : 169 - 180
  • [45] Detection of outliers
    Hadi, Ali S.
    Imon, A. H. M. Rahmatullah
    Werner, Mark
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2009, 1 (01): : 57 - 70
  • [46] Analysis and detection of outliers and systematic errors in industrial plant data
    Alves, Rita M. B.
    Nascimento, Claudio A. O.
    CHEMICAL ENGINEERING COMMUNICATIONS, 2007, 194 (03) : 382 - 397
  • [47] Data Mining for Intrusion Detection: From Outliers to True Intrusions
    Singh, Goverdhan
    Masseglia, Florent
    Fiot, Celine
    Marascu, Alice
    Poncelet, Pascal
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 891 - +
  • [48] On the pareto control and no-regret control for distributed systems with incomplete data
    Nakoulima, O
    Omrane, A
    Velin, J
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 1970 - +
  • [49] Detection and handling outliers in longitudinal data: wavelets decomposition as a solution
    Benghoul, Maroua
    Yazici, Berna
    Sezer, Ahmet
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (03) : 1472 - 1483
  • [50] An alternative approach to dimension reduction for pareto distributed data: a case study
    Marco Roccetti
    Giovanni Delnevo
    Luca Casini
    Silvia Mirri
    Journal of Big Data, 8