Outliers Detection for Pareto Distributed Data

被引:2
|
作者
Safari, M. A. Mohd [1 ]
Masseran, N. [1 ]
Ibrahim, K. [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Sch Math Sci, Bangi 43600, Selangor, Malaysia
来源
关键词
BOXPLOT; TAIL;
D O I
10.1063/1.5028034
中图分类号
O59 [应用物理学];
学科分类号
摘要
This study aims to examine the presence of outliers in the upper tail of Malaysian income distribution under the assumption that the data follow Pareto model. For this purpose, three types of boxplot: standard boxplot, adjusted boxplot and generalized boxplot are considered. The performance of these boxplots is determined by a simulation study. In this study, the data were simulated from Pareto distribution, P(1, alpha = 2, 3, 4), then the simulated data were contaminated by replacing a proportion epsilon (3%, 5%, 10%) of randomly selected data. It is found that the generalized boxplot gives higher power value compared to the standard and adjusted boxplots. Therefore, the generalized boxplot was used for determining the presence of outliers in the upper tail of income distribution, while the threshold for Pareto tail modelling was determined by using Van Kerm's formula. The results showed that 0.4%, 0.4%, 0.9% and 1.2% outliers were detected by the generalized boxplot in the household income data that exceeded the threshold for the years of 2007, 2009, 2012 and 2014.
引用
下载
收藏
页数:6
相关论文
共 50 条
  • [31] An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
    Sun, Hongwei
    Wang, Jiu
    Zhang, Zhongwen
    Hu, Naibao
    Wang, Tong
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021 (2021)
  • [32] Robust detection of multiple outliers in grouped multivariate data
    Caroni, Chrys
    Billor, Nedret
    JOURNAL OF APPLIED STATISTICS, 2007, 34 (10) : 1241 - 1250
  • [33] Automation of cleaning and ensembles for outliers detection in questionnaire data
    Uher, Vojtech
    Drazdilova, Pavla
    Platos, Jan
    Badura, Petr
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [34] A supervised approach for detection of outliers in healthcare claims data
    Jyothi, P Naga
    Lakshmi, D Rajya
    Rama Rao, K.V.S.N.
    Jyothi, P Naga (pbtjyothiraj.33@gmail.com), 1600, Eastern Macedonia and Thrace Institute of Technology (13): : 204 - 213
  • [35] DETECTION OF OUTLIERS IN BIVARIATE TIME-SERIES DATA
    KHATTREE, R
    NAIK, DN
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1987, 16 (12) : 3701 - 3714
  • [36] An FCR Approach Towards Detection of Outliers for Medical Data
    Iqbal, Sidra
    Ajmeri, Hafiz Bahloul
    Bibi, Sumaira
    Wahid, Abdul
    2020 IEEE 17TH INTERNATIONAL CONFERENCE ON SMART COMMUNITIES: IMPROVING QUALITY OF LIFE USING ICT, IOT AND AI (IEEEHONET 2020), 2020, : 224 - 230
  • [37] Automation of cleaning and ensembles for outliers detection in questionnaire data
    Uher, Vojtěch
    Dráždilová, Pavla
    Platoš, Jan
    Badura, Petr
    Expert Systems with Applications, 2022, 206
  • [38] A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (05) : 1017 - 1029
  • [39] Detection of outliers in spatial data by using local difference
    Zhang, SY
    Zhu, ZY
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON INTELLIGENT MECHATRONICS AND AUTOMATION, 2004, : 400 - 405
  • [40] Detection of outliers in longitudinal count data via overdispersion
    Gumedze, Freedom N.
    Chatora, Tinashe D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 79 : 192 - 202