Outliers Detection for Pareto Distributed Data

被引:2
|
作者
Safari, M. A. Mohd [1 ]
Masseran, N. [1 ]
Ibrahim, K. [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Sch Math Sci, Bangi 43600, Selangor, Malaysia
来源
关键词
BOXPLOT; TAIL;
D O I
10.1063/1.5028034
中图分类号
O59 [应用物理学];
学科分类号
摘要
This study aims to examine the presence of outliers in the upper tail of Malaysian income distribution under the assumption that the data follow Pareto model. For this purpose, three types of boxplot: standard boxplot, adjusted boxplot and generalized boxplot are considered. The performance of these boxplots is determined by a simulation study. In this study, the data were simulated from Pareto distribution, P(1, alpha = 2, 3, 4), then the simulated data were contaminated by replacing a proportion epsilon (3%, 5%, 10%) of randomly selected data. It is found that the generalized boxplot gives higher power value compared to the standard and adjusted boxplots. Therefore, the generalized boxplot was used for determining the presence of outliers in the upper tail of income distribution, while the threshold for Pareto tail modelling was determined by using Van Kerm's formula. The results showed that 0.4%, 0.4%, 0.9% and 1.2% outliers were detected by the generalized boxplot in the household income data that exceeded the threshold for the years of 2007, 2009, 2012 and 2014.
引用
下载
收藏
页数:6
相关论文
共 50 条
  • [1] ON THE DETECTION OF MULTIVARIATE DATA OUTLIERS AND REGRESSION OUTLIERS
    LAZRAQ, A
    CLEROUX, R
    DATA ANALYSIS, LEARNING SYMBOLIC AND NUMERIC KNOWLEDGE, 1989, : 133 - 140
  • [2] Detection of distributed targets in Pareto clutter
    Nouar, Nabila
    Farrouki, Atef
    2017 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING - BOUMERDES (ICEE-B), 2017,
  • [3] DETECTION OF OUTLIERS IN FAMILIAL DATA
    BHANDARY, M
    BANSAL, NK
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (09) : 2669 - 2685
  • [4] Are your data really Pareto distributed?
    Cirillo, Pasquale
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2013, 392 (23) : 5947 - 5962
  • [5] On detecting outliers in the Pareto distribution
    Nooghabi, Mehdi Jabbari
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (08) : 1466 - 1481
  • [6] Detection of outliers in meteorological observation data
    Takahashi, Gendai
    Suzuki, Tomomichi
    Kawamura, Hironobu
    Journal of Quality, 2011, 18 (05): : 393 - 405
  • [7] Visualizing Big Data Outliers through Distributed Aggregation
    Wilkinson, Leland
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 256 - 266
  • [8] Distributed Strategies for Mining Outliers in Large Data Sets
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1520 - 1532
  • [9] On entropy of a Pareto distribution in the presence of outliers
    Nooghabi, M. Jabbari
    Nooghabi, E. Khaleghpanah
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (17) : 5234 - 5250
  • [10] Coherent multilook detection for targets in Pareto distributed clutter
    Weinberg, G. V.
    ELECTRONICS LETTERS, 2011, 47 (14) : 822 - U60