Typicality Distribution Function - A New Density-based Data Analytics Tool

被引:0
|
作者
Angelov, Plamen [1 ,2 ]
机构
[1] Univ Lancaster, Sch Comp & Commun, Data Sci Grp, Lancaster LA1 4WA, England
[2] Carlos III Univ, Chair Excellence, Madrid, Spain
关键词
TEDA; typicality; eccentricity; data density; pdf; non-parametric data distributions;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a new density-based, non-frequentistic data analytics tool, called typicality distribution function (TDF) is proposed. It is a further development of the recently introduced typicality-and eccentricity-based data analytics (TEDA) framework. The newly introduced TDF and its standardized form offer an effective alternative to the widely used probability distribution function (pdf), however, remaining free from the restrictive assumptions made and required by the latter. In particular, it offers an exact solution for any (except a single point) amount of non-coinciding data samples. For a comparison, that the well developed and widely used traditional probability theory and related statistical learning approaches require (theoretically) an infinitely large amount of data samples/observations, although, in practice this requirement is often ignored. Furthermore, TDF does not require the user to pre-select or assume a particular distribution (e.g. Gaussian or other) or a mixture of such distributions or to pre-define the number of such distributions in a mixture. In addition, it does not require the individual data items to be independent. At the same time, the link with the traditional statistical approaches such as the well-known "no" analysis, Chebyshev inequality, etc. offers the interesting conclusion that without the restrictive prior assumptions listed above to which these traditional approaches are tied up the same type of analysis can be made using TDF automatically. TDF can provide valuable information for analysis of extreme processes, fault detection and identification were the amount of observations of extreme events or faults is usually disproportionally small. The newly proposed TDF offers a non-parametric, closed form analytical (quadratic) description extracted from the real data realizations exactly in contrast to the usual practice where such distributions are being pre-assumed or approximated. For example, so called particle tilters are also a non-parametric approximation of the traditional statistics; however, they suffer from computational complexity and introduce a large number of dummy data. In addition to that, for several types of proximity/similarity measures (such as Euclidean, Mahalonobis, cosine) it can be calculated recursively, thus, computationally very efficiently and is suitable for real time and online algorithms. Moreover, with a very simple example, it has been illustrated that while traditional probability theory and related statistical approaches can lead in some cases to paradoxically incorrect results and/or to the need for hard prior assumptions to be made. In contrast, the newly proposed TDF can offer a logically meaningful result and an intuitive interpretation automatically and exactly without any prior assumptions. Finally, few simple univariate examples are provided and the process of inference is discussed and the future steps of the development of TDF and TEDA are outlined. Since it is a new fundamental theoretical innovation the areas of applications of TDF and TEDA can span from anomaly detection, clustering, classification, prediction, control, regression to (Kalman-like) tilters. Practical applications can be even wider and, therefore, it is difficult to list all of them.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Evaluating Density-based Motion for Big Data Visual Analytics
    Etemadpour, Ronak
    Murray, Paul
    Forbes, Angus Graeme
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 451 - 460
  • [2] Density-based averaging - A new operator for data fusion
    Angelov, P.
    Yager, R.
    [J]. INFORMATION SCIENCES, 2013, 222 : 163 - 174
  • [3] Unsupervised Classification of Data Streams based on Typicality and Eccentricity Data Analytics
    Jales Costa, Bruno Sielly
    Bezerra, Clauber Gomes
    Guedes, Luiz Affonso
    Parvanov Angelov, Plamen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 58 - 63
  • [4] Online Fault Detection Based on Typicality and Eccentricity Data Analytics
    Jales Costa, Bruno Sielly
    Bezerra, Clauber Gomes
    Guedes, Luiz Affonso
    Angelov, Plamen Parvanov
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [5] Novel density-based and hierarchical density-based clustering algorithms for uncertain data
    Zhang, Xianchao
    Liu, Han
    Zhang, Xiaotong
    [J]. NEURAL NETWORKS, 2017, 93 : 240 - 255
  • [6] An evolving approach to data streams clustering based on typicality and eccentricity data analytics
    Bezerra, Clauber Gomes
    Jales Costa, Bruno Sielly
    Guedes, Luiz Affonso
    Angelov, Plamen Parvanov
    [J]. INFORMATION SCIENCES, 2020, 518 : 13 - 28
  • [7] Presence Analytics: Density-based Social Clustering for Mobile Users
    Eldaw, Muawya Habib Sarnoub
    Levene, Mark
    Roussos, George
    [J]. WINSYS: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS - VOL. 6, 2016, : 52 - 62
  • [8] Density-based multiscale data condensation
    Mitra, P
    Murthy, CA
    Pal, SK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (06) : 734 - 747
  • [9] A new density-based sampling algorithm
    Ros, Frederic
    Guillaume, Serge
    [J]. PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 145 - 151
  • [10] A Probability Density-Based Visual Analytics Approach to Forecast Bias Calibration
    Huang, Renpei
    Li, Quan
    Chen, Li
    Yuan, Xiaoru
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (04) : 1732 - 1744