A Generalized Methodology for Data Analysis

被引:47
|
作者
Angelov, Plamen P. [1 ]
Gu, Xiaowei [1 ]
Principe, Jose C. [2 ]
机构
[1] Univ Lancaster, Sch Comp & Commun, Lancaster LA1 4WA, England
[2] Univ Florida, Computat NeuroEngn Lab, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
关键词
Data mining and analysis; machine learning; pattern recognition; probability; statistics; ASSOCIATION; CENTRALITY;
D O I
10.1109/TCYB.2017.2753880
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Based on a critical analysis of data analytics and its foundations, we propose a functional approach to estimate data ensemble properties, which is based entirely on the empirical observations of discrete data samples and the relative proximity of these points in the data space and hence named empirical data analysis (EDA). The ensemble functions include the nonparametric square centrality (a measure of closeness used in graph theory) and typicality (an empirically derived quantity which resembles probability). A distinctive feature of the proposed new functional approach to data analysis is that it does not assume randomness or determinism of the empirically observed data, nor independence. The typicality is derived from the discrete data directly in contrast to the traditional approach, where a continuous probability density function is assumed a priori. The typicality is expressed in a closed analytical form that can be calculated recursively and, thus, is computationally very efficient. The proposed nonparametric estimators of the ensemble properties of the data can also be interpreted as a discrete form of the information potential (known from the information theoretic learning theory as well as the Parzen windows). Therefore, EDA is very suitable for the current move to a data-rich environment, where the understanding of the underlying phenomena behind the available vast amounts of data is often not clear. We also present an extension of EDA for inference. The areas of applications of the new methodology of the EDA are wide because it concerns the very foundation of data analysis. Preliminary tests show its good performance in comparison to traditional techniques.
引用
收藏
页码:2981 / 2993
页数:13
相关论文
共 50 条
  • [31] Data analysis with generalized linear models on lung cancer data
    Tang, G.
    VALUE IN HEALTH, 2008, 11 (03) : A55 - A55
  • [32] Eco-innovation analysis: A data envelopment analysis methodology
    Arman, H.
    Jamshidi, A.
    Hadi-Vencheh, A.
    ENVIRONMENTAL TECHNOLOGY & INNOVATION, 2021, 23
  • [33] Development of the quantitative generalized information network analysis methodology for satellite systems
    Shaw, GB
    Miller, DW
    Hastings, DE
    JOURNAL OF SPACECRAFT AND ROCKETS, 2001, 38 (02) : 257 - 269
  • [34] From Spheres to Spheropolyhedra: Generalized Distinct Element Methodology and Algorithm Analysis
    Pournin, Lionel
    Liebling, Thomas M.
    RESEARCH TRENDS IN COMBINATORIAL OPTIMIZATION, 2009, : 347 - 363
  • [35] Generalized Cardioid Distributions for Circular Data Analysis
    Paula, Fernanda, V
    Nascimento, Abraao D. C.
    Amaral, Getulio J. A.
    Cordeiro, Gauss M.
    STATS, 2021, 4 (03): : 634 - 649
  • [36] A GENERALIZED NONITERATIVE APPROACH TO THE ANALYSIS OF FAMILY DATA
    ELIASZIW, M
    DONNER, A
    ANNALS OF HUMAN GENETICS, 1991, 55 : 77 - 90
  • [37] ANALYSIS OF REGISTRY DATA BY GENERALIZED LINEAR MODELING
    LEE, J
    LEE, HP
    CHIA, KS
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 1990, 19 (02) : 472 - 473
  • [38] Functional data analysis of generalized regression quantiles
    Mengmeng Guo
    Lan Zhou
    Jianhua Z. Huang
    Wolfgang Karl Härdle
    Statistics and Computing, 2015, 25 : 189 - 202
  • [39] Generalized fuzzy data envelopment analysis methods
    Muren
    Ma, Zhanxin
    Cui, Wei
    APPLIED SOFT COMPUTING, 2014, 19 : 215 - 225
  • [40] Generalized canonical correlation analysis for labeled data
    Sakamoto, Kenta
    Okabe, Masaaki
    Yadoshisa, Hiroshi
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 517 - 525