Principal component analysis for interval data

被引:16
|
作者
Billard, L. [1 ]
Le-Rademacher, J. [2 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
[2] Med Coll Wisconsin, Wauwatosa, WI USA
关键词
PCA; intervals; visualization;
D O I
10.1002/wics.1231
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis for classical data is a method used frequently to reduce the effective dimension underlying a data set from p random variables to s << p linear functions of those p random variables and their observed values. With contemporary large data sets, it is often the case that the data are aggregated in some meaningful scientific way such that the resulting data are symbolic data (such as lists, intervals, histograms, and the like); though symbolic data can and do occur naturally and in smaller data sets. Since symbolic data have internal variations along with the familiar (between observations) variation of classical data, direct application of classical methods to symbolic data will ignore much of the information contained in the data. Our focus is to describe and illustrate principal component methodology for interval data. The significance of symbolic data in general and of this article in particular is illustrated by its applicability for our analysis of three key 21st century challengers: networks, security data, and translational medicine. It is relatively easy to visualize the applicability to security data and translational medicine, though less easy to visualize its applicability to networks. Since an interval is typically denoted by (a, b), in a network interval, we let a be a pair of nodes and b be their edge with characteristics c and d, respectively. If this representation of a network interval is valid, then we can more easily visualize its applicability to networks also. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:535 / 540
页数:6
相关论文
共 50 条
  • [31] CIPCA: Complete-Information-based Principal Component Analysis for interval-valued data
    Wang, Huiwen
    Guan, Rong
    Wu, Junjie
    NEUROCOMPUTING, 2012, 86 : 158 - 169
  • [32] Fault Detection and Isolation of Spacecraft Thrusters using an Extended Principal Component Analysis to Interval Data
    Gueddi, Imen
    Nasri, Othman
    Benothman, Kamal
    Dague, Philippe
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (02) : 776 - 789
  • [33] An incremental principal component analysis for chunk data
    Ozawa, Seiichi
    Pang, Shaoning
    Kasabov, Nikola
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 2278 - +
  • [34] Penalized Principal Component Analysis of Microarray Data
    Nikulin, Vladimir
    McLachlan, Geoffrey J.
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, 2010, 6160 : 82 - 96
  • [35] Principal component analysis of binary genomics data
    Song, Yipeng
    Westerhuis, Johan A.
    Aben, Nanne
    Michaut, Magali
    Wessels, Lodewyk F. A.
    Smilde, Age K.
    BRIEFINGS IN BIOINFORMATICS, 2019, 20 (01) : 317 - 329
  • [36] Principal Component Analysis on Spatial Data: An Overview
    Demsar, Urska
    Harris, Paul
    Brunsdon, Chris
    Fotheringham, A. Stewart
    McLoone, Sean
    ANNALS OF THE ASSOCIATION OF AMERICAN GEOGRAPHERS, 2013, 103 (01) : 106 - 128
  • [37] Robust principal component analysis for functional data
    Peña, D
    Prieto, J
    TEST, 1999, 8 (01) : 56 - 60
  • [38] Data evaluation in chromatography by principal component analysis
    Cserhati, T.
    BIOMEDICAL CHROMATOGRAPHY, 2010, 24 (01) : 20 - 28
  • [39] Quantum data compression by principal component analysis
    Yu, Chao-Hua
    Gao, Fei
    Lin, Song
    Wang, Jingbo
    QUANTUM INFORMATION PROCESSING, 2019, 18 (08)
  • [40] Principal component analysis for compositional data vectors
    Huiwen Wang
    Liying Shangguan
    Rong Guan
    Lynne Billard
    Computational Statistics, 2015, 30 : 1079 - 1096