Principal component analysis for interval data

被引:16
|
作者
Billard, L. [1 ]
Le-Rademacher, J. [2 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
[2] Med Coll Wisconsin, Wauwatosa, WI USA
来源
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS | 2012年 / 4卷 / 06期
关键词
PCA; intervals; visualization;
D O I
10.1002/wics.1231
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis for classical data is a method used frequently to reduce the effective dimension underlying a data set from p random variables to s << p linear functions of those p random variables and their observed values. With contemporary large data sets, it is often the case that the data are aggregated in some meaningful scientific way such that the resulting data are symbolic data (such as lists, intervals, histograms, and the like); though symbolic data can and do occur naturally and in smaller data sets. Since symbolic data have internal variations along with the familiar (between observations) variation of classical data, direct application of classical methods to symbolic data will ignore much of the information contained in the data. Our focus is to describe and illustrate principal component methodology for interval data. The significance of symbolic data in general and of this article in particular is illustrated by its applicability for our analysis of three key 21st century challengers: networks, security data, and translational medicine. It is relatively easy to visualize the applicability to security data and translational medicine, though less easy to visualize its applicability to networks. Since an interval is typically denoted by (a, b), in a network interval, we let a be a pair of nodes and b be their edge with characteristics c and d, respectively. If this representation of a network interval is valid, then we can more easily visualize its applicability to networks also. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:535 / 540
页数:6
相关论文
共 50 条
  • [21] PRINCIPAL COMPONENT ANALYSIS OF EPIDEMIOLOGICAL DATA
    OSAKI, J
    ISHII, F
    IWAMOTO, S
    SHINBO, S
    BIOMETRICS, 1982, 38 (04) : 1101 - 1101
  • [22] Synthetic Data by Principal Component Analysis
    Sano, Natsuki
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 101 - 105
  • [23] PRINCIPAL COMPONENT ANALYSIS OF COMPOSITIONAL DATA
    AITCHISON, J
    BIOMETRIKA, 1983, 70 (01) : 57 - 65
  • [24] PRINCIPAL COMPONENT ANALYSIS OF PRODUCTION DATA
    WILLIAMS, JH
    RADIO AND ELECTRONIC ENGINEER, 1974, 44 (09): : 473 - 480
  • [25] Principal component analysis of hydrologic data
    Rao, AR
    Burke, TT
    INTEGRATED APPROACH TO ENVIRONMENTAL DATA MANAGEMENT SYSTEMS, 1997, 31 : 275 - 290
  • [26] Principal component analysis of genetic data
    David Reich
    Alkes L Price
    Nick Patterson
    Nature Genetics, 2008, 40 : 491 - 492
  • [27] Data Analysis Using Principal Component Analysis
    Sehgal, Shrub
    Singh, Harpreet
    Agarwal, Mohit
    Bhasker, V.
    Shantanu
    2014 INTERNATIONAL CONFERENCE ON MEDICAL IMAGING, M-HEALTH & EMERGING COMMUNICATION SYSTEMS (MEDCOM), 2015, : 45 - 48
  • [28] Fault Detection and Isolation of spacecraft thrusters using an extended principal component analysis to interval data
    Imen Gueddi
    Othman Nasri
    Kamal Benothman
    Philippe Dague
    International Journal of Control, Automation and Systems, 2017, 15 : 776 - 789
  • [29] Principal component analysis for interval-valued symbolic data (ID: 5-036)
    Li Wenhua
    Wang Chunfeng
    Guo Junpeng
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1-5: INDUSTRIAL ENGINEERING AND MANAGEMENT INNOVATION IN NEW-ERA, 2006, : 1815 - 1818
  • [30] Kernel Principal Component Analysis Improvement based on Data-Reduction via Class Interval
    Kaib, Mohammed Tahar Habib
    Kouadri, Abdelmalek
    Harkat, Mohamed Faouzi
    Bensmail, Abderazak
    Mansouri, Majdi
    Nounou, Mohamed
    IFAC PAPERSONLINE, 2024, 58 (04): : 390 - 395