Principal component analysis for interval data

被引：16

作者：

Billard, L. ^{[1
]}

Le-Rademacher, J. ^{[2
]}

机构：

[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA

[2] Med Coll Wisconsin, Wauwatosa, WI USA

来源：

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS | 2012年 / 4卷 / 06期

关键词：

PCA; intervals; visualization;

D O I：

10.1002/wics.1231

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Principal component analysis for classical data is a method used frequently to reduce the effective dimension underlying a data set from p random variables to s << p linear functions of those p random variables and their observed values. With contemporary large data sets, it is often the case that the data are aggregated in some meaningful scientific way such that the resulting data are symbolic data (such as lists, intervals, histograms, and the like); though symbolic data can and do occur naturally and in smaller data sets. Since symbolic data have internal variations along with the familiar (between observations) variation of classical data, direct application of classical methods to symbolic data will ignore much of the information contained in the data. Our focus is to describe and illustrate principal component methodology for interval data. The significance of symbolic data in general and of this article in particular is illustrated by its applicability for our analysis of three key 21st century challengers: networks, security data, and translational medicine. It is relatively easy to visualize the applicability to security data and translational medicine, though less easy to visualize its applicability to networks. Since an interval is typically denoted by (a, b), in a network interval, we let a be a pair of nodes and b be their edge with characteristics c and d, respectively. If this representation of a network interval is valid, then we can more easily visualize its applicability to networks also. (C) 2012 Wiley Periodicals, Inc.

引用

页码：535 / 540

页数：6

共 50 条

[41] Principal Component Analysis of symmetric fuzzy data
Giordani, P
Kiers, HAL
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 45 (03) : 519 - 548
[42] PRINCIPAL COMPONENT ANALYSIS FOR MULTIVARIATE FAMILIAL DATA
KONISHI, S
RAO, CR
BIOMETRIKA, 1992, 79 (03) : 631 - 641
[43] A nonlinear principal component analysis on image data
Saegusa, R
Sakano, H
Hashimoto, S
MACHINE LEARNING FOR SIGNAL PROCESSING XIV, 2004, : 589 - 598
[44] Weighting of geophysical data in Principal Component Analysis
Chung, C
Nigam, S
JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 1999, 104 (D14) : 16925 - 16928
[45] Principal component analysis for compositional data vectors
Wang, Huiwen
Shangguan, Liying
Guan, Rong
Billard, Lynne
COMPUTATIONAL STATISTICS, 2015, 30 (04) : 1079 - 1096
[46] Fuzzy principal component analysis for fuzzy data
Yabuuchi, Y
Watada, J
Nakamori, Y
PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 1127 - 1132
[47] Principal Component Analysis with Noisy and/or Missing Data
Bailey, Stephen
PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2012, 124 (919) : 1015 - 1023
[48] Principal component analysis for Hilbertian functional data
Kim, Dongwoo
Lee, Young Kyung
Park, Byeong U.
COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2020, 27 (01) : 149 - 161
[49] IMPROVED PRINCIPAL COMPONENT ANALYSIS OF NOISY DATA
FAY, MJ
PROCTOR, A
HOFFMANN, DP
HERCULES, DM
ANALYTICAL CHEMISTRY, 1991, 63 (11) : 1058 - 1063
[50] A Nonlinear principal component analysis of image data
Saegusa, R
Sakano, H
Hashimoto, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (10): : 2242 - 2248

← 1 2 3 4 5 →