A principal component analysis of interval data based on center and log-radius

被引:0
|
作者
Zhao Q. [1 ,2 ]
Wang H. [1 ,3 ]
Wang S. [1 ,2 ]
机构
[1] School of Economics and Management, Beihang University, Beijing
[2] Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing
[3] Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing
基金
中国国家自然科学基金;
关键词
Center and log-radius; Covariance matrix; Dimension reduction; Interval data; Principal Component Analysis (PCA);
D O I
10.13700/j.bh.1001-5965.2020.0227
中图分类号
学科分类号
摘要
In order to study the dimension reduction and visualization of multivariate interval data, a two-dimensional array including center and log-radius is used as the expression of interval data. Then the algebraic algorithm of interval data is given, and a new Principal Component Analysis (PCA) method of interval data is proposed on this basis. The processing of the logarithm of interval radius ensures the rationality that the range of the final interval principal components are non-negative. The calculation of this new method is simple, and the complexity is low. Furthermore, the change of the relative position between the points in the sample group before and after the dimension reduction is as small as possible. By reducing the dimension of variables in the high-dimensional space, various classical statistical analysis methods can be used. Besides, the sample points in the original high-dimensional space can be depicted in the low-dimensional space, which makes it possible to visualize multivariate interval data. The results of simulation experiment verify the effectiveness of the proposed method. © 2021, Editorial Board of JBUAA. All right reserved.
引用
收藏
页码:1414 / 1421
页数:7
相关论文
共 18 条
  • [1] WOLD S, ESBENSEN K, GELADI P., Principal component analysis[J], Chemometrics and Intelligent Laboratory Systems, 2, 1-3, pp. 37-52, (1987)
  • [2] REN R E, WANG H W., Multivariate statistical data analysis:Theory, method and examples, pp. 92-95, (1997)
  • [3] SPETSIERIS P G, MA Y, DHAWAN V, Et al., Differential diagnosis of parkinsonian syndromes using PCA-based functional imaging features[J], NeuroImage, 45, 4, pp. 1241-1252, (2009)
  • [4] HU Y, WANG H W., A new data mining method based on huge data and its application, Journal of Beijing University of Aeronautics and Astronautics, 17, 2, pp. 40-44, (2002)
  • [5] DIDAY E., Thinking by classes in data science:The symbolic data analysis paradigm: Symbolic data analysis[J], Wiley Interdiplinary Reviews:Computational Statistics, 8, 5, pp. 172-205, (2016)
  • [6] ZHANG Y, WANG Y, WANG H W., Evaluating of academic journals in management of key academic journal fund:An application of simplified principal component analysis based on interval data, Journal of Management Sciences in China, 13, 7, pp. 92-98, (2010)
  • [7] CAZES P, CHOUAKRIA A, DIDAY E, Et al., Extension de l'analyse en composantes principales à des donnés de type intervalle[J].Revue de Statistique Apliquée, 1997(3):5-24. CAZES P, CHOUAKRIA A, DIDAY E, et al.Extending principal component analysis to interval data[J], Applied Statistics Review, 3, (1997)
  • [8] DIDAY E, BOCK H H., Analysis of symbolic data:Exploratory methods for extracting statistical information from complex data[J], Journal of Classification, 18, 2, pp. 291-294, (2000)
  • [9] WANG H W, LI Y, GUAN R., A comparison study of two methods for principal component analysis of interval data, Journal of Beijing University of Aeronautics and Astronautics, 24, 4, pp. 86-89, (2010)
  • [10] CHOUAKRIA A, DIDAY E, CAZES P., Vertices principal components analysis with an improved factorial representation, Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98), pp. 397-402, (1998)