Principal component analysis and clustering on manifolds

被引:3
|
作者
V. Mardia, Kanti [1 ,2 ]
Wiechers, Henrik [3 ]
Eltzner, Benjamin [4 ]
Huckemann, Stephan F. [3 ]
机构
[1] Univ Leeds, Sch Math, Dept Stat, Leeds LS2 9JT, W Yorkshire, England
[2] Univ Oxford, Dept Stat, Oxford OX1 3LB, England
[3] Georgia Augusta Univ, Felix Bernstein Inst Math Stat Biosci, D-37077 Gottingen, Germany
[4] Max Planck Inst Biophys Chem, D-37077 Gottingen, Germany
关键词
Adaptive linkage clustering; Circular mode hunting; Dimension reduction; Multivariate wrapped normal; SARS-CoV-2; geometry; Stratified spheres; Torus PCA; CHALLENGES; STATISTICS; INFERENCE; SIZER;
D O I
10.1016/j.jmva.2021.104862
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Big data, high dimensional data, sparse data, large scale data, and imaging data are all becoming new frontiers of statistics. Changing technologies have created this flood and have led to a real hunger for new modeling strategies and data analysis by scientists. In many cases data are not Euclidean; for example, in molecular biology, the data sit on manifolds. Even in a simple non-Euclidean manifold (circle), to summarize angles by the arithmetic average cannot make sense and so more care is needed. Thus non-Euclidean settings throw up many major challenges, both mathematical and statistical. This paper will focus on the PCA and clustering methods for some manifolds. Of course, the PCA and clustering methods in multivariate analysis are one of the core topics. We basically deal with two key manifolds from a practical point of view, namely spheres and tori. It is well known that dimension reduction on non-Euclidean manifolds with PCA-like methods has been a challenging task for quite some time but recently there has been some breakthrough. One of them is the idea of nested spheres and another is transforming a torus into a sphere effectively and subsequently use the technology of nested spheres PCA. We also provide a new method of clustering for multivariate analysis which has a fundamental property required for molecular biology that penalizes wrong assignments to avoid chemically no go areas. We give various examples to illustrate these methods. One of the important examples includes dealing with COVID-19 data.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Clustering and disjoint principal component analysis
    Vichi, Maurizio
    Saporta, Gilbert
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (08) : 3194 - 3208
  • [2] XML clustering by principal component analysis
    Liu, JH
    Wang, JTL
    Hsu, W
    Herbert, KG
    [J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 658 - 662
  • [3] Principal component analysis for clustering temporomandibular joint data
    Meng Shuaishuai
    Fu Yuzhuo
    Liu Ting
    Li Yi
    [J]. 2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2015, : 422 - 425
  • [4] Principal Component Analysis based Feature Selection for clustering
    Xu, Jun-Ling
    Xu, Bao-Wen
    Zhang, Wei-Feng
    Cui, Zi-Feng
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 460 - +
  • [5] XML clustering and retrieval through principal component analysis
    Wang, JTL
    Liu, JH
    Wang, JH
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (04) : 683 - 699
  • [6] Principal Component Analysis and Clustering Based Indoor Localization
    Liang, Dong
    Yang, Jingkang
    Xuan, Rui
    Zhang, Zhaojing
    Yang, Zhifang
    Shi, Kexin
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1103 - 1108
  • [7] A Quantum Principal Component Analysis Algorithm for Clustering Problems
    Liu, Wenjie
    Wang, Bosi
    Chen, Junxiu
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (12): : 2858 - 2866
  • [8] Principal component analysis for clustering gene expression data
    Yeung, KY
    Ruzzo, WL
    [J]. BIOINFORMATICS, 2001, 17 (09) : 763 - 774
  • [9] Effect of dimension reduction by principal component analysis on clustering
    Erisoglu, Murat
    Erisoglu, Ulku
    [J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2011, 14 (02) : 277 - 287
  • [10] A random version of principal component analysis in data clustering
    Palese, Luigi Leonardo
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 73 : 57 - 64