Using visual statistical inference to better understand random class separations in high dimension, low sample size data

被引:9
|
作者
Chowdhury, Niladri Roy [1 ]
Cook, Dianne [1 ]
Hofmann, Heike [1 ]
Majumder, Mahbubul [2 ]
Lee, Eun-Kyung [3 ]
Toth, Amy L. [4 ]
机构
[1] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[2] Univ Nebraska, Dept Math, Omaha, NE 68182 USA
[3] Ewha Womans Univ, Dept Stat, Seoul, South Korea
[4] Iowa State Univ, Dept Ecol Evolut & Organismal Biol, Ames, IA USA
基金
美国国家科学基金会;
关键词
Statistical graphics; Lineup; Visualization; Projection pursuit; Data mining; FEATURE-SELECTION; BEHAVIOR; CLASSIFICATION;
D O I
10.1007/s00180-014-0534-x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statistical graphics play an important role in exploratory data analysis, model checking and diagnosis. With high dimensional data, this often means plotting low-dimensional projections, for example, in classification tasks projection pursuit is used to find low-dimensional projections that reveal differences between labelled groups. In many contemporary data sets the number of observations is relatively small compared to the number of variables, which is known as a high dimension low sample size (HDLSS) problem. This paper explores the use of visual inference on understanding low-dimensional pictures of HDLSS data. Visual inference helps to quantify the significance of findings made from graphics. This approach may be helpful to broaden the understanding of issues related to HDLSS data in the data analysis community. Methods are illustrated using data from a published paper, which erroneously found real separation in microarray data, and with a simulation study conducted using Amazon's Mechanical Turk.
引用
收藏
页码:293 / 316
页数:24
相关论文
共 50 条
  • [1] Using visual statistical inference to better understand random class separations in high dimension, low sample size data
    Niladri Roy Chowdhury
    Dianne Cook
    Heike Hofmann
    Mahbubul Majumder
    Eun-Kyung Lee
    Amy L. Toth
    [J]. Computational Statistics, 2015, 30 : 293 - 316
  • [2] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
  • [3] On Perfect Clustering of High Dimension, Low Sample Size Data
    Sarkar, Soham
    Ghosh, Anil K.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (09) : 2257 - 2272
  • [4] Geometric representation of high dimension, low sample size data
    Hall, P
    Marron, JS
    Neeman, A
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 : 427 - 444
  • [5] CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE
    Ahn, Jeongyoun
    Lee, Myung Hee
    Yoon, Young Joo
    [J]. STATISTICA SINICA, 2012, 22 (02) : 443 - 464
  • [6] High-dimension, low-sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration
    Sen, Pranab K.
    Tsai, Ming-Tien
    Jou, Yuh-Shan
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (478) : 686 - 694
  • [7] Multiclass Classification on High Dimension and Low Sample Size Data Using Genetic Programming
    Wei, Tingyang
    Liu, Wei-Li
    Zhong, Jinghui
    Gong, Yue-Jiao
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2022, 10 (02) : 704 - 718
  • [8] Random forest kernel for high-dimension low sample size classification
    Lucca Portes Cavalheiro
    Simon Bernard
    Jean Paul Barddal
    Laurent Heutte
    [J]. Statistics and Computing, 2024, 34
  • [9] Random forest kernel for high-dimension low sample size classification
    Cavalheiro, Lucca Portes
    Bernard, Simon
    Barddal, Jean Paul
    Heutte, Laurent
    [J]. STATISTICS AND COMPUTING, 2024, 34 (01)
  • [10] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130