A t-SNE Based Classification Approach to Compositional Microbiome Data

被引:25
|
作者
Xu, Xueli [1 ]
Xie, Zhongming [2 ]
Yang, Zhenyu [1 ]
Li, Dongfang [3 ]
Xu, Ximing [1 ,4 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, Tianjin, Peoples R China
[2] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China
[4] Key Lab Med Data Anal & Stat Res Tianjin, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
microbiome data; dimension reduction; t-SNE; Aitchison distance; classification; GUT MICROBIOME;
D O I
10.3389/fgene.2020.620143
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Decomposition and Classification of Stellar Spectra Based on t-SNE
    Jiang Bin
    Zhao Zi-liang
    Wang Shu-ting
    Wei Ji-yu
    Qu Mei-xia
    [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40 (09) : 2913 - 2917
  • [2] Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE
    Cardona, Luis Ariosto Serna
    Vargas-Cardona, Hernan Dario
    Navarro Gonzalez, Piedad
    Cardenas Pena, David Augusto
    Orozco Gutierrez, Alvaro Angel
    [J]. COMPUTATION, 2020, 8 (04) : 1 - 15
  • [3] A Frequency-Based Approach for the Detection and Classification of Structural Changes Using t-SNE
    Agis, David
    Pozo, Francesc
    [J]. SENSORS, 2019, 19 (23)
  • [4] A novel Supervised t-SNE based approach of viseme classification for automated lip reading
    Fenghour, Souheil
    Chen, Daqing
    Hajderanj, Laureta
    Weheliye, Isakh
    Xiao, Perry
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 1316 - 1322
  • [5] Visualizing Data using t-SNE
    van der Maaten, Laurens
    Hinton, Geoffrey
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2579 - 2605
  • [6] Visualizing data using t-SNE
    TiCC, Ttlburg University, P.O. Box 90153, 5000 LE Tilburg, Netherlands
    不详
    [J]. J. Mach. Learn. Res, 2008, (2579-2625):
  • [7] Conditional t-SNE: more informative t-SNE embeddings
    Kang, Bo
    Garcia Garcia, Dario
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. MACHINE LEARNING, 2021, 110 (10) : 2905 - 2940
  • [8] Conditional t-SNE: More informative t-SNE embeddings
    Kang, Bo
    Garcia, Dario Garcia
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [9] Conditional t-SNE: more informative t-SNE embeddings
    Bo Kang
    Darío García García
    Jefrey Lijffijt
    Raúl Santos-Rodríguez
    Tijl De Bie
    [J]. Machine Learning, 2021, 110 : 2905 - 2940
  • [10] Induction Motor Fault Classification Based on ROC Curve and t-SNE
    Lee, Chun-Yao
    Lin, Wen-Cheng
    [J]. IEEE ACCESS, 2021, 9 : 56330 - 56343