Data Segmentation via t-SNE, DBSCAN, and Random Forest

被引:2
|
作者
DeLise, Timothy [1 ]
机构
[1] Univ Montreal, Montreal, PQ, Canada
来源
关键词
Clustering; t-SNE; Random forest; DBSCAN; Segmentation; Data visualization; Unsupervised learning;
D O I
10.1007/978-3-030-80126-7_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research proposes a data segmentation algorithm which combines t-SNE, DBSCAN, and Random Forest classifier to form an end-to-end pipeline that separates data into natural clusters and produces a characteristic profile of each cluster based on the most important features. Out-of-sample cluster labels can be inferred, and the technique generalizes well on real data sets. We describe the algorithm and provide case studies using the Iris and MNIST data sets, as well as real social media site data from Instagram. This is a proof of concept and sets the stage for further in-depth theoretical analysis.
引用
收藏
页码:139 / 151
页数:13
相关论文
共 50 条
  • [1] Confidence estimation for t-SNE embeddings using random forest
    Yigin, Busra Ozgode
    Saygili, Gorkem
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (12) : 3981 - 3992
  • [2] Confidence estimation for t-SNE embeddings using random forest
    Busra Ozgode Yigin
    Gorkem Saygili
    [J]. International Journal of Machine Learning and Cybernetics, 2022, 13 : 3981 - 3992
  • [3] Visualizing Data using t-SNE
    van der Maaten, Laurens
    Hinton, Geoffrey
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2579 - 2605
  • [4] Conditional t-SNE: more informative t-SNE embeddings
    Kang, Bo
    Garcia Garcia, Dario
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. MACHINE LEARNING, 2021, 110 (10) : 2905 - 2940
  • [5] Conditional t-SNE: More informative t-SNE embeddings
    Kang, Bo
    Garcia, Dario Garcia
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [6] Conditional t-SNE: more informative t-SNE embeddings
    Bo Kang
    Darío García García
    Jefrey Lijffijt
    Raúl Santos-Rodríguez
    Tijl De Bie
    [J]. Machine Learning, 2021, 110 : 2905 - 2940
  • [7] Phonetic Segmentation of Speech using STEP and t-SNE
    Stan, Adriana
    Valentini-Botinhao, Cassia
    Giurgiu, Mircea
    King, Simon
    [J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [8] Seeing data as t-SNE and UMAP do
    Marx, Vivien
    [J]. NATURE METHODS, 2024, 21 (06) : 930 - 933
  • [9] Application of t-SNE to human genetic data
    Li, Wentian
    Cerise, Jane E.
    Yang, Yaning
    Han, Henry
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2017, 15 (04)
  • [10] Wasserstein t-SNE
    Bachmann, Fynn
    Hennig, Philipp
    Kobak, Dmitry
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I, 2023, 13713 : 104 - 120