A topological data analysis based classifier

被引:1
|
作者
Kindelan, Rolando [1 ,3 ]
Frias, Jose [4 ]
Cerda, Mauricio [2 ]
Hitschfeld, Nancy [1 ]
机构
[1] Univ Chile, Fac Math & Phys Sci, Comp Sci Dept, 851 Beauchef Ave, Santiago 8370456, Metropolitan Re, Chile
[2] Univ Chile, Inst Biomed Sci, Fac Med, Ctr Med Informat & Telemed,Integrat Biol Program, 1027 Independencia Ave, Santiago, Metropolitan Re, Chile
[3] Univ Oriente, Med & Biophys Ctr, Patricio Lumumba S-N, Santiago De Cuba, Cuba
[4] Ctr Res Math, Jalisco S-N, Guanajuato 63023, Guanajuato, Mexico
关键词
Topological data analysis; Persistent homology; Simplicial complex; Supervised learning; Classification; Machine learning; PATTERN-RECOGNITION; PERSISTENCE; EFFICIENT; BEHAVIOR; LAYER;
D O I
10.1007/s11634-023-00548-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset's underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point's link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration's entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.
引用
收藏
页码:493 / 538
页数:46
相关论文
共 50 条
  • [21] A topological data analysis based classification method for multiple measurements
    Riihimaki, Henri
    Chacholski, Wojciech
    Theorell, Jakob
    Hillert, Jan
    Ramanujam, Ryan
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [22] Detection of Repeated Structures in an Image Based on Topological Data Analysis
    S. Eremeev
    Pattern Recognition and Image Analysis, 2024, 34 (4) : 936 - 939
  • [23] An Empirical Study on Darknet Visualization Based on Topological Data Analysis
    Narita, Masaki
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2021, 9 (01) : 52 - 58
  • [24] A topological data analysis based classification method for multiple measurements
    Henri Riihimäki
    Wojciech Chachólski
    Jakob Theorell
    Jan Hillert
    Ryan Ramanujam
    BMC Bioinformatics, 21
  • [25] Empirical study of financial crises based on topological data analysis
    Guo, Hongfeng
    Xia, Shengxiang
    An, Qiguang
    Zhang, Xin
    Sun, Weihua
    Zhao, Xinyao
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 558 (558)
  • [26] Federated Incremental Learning algorithm based on Topological Data Analysis
    Hu, Kai
    Gong, Sheng
    Li, Lingxiao
    Luo, Yuantu
    Li, YaoGen
    Jiang, Shanshan
    PATTERN RECOGNITION, 2025, 158
  • [27] A data classifier based on TOPSIS method
    Jiang, Wei
    Zhong, Xiaoqiang
    Chen, Kai
    Zhang, Shanshan
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 702 - +
  • [28] Topological data analysis and applications
    Costa, Joao Pita
    2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 558 - 563
  • [29] Topological Information Data Analysis
    Baudot, Pierre
    Tapia, Monica
    Bennequin, Daniel
    Goaillard, Jean-Marc
    ENTROPY, 2019, 21 (09)
  • [30] Topological data analysis and cosheaves
    Justin Michael Curry
    Japan Journal of Industrial and Applied Mathematics, 2015, 32 : 333 - 371