Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Gene expression data classification using topology and machine learning models
    Tamal K. Dey
    Sayan Mandal
    Soham Mukherjee
    [J]. BMC Bioinformatics, 22
  • [2] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    [J]. 2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [3] Tumor Classification Using Gene Expression and Machine Learning Models
    Tuncal, Kubra
    Ozkan, Cagri
    [J]. 10TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS - ICSCCW-2019, 2020, 1095 : 662 - 667
  • [4] Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
    Alharbi, Fadi
    Vakanski, Aleksandar
    [J]. BIOENGINEERING-BASEL, 2023, 10 (02):
  • [5] Enhancing Gene Expression Classification Through Explainable Machine Learning Models
    Thanh-Nghi Do
    [J]. SN Computer Science, 5 (5)
  • [6] Comparative Study of Disease Classification Using Multiple Machine Learning Models Based on Landmark and Non-Landmark Gene Expression Data
    Huang, Xiaoqin
    Sun, Jian
    Srinivasan, Satish Mahadevan
    Sangwan, Raghvinder S.
    [J]. BIG DATA, IOT, AND AI FOR A SMARTER FUTURE, 2021, 185 : 264 - 273
  • [7] Classification of Firewall Log Data Using Multiclass Machine Learning Models
    Aljabri, Malak
    Alahmadi, Amal A.
    Mohammad, Rami Mustafa A.
    Aboulnour, Menna
    Alomari, Dorieh M.
    Almotiri, Sultan H.
    [J]. ELECTRONICS, 2022, 11 (12)
  • [8] Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms
    Maniruzzaman, Md
    Rahman, Md Jahanur
    Ahammed, Benojir
    Abedin, Md Menhazul
    Suri, Harman S.
    Biswas, Mainak
    El-Baz, Ayman
    Bangeas, Petros
    Tsoulfas, Georgios
    Suri, Jasjit S.
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 176 : 173 - 193
  • [9] Dissimilarity based ensemble of extreme learning machine for gene expression data classification
    Lu, Hui-juan
    An, Chun-lin
    Zheng, En-hui
    Lu, Yi
    [J]. NEUROCOMPUTING, 2014, 128 : 22 - 30
  • [10] New ensemble machine learning method for classification and prediction on gene expression data
    Wang, Ching Wei
    [J]. 2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 60 - 63