Ranked MSD: A New Feature Ranking and Feature Selection Approach for Biomarker Identification

被引:1
|
作者
Verma, Ghanshyam [1 ,2 ]
Jha, Alokkumar [1 ,2 ]
Rebholz-Schuhmann, Dietrich [3 ]
Madden, Michael G. [1 ,2 ]
机构
[1] Natl Univ Ireland Galway, Insight Ctr Data Analyt, Galway, Ireland
[2] Natl Univ Ireland Galway, Sch Comp Sci, Galway, Ireland
[3] Univ Cologne, ZB Med Informat Ctr Life Sci, Cologne, Germany
基金
爱尔兰科学基金会;
关键词
Machine learning; Respiratory viral infection; Feature ranking; Feature selection; Classification; Explainable AI; SUPPORT VECTOR MACHINES; SIGNATURE;
D O I
10.1007/978-3-030-29726-8_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the era of big data when a huge amount of data is continuously being generated, it is common for situations to arise where the number of samples is much smaller than the number of features (variables) per sample. This phenomenon is often found in biomedical domains, where we may have relatively few patients, compared to the amount of data per patient. For example, gene expression data typically has between 10,000 and 60,000 features per sample. A separate issue arises from the "right to explanation" found in the European General Data Protection Regulation (GDPR), which may prevent the use of black-box models in applications where explainability is required. In such situations, there is a need for robust algorithms which can identify the relevant features from experimental data by discarding irrelevant ones, yielding a simpler subset that facilitates explanation. To address these needs, we have developed a new algorithm for feature ranking and feature selection, named Ranked MSD. We have tested our proposed approach on two real-world gene expression data sets, both of which relate to respiratory viral infections. This Ranked MSD feature selection algorithm is able to reduce the feature set size from 12,023 genes (features) to 65 genes on the first data set and from 20,737 genes to 31 genes on the second data set, in both cases without any significant loss in disease prediction accuracy. In an alternative configuration, our proposed algorithm is able to identify a small subset of features that gives better accuracy than that of the full feature set. Our proposed algorithm can also identify important biomarkers (genes) with their importance score for a particular disease and the identified top-ranked biomarkers can play a vital role in drug discovery and precision medicine.
引用
收藏
页码:147 / 167
页数:21
相关论文
共 50 条
  • [1] A new approach to feature selection
    Scherf, M
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 1997, 1211 : 181 - 184
  • [2] Wrapper for ranking feature selection
    Ruiz, R
    Aguilar-Ruiz, JS
    Riquelme, JC
    [J]. INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 384 - 389
  • [3] ENSEMBLE FEATURE SELECTION APPROACH BASED ON FEATURE RANKING FOR RICE SEED IMAGES CLASSIFICATION
    Dzi Lam Tran Tuan
    Surinwarangkoon, Thongchai
    Meethongjan, Kittikhun
    Vinh Truong Hoang
    [J]. ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2020, 18 (03) : 198 - 206
  • [4] A Stratified Feature Ranking Method for Supervised Feature Selection
    Chen, Renjie
    Chen, Xiaojun
    Yuan, Guowen
    Sun, Wenya
    Wu, Qingyao
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8059 - 8060
  • [5] Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences
    Lo Bosco, Giosue
    Rizzo, Riccardo
    Fiannaca, Antonino
    La Rosa, Massimo
    Urso, Alfonso
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2018, 2018, 909 : 314 - 324
  • [6] A New Approach for Automated Feature Selection
    Gocht, Andreas
    Lehmann, Christoph
    Schoene, Robert
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4915 - 4920
  • [7] A new graph feature selection approach
    Akhiat, Yassine
    Asnaoui, Youssef
    Chahhou, Mohamed
    Zinedine, Ahmed
    [J]. 2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 156 - 161
  • [8] A new approach to feature subset selection
    Liu, DZ
    Feng, ZJ
    Wang, XZ
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1822 - 1825
  • [9] An Adaptive Multiple Feature Subset Method for Feature Ranking and Selection
    Chang, Fu
    Chen, Jen-Cheng
    [J]. INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 255 - 262
  • [10] Feature subset selection and feature ranking for multivariate time series
    Yoon, H
    Yang, KY
    Shahabi, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1186 - 1198