Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier

被引:2
|
作者
Wunsch, Donald C., III [1 ]
Hier, Daniel B. [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65401 USA
关键词
HUMAN PHENOTYPE ONTOLOGY;
D O I
10.1109/EMBC46164.2021.9629897
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
When features in a high dimension dataset are organized hierarchically, there is an inherent opportunity to reduce dimensionality. Since more specific concepts are subsumed by more general concepts, subsumption can be applied successively to reduce dimensionality. We tested whether subsumption could reduce the dimensionality of a disease dataset without impairing classification accuracy. We started with a dataset that had 168 neurological patients, 14 diagnoses, and 293 unique features. We applied subsumption repeatedly to create eight successively smaller datasets, ranging from 293 dimensions in the largest dataset to 11 dimensions in the smallest dataset. We tested a MLP classifier on all eight datasets. Precision, recall, accuracy, and validation declined only at the lowest dimensionality. Our preliminary results suggest that when features in a high dimension dataset are derived from a hierarchical ontology, subsumption is a viable strategy to reduce dimensionality.
引用
收藏
页码:1618 / 1621
页数:4
相关论文
共 50 条
  • [1] Optimizing machine learning for water safety: A comparative analysis with dimensionality reduction and classifier performance in potability prediction
    Chatterjee, Debashis
    Ghosh, Prithwish
    Banerjee, Amlan
    Das, Shiladri Shekhar
    PLOS WATER, 2024, 3 (08):
  • [2] Development of a 'Fake News' Machine Learning Classifier and a Dataset for its Testing
    Fleck, William
    Snell, Nicholas
    Traylor, Terry
    Straub, Jeremy
    DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES II, 2019, 11013
  • [3] Can a machine learning classifier pipeline detect infantile spasms in a clinical dataset?
    Di Ruffia, F. Pentimalli Biscaretti
    Chybowski, B.
    Jordan, B.
    Fisher, K.
    Gonzalez-Sulser, A.
    Rodriguez, J. Escudero
    Shetty, J.
    EPILEPSIA, 2024, 65 : 110 - 111
  • [4] A performance bound of the multi-output extreme learning machine classifier
    Wang, Di
    Wang, Ping
    Shi, Junzhi
    MEMETIC COMPUTING, 2019, 11 (03) : 297 - 304
  • [5] A performance bound of the multi-output extreme learning machine classifier
    Di Wang
    Ping Wang
    Junzhi Shi
    Memetic Computing, 2019, 11 : 297 - 304
  • [6] Optimization of Stacked Unsupervised Extreme Learning Machine to Improve Classifier Performance
    Arsa, Dewa Made Sri
    Ma'sum, M. Anwar
    Rachmadi, Muhammad Febrian
    Jatmiko, Wisnu
    2017 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2017), 2017, : 63 - 68
  • [7] Performance Analysis of Machine Learning Algorithms on Imbalanced DDoS Attack Dataset
    Deb, Dipok
    Rodrigo, Hansapani
    Kumar, Sanjeev
    2024 IEEE 5TH ANNUAL WORLD AI IOT CONGRESS, AIIOT 2024, 2024, : 0349 - 0355
  • [8] The Diagnostic Performance of Machine Learning in Breast Microwave Sensing on an Experimental Dataset
    Reimer, Tyson
    Pistorius, Stephen
    IEEE JOURNAL OF ELECTROMAGNETICS RF AND MICROWAVES IN MEDICINE AND BIOLOGY, 2022, 6 (01): : 139 - 145
  • [9] Comparative Performance Analysis of Machine Learning Classifiers on Ovarian Cancer Dataset
    Bhattacharjee, Sharmistha
    Singh, Yumnam Jayanta
    Ray, Dipankar
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2017, : 213 - 218
  • [10] Dimensionality Reduction for Improving the Performance of Risk Calculation Using Machine Learning Algorithms
    Hiwase, Vaibhav A.
    Agrawal, Avinash J.
    HELIX, 2018, 8 (05): : 3802 - 3809