Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

被引:9
|
作者
Dittman, David J. [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bioinformatics; Feature Selection; Classification; FEATURE-SELECTION; GENE-EXPRESSION;
D O I
10.1109/ICMLA.2013.155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques from the domain of machine learning and data mining are well suited to combating these difficulties. Techniques like feature selection (choosing an optimal subset of features for subsequent analysis by removing irrelevant or redundant features) and classifiers (used to build inductive models in order to classify unknown instances) can assist researchers in working with such difficult datasets. Unfortunately, many practitioners of bioinformatics do not have the machine learning knowledge to choose the correct techniques in order to achieve good classification results. If the choices could be simplified or predetermined then it would be easier to apply the techniques. This study is a comprehensive analysis of machine learning techniques on twenty-five bioinformatics datasets using six classifiers, and twenty-four feature rankers. We analyzed the factors at each of four feature subset sizes chosen for being large enough to be effective in creating inductive models but small enough to be of use for further research. Our results shows that Random Forest with 100 trees is the top performing classifier and that the choice of feature ranker is of little importance as long as feature selection occurs. Statistical analysis confirms our results. By choosing these parameters, machine learning techniques are more accessible to bioinformatics.
引用
收藏
页码:396 / 403
页数:8
相关论文
共 50 条
  • [41] Machine Learning In Bioinformatics: Gene Expression And Microarray Studies
    Bagiroz, Beyza
    Doruk, Emre
    Yildiz, Oktay
    2020 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2020,
  • [42] An Overview of Machine Learning and HPC in Open Sources for Bioinformatics
    Tsai, Yin-Te
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 1338 - 1342
  • [43] Machine learning in bioinformatics: A brief survey and recommendations for practitioners
    Bhaskar, Harish
    Hoyle, David C.
    Singh, Sameer
    COMPUTERS IN BIOLOGY AND MEDICINE, 2006, 36 (10) : 1104 - 1125
  • [44] Machine Learning in Bioinformatics: A Novel Approach for DNA Sequencing
    Dixit, Pooja
    Prajapati, Ghanshyam I.
    2015 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION TECHNOLOGIES ACCT 2015, 2015, : 41 - 47
  • [45] Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens
    Li, Xiaochang
    Chen, Xiaoman
    Wang, Qiulian
    Yang, Ning
    Sun, Congjiao
    GENES, 2024, 15 (06)
  • [46] Introduction to the special issue on machine learning for microarray bioinformatics
    Mak, Man Wai
    Tewfik, Ahmed
    Chan, Lai Wan
    Chan, Chun Chung Keith
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2008, 50 (03): : 263 - 265
  • [47] Special Issue: New Advances in Bioinformatics and Biomedical Engineering Using Machine Learning Techniques, IWBBIO-2022
    Valenzuela, Olga
    Ortuno, Francisco
    Benso, Alfredo
    Schwartz, Jean-Marc
    de Brevern, Alexandre G.
    Rojas, Ignacio
    GENES, 2023, 14 (08)
  • [49] Simplifying Diagnosis of Fetal Alcohol Syndrome Using Machine Learning Methods
    Blanck-Lubarsch, Moritz
    Dirksen, Dieter
    Feldmann, Reinhold
    Bormann, Eike
    Hohoff, Ariane
    FRONTIERS IN PEDIATRICS, 2022, 9
  • [50] Machine Learning Techniques in Storm
    Han, Zhijie
    Xu, Miaoxin
    2015 SEVENTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2015, : 139 - 142