Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

被引:9
|
作者
Dittman, David J. [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bioinformatics; Feature Selection; Classification; FEATURE-SELECTION; GENE-EXPRESSION;
D O I
10.1109/ICMLA.2013.155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques from the domain of machine learning and data mining are well suited to combating these difficulties. Techniques like feature selection (choosing an optimal subset of features for subsequent analysis by removing irrelevant or redundant features) and classifiers (used to build inductive models in order to classify unknown instances) can assist researchers in working with such difficult datasets. Unfortunately, many practitioners of bioinformatics do not have the machine learning knowledge to choose the correct techniques in order to achieve good classification results. If the choices could be simplified or predetermined then it would be easier to apply the techniques. This study is a comprehensive analysis of machine learning techniques on twenty-five bioinformatics datasets using six classifiers, and twenty-four feature rankers. We analyzed the factors at each of four feature subset sizes chosen for being large enough to be effective in creating inductive models but small enough to be of use for further research. Our results shows that Random Forest with 100 trees is the top performing classifier and that the choice of feature ranker is of little importance as long as feature selection occurs. Statistical analysis confirms our results. By choosing these parameters, machine learning techniques are more accessible to bioinformatics.
引用
收藏
页码:396 / 403
页数:8
相关论文
共 50 条
  • [1] Advanced Machine Learning Techniques for Bioinformatics
    Zou, Quan
    Liu, Qi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (04) : 1182 - 1183
  • [2] Latest Machine Learning Techniques for Biomedicine and Bioinformatics
    Zou, Quan
    CURRENT BIOINFORMATICS, 2019, 14 (03) : 176 - 177
  • [3] Simplifying AI and machine learning
    Siegel, Eliot
    APPLIED RADIOLOGY, 2018, 47 (05) : 26 - 28
  • [4] Application of machine learning techniques for simplifying the association problem in a video surveillance system
    Rodríguez, B
    Pérez, O
    García, J
    Molina, JM
    ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING APPLICATIONS: A BIOINSPIRED APPROACH, PT 2, PROCEEDINGS, 2005, 3562 : 509 - 518
  • [5] Machine Learning in Bioinformatics
    Ramon, Jan
    Costa, Fabrizio
    Florencio, Christophe Costa
    Kok, Joost
    FUNDAMENTA INFORMATICAE, 2011, 113 (02) : I - II
  • [6] Machine Learning in Bioinformatics
    Zhaoli
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 582 - 584
  • [7] Machine learning in bioinformatics
    Larranaga, Pedro
    Calvo, Borja
    Santana, Roberto
    Bielza, Concha
    Galdiano, Josu
    Inza, Inaki
    Lozano, Jose A.
    Armananzas, Ruben
    Santafe, Guzman
    Perez, Aritz
    Robles, Victor
    BRIEFINGS IN BIOINFORMATICS, 2006, 7 (01) : 86 - 112
  • [8] The machine learning techniques in the protein structure prediction: an approach from bioinformatics
    Santiesteban-Toca, Cosme E.
    Casanola-Martin, Gerardo M.
    Aguilar-Ruiz, Jesus S.
    AFINIDAD, 2014, 71 (567) : 219 - 227
  • [9] Machine learning for bioinformatics and neuroimaging
    Serra, Angela
    Galdi, Paola
    Tagliaferri, Roberto
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 8 (05)
  • [10] Simplifying the detection of optical distortions by machine learning
    Hu, Shuwen
    Hu, Lejia
    Zhang, Biwei
    Gong, Wei
    Si, Ke
    JOURNAL OF INNOVATIVE OPTICAL HEALTH SCIENCES, 2020, 13 (03)