Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

被引:9
|
作者
Dittman, David J. [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bioinformatics; Feature Selection; Classification; FEATURE-SELECTION; GENE-EXPRESSION;
D O I
10.1109/ICMLA.2013.155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques from the domain of machine learning and data mining are well suited to combating these difficulties. Techniques like feature selection (choosing an optimal subset of features for subsequent analysis by removing irrelevant or redundant features) and classifiers (used to build inductive models in order to classify unknown instances) can assist researchers in working with such difficult datasets. Unfortunately, many practitioners of bioinformatics do not have the machine learning knowledge to choose the correct techniques in order to achieve good classification results. If the choices could be simplified or predetermined then it would be easier to apply the techniques. This study is a comprehensive analysis of machine learning techniques on twenty-five bioinformatics datasets using six classifiers, and twenty-four feature rankers. We analyzed the factors at each of four feature subset sizes chosen for being large enough to be effective in creating inductive models but small enough to be of use for further research. Our results shows that Random Forest with 100 trees is the top performing classifier and that the choice of feature ranker is of little importance as long as feature selection occurs. Statistical analysis confirms our results. By choosing these parameters, machine learning techniques are more accessible to bioinformatics.
引用
收藏
页码:396 / 403
页数:8
相关论文
共 50 条
  • [21] Integrating machine learning in intelligent bioinformatics
    Hamdi-Cherif, Aboubekeur
    WSEAS Transactions on Computers, 2010, 9 (04): : 406 - 417
  • [22] Simplifying complex antibody engineering using machine learning
    Makowski, Emily K.
    Chen, Hsin-Ting
    Tessier, Peter M.
    CELL SYSTEMS, 2023, 14 (08) : 667 - 675
  • [23] SPAM: Simplifying Python']Python for Approaching Machine Learning
    Rosiene, Joel A.
    Rosiene, Carolyn Pe
    2020 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE 2020), 2020,
  • [24] Efficient Predictive Model for Utilization of Computing Resources using Machine Learning Techniques
    Kumar, K. S. Sendhil
    Anbarasi, M.
    Shanmugam, G. Siva
    Shankar, Achyut
    PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 351 - 357
  • [25] Bioinformatics and machine learning to support nanomaterial grouping
    Bahl, Aileen
    Halappanavar, Sabina
    Wohlleben, Wendel
    Nymark, Penny
    Kohonen, Pekka
    Wallin, Hakan
    Vogel, Ulla
    Haase, Andrea
    NANOTOXICOLOGY, 2024, 18 (04) : 373 - 400
  • [26] The impact of bioinformatics and machine learning in drug discovery
    Arrais, Joel
    EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2019, 49 : 41 - 42
  • [27] Incorporating Machine Learning into Established Bioinformatics Frameworks
    Auslander, Noam
    Gussow, Ayal B.
    Koonin, Eugene V.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (06) : 1 - 19
  • [28] SPECIAL ISSUE: MACHINE LEARNING IN BIOMEDICINE AND BIOINFORMATICS
    Peterson, Leif E.
    Chen, Xue-Wen
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (04) : 363 - 364
  • [29] Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics
    Wood, Alexander
    Najarian, Kayvan
    Kahrobaei, Delaram
    ACM COMPUTING SURVEYS, 2020, 53 (04)
  • [30] Probabilistic models and machine learning in structural bioinformatics
    Hamelryck, Thomas
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2009, 18 (05) : 505 - 526