Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

被引:9
|
作者
Dittman, David J. [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bioinformatics; Feature Selection; Classification; FEATURE-SELECTION; GENE-EXPRESSION;
D O I
10.1109/ICMLA.2013.155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques from the domain of machine learning and data mining are well suited to combating these difficulties. Techniques like feature selection (choosing an optimal subset of features for subsequent analysis by removing irrelevant or redundant features) and classifiers (used to build inductive models in order to classify unknown instances) can assist researchers in working with such difficult datasets. Unfortunately, many practitioners of bioinformatics do not have the machine learning knowledge to choose the correct techniques in order to achieve good classification results. If the choices could be simplified or predetermined then it would be easier to apply the techniques. This study is a comprehensive analysis of machine learning techniques on twenty-five bioinformatics datasets using six classifiers, and twenty-four feature rankers. We analyzed the factors at each of four feature subset sizes chosen for being large enough to be effective in creating inductive models but small enough to be of use for further research. Our results shows that Random Forest with 100 trees is the top performing classifier and that the choice of feature ranker is of little importance as long as feature selection occurs. Statistical analysis confirms our results. By choosing these parameters, machine learning techniques are more accessible to bioinformatics.
引用
收藏
页码:396 / 403
页数:8
相关论文
共 50 条
  • [31] Daleel: Simplifying Cloud Instance Selection Using Machine Learning
    Satnreen, Faiza
    Elkhatib, Yehia
    Rowe, Matthew
    Blair, Gordon S.
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 557 - 563
  • [32] Next generation control units simplifying industrial machine learning
    De Blasi, Stefano
    Engels, Elmar
    2020 IEEE 29TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2020, : 468 - 473
  • [33] Editorial of Special Issue "Deep Learning and Machine Learning in Bioinformatics"
    Kang, Mingon
    Oh, Jung Hun
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (12)
  • [34] Deciphering the role of HLF in idiopathic orbital inflammation: integrative analysis via bioinformatics and machine learning techniques
    Wu, Zixuan
    Song, Qiujie
    Liu, Meiling
    Hu, Yi
    Peng, Xin
    Zhang, Zheyuan
    Yao, Xiaolei
    Peng, Qinghua
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [35] Predicting pavement condition index based on the utilization of machine learning techniques: A case study
    Ali A.A.
    Milad A.
    Hussein A.
    Md Yusoff N.I.
    Heneash U.
    Journal of Road Engineering, 2023, 3 (03) : 266 - 278
  • [36] Predicting pavement condition index based on the utilization of machine learning techniques: A case study
    Abdualmtalab Abdualaziz Ali
    Abdalrhman Milad
    Amgad Hussein
    Nur Izzi Md Yusoff
    Usama Heneash
    Journal of Road Engineering, 2023, 3 (03) : 266 - 278
  • [37] Systematic literature review: Machine learning techniques (machine learning)
    Alfaro, Anderson Damian Jimenez
    Ospina, Jose Vicente Diaz
    CUADERNO ACTIVA, 2021, (13): : 113 - 121
  • [38] Computational intelligence and machine learning in bioinformatics and computational biology
    Chetty, Madhu
    Hallinan, Jennifer
    Ruz, Gonzalo A.
    Wipat, Anil
    BIOSYSTEMS, 2022, 222
  • [39] Special Issue on Bioinformatics and Machine Learning for Cancer Biology
    Wan, Shibiao
    Jiang, Chunjie
    Li, Shengli
    Fan, Yiping
    BIOLOGY-BASEL, 2022, 11 (03):
  • [40] Evolutionary computation, machine learning and data mining in bioinformatics
    Pizzuti, Clara
    Ritchie, Marylyn D.
    Giacobini, Mario
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5483 LNCS