MIDGET:Detecting differential gene expression on microarray data

被引:2
|
作者
Angelescu, Radu [1 ]
Dobrescu, Radu [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp Sci, Dept Automat Control & Ind Informat, Splaiul Independentei 313,Sect 6, Bucharest 060042, Romania
关键词
Differentially expressed genes; Deep neural networks; Gradient boost; Metrics; evaluation;
D O I
10.1016/j.cmpb.2021.106418
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Backgound and Objective : Detecting differentially expressed genes is an important step in genome wide analysis and expression profiling. There are a wide array of algorithms used in today's research based on statistical approaches. Even though the current algorithms work, they sometimes miss-predict. There is no framework available for measuring the quality of current algorithms. New machine learning methods (like gradient boost and deep neural networks) were not used to solve this problem. The Gene-Bench open source python package addresses these issues by providing an evaluation and data handling system for differentially expressed genes detection algorithms on microarray data. We also provide MIDGET, a new group of algorithms based on state of the art machine learning approaches Methods : The Gene Bench package provides data collected from real experiments that consists of 73 transcription-factor perturbation experiments with validation data from Chip-seq experiments and 129 drug perturbation experiments, synthetic data generated with our own method and three evaluation metrics (Kolmogorov, F1 and AUC/ROC). Besides the data and metrics, Gene-Bench also contains well-known algorithms and a new method to identify differentially expressed genes, called MIDGET : Machine learning Identification Differential Gene Expression Tool that is using big-data and machine learning methods to identify differentially expressed genes. The two new groups of machine learning algorithms provided in our package use extreme gradient boosting and deep neural networks to achieve their results. Results : The Gene-Bench package is highly flexible, allows fast prototyping and evaluating of new and old algorithms and provides multiple new machine-learning algorithms (called MIDGET) that perform better on all evaluation metrics than all the other tested alternatives. While everything provided in Gene-Bench is algorithm independent, the user can also use algorithms implemented in the R language even though the package is written in Python. Conclusions : The Gene-Bench package fills a gap in evaluating and benchmarking differential gene detection algorithms. It also provides machine learning methods that perform detection with higher accuracy in all tested metrics. It is available at https://github.com/raduangelescu/GeneBench/ and can be directly installed from the Python Package Index using pip install genebench (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 50 条