MIDGET:Detecting differential gene expression on microarray data

被引:2
|
作者
Angelescu, Radu [1 ]
Dobrescu, Radu [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp Sci, Dept Automat Control & Ind Informat, Splaiul Independentei 313,Sect 6, Bucharest 060042, Romania
关键词
Differentially expressed genes; Deep neural networks; Gradient boost; Metrics; evaluation;
D O I
10.1016/j.cmpb.2021.106418
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Backgound and Objective : Detecting differentially expressed genes is an important step in genome wide analysis and expression profiling. There are a wide array of algorithms used in today's research based on statistical approaches. Even though the current algorithms work, they sometimes miss-predict. There is no framework available for measuring the quality of current algorithms. New machine learning methods (like gradient boost and deep neural networks) were not used to solve this problem. The Gene-Bench open source python package addresses these issues by providing an evaluation and data handling system for differentially expressed genes detection algorithms on microarray data. We also provide MIDGET, a new group of algorithms based on state of the art machine learning approaches Methods : The Gene Bench package provides data collected from real experiments that consists of 73 transcription-factor perturbation experiments with validation data from Chip-seq experiments and 129 drug perturbation experiments, synthetic data generated with our own method and three evaluation metrics (Kolmogorov, F1 and AUC/ROC). Besides the data and metrics, Gene-Bench also contains well-known algorithms and a new method to identify differentially expressed genes, called MIDGET : Machine learning Identification Differential Gene Expression Tool that is using big-data and machine learning methods to identify differentially expressed genes. The two new groups of machine learning algorithms provided in our package use extreme gradient boosting and deep neural networks to achieve their results. Results : The Gene-Bench package is highly flexible, allows fast prototyping and evaluating of new and old algorithms and provides multiple new machine-learning algorithms (called MIDGET) that perform better on all evaluation metrics than all the other tested alternatives. While everything provided in Gene-Bench is algorithm independent, the user can also use algorithms implemented in the R language even though the package is written in Python. Conclusions : The Gene-Bench package fills a gap in evaluating and benchmarking differential gene detection algorithms. It also provides machine learning methods that perform detection with higher accuracy in all tested metrics. It is available at https://github.com/raduangelescu/GeneBench/ and can be directly installed from the Python Package Index using pip install genebench (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] A two-step strategy for detecting differential gene expression in cDNA microarray data
    Yan Lu
    Jun Zhu
    Pengyuan Liu
    [J]. Current Genetics, 2005, 47 : 121 - 131
  • [2] A two-step strategy for detecting differential gene expression in cDNA microarray data
    Lu, Y
    Zhu, J
    Liu, PY
    [J]. CURRENT GENETICS, 2005, 47 (02) : 121 - 131
  • [3] Detecting differential expression in microarray data: comparison of optimal procedures
    Perelman, Elena
    Ploner, Alexander
    Calza, Stefano
    Pawitan, Yudi
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [4] Detecting differential expression in microarray data: comparison of optimal procedures
    Elena Perelman
    Alexander Ploner
    Stefano Calza
    Yudi Pawitan
    [J]. BMC Bioinformatics, 8
  • [5] Differential analysis of DNA microarray gene expression data
    Hatfield, GW
    Hung, SP
    Baldi, P
    [J]. MOLECULAR MICROBIOLOGY, 2003, 47 (04) : 871 - 877
  • [6] Weighted Change-Point Method for Detecting Differential Gene Expression in Breast Cancer Microarray Data
    Wang, Yao
    Sun, Guang
    Ji, Zhaohua
    Xing, Chong
    Liang, Yanchun
    [J]. PLOS ONE, 2012, 7 (01):
  • [7] Detecting clusters of different geometrical shapes in microarray gene expression data
    Kim, DW
    Lee, KH
    Lee, D
    [J]. BIOINFORMATICS, 2005, 21 (09) : 1927 - 1934
  • [8] A flexible approximate likelihood ratio test for detecting differential expression in microarray data
    Hossain, Ahmed
    Beyene, Joseph
    Willan, Andrew R.
    Hu, Pingzhao
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (10) : 3685 - 3695
  • [9] Practical Quality Assessment of Microarray Data by Simulation of Differential Gene Expression
    Howard, Brian E.
    Sick, Beate
    Heber, Steffen
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS: 5TH INTERNATIONAL SYMPOSIUM, ISBRA 2009, 2009, 5542 : 18 - +
  • [10] Detecting genetic variation in microarray expression data
    Greenhall, Jennifer A.
    Zapala, Matthew A.
    Caceres, Mario
    Libiger, Ondrej
    Barlow, Carrolee
    Schork, Nicholas J.
    Lockhart, David J.
    [J]. GENOME RESEARCH, 2007, 17 (08) : 1228 - 1235