MIDGET:Detecting differential gene expression on microarray data

被引:2
|
作者
Angelescu, Radu [1 ]
Dobrescu, Radu [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp Sci, Dept Automat Control & Ind Informat, Splaiul Independentei 313,Sect 6, Bucharest 060042, Romania
关键词
Differentially expressed genes; Deep neural networks; Gradient boost; Metrics; evaluation;
D O I
10.1016/j.cmpb.2021.106418
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Backgound and Objective : Detecting differentially expressed genes is an important step in genome wide analysis and expression profiling. There are a wide array of algorithms used in today's research based on statistical approaches. Even though the current algorithms work, they sometimes miss-predict. There is no framework available for measuring the quality of current algorithms. New machine learning methods (like gradient boost and deep neural networks) were not used to solve this problem. The Gene-Bench open source python package addresses these issues by providing an evaluation and data handling system for differentially expressed genes detection algorithms on microarray data. We also provide MIDGET, a new group of algorithms based on state of the art machine learning approaches Methods : The Gene Bench package provides data collected from real experiments that consists of 73 transcription-factor perturbation experiments with validation data from Chip-seq experiments and 129 drug perturbation experiments, synthetic data generated with our own method and three evaluation metrics (Kolmogorov, F1 and AUC/ROC). Besides the data and metrics, Gene-Bench also contains well-known algorithms and a new method to identify differentially expressed genes, called MIDGET : Machine learning Identification Differential Gene Expression Tool that is using big-data and machine learning methods to identify differentially expressed genes. The two new groups of machine learning algorithms provided in our package use extreme gradient boosting and deep neural networks to achieve their results. Results : The Gene-Bench package is highly flexible, allows fast prototyping and evaluating of new and old algorithms and provides multiple new machine-learning algorithms (called MIDGET) that perform better on all evaluation metrics than all the other tested alternatives. While everything provided in Gene-Bench is algorithm independent, the user can also use algorithms implemented in the R language even though the package is written in Python. Conclusions : The Gene-Bench package fills a gap in evaluating and benchmarking differential gene detection algorithms. It also provides machine learning methods that perform detection with higher accuracy in all tested metrics. It is available at https://github.com/raduangelescu/GeneBench/ and can be directly installed from the Python Package Index using pip install genebench (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Clustering methods for microarray gene expression data
    Belacel, Nabil
    Wang, Qian
    Cuperlovic-Culf, Miroslava
    [J]. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2006, 10 (04) : 507 - 531
  • [42] Discriminatory mining of gene expression microarray data
    Wang, ZY
    Wang, Y
    Lu, JP
    Kung, SY
    Zhang, JY
    Lee, R
    Xuan, JH
    Khan, JV
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2003, 35 (03): : 255 - 272
  • [43] A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series
    Stegle, Oliver
    Denby, Katherine
    Wild, David L.
    Ghahramani, Zoubin
    Borgwardt, Karsten M.
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2009, 5541 : 201 - +
  • [44] A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series
    Stegle, Oliver
    Denby, Katherine J.
    Cooke, Emma J.
    Wild, David L.
    Ghahramani, Zoubin
    Borgwardt, Karsten M.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (03) : 355 - 367
  • [45] A Bayesian approach to assessing differential expression of microarray data
    Shieh, Grace S.
    Fan, Tsai-Hung
    Chu, Hsueh-Ping
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2008, 78 (02) : 179 - 191
  • [46] Incorporating gene functional annotations in detecting differential gene expression
    Pan, Wei
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2006, 55 : 301 - 316
  • [47] Application of a priori established gene sets to discover biologically important differential expression in microarray data
    Bild, A
    Febbo, PG
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (43) : 15278 - 15279
  • [48] On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data
    Newton, MA
    Kendziorski, CM
    Richmond, CS
    Blattner, FR
    Tsui, KW
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (01) : 37 - 52
  • [49] Analysis of Differential Expression of microRNAs and Their Target Genes in Prostate Cancer: A Bioinformatics Study on Microarray Gene Expression Data
    Khorasani, Maryam
    Shahbazi, Shirin
    Hosseinkhan, Nazanin
    Mahdian, Reza
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR AND CELLULAR MEDICINE, 2019, 8 (02)
  • [50] Gene Screening and Clustering of Yeast Microarray Gene Expression Data
    Lee, Kyunga
    Kim, Taehoun
    Kim, Jaehee
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2011, 24 (06) : 1077 - 1094