Methodology to identify a gene expression signature by merging microarray datasets

被引:1
|
作者
Fajarda, Olga [1 ]
Almeida, Joao Rafael [2 ]
Duarte-Pereira, Sara [3 ,4 ]
Silva, Raquel M. [5 ]
Oliveira, Jose Luis [1 ]
机构
[1] Univ Aveiro, DETI, IEETA, LASI, Aveiro, Portugal
[2] Univ A Coruna, Dept Computat, La Coruna, Spain
[3] Univ Aveiro, Dept Med Sci, Aveiro, Portugal
[4] Univ Aveiro, iBiMED Inst Biomed, Aveiro, Portugal
[5] Univ Catolica Portuguesa, Fac Dent Med FMD, Ctr Interdisciplinary Res Hlth CIIS, Viseu, Portugal
关键词
Microarray data; Gene expression signature; Random forest; LSVM; Neural network; Heart failure; Autism spectrum disorder; POWERFUL APPROACH; NORMALIZATION; PREDICTION; DISCOVERY;
D O I
10.1016/j.compbiomed.2023.106867
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Merging microarray studies to identify a common gene expression signature to several structural heart diseases
    Fajarda, Olga
    Duarte-Pereira, Sara
    Silva, Raquel M.
    Oliveira, Jose Luis
    [J]. BIODATA MINING, 2020, 13 (01)
  • [2] Merging microarray studies to identify a common gene expression signature to several structural heart diseases
    Olga Fajarda
    Sara Duarte-Pereira
    Raquel M. Silva
    José Luís Oliveira
    [J]. BioData Mining, 13
  • [3] A NEW APPROACH FOR MERGING GENE EXPRESSION DATASETS
    Roubaud, Marie-Christine
    Torresani, Bruno
    [J]. 2011 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2011, : 129 - 132
  • [4] USING META-ANALYSIS OF PUBLICLY AVAILABLE GENOMIC EXPRESSION DATASETS TO IDENTIFY A GENE EXPRESSION SIGNATURE OF METASTASIS IN OSTEOSARCOMA
    Hawkins, Marla A.
    Yu, Alexander
    Hilsenbeck, Susan G.
    Guerra, Rudy
    Lau, Ching
    Man, Chris T.
    [J]. PEDIATRIC BLOOD & CANCER, 2009, 52 (06) : 738 - 738
  • [5] Gene expression profiling of corona virus microarray datasets to identify crucial targets in COVID-19 patients
    Ramesh, Priyanka
    Veerappapillai, Shanthi
    Karuppasamy, Ramanathan
    [J]. GENE REPORTS, 2021, 22
  • [6] Gene Expression Signature in Endemic Osteoarthritis by Microarray Analysis
    Wang, Xi
    Ning, Yujie
    Zhang, Feng
    Yu, Fangfang
    Tan, Wuhong
    Lei, Yanxia
    Wu, Cuiyan
    Zheng, Jingjing
    Wang, Sen
    Yu, Hanjie
    Li, Zheng
    Lammi, Mikko J.
    Guo, Xiong
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2015, 16 (05) : 11465 - 11481
  • [7] Use of microarray technology to identify differential gene expression in Down syndrome placentae: validation of methodology.
    Dar, P
    Ferreira, JC
    Livne, KC
    Segal, J
    Khabele, D
    Gross, BL
    Gross, SJ
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (05) : 591 - 591
  • [8] Methods for evaluating gene expression from Affymetrix microarray datasets
    Jiang, Ning
    Leach, Lindsey J.
    Hu, Xiaohua
    Potokina, Elena
    Jia, Tianye
    Druka, Arnis
    Waugh, Robbie
    Kearsey, Michael J.
    Luo, Zewei W.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [9] Methods for evaluating gene expression from Affymetrix microarray datasets
    Ning Jiang
    Lindsey J Leach
    Xiaohua Hu
    Elena Potokina
    Tianye Jia
    Arnis Druka
    Robbie Waugh
    Michael J Kearsey
    Zewei W Luo
    [J]. BMC Bioinformatics, 9
  • [10] Consensus gene regulatory networks: combining multiple microarray gene expression datasets
    Peeling, Emma
    Tucker, Allan
    [J]. COMPLIFE 2007: 3RD INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL LIFE SCIENCE, 2007, 940 : 38 - +