Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level

被引:25
|
作者
Castillo, Daniel [1 ,4 ]
Manuel Galvez, Juan [1 ]
Herrera, Luis J. [1 ]
Rojas, Fernando [1 ]
Valenzuela, Olga [2 ]
Caba, Octavio [3 ]
Prados, Jose [3 ]
Rojas, Ignacio [1 ]
机构
[1] Univ Granada, Dept Comp Architecture & Comp Technol, Granada, Spain
[2] Univ Granada, Dept Appl Math, Granada, Spain
[3] Univ Granada, Inst Biopathol & Regenerat Med IBIMER, Ctr Biomed Res CIBM, Granada, Spain
[4] Univ Granada, Res Ctr Informat & Commun Technol CITIC, C Periodista Rafael Gomez Montero 2,D1-7 Off, E-18071 Granada, Spain
来源
PLOS ONE | 2019年 / 14卷 / 02期
关键词
ACUTE MYELOID-LEUKEMIA; CHRONIC LYMPHOCYTIC-LEUKEMIA; READ ALIGNMENT; PROTEIN; EVOLUTION; AML; DIFFERENTIATION; PROLIFERATION; PROGRESSION; OMICS;
D O I
10.1371/journal.pone.0212127
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In more recent years, a significant increase in the number of available biological experiments has taken place due to the widespread use of massive sequencing data. Furthermore, the continuous developments in the machine learning and in the high performance computing areas, are allowing a faster and more efficient analysis and processing of this type of data. However, biological information about a certain disease is normally widespread due to the use of different sequencing technologies and different manufacturers, in different experiments along the years around the world. Thus, nowadays it is of paramount importance to attain a correct integration of biologically-related data in order to achieve genuine benefits from them. For this purpose, this work presents an integration of multiple Microarray and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples from the main four types of leukemia, quantified at gene expression. Subsequently, in order to find a set of differentially expressed genes with the highest discernment capability among different types of leukemia, an innovative parameter referred to as coverage is presented here. This parameter allows assessing the number of different pathologies that a certain gen is able to discern. It has been evaluated together with other widely known parameters under assessment of an ANOVA statistical test which corroborated its filtering power when the identified genes are subjected to a machine learning process at multiclass level. The optimal tuning of gene extraction evaluated parameters by means of this statistical test led to the selection of 42 highly relevant expressed genes. By the use of minimum-Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were reordered and assessed under the operation of four different classification techniques. Outstanding results were achieved by taking exclusively the first ten genes of the ranking into consideration. Finally, specific literature was consulted on this last subset of genes, revealing the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter which facilitates the identification of highly valid expressed genes for simultaneously discerning multiple types of leukemia.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data
    Garcia-Diaz, Pilar
    Sanchez-Berriel, Isabel
    Martinez-Rojas, Juan A.
    Diez-Pascual, Ana M.
    [J]. GENOMICS, 2020, 112 (02) : 1916 - 1925
  • [2] RNA-seq and microarray gene expression vie for toxicogenomics superiority
    Tong, W.
    [J]. TOXICOLOGY LETTERS, 2015, 238 (02) : S226 - S227
  • [3] Comparing Bioinformatic Gene Expression Profiling Methods: Microarray and RNA-Seq
    Mantione, Kirk J.
    Kream, Richard M.
    Kuzelova, Hana
    Ptacek, Radek
    Raboch, Jiri
    Samuel, Joshua M.
    Stefano, George B.
    [J]. MEDICAL SCIENCE MONITOR BASIC RESEARCH, 2014, 20 : 138 - 141
  • [4] Meta-analysis of microarray and RNA-Seq gene expression datasets for carcinogenic risk: An assessment of Bisphenol A
    Jung, Junghyun
    Mok, Changsoo
    Lee, Woosuk
    Jang, Wonhee
    [J]. MOLECULAR & CELLULAR TOXICOLOGY, 2017, 13 (02) : 239 - 249
  • [5] Meta-analysis of microarray and RNA-Seq gene expression datasets for carcinogenic risk: An assessment of Bisphenol A
    Junghyun Jung
    Changsoo Mok
    Woosuk Lee
    Wonhee Jang
    [J]. Molecular & Cellular Toxicology, 2017, 13 : 239 - 249
  • [6] RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
    Marioni, John C.
    Mason, Christopher E.
    Mane, Shrikant M.
    Stephens, Matthew
    Gilad, Yoav
    [J]. GENOME RESEARCH, 2008, 18 (09) : 1509 - 1517
  • [7] Interpretation of differential gene expression results of RNA-seq data: review and integration
    McDermaid, Adam
    Monier, Brandon
    Zhao, Jing
    Liu, Bingqiang
    Ma, Qin
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) : 2044 - 2054
  • [8] Towards the integration, annotation and association of historical microarray experiments with RNA-seq
    Shweta S Chavan
    Michael A Bauer
    Erich A Peterson
    Christoph J Heuck
    Donald J Johann
    [J]. BMC Bioinformatics, 14
  • [9] Towards the integration, annotation and association of historical microarray experiments with RNA-seq
    Chavan, Shweta S.
    Bauer, Michael A.
    Peterson, Erich A.
    Heuck, Christoph J.
    Johann, Donald J., Jr.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [10] QCprocSE: R package for quality control of processed gene expression data from microarray or RNA-seq experiments
    Szymczak, Silke
    [J]. GENETIC EPIDEMIOLOGY, 2020, 44 (05) : 521 - 521