A multi-objective optimization approach for the identification of cancer biomarkers from RNA-seq data

被引:10
|
作者
Coleto-Alcudia, Veredas [1 ]
Vega-Rodriguez, Miguel A. [1 ]
机构
[1] Univ Extremadura, Dept Comp & Commun Technol, Campus Univ S-N, Caceres 10003, Spain
关键词
Multi-objective optimization; Evolutionary computation; Support vector machine; Cancer; Biomarker; RNA-seq; FEATURE-SELECTION; GENE-EXPRESSION; MULTICLASS;
D O I
10.1016/j.eswa.2021.116480
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identification of biomarkers is essential for the diagnosis and prognosis of certain diseases, like cancer. Gene selection purpose is finding the minimum number of genes that can classify a (e.g. normal or tumour) sample with a high accuracy. Therefore, the selected genes can be studied as potential cancer biomarkers. In this article, a new method for gene selection is proposed in two steps. The first step is a filtering of the most relevant genes of a gene expression dataset. In this step, three feature selection methods have been combined. Since gene selection is a two-objective problem (minimizing the number of selected genes while maximizing the classification accuracy), the second step is performed as a multi-objective optimization, using an Artificial Bee Colony based on Dominance (ABCD) algorithm. ABCD algorithm uses internally a support vector machine (SVM) classifier. The method has been tested with five RNA-seq cancer datasets and with a comparative study of the results obtained by the method and by other five methods proposed in the scientific literature by other authors. Finally, in order to check if the genes selected by the proposed method could be studied as biomarkers, the relation between the selected genes and the cancer they belong to is analysed. It can be concluded that the proposed method is effective in gene selection for the identification of cancer biomarkers from RNA-seq data.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] ARH-seq: identification of differential splicing in RNA-seq data
    Rasche, Axel
    Lienhard, Matthias
    Yaspo, Marie-Laure
    Lehrach, Hans
    Herwig, Ralf
    NUCLEIC ACIDS RESEARCH, 2014, 42 (14) : e110
  • [22] voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data
    Zararsiz, Gokmen
    Goksuluk, Dincer
    Klaus, Bernd
    Korkmaz, Selcuk
    Eldem, Vahap
    Karabulut, Erdem
    Ozturk, Ahmet
    PEERJ, 2017, 5
  • [23] Identification of Alternative Splicing and Polyadenylation in RNA-seq Data
    Dixit, Gunjan
    Zheng, Ying
    Parker, Brian
    Wen, Jiayu
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2021, (172):
  • [24] Efficient RNA isoform identification and quantification from RNA-Seq data with network flows
    Bernard, Elsa
    Jacob, Laurent
    Mairal, Julien
    Vert, Jean-Philippe
    BIOINFORMATICS, 2014, 30 (17) : 2447 - 2455
  • [25] A transposable approach to RNA-seq from total RNA
    Scott Kuersten
    Nature Methods, 2013, 10 (9) : 916 - 916
  • [26] A transposable approach to RNA-seq from total RNA
    Scott Kuersten
    Nature Methods, 2012, 9 (6) : i - ii
  • [27] Identification of RNA-Seq Gene Fusions in Ovarian Cancer
    Hsieh, G.
    Szabo, L.
    MacLaughlan, S.
    Salzman, J.
    GYNECOLOGIC ONCOLOGY, 2016, 143 (01) : 218 - 218
  • [28] Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq
    Fan Zhang
    Chris K. Deng
    Mu Wang
    Bin Deng
    Robert Barber
    Gang Huang
    BMC Bioinformatics, 21
  • [29] Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq
    Zhang, Fan
    Deng, Chris K.
    Wang, Mu
    Deng, Bin
    Barber, Robert
    Huang, Gang
    BMC BIOINFORMATICS, 2020, 21 (Suppl 9)
  • [30] Multivariate approach to the analysis of correlated RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1783 - 1786