A novel feature selection for RNA-seq analysis

被引:10
|
作者
Han, Henry [1 ]
机构
[1] Fordham Univ, Dept Comp & Informat Sci, Lincon Ctr, New York, NY 10023 USA
关键词
RNA-seq; Feature selection; Differential expression analysis; DIFFERENTIAL EXPRESSION; GENE; CANCER;
D O I
10.1016/j.compbiolchem.2017.10.010
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA-seq data are challenging existing omics data analytics for its volume and complexity. Although quite a few computational models were proposed from different standing points to conduct differential expression (D.E.) analysis, almost all these methods do not provide a rigorous feature selection for high dimensional RNA-seq count data. Instead, most or even all genes are invited into differential calls no matter they have real contributions to data variations or not. Thus, it would inevitably affect the robustness of D.E. analysis and lead to the increase of false positive ratios. In this study, we presented a novel feature selection method: nonnegative singular value approximation (NSVA) to enhance RNA-seq differential expression analysis by taking advantage of RNA-seq count data's non-negativity. As a variance-based feature selection method, it selects genes according to its contribution to the first singular value direction of input data in a data-driven approach. It demonstrates robustness to depth bias and gene length bias in feature selection in comparison with its five peer methods. Combining with state-of-the-art RNA-seq differential expression analysis, it contributes to enhancing differential expression analysis by lowering false discovery rates caused by the biases. Furthermore, we demonstrated the effectiveness of the proposed feature selection by proposing a data-driven differential expression analysis: NSVA-seq, besides conducting network marker discovery. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:245 / 257
页数:13
相关论文
共 50 条
  • [1] Data Driven Feature Selection for RNA-Seq Differential Expression Analysis
    Han, Henry
    [J]. PATTERN RECOGNITION IN BIOINFORMATICS, PRIB 2014, 2014, 8626 : 114 - 115
  • [2] Analysis of Ensemble Feature Selection for Correlated High-Dimensional RNA-Seq Cancer Data
    Polewko-Klim, Aneta
    Rudnicki, Witold R.
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT III, 2020, 12139 : 525 - 538
  • [3] RNA-Seq UD: A bioinformatics plattform for RNA-Seq analysis
    Ramirez, Miguel
    Alejandro Rojas-Quintero, Cristian
    Enrique Vera-Parra, Nelson
    [J]. 2015 10TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2015,
  • [4] FBA: feature barcoding analysis for single cell RNA-Seq
    Duan, Jialei
    Hon, Gary C.
    [J]. BIOINFORMATICS, 2021, 37 (22) : 4266 - 4268
  • [5] Gene ontology analysis for RNA-seq: accounting for selection bias
    Young, Matthew D.
    Wakefield, Matthew J.
    Smyth, Gordon K.
    Oshlack, Alicia
    [J]. GENOME BIOLOGY, 2010, 11 (02):
  • [6] Gene ontology analysis for RNA-seq: accounting for selection bias
    Matthew D Young
    Matthew J Wakefield
    Gordon K Smyth
    Alicia Oshlack
    [J]. Genome Biology, 11
  • [7] Semi-supervised Feature Extraction for RNA-Seq Data Analysis
    Liu, Jin-Xing
    Xu, Yong
    Gao, Ying-Lian
    Wang, Dong
    Zheng, Chun-Hou
    Shang, Jun-Liang
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 679 - 685
  • [8] RNA-Seq analysis in MeV
    Howe, Eleanor A.
    Sinha, Raktim
    Schlauch, Daniel
    Quackenbush, John
    [J]. BIOINFORMATICS, 2011, 27 (22) : 3209 - 3210
  • [9] RNA-seq analysis for Dystrophinopathy
    Okubo, M.
    Noguchi, S.
    Hayashi, S.
    Komaki, H.
    Nishino, I.
    [J]. NEUROMUSCULAR DISORDERS, 2021, 31 : S84 - S84
  • [10] Advancing RNA-Seq analysis
    Haas, Brian J.
    Zody, Michael C.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 421 - 423