Identifying Differentially Expressed Genes in RNA Sequencing Data with Small Labelled Samples

被引:0
|
作者
Guo Y. [1 ]
Xiao Y. [1 ]
Li L. [1 ]
机构
[1] Xi'an Jiaotong University, School of Mathematics and Statistics, Xi'an,710049, China
关键词
Auxiliary sample; Biological system modeling; Biology; Cancer; Differentially expressed genes; Gene expression; Sequential analysis; Small sample problem; Sociology; Statistics; Two-sample independent test; Wilcoxon-Mann-Whitney test;
D O I
10.1109/TCBB.2024.3382147
中图分类号
学科分类号
摘要
RNA-seq, including bulk RNA-seq and single-cell RNA-seq, is a next-generation sequencing-based RNA profiling method capable of measuring gene expression patterns with high resolution, and has gradually become an essential tool for the analysis of differential gene expression at the whole transcriptome level. Differential gene identification is a key problem in many biological studies such as disease genetics. Two-sample location test methods are widely used in case-control studies to identify the significant differential genes. However, due to the high cost of labelled data collection, many studies face the small sample problem since there is only small labelled data available, for which the traditional methods often lose power. To address this issue, we propose a novel rank-based nonparametric test method called WMW-A test based on <underline>W</underline>ilcoxon-<underline>M</underline>ann-<underline>W</underline>hitiney test by introducing a three-sample statistic through another <underline>a</underline>uxiliary sample, which is either given or generated in form of unlabelled data. By combining the case, control and auxiliary samples together, we construct a three-sample WMW-A statistic based on the gap between the average ranks of the case and control samples in the combined samples. The extensive simulation experiments and real applications on different gene expression datasets, including one bulk RNA-seq dataset and two single cell RNA-seq datasets, show that the WMW-A test could significantly improve the test power for two-sample problem with small sample sizes, by either available or generated auxiliary data. The applications on two real small SARS-CoV-2 datasets further show the improvement of WMW-A test for differentially expressed gene identification with small labelled samples. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [1] RNA Sequencing of Pooled Samples Effectively Identifies Differentially Expressed Genes
    Ko, Bokang
    Van Raamsdonk, Jeremy M.
    BIOLOGY-BASEL, 2023, 12 (06):
  • [2] An algorithm for identifying differentially expressed genes in multiclass RNA-seq samples
    An, Jaehyun
    Kim, Kwangsoo
    Kim, Sun
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 40 - +
  • [3] A balanced method detecting differentially expressed genes for RNA-sequencing data
    Tang, Jinyang
    Wang, Fei
    IFAC PAPERSONLINE, 2015, 48 (28): : 27 - 32
  • [4] The study of differentially expressed genes in metachronous colorectal liver metastasis samples using RNA sequencing
    Rao, B. H.
    Seberova, K.
    Liska, V.
    Vycital, O.
    Fiala, O.
    Hlavac, V.
    Soucek, P.
    ANNALS OF ONCOLOGY, 2024, 35 : S56 - S56
  • [5] Performances evaluation of algorithms for identifying differentially expressed genes in RNA-seq data
    Wu, Chin-Ting
    Tsai, Mong-Hsun
    Lu, Tzu-Pin
    Lai, Liang-Chuan
    Chuang, Eric Y.
    CANCER RESEARCH, 2015, 75
  • [6] Variance component testing for identifying differentially expressed genes in RNA-seq data
    Yang, Sheng
    Shao, Fang
    Duan, Weiwei
    Zhao, Yang
    Chen, Feng
    PEERJ, 2017, 5
  • [7] Identifying Differentially Expressed Genes of Zero Inflated Single Cell RNA Sequencing Data Using Mixed Model Score Tests
    He, Zhiqiang
    Pan, Yueyun
    Shao, Fang
    Wang, Hui
    FRONTIERS IN GENETICS, 2021, 12
  • [8] A Bayesian Model Selection Approach for Identifying Differentially Expressed Transcripts from RNA Sequencing Data
    Papastamoulis, Panagiotis
    Rattray, Magnus
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2018, 67 (01) : 3 - 23
  • [9] DEGseq: an R package for identifying differentially expressed genes from RNA-seq data
    Wang, Likun
    Feng, Zhixing
    Wang, Xi
    Wang, Xiaowo
    Zhang, Xuegong
    BIOINFORMATICS, 2010, 26 (01) : 136 - 138
  • [10] SPRING: A METHOD FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA
    Tian, Yuan
    Liu, Guixia
    Wu, Chunguo
    Rong, Guang
    Sun, An
    BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2013, 27 (05) : 4150 - 4156