Identifying Differentially Expressed Genes in RNA Sequencing Data with Small Labelled Samples

被引:0
|
作者
Guo Y. [1 ]
Xiao Y. [1 ]
Li L. [1 ]
机构
[1] Xi'an Jiaotong University, School of Mathematics and Statistics, Xi'an,710049, China
关键词
Auxiliary sample; Biological system modeling; Biology; Cancer; Differentially expressed genes; Gene expression; Sequential analysis; Small sample problem; Sociology; Statistics; Two-sample independent test; Wilcoxon-Mann-Whitney test;
D O I
10.1109/TCBB.2024.3382147
中图分类号
学科分类号
摘要
RNA-seq, including bulk RNA-seq and single-cell RNA-seq, is a next-generation sequencing-based RNA profiling method capable of measuring gene expression patterns with high resolution, and has gradually become an essential tool for the analysis of differential gene expression at the whole transcriptome level. Differential gene identification is a key problem in many biological studies such as disease genetics. Two-sample location test methods are widely used in case-control studies to identify the significant differential genes. However, due to the high cost of labelled data collection, many studies face the small sample problem since there is only small labelled data available, for which the traditional methods often lose power. To address this issue, we propose a novel rank-based nonparametric test method called WMW-A test based on <underline>W</underline>ilcoxon-<underline>M</underline>ann-<underline>W</underline>hitiney test by introducing a three-sample statistic through another <underline>a</underline>uxiliary sample, which is either given or generated in form of unlabelled data. By combining the case, control and auxiliary samples together, we construct a three-sample WMW-A statistic based on the gap between the average ranks of the case and control samples in the combined samples. The extensive simulation experiments and real applications on different gene expression datasets, including one bulk RNA-seq dataset and two single cell RNA-seq datasets, show that the WMW-A test could significantly improve the test power for two-sample problem with small sample sizes, by either available or generated auxiliary data. The applications on two real small SARS-CoV-2 datasets further show the improvement of WMW-A test for differentially expressed gene identification with small labelled samples. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [41] Identifying Differentially Expressed Genes and Small Molecule Drugs for Prostate Cancer by a Bioinformatics Strategy
    Li, Jian
    Xu, Ya-Hong
    Lu, Yi
    Ma, Xiao-Ping
    Chen, Ping
    Luo, Shun-Wen
    Jia, Zhi-Gang
    Liu, Yang
    Guo, Yu
    ASIAN PACIFIC JOURNAL OF CANCER PREVENTION, 2013, 14 (09) : 5281 - 5286
  • [42] Identifying differentially spliced genes from two groups of RNA-seq samples
    Wang, Weichen
    Qin, Zhiyi
    Feng, Zhixing
    Wang, Xi
    Zhang, Xuegong
    GENE, 2013, 518 (01) : 164 - 170
  • [43] The impact of sample imbalance on identifying differentially expressed genes
    Kun Yang
    Jianzhong Li
    Hong Gao
    BMC Bioinformatics, 7
  • [44] Protocol Protocol for identifying differentially expressed genes the RumBall
    Nagai, Luis Augusto Eijy
    Lee, Seohyun
    Nakato, Ryuichiro
    STAR PROTOCOLS, 2024, 5 (01):
  • [45] The impact of sample imbalance on identifying differentially expressed genes
    Yang, Kun
    Li, Jianzhong
    Gao, Hong
    BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
  • [46] Robust identification of differentially expressed genes from RNA-seq data
    Shahjaman, Md
    Mollah, Md Manir Hossain
    Rahman, Md Rezanur
    Islam, S. M. Shahinul
    Mollah, Md Nurul Haque
    GENOMICS, 2020, 112 (02) : 2000 - 2010
  • [47] Statistical methods on detecting differentially expressed genes for RNA-seq data
    Chen, Zhongxue
    Liu, Jianzhong
    Ng, Hon Keung Tony
    Nadarajah, Saralees
    Kaufman, Howard L.
    Yang, Jack Y.
    Deng, Youping
    BMC SYSTEMS BIOLOGY, 2011, 5
  • [48] Identifying differentially expressed genes in cDNA microarray experiments
    Baggerly, KA
    Coombes, KR
    Hess, KR
    Stivers, DN
    Abruzzo, LV
    Zhang, W
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) : 639 - 659
  • [49] Identifying differentially expressed transcripts from RNA-seq data with biological variation
    Glaus, Peter
    Honkela, Antti
    Rattray, Magnus
    BIOINFORMATICS, 2012, 28 (13) : 1721 - 1728
  • [50] RNA Sequencing of Sessile Serrated Colon Polyps Identifies Differentially Expressed Genes and Immunohistochemical Markers
    Delker, Don A.
    McGettigan, Brett M.
    Kanth, Priyanka
    Pop, Stelian
    Neklason, Deborah W.
    Bronner, Mary P.
    Burt, Randall W.
    Hagedorn, Curt H.
    PLOS ONE, 2014, 9 (02):