Approximate distance correlation for selecting highly interrelated genes across datasets

被引:3
|
作者
Shen, Qunlun [1 ,2 ]
Zhang, Shihua [1 ,2 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, RCSDS, CEMS,NCMIS, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming, Yunnan, Peoples R China
[4] Chinese Acad Sci, Univ Chinese Acad Sci, Hangzhou Inst Adv Study, Key Lab Syst Biol, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
CELL RNA-SEQ; EXPRESSION; PREDICTION; DISCOVERY; CANCER; ATLAS;
D O I
10.1371/journal.pcbi.1009548
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies. Author summaryThe number and size of biological datasets (e.g., single-cell RNA-seq datasets) are booming recently. How to mine the relationships of genes across datasets is becoming an important issue. Computational tools of identifying differentially expressed genes have been comprehensively studied, but the interrelated genes across datasets are always neglected. Detecting of highly interrelated genes across datasets is hindered because the samples of them are always different and they could have different numbers of samples. To solve this problem, we present a new algorithm that can identify interrelated genes across datasets based on distance correlation. Our proposed algorithm is very efficient and works well in different technologies, i.e., RNA-seq, single-cell RNA-seq and single-cell ATAC-seq. Also, we found that the number of such highly interrelated genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.
引用
收藏
页数:18
相关论文
共 41 条
  • [31] A powerful nonparametric method for detecting differentially co-expressed genes: distance correlation screening and edge-count test
    Zhang, Qingyang
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [32] The Correlation between Running Economy and Maximal Oxygen Uptake: Cross-Sectional and Longitudinal Relationships in Highly Trained Distance Runners
    Shaw, Andrew J.
    Ingham, Stephen A.
    Atkinson, Greg
    Folland, Jonathan P.
    PLOS ONE, 2015, 10 (04):
  • [33] Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques
    Ross, Cody T.
    Roodgarz, Morteza
    Smith, David Glenn
    PLOS ONE, 2015, 10 (04):
  • [34] Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes
    Hung-Chung Huang
    Siyuan Zheng
    Zhongming Zhao
    BMC Bioinformatics, 11 (Suppl 4)
  • [35] Kinetics of lipogenic genes expression in milk purified mammary epithelial cells (MEC) across lactation and their correlation with milk and fat yield in buffalo
    Yadav, Poonam
    Kumar, Parveen
    Mukesh, Manishi
    Kataria, R. S.
    Yadav, Anita
    Mohanty, A. K.
    Mishra, B. P.
    RESEARCH IN VETERINARY SCIENCE, 2015, 99 : 129 - 136
  • [36] Relationship between pulmonary function, throw distance, and psychological competitive ability of elite highly trained Japanese boccia players via correlation analysis
    Ichiba, Tomomi
    Okuda, Kuniharu
    Miyagawa, Tetsuo
    Kataoka, Masataka
    Yahagi, Kousuke
    HELIYON, 2020, 6 (03)
  • [37] Computational workflow for investigating highly variable genes in single-cell RNA-seq across multiple time points and cell types
    Arora, Jantarika Kumar
    Opasawatchai, Anunya
    Teichmann, Sarah A.
    Matangkasombut, Ponpan
    Charoensawan, Varodom
    STAR PROTOCOLS, 2023, 4 (03):
  • [38] Correlation of The Curved Distance Between the Distal Surfaces of Maxillary Canines and The Combined Width of the Six Maxillary Anterior Teeth When Selecting Artificial Teeth for Malay, Chinese and Indian Community in Malaysia.
    Kumar, Kiran K. S.
    Lin, Yong Heng
    Ping, Geraldine Lim Wan
    Sze, Phyllis Ong Hui
    Ling, Laura Lau Mei
    RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (05): : 2152 - 2157
  • [39] Molecular evolution and correlation of Hoxa-11 and Hoxa-13 genes to skeletal evolution and gene expression changes across the fin to limb transition
    Harrison, Luke
    Larsson, Hans
    JOURNAL OF VERTEBRATE PALEONTOLOGY, 2007, 27 (03) : 87A - 87A
  • [40] Strong correlation between cross-amplification success and genetic distance across all members of 'True Salamanders' (Amphibia: Salamandridae) revealed by Salamandra salamandra-specific microsatellite loci
    Hendrix, Ralf
    Hauswaldt, J. Susanne
    Veith, Michael
    Steinfartz, Sebastian
    MOLECULAR ECOLOGY RESOURCES, 2010, 10 (06) : 1038 - 1047