HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data

被引:7
|
作者
Wang, Xiao [1 ,2 ]
Wang, Jia [1 ,2 ]
Zhang, Han [3 ]
Huang, Shenwei [1 ,2 ]
Yin, Yanbin [4 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China
[2] Nankai Univ, Tianjin Key Lab Network & Data Secur Technol, Tianjin 300350, Peoples R China
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin 300350, Peoples R China
[4] Univ Nebraska, Dept Food Sci & Technol, Lincoln, NE 68588 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1093/bioinformatics/btab821
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different datasets must be considered. Although several methods have been recently published to remove batch effects in scRNA-seq data, two problems remain to be challenging and not completely solved: (i) how to reduce the distribution differences of different batches more accurately; and (ii) how to align samples from different batches to recover the cell type clusters. Results: We proposed a novel deep-learning approach, which is a hierarchical distribution-matching framework assisted with contrastive learning to address these two problems. Firstly, we design a hierarchical framework for distribution matching based on a deep autoencoder. This framework employs an adversarial training strategy to match the global distribution of different batches. This provides an improved foundation to further match the local distributions with a maximum mean discrepancy-based loss. For local matching, we divide cells in each batch into clusters and develop a contrastive learning mechanism to simultaneously align similar cluster pairs and keep noisy pairs apart from each other. This allows to obtain clusters with all cells of the same type (true positives), and avoid clusters with cells of different type (false positives). We demonstrate the effectiveness of our method on both simulated and real datasets. Results show that our new method significantly outperforms the state-of-the-art methods and has the ability to prevent overcorrection.
引用
收藏
页码:1295 / 1303
页数:9
相关论文
共 50 条
  • [1] Deep Batch Integration and Denoise of Single-Cell RNA-Seq Data
    Qin, Lu
    Zhang, Guangya
    Zhang, Shaoqiang
    Chen, Yong
    [J]. ADVANCED SCIENCE, 2024, 11 (29)
  • [2] Deep Learning for Clustering Single-cell RNA-seq Data
    Zhu, Yuan
    Bai, Litai
    Ning, Zilin
    Fu, Wenfei
    Liu, Jie
    Jiang, Linfeng
    Fei, Shihuang
    Gong, Shiyun
    Lu, Lulu
    Deng, Minghua
    Yi, Ming
    [J]. CURRENT BIOINFORMATICS, 2024, 19 (03) : 193 - 210
  • [3] Deep learning-based classifier for malignant plasma cell identification from single-cell RNA-seq data
    Satpathy, Sarthak
    Pilcher, William
    Prahalad, Vaishali
    Bhasin, Manoj
    [J]. CLINICAL LYMPHOMA MYELOMA & LEUKEMIA, 2023, 23 : S242 - S243
  • [4] scDLC: a deep learning framework to classify large sample single-cell RNA-seq data
    Yan Zhou
    Minjiao Peng
    Bin Yang
    Tiejun Tong
    Baoxue Zhang
    Niansheng Tang
    [J]. BMC Genomics, 23
  • [5] scDLC: a deep learning framework to classify large sample single-cell RNA-seq data
    Zhou, Yan
    Peng, Minjiao
    Yang, Bin
    Tong, Tiejun
    Zhang, Baoxue
    Tang, Niansheng
    [J]. BMC GENOMICS, 2022, 23 (01)
  • [6] CellMixS: quantifying and visualizing batch effects in single-cell RNA-seq data
    Luetge, Almut
    Zyprych-Walczak, Joanna
    Kunzmann, Urszula Brykczynska
    Crowell, Helena L.
    Calini, Daniela
    Malhotra, Dheeraj
    Soneson, Charlotte
    Robinson, Mark D.
    [J]. LIFE SCIENCE ALLIANCE, 2021, 4 (06)
  • [7] A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data
    Li, Ziyi
    Wang, Yizhuo
    Ganan-Gomez, Irene
    Colla, Simona
    Do, Kim-Anh
    [J]. BIOINFORMATICS, 2022, 38 (21) : 4885 - 4892
  • [8] Emerging deep learning methods for single-cell RNA-seq data analysis
    Zheng, Jie
    Wang, Ke
    [J]. QUANTITATIVE BIOLOGY, 2019, 7 (04) : 247 - 254
  • [9] Emerging deep learning methods for single-cell RNA-seq data analysis
    Jie Zheng
    Ke Wang
    [J]. Quantitative Biology., 2019, 7 (04) - 254
  • [10] Clustering single-cell RNA-seq data with a model-based deep learning approach
    Tian, Tian
    Wan, Ji
    Song, Qi
    Wei, Zhi
    [J]. NATURE MACHINE INTELLIGENCE, 2019, 1 (04) : 191 - 198