Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data

被引:6
|
作者
Xiong, Ke-Xu [1 ,2 ]
Zhou, Han-Lin [2 ,3 ,4 ,5 ,6 ,7 ]
Lin, Cong [2 ,5 ,6 ,8 ]
Yin, Jian-Hua [2 ,5 ,6 ,8 ]
Kristiansen, Karsten [2 ,7 ]
Yang, Huan-Ming [2 ,9 ]
Li, Gui-Bo [2 ,3 ,4 ,5 ,6 ,8 ]
机构
[1] Univ Chinese Acad Sci, Coll Life Sci, Beijing 100049, Peoples R China
[2] BGI Shenzhen, Shenzhen 518083, Peoples R China
[3] Zhengzhou Univ, BGI Coll, Zhengzhou, Peoples R China
[4] Zhengzhou Univ, Henan Inst Med & Pharmaceut Sci, Zhengzhou, Peoples R China
[5] BGI Shenzhen, BGI Henan, Xinxiang 453000, Henan, Peoples R China
[6] BGI Shenzhen, Shenzhen Key Lab Genom, Guangdong Prov Key Lab Human Dis Genom, Shenzhen 518083, Peoples R China
[7] Univ Copenhagen, Dept Biol, Lab Genom & Mol Biomed, DK-2100 Copenhagen, Denmark
[8] BGI Shenzhen, Shenzhen Key Lab Single Cell Omics, Shenzhen 518083, Peoples R China
[9] James D Watson Inst Genome Sci, Hangzhou 310008, Peoples R China
关键词
SEQ;
D O I
10.1038/s42003-022-03476-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the unmet need to choose the suitable doublet detection method, an ensemble machine learning algorithm called Chord was developed, which integrates multiple methods and achieves higher accuracy and stability on different scRNA-seq datasets. High-throughput single-cell RNA sequencing (scRNA-seq) is a popular method, but it is accompanied by doublet rate problems that disturb the downstream analysis. Several computational approaches have been developed to detect doublets. However, most of these methods may yield satisfactory performance in some datasets but lack stability in others; thus, it is difficult to regard a single method as the gold standard which can be applied to all types of scenarios. It is a difficult and time-consuming task for researchers to choose the most appropriate software. We here propose Chord which implements a machine learning algorithm that integrates multiple doublet detection methods to address these issues. Chord had higher accuracy and stability than the individual approaches on different datasets containing real and synthetic data. Moreover, Chord was designed with a modular architecture port, which has high flexibility and adaptability to the incorporation of any new tools. Chord is a general solution to the doublet detection problem.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data
    Feng, Hongsong
    Cottrell, Sean
    Hozumi, Yuta
    Wei, Guo-Wei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171
  • [32] Bulk and single-cell RNA-sequencing analyses along with abundant machine learning methods identify a novel monocyte signature in SKCM
    Liu, Yuyao
    Zhang, Haoxue
    Mao, Yan
    Shi, Yangyang
    Wang, Xu
    Shi, Shaomin
    Hu, Delin
    Liu, Shengxiu
    FRONTIERS IN IMMUNOLOGY, 2023, 14
  • [33] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    Computational and Structural Biotechnology Journal, 2021, 19 : 3234 - 3244
  • [34] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3234 - 3244
  • [35] Joint learning dimension reduction and clustering of single-cell RNA-sequencing data
    Wu, Wenming
    Ma, Xiaoke
    BIOINFORMATICS, 2020, 36 (12) : 3825 - 3832
  • [36] COME: contrastive mapping learning for spatial reconstruction of single-cell RNA sequencing data
    Wei, Xindian
    Chen, Tianyi
    Wang, Xibiao
    Shen, Wenjun
    Liu, Cheng
    Wu, Si
    Wong, Hau-San
    BIOINFORMATICS, 2025, 41 (03)
  • [37] Characterizing immune variation and diagnostic indicators of preeclampsia by single-cell RNA sequencing and machine learning
    Wenwen Zhou
    Yixuan Chen
    Yuhui Zheng
    Yong Bai
    Jianhua Yin
    Xiao-Xia Wu
    Mei Hong
    Langchao Liang
    Jing Zhang
    Ya Gao
    Ning Sun
    Jiankang Li
    Yiwei Zhang
    Linlin Wu
    Xin Jin
    Jianmin Niu
    Communications Biology, 7
  • [38] CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis
    Wang, Liang
    Hong, Chenyang
    Song, Jiangning
    Yao, Jianhua
    BIOINFORMATICS, 2024, 40 (04)
  • [39] Characterizing immune variation and diagnostic indicators of preeclampsia by single-cell RNA sequencing and machine learning
    Zhou, Wenwen
    Chen, Yixuan
    Zheng, Yuhui
    Bai, Yong
    Yin, Jianhua
    Wu, Xiao-Xia
    Hong, Mei
    Liang, Langchao
    Zhang, Jing
    Gao, Ya
    Sun, Ning
    Li, Jiankang
    Zhang, Yiwei
    Wu, Linlin
    Jin, Xin
    Niu, Jianmin
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [40] Single-Cell Sequencing and Machine Learning Integration to Identify Candidate Biomarkers in Psoriasis: INSIG1
    Zhou, Xiangnan
    Ning, Jingyuan
    Cai, Rui
    Liu, Jiayi
    Yang, Haoyu
    Bai, Yanping
    JOURNAL OF INFLAMMATION RESEARCH, 2024, 17 : 11485 - 11503