Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data

被引:6
|
作者
Xiong, Ke-Xu [1 ,2 ]
Zhou, Han-Lin [2 ,3 ,4 ,5 ,6 ,7 ]
Lin, Cong [2 ,5 ,6 ,8 ]
Yin, Jian-Hua [2 ,5 ,6 ,8 ]
Kristiansen, Karsten [2 ,7 ]
Yang, Huan-Ming [2 ,9 ]
Li, Gui-Bo [2 ,3 ,4 ,5 ,6 ,8 ]
机构
[1] Univ Chinese Acad Sci, Coll Life Sci, Beijing 100049, Peoples R China
[2] BGI Shenzhen, Shenzhen 518083, Peoples R China
[3] Zhengzhou Univ, BGI Coll, Zhengzhou, Peoples R China
[4] Zhengzhou Univ, Henan Inst Med & Pharmaceut Sci, Zhengzhou, Peoples R China
[5] BGI Shenzhen, BGI Henan, Xinxiang 453000, Henan, Peoples R China
[6] BGI Shenzhen, Shenzhen Key Lab Genom, Guangdong Prov Key Lab Human Dis Genom, Shenzhen 518083, Peoples R China
[7] Univ Copenhagen, Dept Biol, Lab Genom & Mol Biomed, DK-2100 Copenhagen, Denmark
[8] BGI Shenzhen, Shenzhen Key Lab Single Cell Omics, Shenzhen 518083, Peoples R China
[9] James D Watson Inst Genome Sci, Hangzhou 310008, Peoples R China
关键词
SEQ;
D O I
10.1038/s42003-022-03476-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the unmet need to choose the suitable doublet detection method, an ensemble machine learning algorithm called Chord was developed, which integrates multiple methods and achieves higher accuracy and stability on different scRNA-seq datasets. High-throughput single-cell RNA sequencing (scRNA-seq) is a popular method, but it is accompanied by doublet rate problems that disturb the downstream analysis. Several computational approaches have been developed to detect doublets. However, most of these methods may yield satisfactory performance in some datasets but lack stability in others; thus, it is difficult to regard a single method as the gold standard which can be applied to all types of scenarios. It is a difficult and time-consuming task for researchers to choose the most appropriate software. We here propose Chord which implements a machine learning algorithm that integrates multiple doublet detection methods to address these issues. Chord had higher accuracy and stability than the individual approaches on different datasets containing real and synthetic data. Moreover, Chord was designed with a modular architecture port, which has high flexibility and adaptability to the incorporation of any new tools. Chord is a general solution to the doublet detection problem.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Predicting Algorithm of Tissue Cell Ratio Based on Deep Learning Using Single-Cell RNA Sequencing
    Liu, Zhendong
    Lv, Xinrong
    Chen, Xi
    Li, Dongyan
    Qin, Mengying
    Bai, Ke
    Yang, Yurong
    Li, Xiaofeng
    Zhang, Peng
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [42] A comparison of integration methods for single-cell RNA sequencing data and ATAC sequencing data
    Kan, Yulong
    Wang, Weihao
    Qi, Yunjing
    Zhang, Zhongxiao
    Liang, Xikeng
    Jin, Shuilin
    QUANTITATIVE BIOLOGY, 2025, 13 (02)
  • [43] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [44] MISC: missing imputation for single-cell RNA sequencing data
    Yang, Mary Qu
    Weissman, Sherman M.
    Yang, William
    Zhang, Jialing
    Canaann, Allon
    Guan, Renchu
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [45] SNV identification from single-cell RNA sequencing data
    Schnepp, Patricia M.
    Chen, Mengjie
    Keller, Evan T.
    Zhou, Xiang
    HUMAN MOLECULAR GENETICS, 2019, 28 (21) : 3569 - 3583
  • [46] Normalizing single-cell RNA sequencing data: Challenges and opportunities
    Vallejos C.A.
    Risso D.
    Scialdone A.
    Dudoit S.
    Marioni J.C.
    Nature Methods, 2017, 14 (6) : 565 - 571
  • [47] Analysis of single-cell RNA sequencing data based on autoencoders
    Andrea Tangherloni
    Federico Ricciuti
    Daniela Besozzi
    Pietro Liò
    Ana Cvejic
    BMC Bioinformatics, 22
  • [48] The shaky foundations of simulating single-cell RNA sequencing data
    Crowell, Helena L.
    Leonardo, Sarah X. Morillo X.
    Soneson, Charlotte
    Robinson, Mark D.
    GENOME BIOLOGY, 2023, 24 (01)
  • [49] SCRIP: an accurate simulator for single-cell RNA sequencing data
    Qin, Fei
    Luo, Xizhi
    Xiao, Feifei
    Cai, Guoshuai
    BIOINFORMATICS, 2022, 38 (05) : 1304 - 1311
  • [50] Analysis of single-cell RNA sequencing data based on autoencoders
    Tangherloni, Andrea
    Ricciuti, Federico
    Besozzi, Daniela
    Lio, Pietro
    Cvejic, Ana
    BMC BIOINFORMATICS, 2021, 22 (01)