iDeLUCS: a deep learning interactive tool for alignmentfree clustering of DNA sequences

被引:3
|
作者
Arias, Pablo Millan [1 ]
Hill, Kathleen A. [2 ]
Kari, Lila [1 ]
机构
[1] Univ Waterloo, Cheriton Sch Comp Sci, 200 Univ Ave West, Waterloo, ON N2L 3G1, Canada
[2] Univ Western Ontario, Dept Biol, London, ON N6A 5B7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1093/bioinformatics/btad508
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly: its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets: several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-meansthornthorn and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of similar to 20%, and the two specialized algorithms by an average of similar to 12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] DeLUCS: Deep learning for unsupervised clustering of DNA sequences
    Arias, Pablo Milla
    Alipour, Fatemeh
    Hill, Kathleen A.
    Kari, Lila
    PLOS ONE, 2022, 17 (01):
  • [2] MeShClust: an intelligent tool for clustering DNA sequences
    James, Benjamin T.
    Luczak, Brian B.
    Girgis, Hani Z.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (14) : E83
  • [3] DNA-MC: Tool for Mapping and Clustering DNA Sequences
    Ramirez, Valeria
    Roman-Godinez, Israel
    Torres-Ramos, Sulema
    VIII LATIN AMERICAN CONFERENCE ON BIOMEDICAL ENGINEERING AND XLII NATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING, 2020, 75 : 736 - 742
  • [4] Interactive Machine Learning Tool for Clustering in Visual Analytics
    Thrun, Michael
    Pape, Felix
    Ultsch, Alfred
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 479 - 487
  • [5] MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs
    Hamady, Micah
    Widmann, Jeremy
    Copley, Shelley D.
    Knight, Rob
    GENOME BIOLOGY, 2008, 9 (08)
  • [6] MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs
    Micah Hamady
    Jeremy Widmann
    Shelley D Copley
    Rob Knight
    Genome Biology, 9
  • [7] autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences
    Jing, Runyu
    Li, Yizhou
    Xue, Li
    Liu, Fengjuan
    Li, Menglong
    Luo, Jiesi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (08) : 3755 - 3764
  • [8] An interactive tool for data visualization and clustering
    Iorio, F.
    Miele, G.
    Napolitano, F.
    Raiconi, G.
    Tagliaferri, R.
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT III, PROCEEDINGS, 2007, 4694 : 870 - +
  • [9] DNAcycP: a deep learning tool for DNA cyclizability prediction
    Li, Keren
    Carroll, Matthew
    Vafabakhsh, Reza
    Wang, Xiaozhong A.
    Wang, Ji-Ping
    NUCLEIC ACIDS RESEARCH, 2022, 50 (06) : 3142 - 3154
  • [10] Graphical classification of DNA sequences of HLA alleles by deep learning
    Jun Miyake
    Yuhei Kaneshita
    Satoshi Asatani
    Seiichi Tagawa
    Hirohiko Niioka
    Takashi Hirano
    Human Cell, 2018, 31 : 102 - 105