Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics

被引:21
|
作者
Menegaux, Romain [1 ,2 ]
Vert, Jean-Philippe [1 ,2 ,3 ,4 ]
机构
[1] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Paris, France
[2] PSL Res Univ, Inst Curie, INSERM, U900, Paris, France
[3] PSL Res Univ, CNRS, Dept Math & Applicat, Ecole Normale Super, Paris, France
[4] Google Brain, 8 Rue Londres, F-75009 Paris, France
关键词
metagenomics; sequencing; classification; embedding; CLASSIFICATION;
D O I
10.1089/cmb.2018.0174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new model for fast classification of DNA sequences output by next-generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms the state-of-the-art methods in terms of accuracy and scalability.
引用
收藏
页码:509 / 518
页数:10
相关论文
共 50 条
  • [11] BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation
    Jing-Quan Lim
    Chandana Tennakoon
    Guoliang Li
    Eleanor Wong
    Yijun Ruan
    Chia-Lin Wei
    Wing-Kin Sung
    Genome Biology, 13 (10)
  • [12] LONG READS DNA SEQUENCING IN GENOMICS AND VENOM GLAND TRANSCRIPTOMICS
    Viala, Vincent
    TOXICON, 2020, 177 : S2 - S2
  • [13] The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics
    Escobar-Zepeda, Alejandra
    Vera-Ponce de Leon, Arturo
    Sanchez-Flores, Alejandro
    FRONTIERS IN GENETICS, 2015, 6
  • [14] Metagenomics to paleogenomics: Large-scale sequencing of mammoth DNA
    Poinar, HN
    Schwarz, C
    Qi, J
    Shapiro, B
    MacPhee, RDE
    Buigues, B
    Tikhonov, A
    Huson, DH
    Tomsho, LP
    Auch, A
    Rampp, M
    Miller, W
    Schuster, SC
    SCIENCE, 2006, 311 (5759) : 392 - 394
  • [15] Application of Metagenomics Sequencing in a Patient with Dementia: A New Case Report
    Minelli, Maria
    Anaclerio, Federico
    Calisi, Dario
    Onofrj, Marco
    Antonucci, Ivana
    Gatta, Valentina
    Stuppia, Liborio
    GENES, 2024, 15 (08)
  • [16] Bioreactor virome metagenomics sequencing using DNA spike-ins
    Cremers, Geert
    Gambelli, Lavinia
    van Alen, Theo
    van Niftrik, Laura
    Op den Camp, Huub J. M.
    PEERJ, 2018, 6
  • [17] MetaComBin: combining abundances and overlaps for binning metagenomics reads
    Tomasella, Francesco
    Pizzi, Cinzia
    FRONTIERS IN BIOINFORMATICS, 2025, 5
  • [18] A New Method for Mapping Short DNA Sequencing Reads by Using Quality Scores
    Ozer, Hatice Gulcin
    Camerlengo, Terry
    Huang, Tim
    Huang, Kun
    2009 OHIO COLLABORATIVE CONFERENCE ON BIOINFORMATICS, PROCEEDINGS, 2009, : 21 - +
  • [19] NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model
    Ze-Gang Wei
    Shao-Wu Zhang
    BMC Bioinformatics, 19
  • [20] Shotgun metagenomics of biological stains using ultra-deep DNA sequencing
    Brenig, B.
    Beck, J.
    Schuetz, E.
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2010, 4 (04) : 228 - 231