Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics

被引:21
|
作者
Menegaux, Romain [1 ,2 ]
Vert, Jean-Philippe [1 ,2 ,3 ,4 ]
机构
[1] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Paris, France
[2] PSL Res Univ, Inst Curie, INSERM, U900, Paris, France
[3] PSL Res Univ, CNRS, Dept Math & Applicat, Ecole Normale Super, Paris, France
[4] Google Brain, 8 Rue Londres, F-75009 Paris, France
关键词
metagenomics; sequencing; classification; embedding; CLASSIFICATION;
D O I
10.1089/cmb.2018.0174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new model for fast classification of DNA sequences output by next-generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms the state-of-the-art methods in terms of accuracy and scalability.
引用
收藏
页码:509 / 518
页数:10
相关论文
共 50 条
  • [41] Binning long reads in metagenomics datasets using composition and coverage information
    Anuradha Wickramarachchi
    Yu Lin
    Algorithms for Molecular Biology, 17
  • [42] Binning long reads in metagenomics datasets using composition and coverage information
    Wickramarachchi, Anuradha
    Lin, Yu
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2022, 17 (01)
  • [43] High-Throughput Sequencing and Metagenomics
    Jones, William J.
    ESTUARIES AND COASTS, 2010, 33 (04) : 944 - 952
  • [44] A Metagenomics Portal for a Democratized Sequencing World
    Wilke, Andreas
    Glass, Elizabeth M.
    Bartels, Daniela
    Bischorf, Jared
    Braithwaite, Daniel
    D'Souza, Mark
    Gerlach, Wolfgang
    Harrison, Travis
    Keegan, Kevin
    Matthews, Hunter
    Kottmann, Renzo
    Paczian, Tobias
    Tang, Wei
    Trimble, William L.
    Yilmaz, Pelin
    Wilkening, Jared
    Desai, Narayan
    Meyer, Folker
    MICROBIAL METAGENOMICS, METATRANSCRIPTOMICS, AND METAPROTEOMICS, 2013, 531 : 487 - 523
  • [45] Unlocking Short Read Sequencing for Metagenomics
    Rodrigue, Sebastien
    Materna, Arne C.
    Timberlake, Sonia C.
    Blackburn, Matthew C.
    Malmstrom, Rex R.
    Alm, Eric J.
    Chisholm, Sallie W.
    PLOS ONE, 2010, 5 (07):
  • [46] High-Throughput Sequencing and Metagenomics
    William J. Jones
    Estuaries and Coasts, 2010, 33 : 944 - 952
  • [47] Application of the second-generation sequencing technology of metagenomics in the detection of pathogens in respiratory patients
    Zhang, Danfeng
    Yang, Ali
    Sheng, Kai
    Fang, Shuyu
    Zhou, Liang
    JOURNAL OF MICROBIOLOGICAL METHODS, 2024, 225
  • [48] DNA Sequencing by Recognition and Its Potential Application with Nanopore Sequencing
    Liang, Feng
    Zeng, Yan
    Wang, Lei
    CURRENT ORGANIC CHEMISTRY, 2014, 18 (15) : 1948 - 1956
  • [49] Microfluidic device reads up to four consecutive base pairs in DNA sequencing-by-synthesis
    Kartalov, EP
    Quake, SR
    NUCLEIC ACIDS RESEARCH, 2004, 32 (09) : 2873 - 2879
  • [50] Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing
    Dilernia, Dario A.
    Chien, Jung-Ting
    Monaco, Daniela C.
    Brown, Michael P. S.
    Ende, Zachary
    Deymier, Martin J.
    Yue, Ling
    Paxinos, Ellen E.
    Allen, Susan
    Tirado-Ramos, Alfredo
    Hunter, Eric
    NUCLEIC ACIDS RESEARCH, 2015, 43 (20)