Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics

被引:21
|
作者
Menegaux, Romain [1 ,2 ]
Vert, Jean-Philippe [1 ,2 ,3 ,4 ]
机构
[1] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Paris, France
[2] PSL Res Univ, Inst Curie, INSERM, U900, Paris, France
[3] PSL Res Univ, CNRS, Dept Math & Applicat, Ecole Normale Super, Paris, France
[4] Google Brain, 8 Rue Londres, F-75009 Paris, France
关键词
metagenomics; sequencing; classification; embedding; CLASSIFICATION;
D O I
10.1089/cmb.2018.0174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new model for fast classification of DNA sequences output by next-generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms the state-of-the-art methods in terms of accuracy and scalability.
引用
收藏
页码:509 / 518
页数:10
相关论文
共 50 条
  • [21] NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model
    Wei, Ze-Gang
    Zhang, Shao-Wu
    BMC BIOINFORMATICS, 2018, 19
  • [22] REMOVAL OF HOST DNA ENHANCES METAGENOMICS SEQUENCING SENSITIVITY OF THE MICROBIOTA IN TISSUE BIOPSIES
    Cheng, Wing Yin
    Liu, Weixin
    Chu, Eagle S.
    Wong, Sunny H.
    Sung, Joseph J.
    Yu, Jun
    GASTROENTEROLOGY, 2021, 160 (06) : S732 - S733
  • [23] The integration of sequencing and bioinformatics in metagenomics
    Firouz Abbasian
    Robin Lockington
    Mallavarapu Megharaj
    Ravi Naidu
    Reviews in Environmental Science and Bio/Technology, 2015, 14 : 357 - 383
  • [24] The integration of sequencing and bioinformatics in metagenomics
    Abbasian, Firouz
    Lockington, Robin
    Megharaj, Mallavarapu
    Naidu, Ravi
    REVIEWS IN ENVIRONMENTAL SCIENCE AND BIO-TECHNOLOGY, 2015, 14 (03) : 357 - 383
  • [25] Metagenomics uBiome: participative sequencing
    不详
    BIOFUTUR, 2013, (342) : 15 - 15
  • [26] Food authentication from shotgun sequencing reads with an application on high protein powders
    Haiminen, Niina
    Edlund, Stefan
    Chambliss, David
    Kunitomi, Mark
    Weimer, Bart C.
    Ganesan, Balasubramanian
    Baker, Robert
    Markwell, Peter
    Davis, Matthew
    Huang, B. Carol
    Kong, Nguyet
    Prill, Robert J.
    Marlowe, Carl H.
    Quintanar, Andre
    Pierre, Sophie
    Dubois, Geraud
    Kaufman, James H.
    Parida, Laxmi
    Beck, Kristen L.
    NPJ SCIENCE OF FOOD, 2019, 3 (01)
  • [27] Food authentication from shotgun sequencing reads with an application on high protein powders
    Niina Haiminen
    Stefan Edlund
    David Chambliss
    Mark Kunitomi
    Bart C. Weimer
    Balasubramanian Ganesan
    Robert Baker
    Peter Markwell
    Matthew Davis
    B. Carol Huang
    Nguyet Kong
    Robert J. Prill
    Carl H. Marlowe
    André Quintanar
    Sophie Pierre
    Geraud Dubois
    James H. Kaufman
    Laxmi Parida
    Kristen L. Beck
    npj Science of Food, 3
  • [28] Encephalitis diagnosis using metagenomics: application of next generation sequencing for undiagnosed cases
    Brown, Julianne R.
    Bharucha, Tehmina
    Breuer, Judith
    JOURNAL OF INFECTION, 2018, 76 (03) : 225 - 240
  • [29] Aiming off the target: recycling target capture sequencing reads for investigating repetitive DNA
    Costa, Lucas
    Marques, Andre
    Buddenhagen, Chris
    Thomas, William Wayt
    Huettel, Bruno
    Schubert, Veit
    Dodsworth, Steven
    Houben, Andreas
    Souza, Gustavo
    Pedrosa-Harand, Andrea
    ANNALS OF BOTANY, 2021, 128 (07) : 835 - 848
  • [30] Mapping short DNA sequencing reads and calling variants using mapping quality scores
    Li, Heng
    Ruan, Jue
    Durbin, Richard
    GENOME RESEARCH, 2008, 18 (11) : 1851 - 1858