Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics

被引:21
|
作者
Menegaux, Romain [1 ,2 ]
Vert, Jean-Philippe [1 ,2 ,3 ,4 ]
机构
[1] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Paris, France
[2] PSL Res Univ, Inst Curie, INSERM, U900, Paris, France
[3] PSL Res Univ, CNRS, Dept Math & Applicat, Ecole Normale Super, Paris, France
[4] Google Brain, 8 Rue Londres, F-75009 Paris, France
关键词
metagenomics; sequencing; classification; embedding; CLASSIFICATION;
D O I
10.1089/cmb.2018.0174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new model for fast classification of DNA sequences output by next-generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms the state-of-the-art methods in terms of accuracy and scalability.
引用
收藏
页码:509 / 518
页数:10
相关论文
共 50 条
  • [31] Comparison of DNA and RNA sequencing of total nucleic acids from human cervix for metagenomics
    Laila Sara Arroyo Mühr
    Joakim Dillner
    Agustin Enrique Ure
    Karin Sundström
    Emilie Hultin
    Scientific Reports, 11
  • [32] Comparison of DNA and RNA sequencing of total nucleic acids from human cervix for metagenomics
    Muhr, Laila Sara Arroyo
    Dillner, Joakim
    Ure, Agustin Enrique
    Sundstrom, Karin
    Hultin, Emilie
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [33] Circular consensus sequencing with long reads
    Tang, Lei
    NATURE METHODS, 2019, 16 (10) : 958 - 958
  • [34] Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data
    Chen, Xun
    Li, Dawei
    GENOMICS, 2021, 113 (01) : 1189 - 1198
  • [35] Haplotype Estimation Using Sequencing Reads
    Delaneau, Olivier
    Howie, Bryan
    Cox, Anthony J.
    Zagury, Jean-Francois
    Marchini, Jonathan
    AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) : 687 - 696
  • [36] MetaBCC-LR: metagenomics binning by coverage and composition for long reads
    Wickramarachchi, Anuradha
    Mallawaarachchi, Vijini
    Rajan, Vaibhav
    Lin, Yu
    BIOINFORMATICS, 2020, 36 : 3 - 11
  • [37] Circular consensus sequencing with long reads
    Lei Tang
    Nature Methods, 2019, 16 : 958 - 958
  • [38] Optimization of alignment-based methods for taxonomic binning of metagenomics reads
    Jaillard, Magali
    Tournoud, Maud
    Meynier, Faustine
    Veyrieras, Jean-Baptiste
    BIOINFORMATICS, 2016, 32 (12) : 1779 - 1787
  • [39] MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
    Petersen, Thomas Nordahl
    Lukjancenko, Oksana
    Thomsen, Martin Christen Frolund
    Sperotto, Maria Maddalena
    Lund, Ole
    Aarestrup, Frank Moller
    Sicheritz-Ponten, Thomas
    PLOS ONE, 2017, 12 (05):
  • [40] Metagenomics Binning of Long Reads Using Read-Overlap Graphs
    Wickramarachchi, Anuradha
    Lin, Yu
    COMPARATIVE GENOMICS (RECOMB-CG 2022), 2022, 13234 : 260 - 278