Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics

被引:21
|
作者
Menegaux, Romain [1 ,2 ]
Vert, Jean-Philippe [1 ,2 ,3 ,4 ]
机构
[1] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Paris, France
[2] PSL Res Univ, Inst Curie, INSERM, U900, Paris, France
[3] PSL Res Univ, CNRS, Dept Math & Applicat, Ecole Normale Super, Paris, France
[4] Google Brain, 8 Rue Londres, F-75009 Paris, France
关键词
metagenomics; sequencing; classification; embedding; CLASSIFICATION;
D O I
10.1089/cmb.2018.0174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new model for fast classification of DNA sequences output by next-generation sequencing machines. The model, which we call fastDNA, embeds DNA sequences in a vector space by learning continuous low-dimensional representations of the k-mers it contains. We show on metagenomics benchmarks that it outperforms the state-of-the-art methods in terms of accuracy and scalability.
引用
收藏
页码:509 / 518
页数:10
相关论文
共 50 条
  • [1] Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads
    Sanders, Jon G.
    Nurk, Sergey
    Salido, Rodolfo A.
    Minich, Jeremiah
    Xu, Zhenjiang Z.
    Zhu, Qiyun
    Martino, Cameron
    Fedarko, Marcus
    Arthur, Timothy D.
    Chen, Feng
    Boland, Brigid S.
    Humphrey, Greg C.
    Brennan, Caitriona
    Sanders, Karenina
    Gaffney, James
    Jepsen, Kristen
    Khosroheidari, Mahdieh
    Green, Cliff
    Liyanage, Marlon
    Dang, Jason W.
    Phelan, Vanessa V.
    Quinn, Robert A.
    Bankevich, Anton
    Chang, John T.
    Rana, Tariq M.
    Conrad, Douglas J.
    Sandborn, William J.
    Smarr, Larry
    Dorrestein, Pieter C.
    Pevzner, Pavel A.
    Knight, Rob
    GENOME BIOLOGY, 2019, 20 (01) : 1 - 14
  • [2] Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads
    Jon G. Sanders
    Sergey Nurk
    Rodolfo A. Salido
    Jeremiah Minich
    Zhenjiang Z. Xu
    Qiyun Zhu
    Cameron Martino
    Marcus Fedarko
    Timothy D. Arthur
    Feng Chen
    Brigid S. Boland
    Greg C. Humphrey
    Caitriona Brennan
    Karenina Sanders
    James Gaffney
    Kristen Jepsen
    Mahdieh Khosroheidari
    Cliff Green
    Marlon Liyanage
    Jason W. Dang
    Vanessa V. Phelan
    Robert A. Quinn
    Anton Bankevich
    John T. Chang
    Tariq M. Rana
    Douglas J. Conrad
    William J. Sandborn
    Larry Smarr
    Pieter C. Dorrestein
    Pavel A. Pevzner
    Rob Knight
    Genome Biology, 20
  • [3] Metagenomics: DNA sequencing of environmental samples
    Tringe, SG
    Rubin, EM
    NATURE REVIEWS GENETICS, 2005, 6 (11) : 805 - 814
  • [4] Metagenomics: DNA sequencing of environmental samples
    Susannah Green Tringe
    Edward M. Rubin
    Nature Reviews Genetics, 2005, 6 : 805 - 814
  • [5] Optimal DNA shotgun sequencing: Noisy reads are as good as noiseless reads
    Motahari, Abolfazl
    Ramchandran, Kannan
    Tse, David
    Ma, Nan
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2013, : 1640 - 1644
  • [6] Decoding long nanopore sequencing reads of natural DNA
    Laszlo, Andrew H.
    Derrington, Ian M.
    Ross, Brian C.
    Brinkerhoff, Henry
    Adey, Andrew
    Nova, Ian C.
    Craig, Jonathan M.
    Langford, Kyle W.
    Samson, Jenny Mae
    Daza, Riza
    Doering, Kenji
    Shendure, Jay
    Gundlach, Jens H.
    NATURE BIOTECHNOLOGY, 2014, 32 (08) : 829 - 833
  • [7] Sequence verification of synthetic DNA by assembly of sequencing reads
    Wilson, Mandy L.
    Cai, Yizhi
    Hanlon, Regina
    Taylor, Samantha
    Chevreux, Bastien
    Setubal, Joao C.
    Tyler, Brett M.
    Peccoud, Jean
    NUCLEIC ACIDS RESEARCH, 2013, 41 (01)
  • [8] Decoding long nanopore sequencing reads of natural DNA
    Andrew H Laszlo
    Ian M Derrington
    Brian C Ross
    Henry Brinkerhoff
    Andrew Adey
    Ian C Nova
    Jonathan M Craig
    Kyle W Langford
    Jenny Mae Samson
    Riza Daza
    Kenji Doering
    Jay Shendure
    Jens H Gundlach
    Nature Biotechnology, 2014, 32 : 829 - 833
  • [9] DNA sequencing and metagenomics of cultivated and uncultivated chernozems in Russia
    Gorbacheva, Maria A.
    Melnikova, Nataliya, V
    Chechetkin, Vladimir R.
    Kravatsky, Yuri, V
    Tchurikov, Nickolai A.
    GEODERMA REGIONAL, 2018, 14
  • [10] BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation
    Lim, Jing-Quan
    Tennakoon, Chandana
    Li, Guoliang
    Wong, Eleanor
    Ruan, Yijun
    Wei, Chia-Lin
    Sung, Wing-Kin
    GENOME BIOLOGY, 2012, 13 (10):