COSINE: non-seeding method for mapping long noisy sequences

被引:3
|
作者
Afshar, Pegah Tootoonchi [1 ]
Wong, Wing Hung [2 ,3 ]
机构
[1] Stanford Univ, Sch Engn, Dept Elect Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
FAST FOURIER-TRANSFORM; GENERATION;
D O I
10.1093/nar/gkx511
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Fast and Scalable Qubit-Mapping Method for Noisy Intermediate-Scale Quantum Computers
    Park, Sunghye
    Kim, Daeyeon
    Kweon, Minhyuk
    Sim, Jae-Yoon
    Kang, Seokhyeong
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 13 - 18
  • [32] Efficient method for constructing optimized long binary spreading sequences
    Boukerma, Sabrina
    Rouabah, Khaled
    Mezaache, SalahEddine
    Atia, Salim
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2021, 34 (04)
  • [33] The matrix method of representation, analysis and classification of long genetic sequences
    Petoukhov, Sergey V. (neurocomp.pro@gmail.com), 1600, MDPI AG (08):
  • [34] A fast structural multiple alignment method for long RNA sequences
    Yasuo Tabei
    Hisanori Kiryu
    Taishin Kin
    Kiyoshi Asai
    BMC Bioinformatics, 9
  • [35] The matrix method of representation, analysis and classification of long genetic sequences
    Stepanyan, Ivan V.
    Petoukhov, Sergey V.
    Information (Switzerland), 2017, 8 (01)
  • [36] A fast structural multiple alignment method for long RNA sequences
    Tabei, Yasuo
    Kiryu, Hisanori
    Kin, Taishin
    Asai, Kiyoshi
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [37] Hyperdimensional Bayesian Time Mapping (HyperBaT): A Probabilistic Approach to Time Series Mapping of Non-Identical Sequences
    Ruble, Macey
    Hayes, Charles Ethan
    Welborn, Matt
    Zajic, Alenka
    Prvulovic, Milos
    Pitruzzello, Ann M.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (14) : 3719 - 3731
  • [38] A METHOD FOR PREDICTING THE INTELLIGIBILITY OF NOISY AND NON-LINEARLY ENHANCED BINAURAL SPEECH
    Andersen, Asper Heidemann
    de Haan, Jan Mark
    Tan, Zheng-Hua
    Jensen, Jesper
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4995 - 4999
  • [39] A METHOD FOR THE DETECTION OF IGE BINDING SEQUENCES OF ALLERGENS BASED ON A MODIFICATION OF EPITOPE MAPPING
    WALSH, BJ
    HOWDEN, MEH
    JOURNAL OF IMMUNOLOGICAL METHODS, 1989, 121 (02) : 275 - 280
  • [40] Long-read mapping to repetitive reference sequences using Winnowmap2
    Jain, Chirag
    Rhie, Arang
    Hansen, Nancy F.
    Koren, Sergey
    Phillippy, Adam M.
    NATURE METHODS, 2022, 19 (06) : 705 - +