Efficient Approximate Subsequence Matching Using Hybrid Signatures

被引:1
|
作者
Qiu, Tao [1 ]
Yang, Xiaochun [1 ]
Wang, Bin [1 ]
Han, Yutong [1 ]
Wang, Siyao [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Read mapping; Approximate subsequence matching; Hybrid signatures; READ ALIGNMENT;
D O I
10.1007/978-3-319-91452-7_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.
引用
收藏
页码:600 / 609
页数:10
相关论文
共 50 条
  • [21] Efficient hybrid NOMA schemes using multiple signatures
    Park, Ok-Sun
    Go, Young-Jo
    Back, Seung-Kwon
    [J]. 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 866 - 869
  • [22] Efficient Approximate Entity Matching Using Jaro-Winkler Distance
    Wang, Yaoshu
    Qin, Jianbin
    Wang, Wei
    [J]. WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 231 - 239
  • [23] A Hybrid Architecture for the Approximate String Matching on an FPGA
    Wada, Takuma
    Funasaka, Shunji
    Nakano, Koji
    Ito, Yasuaki
    [J]. 2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 48 - 57
  • [24] Subsequence matching in data streams
    Toyoda, MacHiko
    Sakurai, Yasushi
    [J]. NTT Technical Review, 2013, 11 (01):
  • [25] Sequence matching with subsequence analysis
    Ferme, Marko
    Ojstersek, Milan
    [J]. ADVANCES IN COMMUNICATIONS, COMPUTERS, SYSTEMS, CIRCUITS AND DEVICES, 2010, : 234 - +
  • [26] On longest matching consecutive subsequence
    Li, Jinjun
    Yang, Xiangfeng
    [J]. INTERNATIONAL JOURNAL OF NUMBER THEORY, 2019, 15 (08) : 1745 - 1758
  • [27] Efficient Approximate Substring Matching in Compressed String
    Han, Yutong
    Wang, Bin
    Yang, Xiaochun
    [J]. Web-Age Information Management, Pt II, 2016, 9659 : 184 - 197
  • [28] Efficient algorithms for approximate string matching with swaps
    Lee, JS
    Kim, DK
    Park, K
    Cho, Y
    [J]. COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 1997, 1264 : 28 - 39
  • [29] An Efficient Algorithm for Approximate Pattern Matching with Swaps
    Campanelli, Matteo
    Cantone, Domenico
    Faro, Simone
    Giaquinta, Emanuele
    [J]. PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2009, 2009, : 90 - 104
  • [30] Efficient algorithms for approximate string matching with swaps
    Kim, DK
    Lee, JS
    Park, K
    Cho, Y
    [J]. JOURNAL OF COMPLEXITY, 1999, 15 (01) : 128 - 147