Efficient Approximate Subsequence Matching Using Hybrid Signatures

被引:1
|
作者
Qiu, Tao [1 ]
Yang, Xiaochun [1 ]
Wang, Bin [1 ]
Han, Yutong [1 ]
Wang, Siyao [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I | 2018年 / 10827卷
基金
中国国家自然科学基金;
关键词
Read mapping; Approximate subsequence matching; Hybrid signatures; READ ALIGNMENT;
D O I
10.1007/978-3-319-91452-7_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.
引用
收藏
页码:600 / 609
页数:10
相关论文
共 50 条
  • [41] An improvement and an extension on the hybrid index for approximate string matching
    Hyyrö, H
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 3246 : 208 - 209
  • [42] An improvement and an extension on the hybrid index for approximate string matching
    PRESTO, Japan Science and Technology Agency, Japan
    不详
    1611, 208-209 (2004):
  • [43] Interactive and space-efficient multi-dimensional time series subsequence matching
    Piatov, Danila
    Helmer, Sven
    Dignos, Anton
    Gamper, Johann
    INFORMATION SYSTEMS, 2019, 82 : 121 - 135
  • [44] An efficient approach for faster matching of approximate patterns in graphs
    Khan, Muhammad Ghufran
    Halim, Zahid
    Baig, Abdul Rauf
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [45] Efficient and secure outsourced approximate pattern matching protocol
    Xiaochao Wei
    Minghao Zhao
    Qiuliang Xu
    Soft Computing, 2018, 22 : 1175 - 1187
  • [46] Efficient and secure outsourced approximate pattern matching protocol
    Wei, Xiaochao
    Zhao, Minghao
    Xu, Qiuliang
    SOFT COMPUTING, 2018, 22 (04) : 1175 - 1187
  • [47] Efficient authentication of approximate record matching for outsourced databases
    Dong B.
    Wang H.W.
    Advances in Intelligent Systems and Computing, 2019, 838 : 119 - 168
  • [48] AN EFFICIENT NC ALGORITHM FOR APPROXIMATE MAXIMUM WEIGHT MATCHING
    Banerjee, Satyajit
    DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2013, 5 (03)
  • [49] Approximate Hausdorf matching using eigenspaces
    Huttenlocher, DP
    Lilien, RH
    Olson, CF
    IMAGE UNDERSTANDING WORKSHOP, 1996 PROCEEDINGS, VOLS I AND II, 1996, : 1181 - 1186
  • [50] Combinatorial Algorithms for Subsequence Matching: A Survey
    Kosche, Maria
    Koss, Tore
    Manea, Florin
    Siemer, Stefan
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2022, (367): : 11 - 27