Efficient Approximate Subsequence Matching Using Hybrid Signatures

被引:1
|
作者
Qiu, Tao [1 ]
Yang, Xiaochun [1 ]
Wang, Bin [1 ]
Han, Yutong [1 ]
Wang, Siyao [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I | 2018年 / 10827卷
基金
中国国家自然科学基金;
关键词
Read mapping; Approximate subsequence matching; Hybrid signatures; READ ALIGNMENT;
D O I
10.1007/978-3-319-91452-7_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.
引用
收藏
页码:600 / 609
页数:10
相关论文
共 50 条
  • [11] An efficient algorithm for attribute-based subsequence matching
    Qu, Jun-Feng
    Yuan, Lei
    Huang, Yannong
    Wu, Zhao
    INFORMATION SCIENCES, 2016, 334 : 323 - 337
  • [12] Efficient subsequence matching over large video databases
    Xiangmin Zhou
    Xiaofang Zhou
    Lei Chen
    Athman Bouguettaya
    The VLDB Journal, 2012, 21 : 489 - 508
  • [13] An efficient subsequence matching method based on index interpolation
    Koh, HG
    Loh, WK
    Kim, SW
    INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2005, 3533 : 480 - 489
  • [14] Online signature verification based on signatures turning angle representation using longest common subsequence matching
    Barkoula, K.
    Economou, G.
    Fotopoulos, S.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2013, 16 (03) : 261 - 272
  • [15] Efficient subsequence matching over large video databases
    Zhou, Xiangmin
    Zhou, Xiaofang
    Chen, Lei
    Bouguettaya, Athman
    VLDB JOURNAL, 2012, 21 (04): : 489 - 508
  • [16] Online signature verification based on signatures turning angle representation using longest common subsequence matching
    K. Barkoula
    G. Economou
    S. Fotopoulos
    International Journal on Document Analysis and Recognition (IJDAR), 2013, 16 : 261 - 272
  • [17] Indexing of sequences of sets for efficient exact and similar subsequence matching
    Andrzejewski, W
    Morzy, T
    Morzy, M
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2005, PROCEEDINGS, 2005, 3733 : 864 - 873
  • [18] Efficient approximate and dynamic matching of patterns using a labeling paradigm
    Sahinalp, SC
    Vishkin, U
    37TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1996, : 320 - 328
  • [19] Efficient subsequence matching for sequences databases under time warping
    Wong, TSF
    Wong, MH
    SEVENTH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2003, : 139 - 148
  • [20] The Inherent Time Complexity and An Efficient Algorithm for Subsequence Matching Problem
    Chao, Zemin
    Gao, Hong
    An, Yinan
    Li, Jianzhong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (07): : 1453 - 1465