Efficient Approximate Subsequence Matching Using Hybrid Signatures

被引:1
|
作者
Qiu, Tao [1 ]
Yang, Xiaochun [1 ]
Wang, Bin [1 ]
Han, Yutong [1 ]
Wang, Siyao [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Read mapping; Approximate subsequence matching; Hybrid signatures; READ ALIGNMENT;
D O I
10.1007/978-3-319-91452-7_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.
引用
收藏
页码:600 / 609
页数:10
相关论文
共 50 条
  • [1] Efficient subsequence matching using the Longest Common Subsequence with a Dual Match index
    Han, Tae Sik
    Ko, Seung-Kyu
    Kang, Jaewoo
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 585 - +
  • [2] APPROXIMATE HOUGH TRANSFORM FOR VIDEO SUBSEQUENCE MATCHING
    Chiu, Chih-Yi
    Liou, Yu-Cyuan
    Tsai, Tsung-Han
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [3] Hobbes3: Dynamic Generation of Variable-Length Signatures for Efficient Approximate Subsequence Mappings
    Kim, Jongik
    Li, Chen
    Xie, Xiaohui
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 169 - 180
  • [4] Quantizing time series for efficient subsequence matching
    Vega-Lopez, Ines F.
    Moon, Bongki
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2006, : 209 - +
  • [5] Efficient time-series subsequence matching using duality in constructing windows
    Moon, YS
    Whang, KY
    Loh, WK
    [J]. INFORMATION SYSTEMS, 2001, 26 (04) : 279 - 293
  • [6] Using multiple indexes for efficient subsequence matching in time-series databases
    Lim, Seung-Hwan
    Park, Heejin
    Kim, Sang-Wook
    [J]. INFORMATION SCIENCES, 2007, 177 (24) : 5691 - 5706
  • [7] Using multiple indexes for efficient subsequence matching in time-series databases
    Lim, Seung-Hwan
    Park, Hee-Jin
    Kim, Sang-Wook
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2006, 3882 : 65 - 79
  • [8] Radius-aware approximate blank node matching using signatures
    Lantzaki, Christina
    Papadakos, Panagiotis
    Analyti, Anastasia
    Tzitzikas, Yannis
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (02) : 505 - 542
  • [9] Radius-aware approximate blank node matching using signatures
    Christina Lantzaki
    Panagiotis Papadakos
    Anastasia Analyti
    Yannis Tzitzikas
    [J]. Knowledge and Information Systems, 2017, 50 : 505 - 542
  • [10] A efficient subsequence matching algorithm of number trend sequence
    Chen, Dangyang
    Jia, Suling
    Wang, Huiwen
    [J]. ICIM 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INDUSTRIAL MANAGEMENT, 2006, : 668 - 674