Embedding-based subsequence matching with gaps-range-tolerances: a Query-By-Humming application

被引:5
|
作者
Kotsifakos, Alexios [1 ]
Karlsson, Isak [2 ]
Papapetrou, Panagiotis [2 ]
Athitsos, Vassilis [1 ]
Gunopulos, Dimitrios [3 ]
机构
[1] Univ Texas Arlington, Dept Comp Sci & Enginering, Arlington, TX 76019 USA
[2] Stockholm Univ, Dept Comp & Syst Sci, S-10691 Stockholm, Sweden
[3] Natl & Kapodistrian Univ Athens, Dept Informat & Telecommun, Athens 11528, Greece
来源
VLDB JOURNAL | 2015年 / 24卷 / 04期
基金
美国国家科学基金会;
关键词
Subsequence matching; Query-By-Humming; Indexing; Embeddings; SEARCH;
D O I
10.1007/s00778-015-0387-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q and a large database, identifies in quadratic time the subsequence of the database that best matches Q. SMBGT is highly applicable to music retrieval. However, in Query-By-Humming applications, runtime is critical. Hence, we propose a novel embedding-based approach, called ISMBGT, for speeding up search under SMBGT. Using a set of reference sequences, ISMBGT maps both Q and each position of each database sequence into vectors. The database vectors closest to the query vector are identified, and SMBGT is then applied between Q and the subsequences that correspond to those database vectors. The key novelties of ISMBGT are that it does not require training, it is query sensitive, and it exploits the flexibility of SMBGT. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our findings show that ISMBGT can achieve speedups of up to an order of magnitude against brute-force search and over an order of magnitude against cDTW, while maintaining a retrieval accuracy very close to that of brute-force search.
引用
收藏
页码:519 / 536
页数:18
相关论文
共 3 条
  • [1] Embedding-based subsequence matching with gaps–range–tolerances: a Query-By-Humming application
    Alexios Kotsifakos
    Isak Karlsson
    Panagiotis Papapetrou
    Vassilis Athitsos
    Dimitrios Gunopulos
    [J]. The VLDB Journal, 2015, 24 : 519 - 536
  • [2] A Subsequence Matching with Gaps-Range-Tolerances Framework: A Query-By-Humming Application
    Kotsifakos, Alexios
    Papapetrou, Panagiotis
    Hollmen, Jaakko
    Gunopulos, Dimitrios
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (11): : 761 - 771
  • [3] Embedding-Based Subsequence Matching in Time-Series Databases
    Papapetrou, Panagiotis
    Athitsos, Vassilis
    Potamias, Michalis
    Kollios, George
    Gunopulos, Dimitrios
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (03):