Continuous Similarity Search for Dynamic Text Streams

被引:0
|
作者
Tsuchida, Yuma [1 ]
Kubo, Kohei [1 ]
Koga, Hisashi [1 ]
机构
[1] Univ Electrocommun, Tokyo, 182-8585, Japan
关键词
data stream; similarity search; text sets; inverted index; pruning;
D O I
10.1587/transinf.2022EDP7229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold e. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.
引用
收藏
页码:2026 / 2035
页数:10
相关论文
共 50 条
  • [1] Similarity Search for Dynamic Data Streams
    Bury, Marc
    Schwiegelshohn, Chris
    Sorella, Mara
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (11) : 2241 - 2253
  • [2] Continuous Similarity Search for Text Sets
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 229 - 234
  • [3] Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams
    Bury, Marc
    Schwiegelshohn, Chris
    Sorella, Mara
    WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 72 - 80
  • [4] Continuous Similarity Join on Data Streams
    Cui, Jia
    Wang, Weiping
    Meng, Dan
    Liu, Zhenyan
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 552 - 559
  • [5] Text documents streams with improved incremental similarity
    Sarmento, Rui Portocarrero
    O. Cardoso, Douglas
    Dearo, Kemmily
    Brazdil, Pavel
    Gama, Joao
    SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [6] Text documents streams with improved incremental similarity
    Rui Portocarrero Sarmento
    Douglas O. Cardoso
    Kemmily Dearo
    Pavel Brazdil
    João Gama
    Social Network Analysis and Mining, 2021, 11
  • [7] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [8] Segmentation and recognition of motion streams by similarity search
    Li, Chuanjun
    Zheng, S. Q.
    Prabhakaran, B.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2007, 3 (03)
  • [9] Local Similarity Search for Unstructured Text
    Wang, Pei
    Xiao, Chuan
    Qin, Jianbin
    Wang, Wei
    Zhang, Xiaoyang
    Ishikawa, Yoshiharu
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1991 - 2005
  • [10] Search Result Diversification in Short Text Streams
    Liang, Shangsong
    Yilmaz, Emine
    Shen, Hong
    De Rijke, Maarten
    Croft, W. Bruce
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (01)