Continuous Similarity Search for Dynamic Text Streams

被引:0
|
作者
Tsuchida, Yuma [1 ]
Kubo, Kohei [1 ]
Koga, Hisashi [1 ]
机构
[1] Univ Electrocommun, Tokyo, 182-8585, Japan
关键词
data stream; similarity search; text sets; inverted index; pruning;
D O I
10.1587/transinf.2022EDP7229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold e. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.
引用
收藏
页码:2026 / 2035
页数:10
相关论文
共 50 条
  • [21] Financial news mining:: Monitoring continuous streams of text
    Ingvaldsen, Jon Espen
    Gulla, Jon Atle
    Laegreid, Tarjei
    Sandal, Paul Christian
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 321 - +
  • [22] Dynamic Similarity Search on Integer Sketches
    Kanda, Shunsuke
    Tabei, Yasuo
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 242 - 251
  • [23] Effective similarity search methods for large video data streams
    Lee, SL
    Chun, SJ
    Lee, JH
    COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 1030 - 1039
  • [24] Similarity Search on Semantic Trajectories Using Text Processing
    de Almeida, Damiao Ribeiro
    Baptista, Claudio de Souza
    de Andrade, Fabio Gomes
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (07)
  • [25] On effective conceptual indexing and similarity search in text data
    Aggarwal, CC
    Yu, PS
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 3 - 10
  • [26] Continuous trajectory similarity search with result diversification
    Yu, Xiaofeng
    Zhu, Shunzhi
    Ren, Yongjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 143 : 392 - 400
  • [27] TextFlow: A Text Similarity Measure based on Continuous Sequences
    Mrabet, Yassine
    Kilicoglu, Halil
    Demner-Fushman, Dina
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 763 - 772
  • [28] Automating the search for a patent's prior art with a full text similarity search
    Helmers, Lea
    Horn, Franziska
    Biegler, Franziska
    Oppermann, Tim
    Mueller, Klaus-Robert
    PLOS ONE, 2019, 14 (03):
  • [29] Continuous Subgraph Pattern Search over Graph Streams
    Wang, Changliang
    Chen, Lei
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 393 - 404
  • [30] STREAMIT: Dynamic Visualization and Interactive Exploration of Text Streams
    Alsakran, Jamal
    Chen, Yang
    Zhao, Ye
    Yang, Jing
    Luo, Dongning
    IEEE PACIFIC VISUALIZATION SYMPOSIUM 2011, 2011, : 131 - 138