Batch Text Similarity Search with MapReduce

被引:0
|
作者
Li, Rui [1 ,2 ,3 ]
Ju, Li [4 ]
Peng, Zhuo [1 ]
Yu, Zhiwei [5 ]
Wang, Chaokun [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
[3] Ministry Educ, Key Lab Informat Syst Secur, Beijing, Peoples R China
[4] Henan Coll Finance & Taxat, Dept Informat Engn, Zhengzhou 450002, Peoples R China
[5] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
MapReduce; Batch Text Similarity Search;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Batch text similarity search aims to find the similar texts according to users' batch text queries. It is widely used in the real world such as plagiarism check, and attracts more and more attention with the emergence of abundant texts on the web. Existing works, such as Fuzzy Join, can neither support the variation of thresholds, nor support the online batch text similarity search. In this paper, a two-stage algorithm is proposed. It can effectively resolve the problem of batch text similarity search based on inverted index structures. Experimental results on real datasets show the efficiency and expansibility of our method.
引用
收藏
页码:412 / +
页数:2
相关论文
共 50 条
  • [41] Data-Intensive Text Processing with MapReduce
    Xu, Peng
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 635 - 637
  • [42] Summingbird: A Framework for Integrating Batch and Online MapReduce Computations
    Boykin, Oscar
    Ritchie, Sam
    O'Connell, Ian
    Lin, Jimmy
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1441 - 1451
  • [43] DHDSearch: A Framework for Batch Time Series Searching on MapReduce
    Li, Zhongsheng
    Li, Qiuhong
    Wang, Wei
    Wang, Yang
    Liu, Yimin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 567 - 570
  • [44] Text mining: identification of similarity of text documents using hybrid similarity model
    K. M. Shiva Prasad
    Iran Journal of Computer Science, 2023, 6 (2) : 123 - 135
  • [45] An efficient MapReduce algorithm for similarity join in metric spaces
    Wen Liu
    Yanming Shen
    Peng Wang
    The Journal of Supercomputing, 2016, 72 : 1179 - 1200
  • [46] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788
  • [47] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Sébastien Rivault
    Mostafa Bamha
    Sébastien Limet
    Sophie Robert
    International Journal of Parallel Programming, 2022, 50 : 360 - 380
  • [48] Similarity-based Change Detection for RDF in MapReduce
    Lee, Taewhi
    Im, Dong-Hyuk
    Won, Jongho
    PROMOTING BUSINESS ANALYTICS AND QUANTITATIVE MANAGEMENT OF TECHNOLOGY: 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2016), 2016, 91 : 789 - 797
  • [49] Sentiment analysis using semantic similarity and Hadoop MapReduce
    Youness Madani
    Mohammed Erritali
    Jamaa Bengourram
    Knowledge and Information Systems, 2019, 59 : 413 - 436
  • [50] MapReduce-based Similarity Measurement for Business Processes
    Gao, Juntao
    Wang, Xueshan
    Wang, Yongan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 95 - 99