Batch Text Similarity Search with MapReduce

被引:0
|
作者
Li, Rui [1 ,2 ,3 ]
Ju, Li [4 ]
Peng, Zhuo [1 ]
Yu, Zhiwei [5 ]
Wang, Chaokun [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
[3] Ministry Educ, Key Lab Informat Syst Secur, Beijing, Peoples R China
[4] Henan Coll Finance & Taxat, Dept Informat Engn, Zhengzhou 450002, Peoples R China
[5] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
MapReduce; Batch Text Similarity Search;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Batch text similarity search aims to find the similar texts according to users' batch text queries. It is widely used in the real world such as plagiarism check, and attracts more and more attention with the emergence of abundant texts on the web. Existing works, such as Fuzzy Join, can neither support the variation of thresholds, nor support the online batch text similarity search. In this paper, a two-stage algorithm is proposed. It can effectively resolve the problem of batch text similarity search based on inverted index structures. Experimental results on real datasets show the efficiency and expansibility of our method.
引用
收藏
页码:412 / +
页数:2
相关论文
共 50 条
  • [31] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [32] Practising Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Ge, Bin
    Xiao, Chuan
    Chi, Chi-Hung
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 112 - 119
  • [33] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [34] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [35] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [36] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    Journal of Intelligent Information Systems, 2016, 46 : 473 - 497
  • [37] Bidirectional String Anchors for Improved Text Indexing and Top-K Similarity Search
    Loukides, Grigorios
    Pissis, Solon P.
    Sweering, Michelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11093 - 11111
  • [38] An improved measuring similarity for short text snippets and its application in clustering search engine
    Li, Zhao
    Peng, Hong
    Peng, Peng
    Jia, Xi-Ping
    Wang, Jia-Bing
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1581 - 1585
  • [39] Text Extraction from videos using MapReduce
    Roshan, Chanchal Kumar
    Kaushal, Rajeet
    Alam, Sha
    Rai, Shashank
    Gholap, Yuvraj
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 431 - 434
  • [40] Experimental Evaluations of MapReduce in Biomedical Text Mining
    Ji, Yanqing
    Tian, Yun
    Shen, Fangyang
    Tran, John
    INFORMATION TECHNOLOGY: NEW GENERATIONS, 2016, 448 : 665 - 675