Batch Text Similarity Search with MapReduce

被引:0
|
作者
Li, Rui [1 ,2 ,3 ]
Ju, Li [4 ]
Peng, Zhuo [1 ]
Yu, Zhiwei [5 ]
Wang, Chaokun [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
[3] Ministry Educ, Key Lab Informat Syst Secur, Beijing, Peoples R China
[4] Henan Coll Finance & Taxat, Dept Informat Engn, Zhengzhou 450002, Peoples R China
[5] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
MapReduce; Batch Text Similarity Search;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Batch text similarity search aims to find the similar texts according to users' batch text queries. It is widely used in the real world such as plagiarism check, and attracts more and more attention with the emergence of abundant texts on the web. Existing works, such as Fuzzy Join, can neither support the variation of thresholds, nor support the online batch text similarity search. In this paper, a two-stage algorithm is proposed. It can effectively resolve the problem of batch text similarity search based on inverted index structures. Experimental results on real datasets show the efficiency and expansibility of our method.
引用
收藏
页码:412 / +
页数:2
相关论文
共 50 条
  • [1] An Efficient Batch Similarity Processing with MapReduce
    Trong Nhan Phan
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018, 2018, 11251 : 158 - 171
  • [2] Detecting Text Similarity Using MapReduce Framework
    Birjali, Marouane
    Beni-Hssane, Abderrahim
    Erritali, Mohammed
    Madani, Youness
    EUROPE AND MENA COOPERATION ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGIES, 2017, 520 : 383 - 389
  • [3] eHSim: An Efficient Hybrid Similarity Search with MapReduce
    Trong Nhan Phan
    Kung, Josef
    Tran Khanh Dang
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 422 - 429
  • [4] XML Structural Similarity Search Using MapReduce
    Yuan, Peisen
    Sha, Chaofeng
    Wang, Xiaoling
    Yang, Bin
    Zhou, Aoying
    Yang, Su
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 169 - +
  • [5] A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduce
    Phan T.N.
    Dang T.K.
    SN Computer Science, 2020, 1 (1)
  • [6] An Efficient Similarity Search in Large Data Collections with MapReduce
    Trong Nhan Phan
    Kueng, Josef
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2014, 2014, 8860 : 44 - 57
  • [7] CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce
    Victor, Giannakouris-Salalidis
    Antonia, Plerou
    Spyros, Sioutas
    IFIP Advances in Information and Communication Technology, 2014, 437 : 211 - 220
  • [8] Local Similarity Search for Unstructured Text
    Wang, Pei
    Xiao, Chuan
    Qin, Jianbin
    Wang, Wei
    Zhang, Xiaoyang
    Ishikawa, Yoshiharu
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1991 - 2005
  • [9] Continuous Similarity Search for Text Sets
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 229 - 234
  • [10] Text similarity: an alternative way to search MEDLINE
    Lewis, James
    Ossowski, Stephan
    Hicks, Justin
    Errami, Mounir
    Garner, Harold R.
    BIOINFORMATICS, 2006, 22 (18) : 2298 - 2304