Detecting Text Similarity Using MapReduce Framework

被引:0
|
作者
Birjali, Marouane [1 ]
Beni-Hssane, Abderrahim [1 ]
Erritali, Mohammed [2 ]
Madani, Youness [2 ]
机构
[1] Univ Chouaib Doukkali, Fac Sci, Dept Comp Sci, LAROSERI Lab, El Jadida, Morocco
[2] Univ Sultan Moulay Slimane, Fac Sci & Technol, Dept Comp Sci, TIAD Lab, Beni Mellal, Morocco
关键词
Hadoop cluster; Document similarity; MapReduce programming model; Similarity measure;
D O I
10.1007/978-3-319-46568-5_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of similarities between textual documents was regarded as a subject of research strongly recommended in various domains. There are many of documents in a large amount of corpus. Most of them are required to check the similarity for validation. In this paper, we propose a new MapReduce algorithm of document similarity measures. Then we study the state of the art of different approaches for computing the similarity of amount documents to choose the approach that will be used in our MapReduce algorithm. Therefore, we present how the similarity between terms is used in the assessment of the similarity between documents. Simulation results, on Hadoop framework, show that our MapReduce algorithm outperforms classical ones in term of running time.
引用
收藏
页码:383 / 389
页数:7
相关论文
共 50 条
  • [1] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [2] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [3] Multidimensional Similarity Join Using MapReduce
    Li, Ye
    Wang, Jian
    Hou, Leong U.
    [J]. WEB-AGE INFORMATION MANAGEMENT, PT II, 2016, 9659 : 457 - 468
  • [5] XML Structural Similarity Search Using MapReduce
    Yuan, Peisen
    Sha, Chaofeng
    Wang, Xiaoling
    Yang, Bin
    Zhou, Aoying
    Yang, Su
    [J]. WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 169 - +
  • [6] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [7] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    [J]. INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [8] Detecting Text Similarity Based on Discrete Wavelet Transformation
    Vo, Trung Hung
    Felde, Imre
    Ho, Phan Hieu
    Nguyen, Ngoc Anh Thi
    [J]. ACTA POLYTECHNICA HUNGARICA, 2024, 21 (09) : 263 - 277
  • [9] Text Extraction from videos using MapReduce
    Roshan, Chanchal Kumar
    Kaushal, Rajeet
    Alam, Sha
    Rai, Shashank
    Gholap, Yuvraj
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 431 - 434
  • [10] Term Similarity and Weighting Framework for Text Representation
    Sani, Sadiq
    Wiratunga, Nirmalie
    Massie, Stewart
    Lothian, Robert
    [J]. CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2011, 2011, 6880 : 304 - 318