XML Structural Similarity Search Using MapReduce

被引:0
|
作者
Yuan, Peisen [1 ,2 ]
Sha, Chaofeng [1 ,2 ]
Wang, Xiaoling [3 ]
Yang, Bin [1 ,2 ]
Zhou, Aoying [2 ,3 ]
Yang, Su [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] East China Normal Univ, Shanghai Key Lab Trustworthy Comp, Software Engn Inst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [1] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [2] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    [J]. Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [3] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [4] MapReduce Implementation of XML Keyword Search Algorithm
    Zhang, Yong
    Li, Quanlin
    Liu, Bo
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 721 - 728
  • [5] GRAMS3: An Efficient Framework for XML Structural Similarity Search
    Yuan, Peisen
    Wang, Xiaoling
    Sha, Chaofeng
    Gao, Ming
    Zhou, Aoying
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2010, 6193 : 422 - +
  • [6] MapReduce Implementation of an Improved XML Keyword Search Algorithm
    Zhang, Yong
    Cai, Jing
    Li, Quanlin
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2018, 33 (02): : 125 - 135
  • [7] eHSim: An Efficient Hybrid Similarity Search with MapReduce
    Trong Nhan Phan
    Kung, Josef
    Tran Khanh Dang
    [J]. IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 422 - 429
  • [8] Approximate top-k structural similarity search over XML documents
    Xie, T
    Sha, CF
    Wang, XL
    Zhou, AY
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 319 - 330
  • [9] XML Duplicate Detection Using MapReduce
    Yu, Shoujian
    He, Shan
    [J]. ASIA-PACIFIC MANAGEMENT AND ENGINEERING CONFERENCE (APME 2014), 2014, : 1399 - 1406
  • [10] Proximity search of XML data using ontology and XPath edit similarity
    Amagasa, Toshiyuki
    Wen, Lianzi
    Kitagawa, Hiroyuki
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 298 - +