XML Structural Similarity Search Using MapReduce

被引:0
|
作者
Yuan, Peisen [1 ,2 ]
Sha, Chaofeng [1 ,2 ]
Wang, Xiaoling [3 ]
Yang, Bin [1 ,2 ]
Zhou, Aoying [2 ,3 ]
Yang, Su [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] East China Normal Univ, Shanghai Key Lab Trustworthy Comp, Software Engn Inst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [41] Efficient Keyword Search on Graphs using MapReduce
    Hao, Yifan
    Cao, Huiping
    Qi, Yan
    Hu, Chuan
    Brahma, Sukumar
    Han, Jingyu
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2871 - 2873
  • [42] An implementation of XML documents search system based on similarity in structure and semantics
    Park, U
    Seo, Y
    INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 97 - 102
  • [43] Similarity search for office XML documents based on style and structure data
    Watanabe, Yousuke
    Kamigaito, Hidetaka
    Yokota, Haruo
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2013, 9 (02) : 100 - 116
  • [44] Estimation of Structural Similarity of XML Document Based on Frequency and Path
    Ren Xueli
    Dai Yubiao
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 272 - 275
  • [45] A kernel method for measuring structural similarity between XML documents
    Jeong, Buhwan
    Lee, Daewon
    Cho, Hyunbo
    Kulvatunyou, Boonserm
    NEW TRENDS IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4570 : 572 - +
  • [46] Structural descriptors, similarity search, bioisosteric search, and virtual screening
    Liang, Guyan
    Morize, Isabelle
    Laoui, Abdelazize
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2006, 232 : 40 - 40
  • [47] Parallel Prime Number Labeling of Large XML Data Using MapReduce
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Lee, Taewhi
    Kim, Hong-Gee
    2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 176 - 177
  • [48] Efficient Querying Distributed Big-XML Data using MapReduce
    Song Kunfang
    Hongwei Lu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2016, 8 (03) : 70 - 79
  • [49] An Efficient XML Label based on MapReduce
    Wei, Bowen
    Song, Kunfang
    Jiang, Minghua
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [50] 3D Similarity Search Using a Weighted Structural Histogram Representation
    Lu, Tong
    Gao, Rongjun
    Wang, Tuantuan
    Yang, Yubin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT I, 2010, 6297 : 348 - 356