Integrated method for distributed processing of large XML data

被引:0
|
作者
Rongxin Chen
Guorong Cai
Jie Chen
Yuling Hong
机构
[1] Jimei University,Computer Engineering College
[2] Digital Fujian Big Data Modeling and Intelligent Computing Institute,undefined
来源
Cluster Computing | 2024年 / 27卷
关键词
Large XML data; XML parsing; XPath evaluation; Distributed processing; Integrated method;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional standalone computing approach is difficult to handle the task of processing large XML data due to scalability, thus distributed processing using cluster systems becomes an inevitable choice. The currently distributed XML processing methods generally rely on existing distributed computing frameworks for general purpose data, which have limitations such as complex configuration, inflexible working mechanism, and difficult performance optimization in the context of XML semi-structural features and complex queries. In addition, XML distributed queries suffer from a low level of automatic processing and lack of effective integration with distributed XML parsing and indexing. In this paper we propose an integrated method for distributed processing of large XML data, called the dXML method. Our method supports the distributed parsing of arbitrary XML fragment and the distributed creation of index, and adopts the efficient navigational XPath evaluation based on relation index. Through a distributed XPath evaluation approach based on filter-upon-pre-evaluate, our method enables data locality and reduces network traffic during the distributed evaluation of complex XPath predicates. dXML integrates the distributed processing technology of XML parsing, index creation and XPath query, provides a one-stop XML processing solution, supports the automatic distributed processing of large XML data, and has the characteristics of lightweight configuration and flexible working mechanism. Experimental evaluation verifies the effectiveness of dXML, and comparative experimental results show that dXML has better distributed query performance than both the typical existing navigational and Twig distributed processing methods.
引用
收藏
页码:1375 / 1399
页数:24
相关论文
共 50 条
  • [1] Integrated method for distributed processing of large XML data
    Chen, Rongxin
    Cai, Guorong
    Chen, Jie
    Hong, Yuling
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (02): : 1375 - 1399
  • [2] Efficient query processing for large XML data in distributed environments
    Kurita, Hiroto
    Hatano, Kenji
    Miyazaki, Jun
    Uemura, Shunsuke
    21ST INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING AND APPLICATIONS, PROCEEDINGS, 2007, : 317 - +
  • [3] Distributed XPath Query Processing over Large XML Data based on MapReduce framework
    Fan, Hongjie
    Wang, Dongsheng
    Liu, Junfei
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1447 - 1453
  • [4] An architecture for archiving and post-processing large, distributed, scientific data using SQL/MED and XML
    Papiani, M
    Wason, JL
    Nicole, DA
    ADVANCES IN DATABSE TECHNOLOGY-EDBT 2000, PROCEEDINGS, 2000, 1777 : 447 - 461
  • [5] Handling distributed XML queries over large XML data based on MapReduce framework
    Fan, Hongjie
    Ma, Zhiyi
    Wang, Dianhui
    Liu, Junfei
    INFORMATION SCIENCES, 2018, 453 : 1 - 20
  • [6] Distributed processing in integrated data preparation flow
    Schulze, S
    Bailey, GE
    24TH ANNUAL BACUS SYMPOSIUM ON PHOTOMASK TECHNOLOGY, PT 1 AND 2, 2004, 5567 : 394 - 405
  • [7] Distributed XML processing: Theory and applications
    Cavendish, Dirceu
    Candan, K. Selcuk
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (08) : 1054 - 1069
  • [8] An efficient XML encoding and labeling method for query processing and updating on dynamic XML data
    Min, Jun-Ki
    Lee, Jihyun
    Chung, Chin-Wan
    JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (03) : 503 - 515
  • [9] XML scheme directory:: A data structure for XML data processing
    Kotsakis, E
    Böhm, K
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, 2000, : 62 - 69
  • [10] Distributed XML Processing over Various Topologies: Characterizing XML Document Processing Efficiency
    Uratani, Yoshiyuki
    Koide, Hiroshi
    Cavendish, Dirceu
    Oie, Yuji
    WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2012, 101 : 57 - +