Geographically distributed data management to support large-scale data analysis

被引:0
|
作者
Tamer Z. Emara
Thanh Trinh
Joshua Zhexue Huang
机构
[1] Damietta University,Faculty of Computers and Artificial Intelligence
[2] Phenikaa University,Faculty of Computer Science
[3] A &A Green Phoenix Group JSC,Phenikaa Research and Technology Institute (PRATI)
[4] Shenzhen University,National Engineering Laboratory for Big Data System Computing Technology
[5] Shenzhen University,Big Data Institute, College of Computer Science and Software Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, several companies prefer storing their data on multiple data centers with replication for many reasons. The data that spans various data centers ensures the fastest possible response time for customers and workforces who are geographically separated. It also provides protecting the information from the loss in case a single data center experiences a disaster. However, the amount of data is increasing at a rapid pace, which leads to challenges in storage, analysis, and various processing tasks. In this paper, we propose and design a geographically distributed data management framework to manage the massive data stored and distributed among geo-distributed data centers. The goal of the proposed framework is to enable efficient use of the distributed data blocks for various data analysis tasks. The architecture of the proposed framework is composed of a grid of geo-distributed data centers connected to a data controller (DCtrl). The DCtrl is responsible for organizing and managing the block replicas across the geo-distributed data centers. We use the BDMS system as the installed system on the distributed data centers. BDMS stores the big data file as a set of random sample data blocks, each being a random sample of the whole data file. Then, DCtrl distributes these data blocks into multiple data centers with replication. In analyzing a big data file distributed based on the proposed framework, we randomly select a sample of data blocks replicated from other data centers on any data center. We use simulation results to demonstrate the performance of the proposed framework in big data analysis across geo-distributed data centers.
引用
收藏
相关论文
共 50 条
  • [1] Geographically distributed data management to support large-scale data analysis
    Emara, Tamer Z.
    Trinh, Thanh
    Huang, Joshua Zhexue
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [2] A distributed data management system to support large-scale data analysis
    Emara, Tamer Z.
    Huang, Joshua Zhexue
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2019, 148 : 105 - 115
  • [3] Distributed Data Strategies to Support Large-Scale Data Analysis Across Geo-Distributed Data Centers
    Emara, Tamer Z.
    Huang, Joshua Zhexue
    [J]. IEEE ACCESS, 2020, 8 (178526-178538) : 178526 - 178538
  • [4] Watchdog – a workflow management system for the distributed analysis of large-scale experimental data
    Michael Kluge
    Caroline C. Friedel
    [J]. BMC Bioinformatics, 19
  • [5] Watchdog - a workflow management system for the distributed analysis of large-scale experimental data
    Kluge, Michael
    Friedel, Caroline C.
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [6] Large-scale similarity data management with distributed Metric Index
    Novak, David
    Batko, Michal
    Zezula, Pavel
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (05) : 855 - 872
  • [7] A novel data distribution management scheme to support synchronization in large-scale distributed virtual environments
    Boukerche, A
    McGraw, NJ
    Araujo, RB
    [J]. Proceedings of the 2005 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2005, : 67 - 72
  • [8] Computational solutions to large-scale data management and analysis
    Schadt, Eric E.
    Linderman, Michael D.
    Sorenson, Jon
    Lee, Lawrence
    Nolan, Garry P.
    [J]. NATURE REVIEWS GENETICS, 2010, 11 (09) : 647 - 657
  • [9] Computational solutions to large-scale data management and analysis
    Eric E. Schadt
    Michael D. Linderman
    Jon Sorenson
    Lawrence Lee
    Garry P. Nolan
    [J]. Nature Reviews Genetics, 2010, 11 : 647 - 657
  • [10] Memory-based Data Management for Large-scale Distributed Rendering
    Zheng, Ran
    Jia, Jinli
    Jin, Hai
    Lv, Xinqiao
    Yang, Shuai
    [J]. 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2016, : 123 - 128