Cost-aware load balancing for multilingual record linkage using MapReduce

被引:4
|
作者
Medhat, Doaa [1 ]
Yousef, Ahmed H. [1 ,2 ]
Salama, Cherif [1 ,3 ]
机构
[1] Ain Shams Univ, Cairo, Egypt
[2] Nile Univ, Informat Technol & Comp Sci Sch, Giza, Egypt
[3] Amer Univ Cairo, Comp Sci & Engn Dept, Cairo, Egypt
关键词
Multilingual; Record linkage; Load balancing; Data matching; Big data; MapReduce;
D O I
10.1016/j.asej.2019.08.009
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Gathering and processing large amounts of data is increasing every day. Record linkage is one of the most complex data-intensive tasks, which is used to accurately match records from different data sources that contain information about same entity like a person, especially when they do not share common identifier. As more resources in more than one language become available, new methods are required that are capable to match records expressed in more than one language. In this paper, we are presenting a scalable, cost-aware load balancing technique over MapReduce that is capable to link records from different multilingual data sources accurately and efficiently by re-distributing the multilingual matching tasks on available machines based on their cost. We are evaluating our approach on a Hadoop cluster on cloud infrastructure against state of the art blocking-based load balancing techniques, where our approach outperforms other approaches in terms of execution time and scalability. (C) 2019 The Authors. Published by Elsevier B.V. on behalf of Faculty of Engineering, Ain Shams University.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [31] Cost-aware job scheduling for cloud instances using deep reinforcement learning
    Feng Cheng
    Yifeng Huang
    Bhavana Tanpure
    Pawan Sawalani
    Long Cheng
    Cong Liu
    [J]. Cluster Computing, 2022, 25 : 619 - 631
  • [32] Cost-aware execution of transactional web services using labelled transition systems
    Bushehrian, Omid
    Zare, Salman
    [J]. IET SOFTWARE, 2014, 8 (05) : 232 - 243
  • [33] Cost-aware horizontal scaling of NoSQL databases using probabilistic model checking
    Naskos, Athanasios
    Gounaris, Anastasios
    Katsaros, Panagiotis
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2687 - 2701
  • [34] Cost-aware job scheduling for cloud inutances using deep reinforcement learning
    Cheng, Feng
    Huang, Yifeng
    Tanpure, Bhavana
    Sawalani, Pawan
    Cheng, Long
    Liu, Cong
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (01): : 619 - 631
  • [35] Renewable-Aware Geographical Load Balancing Using Option Pricing for Energy Cost Minimization in Data Centers
    Khalil, Muhammad Imran Khan
    Shah, Syed Adeel Ali
    Taj, Amer
    Shiraz, Muhammad
    Alamri, Basem
    Murawwat, Sadia
    Hafeez, Ghulam
    [J]. PROCESSES, 2022, 10 (10)
  • [36] Cost-aware Inference of Bovine Respiratory Disease in Calves using Precision Livestock Technology
    Casella, Enrico
    Cantor, Melissa C.
    Silvestri, Simone
    Renaud, Dave L.
    Costa, Joao H. C.
    [J]. 18TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2022), 2022, : 109 - 116
  • [37] A cost-aware auto-scaling approach using the workload prediction in service clouds
    Jingqi Yang
    Chuanchang Liu
    Yanlei Shang
    Bo Cheng
    Zexiang Mao
    Chunhong Liu
    Lisha Niu
    Junliang Chen
    [J]. Information Systems Frontiers, 2014, 16 : 7 - 18
  • [38] Large-scale Cost-Aware Classification Using Feature Computational Dependency Graph
    Li, Qingzhe
    Alipour-Fanid, Amir
    Slawski, Martin
    Ye, Yanfang
    Wu, Lingfei
    Zeng, Kai
    Zhao, Liang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) : 2029 - 2044
  • [39] A cost-aware auto-scaling approach using the workload prediction in service clouds
    Yang, Jingqi
    Liu, Chuanchang
    Shang, Yanlei
    Cheng, Bo
    Mao, Zexiang
    Liu, Chunhong
    Niu, Lisha
    Chen, Junliang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2014, 16 (01) : 7 - 18
  • [40] DCSP: A delay and cost-aware service placement and load distribution algorithm for IoT-based fog networks
    Azizi, Sadoon
    Shojafar, Mohammad
    Farzin, Pedram
    Dogani, Javad
    [J]. COMPUTER COMMUNICATIONS, 2024, 215 : 9 - 20