Cost-aware load balancing for multilingual record linkage using MapReduce

被引:4
|
作者
Medhat, Doaa [1 ]
Yousef, Ahmed H. [1 ,2 ]
Salama, Cherif [1 ,3 ]
机构
[1] Ain Shams Univ, Cairo, Egypt
[2] Nile Univ, Informat Technol & Comp Sci Sch, Giza, Egypt
[3] Amer Univ Cairo, Comp Sci & Engn Dept, Cairo, Egypt
关键词
Multilingual; Record linkage; Load balancing; Data matching; Big data; MapReduce;
D O I
10.1016/j.asej.2019.08.009
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Gathering and processing large amounts of data is increasing every day. Record linkage is one of the most complex data-intensive tasks, which is used to accurately match records from different data sources that contain information about same entity like a person, especially when they do not share common identifier. As more resources in more than one language become available, new methods are required that are capable to match records expressed in more than one language. In this paper, we are presenting a scalable, cost-aware load balancing technique over MapReduce that is capable to link records from different multilingual data sources accurately and efficiently by re-distributing the multilingual matching tasks on available machines based on their cost. We are evaluating our approach on a Hadoop cluster on cloud infrastructure against state of the art blocking-based load balancing techniques, where our approach outperforms other approaches in terms of execution time and scalability. (C) 2019 The Authors. Published by Elsevier B.V. on behalf of Faculty of Engineering, Ain Shams University.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [1] Scalable Load Balancing for MapReduce-based Record Linkage
    Yan, Wei
    Xue, Yuan
    Malin, Bradley
    [J]. 2013 IEEE 32ND INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2013,
  • [2] A game-theoretic approach for cost-aware load balancing in distributed systems
    Kishor, Avadh
    Niyogi, Rajdeep
    Veeravalli, Bharadwaj
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 109 : 29 - 44
  • [3] Cost-aware service brokering and performance sentient load balancing algorithms in the cloud
    Naha, Ranesh Kumar
    Othman, Mohamed
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 75 : 47 - 57
  • [4] Deadline-Aware Load Balancing for MapReduce
    Lai, Zhao-Rong
    Chang, Che-Wei
    Liu, Xue
    Kuo, Tei-Wei
    Hsiu, Pi-Cheng
    [J]. 2014 IEEE 20TH INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA), 2014,
  • [5] Cost-Aware Ant Colony Optimization Based Model for Load Balancing in Cloud Computing
    Alagarsamy, Malini
    Sundarji, Ajitha
    Arunachalapandi, Aparna
    Kalyanasundaram, Keerthanaa
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (05) : 719 - 729
  • [6] A Novel Cost-Aware Load Balancing Algorithm for Road Side Units in Internet of Vehicles
    Thapa, Shivank
    Sahoo, Swagat Ranjan
    Patra, Moumita
    Gupta, Arobinda
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES, 2022,
  • [7] On Datacenter-Network-Aware Load Balancing in MapReduce
    Le, Yanfang
    Wang, Feng
    Liu, Jiangchuan
    Ergun, Funda
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 485 - 492
  • [8] Distributed Cost-Aware Fault-Tolerant Load Balancing in Geo-Distributed Data Centers
    Tripathi, Rakesh
    Sivaraman, Vignesh
    Tamarapalli, Venkatesh
    [J]. IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2022, 6 (01): : 472 - 483
  • [9] CAT: A Cost-Aware Translator for SQL-query workflow to MapReduce jobflow
    Song, Aibo
    Wu, Zhiang
    Ma, Xu
    Luo, Junzhou
    [J]. DATA & KNOWLEDGE ENGINEERING, 2016, 102 : 42 - 56
  • [10] On Cost-Aware Monitoring for Self-Adaptive Load Sharing
    Breitgand, David
    Cohen, Rami
    Nahir, Amir
    Raz, Danny
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2010, 28 (01) : 70 - 83