Robust and Scalable Entity Alignment in Big Data

被引:2
|
作者
Flamino, James [1 ]
Abriola, Christopher [2 ]
Zimmerman, Benjamin [2 ]
Li, Zhongheng [2 ]
Douglas, Joel [2 ]
机构
[1] Rensselaer Polytech Inst, Dept Phys Appl Phys & Astrophy, Troy, NY 12180 USA
[2] Syst & Technol Res, Woburn, MA USA
关键词
Graph alignment; clustering; MapReduce; CLUSTERING-ALGORITHM;
D O I
10.1109/BigData50022.2020.9378273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative networks such as social media have expanded in scale and popularity. With the advent of big data, there is a growing need to provide analysis on graphs of massive scale. However, with millions of nodes and billions of edges, the idea of alignment between a myriad of graphs of similar scale using features extracted from potentially sparse or incomplete datasets becomes daunting. In this paper we will propose a solution to the issue of large-scale alignments in the form of a multi-step pipeline. Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across graphs. The features and their clusters are fed into a versatile alignment stage that accurately identifies partner nodes among millions of possible matches. Our results show that the pipeline can process large data sets, achieving efficient runtimes within the memory constraints.
引用
收藏
页码:2526 / 2533
页数:8
相关论文
共 50 条
  • [21] Scalable Management of Compressed Semantic Big Data
    Fernandez, Javier D.
    Martinez-Prieto, Miguel A.
    Arias, Mario
    ERCIM NEWS, 2012, (89): : 29 - 30
  • [22] Customizable and Scalable Fuzzy Join for Big Data
    Chen, Zhimin
    Wang, Yue
    Narasayya, Vivek
    Chaudhuri, Surajit
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2106 - 2117
  • [23] Exploring Big Data with Scalable Soft Clustering
    Hall, Lawrence O.
    SYNERGIES OF SOFT COMPUTING AND STATISTICS FOR INTELLIGENT DATA ANALYSIS, 2013, 190 : 11 - 15
  • [24] Scalable Progressive Analytics on Big Data in the Cloud
    Chandramouli, Badrish
    Goldstein, Jonathan
    Quamar, Abdul
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1726 - 1737
  • [25] Scalable Maximal Discernibility Discretization for Big Data
    Czolombitko, Michal
    Stepaniuk, Jaroslaw
    ROUGH SETS, 2017, 10313 : 644 - 654
  • [26] Scalable Clustering Algorithms for Big Data: A Review
    Mahdi, Mahmoud A.
    Hosny, Khalid M.
    Elhenawy, Ibrahim
    IEEE ACCESS, 2021, 9 : 80015 - 80027
  • [27] Tailored Graph Embeddings for Entity Alignment on Historical Data
    Baas, Jurian
    Dastani, Mehdi
    Feelders, Ad
    22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 125 - 133
  • [28] Scalable system scheduling for HPC and big data
    Reuther, Albert
    Byun, Chansup
    Arcand, William
    Bestor, David
    Bergeron, Bill
    Hubbell, Matthew
    Jones, Michael
    Michaleas, Peter
    Prout, Andrew
    Rosa, Antonio
    Kepner, Jeremy
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 111 : 76 - 92
  • [29] Special Issue on Scalable Computing for Big Data
    Yang, Laurence T.
    Chen, Jinjun
    BIG DATA RESEARCH, 2014, 1 (01) : 2 - 3
  • [30] Scalable biclustering - the future of big data exploration?
    Orzechowski, Patryk
    Boryczko, Krzysztof
    Moore, Jason H.
    GIGASCIENCE, 2019, 8 (07):