Robust and Scalable Entity Alignment in Big Data

被引：2

作者：

Flamino, James ^{[1
]}

Abriola, Christopher ^{[2
]}

Zimmerman, Benjamin ^{[2
]}

Li, Zhongheng ^{[2
]}

Douglas, Joel ^{[2
]}

机构：

[1] Rensselaer Polytech Inst, Dept Phys Appl Phys & Astrophy, Troy, NY 12180 USA

[2] Syst & Technol Res, Woburn, MA USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年

关键词：

Graph alignment; clustering; MapReduce; CLUSTERING-ALGORITHM;

D O I：

10.1109/BigData50022.2020.9378273

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative networks such as social media have expanded in scale and popularity. With the advent of big data, there is a growing need to provide analysis on graphs of massive scale. However, with millions of nodes and billions of edges, the idea of alignment between a myriad of graphs of similar scale using features extracted from potentially sparse or incomplete datasets becomes daunting. In this paper we will propose a solution to the issue of large-scale alignments in the form of a multi-step pipeline. Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across graphs. The features and their clusters are fed into a versatile alignment stage that accurately identifies partner nodes among millions of possible matches. Our results show that the pipeline can process large data sets, achieving efficient runtimes within the memory constraints.

引用

页码：2526 / 2533

页数：8

共 50 条

[21] Scalable Management of Compressed Semantic Big Data
Fernandez, Javier D.
Martinez-Prieto, Miguel A.
Arias, Mario
ERCIM NEWS, 2012, (89): : 29 - 30
[22] Customizable and Scalable Fuzzy Join for Big Data
Chen, Zhimin
Wang, Yue
Narasayya, Vivek
Chaudhuri, Surajit
PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2106 - 2117
[23] Exploring Big Data with Scalable Soft Clustering
Hall, Lawrence O.
SYNERGIES OF SOFT COMPUTING AND STATISTICS FOR INTELLIGENT DATA ANALYSIS, 2013, 190 : 11 - 15
[24] Scalable Progressive Analytics on Big Data in the Cloud
Chandramouli, Badrish
Goldstein, Jonathan
Quamar, Abdul
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1726 - 1737
[25] Scalable Maximal Discernibility Discretization for Big Data
Czolombitko, Michal
Stepaniuk, Jaroslaw
ROUGH SETS, 2017, 10313 : 644 - 654
[26] Scalable Clustering Algorithms for Big Data: A Review
Mahdi, Mahmoud A.
Hosny, Khalid M.
Elhenawy, Ibrahim
IEEE ACCESS, 2021, 9 : 80015 - 80027
[27] Tailored Graph Embeddings for Entity Alignment on Historical Data
Baas, Jurian
Dastani, Mehdi
Feelders, Ad
22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 125 - 133
[28] Scalable system scheduling for HPC and big data
Reuther, Albert
Byun, Chansup
Arcand, William
Bestor, David
Bergeron, Bill
Hubbell, Matthew
Jones, Michael
Michaleas, Peter
Prout, Andrew
Rosa, Antonio
Kepner, Jeremy
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 111 : 76 - 92
[29] Special Issue on Scalable Computing for Big Data
Yang, Laurence T.
Chen, Jinjun
BIG DATA RESEARCH, 2014, 1 (01) : 2 - 3
[30] Scalable biclustering - the future of big data exploration?
Orzechowski, Patryk
Boryczko, Krzysztof
Moore, Jason H.
GIGASCIENCE, 2019, 8 (07):

← 1 2 3 4 5 →