Large-Scale Network Embedding in Apache Spark

被引:10
|
作者
Lin, Wenqing [1 ]
机构
[1] Tencent, Interact Entertainment Grp, Shenzhen, Guangdong, Peoples R China
关键词
network embedding; distributed computing; graph partitioning;
D O I
10.1145/3447548.3467136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that (i) computation on graphs is often costly and (ii) the size of graph or the intermediate results of vectors could be prohibitively large, rendering it difficult to be processed on a single machine. In this paper, we propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark, which recursively partitions a graph into several small-sized subgraphs to capture the internal and external structural information of nodes, and then computes the network embedding for each subgraph in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain the embeddings of nodes in a linear cost. After that, we demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches. Besides, it achieves up to 4.25% and 4.27% improvements on link prediction and node classification tasks respectively. In the end, we deploy the proposed algorithms in two online games of Tencent with the applications of friend recommendation and item recommendation, which improve the competitors by up to 91.11% in running time and up to 12.80% in the corresponding evaluation metrics.
引用
收藏
页码:3271 / 3279
页数:9
相关论文
共 50 条
  • [41] Large-Scale Text Similarity Computing with Spark
    Bao, Xiaoan
    Dai, Shichao
    Zhang, Na
    Yu, Chenghai
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (04): : 95 - 100
  • [42] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    N. Ahmed
    Andre L. C. Barczak
    Teo Susnjak
    Mohammed A. Rashid
    [J]. Journal of Big Data, 7
  • [43] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    Ahmed, N.
    Barczak, Andre L. C.
    Susnjak, Teo
    Rashid, Mohammed A.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [44] Understanding Coarsening for Embedding Large-Scale Graphs
    Akyildiz, Taha Atahan
    Aljundi, Amro Alabsi
    Kaya, Kamer
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
  • [45] Decentralized Embedding Framework for Large-Scale Networks
    Imran, Mubashir
    Yin, Hongzhi
    Chen, Tong
    Shao, Yingxia
    Zhang, Xiangliang
    Zhou, Xiaofang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 425 - 441
  • [46] Gaussian Embedding of Large-Scale Attributed Graphs
    Hettige, Bhagya
    Li, Yuan-Fang
    Wang, Weiqing
    Buntine, Wray
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 134 - 146
  • [47] Large-Scale Clustering through Functional Embedding
    Ratle, Frederic
    Weston, Jason
    Miller, Matthew L.
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 266 - +
  • [48] Large-scale prediction of adverse drug reactions-related proteins with network embedding
    Park, Jaesub
    Lee, Sangyeon
    Kim, Kwansoo
    Jung, Jaegyun
    Lee, Doheon
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [49] Gated Multi-channel Network Embedding for Large-scale Mobile App Clustering
    Yoon, Yeo-Chan
    Kim, Soo Kyun
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (06): : 1620 - 1634
  • [50] Train rescheduling for large-scale disruptions in a large-scale railway network
    Zhang, Chuntian
    Gao, Yuan
    Cacchiani, Valentina
    Yang, Lixing
    Gao, Ziyou
    [J]. TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2023, 174