Large-Scale Network Embedding in Apache Spark

被引:10
|
作者
Lin, Wenqing [1 ]
机构
[1] Tencent, Interact Entertainment Grp, Shenzhen, Guangdong, Peoples R China
关键词
network embedding; distributed computing; graph partitioning;
D O I
10.1145/3447548.3467136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that (i) computation on graphs is often costly and (ii) the size of graph or the intermediate results of vectors could be prohibitively large, rendering it difficult to be processed on a single machine. In this paper, we propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark, which recursively partitions a graph into several small-sized subgraphs to capture the internal and external structural information of nodes, and then computes the network embedding for each subgraph in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain the embeddings of nodes in a linear cost. After that, we demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches. Besides, it achieves up to 4.25% and 4.27% improvements on link prediction and node classification tasks respectively. In the end, we deploy the proposed algorithms in two online games of Tencent with the applications of friend recommendation and item recommendation, which improve the competitors by up to 91.11% in running time and up to 12.80% in the corresponding evaluation metrics.
引用
收藏
页码:3271 / 3279
页数:9
相关论文
共 50 条
  • [31] A Parallel Fast Fourier Transform Algorithm for Large-Scale Signal Data Using Apache Spark in Cloud
    Yang, Cheng
    Bao, Weidong
    Zhu, Xiaomin
    Wang, Ji
    Xiao, Wenhua
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT III, 2018, 11336 : 293 - 310
  • [32] Large-Scale Learning with AdaGrad on Spark
    Hadgu, Asmelash Teka
    Nigam, Aastha
    Diaz-Aviles, Ernesto
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2828 - 2830
  • [33] Large-Scale Heterogeneous Feature Embedding
    Huang, Xiao
    Song, Qingquan
    Yang, Fan
    Hu, Xia
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
  • [34] A spark-based method for identifying large-scale network burst traffic
    Sun, Yu-Lu
    Yun, Ben-Sheng
    Qian, Ya-Guan
    Feng, Jun
    [J]. Journal of Computers (Taiwan), 2021, 32 (04) : 123 - 136
  • [35] A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
    Wang, Yong
    Ke, Wenlong
    Tao, Xiaoling
    [J]. INFORMATION, 2016, 7 (01)
  • [36] Embedding Virtual Network Functions with Backup for Reliable Large-scale Edge Computing
    Zhang, Yuntong
    Zhao, Zhiwei
    Shu, Chang
    Min, Geyong
    Wang, Zhe
    [J]. 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (IEEE CSCLOUD 2018) / 2018 4TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (IEEE EDGECOM 2018), 2018, : 190 - 195
  • [37] A Divide-and-Conquer Evolutionary Algorithm for Large-Scale Virtual Network Embedding
    Song, An
    Chen, Wei-Neng
    Gong, Yue-Jiao
    Luo, Xiaonan
    Zhang, Jun
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (03) : 566 - 580
  • [38] Accelerating Large-Scale Genomic Analysis with Spark
    Li, Xueqi
    Tan, Guangming
    Zhang, Chunming
    Li, Xu
    Zhang, Zhonghai
    Sun, Ninghui
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 747 - 751
  • [39] Large-scale geographically weighted regression on Spark
    Hung Tien Tran
    Hiep Tuan Nguyen
    Viet-Trung Tran
    [J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2016, : 127 - 132
  • [40] Large-Scale Human Action Recognition with Spark
    Wang, Hanli
    Zheng, Xiaobin
    Xiao, Bo
    [J]. 2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,