Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling

被引:9
|
作者
Ji, Yugang [1 ]
Yin, Mingyang [2 ]
Yang, Hongxia [2 ]
Zhou, Jingren [2 ]
Zheng, Vincent W. [3 ]
Shi, Chuan [1 ]
Fang, Yuan [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Xitucheng Rd, Beijing, Peoples R China
[2] Alibaba Grp, Wengyi Rd, Hangzhou, Peoples R China
[3] Adv Digital Sci Ctr, Create Tower, Singapore, Singapore
[4] Singapore Management Univ, Victoria St, Singapore, Singapore
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Heterogeneous interaction graphs; large-scale graphs; type-dependent sampler; type-fusion sampler; importance sampling; MODEL;
D O I
10.1145/3418684
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this article, we attempt to accelerate representation learning on large-scale HIGs by adopting the importance sampling of heterogeneous neighborhoods in a batch-wise manner, which naturally fits with most batch-based optimizations. Distinct from traditional homogeneous strategies neglecting semantic types of nodes and edges, to handle the rich heterogeneous semantics within HIGs, we devise both type-dependent and type-fusion samplers where the former respectively samples neighborhoods of each type and the latter jointly samples from candidates of all types. Furthermore, to overcome the imbalance between the down-sampled and the original information, we respectively propose heterogeneous estimators including the self-normalized and the adaptive estimators to improve the robustness of our sampling strategies. Finally, we evaluate the performance of our models for node classification and link prediction on five real-world datasets, respectively. The empirical results demonstrate that our approach performs significantly better than other state-of-the-art alternatives, and is able to reduce the number of edges in computation by up to 93%, the memory cost by up to 92% and the time cost by up to 86%.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Unsupervised Embedding Learning for Large-Scale Heterogeneous Networks Based on Metapath Graph Sampling
    Zhong, Hongwei
    Wang, Mingyang
    Zhang, Xinyue
    [J]. ENTROPY, 2023, 25 (02)
  • [2] Large-Scale Embedding Learning in Heterogeneous Event Data
    Gui, Huan
    Liu, Jialu
    Tao, Fangbo
    Jiang, Meng
    Norick, Brandon
    Han, Jiawei
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 907 - 912
  • [3] Large-Scale Heterogeneous Feature Embedding
    Huang, Xiao
    Song, Qingquan
    Yang, Fan
    Hu, Xia
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
  • [4] Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph
    Yeh, Chin-Chia Michael
    Gu, Mengting
    Zheng, Yan
    Chen, Huiyuan
    Ebrahimi, Javid
    Zhuang, Zhongfang
    Wang, Junpeng
    Wang, Liang
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4391 - 4401
  • [5] A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks
    Huang, Xiao
    Li, Jundong
    Zou, Na
    Hu, Xia
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
  • [6] Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding
    Jiang, Zhuoren
    Gao, Zheng
    Lan, Jinjiong
    Yang, Hongxia
    Lu, Yao
    Liu, Xiaozhong
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 1581 - 1591
  • [7] Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding
    Xin, Kexuan
    Sun, Zequn
    Hua, Wen
    Hu, Wei
    Qu, Jianfeng
    Zhou, Xiaofang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2240 - 2249
  • [8] Accelerating large-scale graph analytics with FPGA and HMC
    Khoram, Soroosh
    Zhang, Jialiang
    Strange, Maxwell
    Li, Jing
    [J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 82 - 82
  • [9] Large-scale Graph Representation Learning
    Leskovec, Jure
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4 - 4
  • [10] TGE: Machine Learning Based Task Graph Embedding for Large-Scale Topology Mapping
    Choi, Jong Youl
    Logan, Jeremy
    Wolf, Matthew
    Ostrouchov, George
    Kurc, Tahsin
    Liu, Qing
    Podhorszki, Norbert
    Klasky, Scott
    Romanus, Melissa
    Sun, Qian
    Parashar, Manish
    Churchill, Randy Michael
    Chang, C. S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 587 - 591