Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling

被引：9

作者：

Ji, Yugang ^{[1
]}

Yin, Mingyang ^{[2
]}

Yang, Hongxia ^{[2
]}

Zhou, Jingren ^{[2
]}

Zheng, Vincent W. ^{[3
]}

Shi, Chuan ^{[1
]}

Fang, Yuan ^{[4
]}

机构：

[1] Beijing Univ Posts & Telecommun, Xitucheng Rd, Beijing, Peoples R China

[2] Alibaba Grp, Wengyi Rd, Hangzhou, Peoples R China

[3] Adv Digital Sci Ctr, Create Tower, Singapore, Singapore

[4] Singapore Management Univ, Victoria St, Singapore, Singapore

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2021年 / 15卷 / 01期

基金：

中国国家自然科学基金; 新加坡国家研究基金会;

关键词：

Heterogeneous interaction graphs; large-scale graphs; type-dependent sampler; type-fusion sampler; importance sampling; MODEL;

D O I：

10.1145/3418684

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this article, we attempt to accelerate representation learning on large-scale HIGs by adopting the importance sampling of heterogeneous neighborhoods in a batch-wise manner, which naturally fits with most batch-based optimizations. Distinct from traditional homogeneous strategies neglecting semantic types of nodes and edges, to handle the rich heterogeneous semantics within HIGs, we devise both type-dependent and type-fusion samplers where the former respectively samples neighborhoods of each type and the latter jointly samples from candidates of all types. Furthermore, to overcome the imbalance between the down-sampled and the original information, we respectively propose heterogeneous estimators including the self-normalized and the adaptive estimators to improve the robustness of our sampling strategies. Finally, we evaluate the performance of our models for node classification and link prediction on five real-world datasets, respectively. The empirical results demonstrate that our approach performs significantly better than other state-of-the-art alternatives, and is able to reduce the number of edges in computation by up to 93%, the memory cost by up to 92% and the time cost by up to 86%.

引用

页数：23

共 50 条

[1] Unsupervised Embedding Learning for Large-Scale Heterogeneous Networks Based on Metapath Graph Sampling
Zhong, Hongwei
Wang, Mingyang
Zhang, Xinyue
[J]. ENTROPY, 2023, 25 (02)
[2] Large-Scale Embedding Learning in Heterogeneous Event Data
Gui, Huan
Liu, Jialu
Tao, Fangbo
Jiang, Meng
Norick, Brandon
Han, Jiawei
[J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 907 - 912
[3] Large-Scale Heterogeneous Feature Embedding
Huang, Xiao
Song, Qingquan
Yang, Fan
Hu, Xia
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
[4] Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph
Yeh, Chin-Chia Michael
Gu, Mengting
Zheng, Yan
Chen, Huiyuan
Ebrahimi, Javid
Zhuang, Zhongfang
Wang, Junpeng
Wang, Liang
Zhang, Wei
[J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4391 - 4401
[5] A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks
Huang, Xiao
Li, Jundong
Zou, Na
Hu, Xia
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
[6] Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding
Jiang, Zhuoren
Gao, Zheng
Lan, Jinjiong
Yang, Hongxia
Lu, Yao
Liu, Xiaozhong
[J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 1581 - 1591
[7] Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding
Xin, Kexuan
Sun, Zequn
Hua, Wen
Hu, Wei
Qu, Jianfeng
Zhou, Xiaofang
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2240 - 2249
[8] Accelerating large-scale graph analytics with FPGA and HMC
Khoram, Soroosh
Zhang, Jialiang
Strange, Maxwell
Li, Jing
[J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 82 - 82
[9] Large-scale Graph Representation Learning
Leskovec, Jure
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4 - 4
[10] TGE: Machine Learning Based Task Graph Embedding for Large-Scale Topology Mapping
Choi, Jong Youl
Logan, Jeremy
Wolf, Matthew
Ostrouchov, George
Kurc, Tahsin
Liu, Qing
Podhorszki, Norbert
Klasky, Scott
Romanus, Melissa
Sun, Qian
Parashar, Manish
Churchill, Randy Michael
Chang, C. S.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 587 - 591

← 1 2 3 4 5 →