DeHIN: A Decentralized Framework for Embedding Large-Scale Heterogeneous Information Networks

被引：3

作者：

Imran, Mubashir ^{[1
]}

Yin, Hongzhi ^{[1
]}

Chen, Tong ^{[1
]}

Huang, Zi ^{[1
]}

Zheng, Kai ^{[2
]}

机构：

[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610056, Sichuan, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 04期

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Heterogeneous networks; Task analysis; Parallel processing; Data models; Pipelines; Computational modeling; Training; Decentralized network embedding; heterogeneous networks; link prediction; node classification;

D O I：

10.1109/TKDE.2022.3141951

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modeling heterogeneity by extraction and exploitation of high-order information from heterogeneous information networks (HINs) has been attracting immense research attention in recent times. Such heterogeneous network embedding (HNE) methods effectively harness the heterogeneity of small-scale HINs. However, in the real world, the size of HINs grow exponentially with the continuous introduction of new nodes and different types of links, making it a billion-scale network. Learning node embeddings on such HINs creates a performance bottleneck for existing HNE methods that are commonly centralized, i.e., complete data and the model are both on a single machine. To address large-scale HNE tasks with strong efficiency and effectiveness guarantee, we present Decentralized Embedding Framework for Heterogeneous Information Network (DeHIN) in this paper. In DeHIN, we generate a distributed parallel pipeline that utilizes hypergraphs in order to infuse parallelization into the HNE task. DeHIN presents a context preserving partition mechanism that innovatively formulates a large HIN as a hypergraph, whose hyperedges connect semantically similar nodes. Our framework then adopts a decentralized strategy to efficiently partition HINs by adopting a tree-like pipeline. Then, each resulting subnetwork is assigned to a distributed worker, which employs the deep information maximization theorem to locally learn node embeddings from the partition it receives. We further devise a novel embedding alignment scheme to precisely project independently learned node embeddings from all subnetworks onto a common vector space, thus allowing for downstream tasks like link prediction and node classification. As shown from our experimental results, DeHIN significantly improves the efficiency and accuracy of existing HNE models as well as outperforms the large-scale graph embedding frameworks by efficiently scaling up to large-scale HINs.

引用

页码：3645 / 3657

页数：13

共 50 条

[1] Decentralized Embedding Framework for Large-Scale Networks
Imran, Mubashir
Yin, Hongzhi
Chen, Tong
Shao, Yingxia
Zhang, Xiangliang
Zhou, Xiaofang
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 425 - 441
[2] A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks
Huang, Xiao
Li, Jundong
Zou, Na
Hu, Xia
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
[3] DDHH: A Decentralized Deep Learning Framework for Large-scale Heterogeneous Networks
Imran, Mubashir
Yin, Hongzhi
Chen, Tong
Huang, Zi
Zhang, Xiangliang
Zheng, Kai
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2033 - 2038
[4] A flexible aggregation framework on large-scale heterogeneous information networks
Yin, Dan
Gao, Hong
[J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 186 - 203
[5] Large-Scale Heterogeneous Feature Embedding
Huang, Xiao
Song, Qingquan
Yang, Fan
Hu, Xia
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
[6] COSINE: Compressive Network Embedding on Large-Scale Information Networks
Zhang, Zhengyan
Yang, Cheng
Liu, Zhiyuan
Sun, Maosong
Fang, Zhichong
Zhang, Bo
Lin, Leyu
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3655 - 3668
[7] An Adaptive Embedding Framework for Heterogeneous Information Networks
Chen, Daoyuan
Li, Yaliang
Ding, Bolin
Shen, Ying
[J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 165 - 174
[8] PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks
Tang, Jian
Qu, Meng
Mei, Qiaozhu
[J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1165 - 1174
[9] A Framework of Transferring Structures Across Large-scale Information Networks
Xue, Shan
Lu, Jie
Zhang, Guangquan
Xiong, Li
[J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[10] Decentralized Ranking in Large-Scale Overlay Networks
Montresor, Alberto
Jelasity, Mark
Babaoglu, Ozalp
[J]. SASOW 2008: SECOND IEEE INTERNATIONAL CONFERENCE ON SELF-ADAPTIVE AND SELF-ORGANIZING SYSTEMS WORKSHOPS, PROCEEDINGS, 2008, : 208 - +

← 1 2 3 4 5 →