Top-k Similarity Join in Heterogeneous Information Networks

被引：44

作者：

Xiong, Yun ^{[1
]}

Zhu, Yangyong ^{[1
]}

Yu, Philip S. ^{[2
]}

机构：

[1] Fudan Univ, Shanghai Key Lab Data Sci, Shanghai 200433, Peoples R China

[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2015年 / 27卷 / 06期

基金：

美国国家科学基金会; 上海市科技启明星计划;

关键词：

Heterogeneous network; graph; similarity join; ATRIAL-FIBRILLATION; ALGORITHMS; SEARCH;

D O I：

10.1109/TKDE.2014.2373385

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.

引用

页码：1710 / 1723

页数：14

共 50 条

[21] Efficient top-k similarity join processing over multi-valued objects
Wenjie Zhang
Liming Zhan
Ying Zhang
Muhammad Aamir Cheema
Xuemin Lin
[J]. World Wide Web, 2014, 17 : 285 - 309
[22] Top-K Interesting Subgraph Discovery in Information Networks
Gupta, Manish
Gao, Jing
Yan, Xifeng
Cam, Hasan
Han, Jiawei
[J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 820 - 831
[23] Optimal Join Algorithms Meet Top-k
Tziavelis, Nikolaos
Gatterbauer, Wolfgang
Riedewald, Mirek
[J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2659 - 2665
[24] Top-k String Similarity Joins
Qi, Shuyao
Bouros, Panagiotis
Mamoulis, Nikos
[J]. PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,
[25] On Top-k Structural Similarity Search
Lee, Pei
Lakshmanan, Laks V. S.
Yu, Jeffrey Xu
[J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 774 - 785
[26] Top-k Set Similarity Joins
Xiao, Chuan
Wang, Wei
Lin, Xuemin
Shang, Haichuan
[J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 916 - +
[27] Fast top-k similarity search in large dynamic attributed networks
Meng, Zaiqiao
Shen, Hong
[J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
[28] Supporting top-k join queries in relational databases
Ilyas, IF
Aref, WG
Elmagarmid, AK
[J]. VLDB JOURNAL, 2004, 13 (03): : 207 - 221
[29] A Distributed Approach for Top-k Star Queries on Massive Information Networks
Jin, Jiahui
Khemmarat, Samamon
Gao, Lixin
Luo, Junzhou
[J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 9 - 16
[30] Scalable top-k query on information networks with hierarchical inheritance relations
Wu, Fubao
Gao, Lixin
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2024, 42 (01) : 1 - 30

← 1 2 3 4 5 →