Top-k Similarity Join in Heterogeneous Information Networks

被引:44
|
作者
Xiong, Yun [1 ]
Zhu, Yangyong [1 ]
Yu, Philip S. [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Data Sci, Shanghai 200433, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会; 上海市科技启明星计划;
关键词
Heterogeneous network; graph; similarity join; ATRIAL-FIBRILLATION; ALGORITHMS; SEARCH;
D O I
10.1109/TKDE.2014.2373385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
引用
收藏
页码:1710 / 1723
页数:14
相关论文
共 50 条
  • [21] Efficient top-k similarity join processing over multi-valued objects
    Wenjie Zhang
    Liming Zhan
    Ying Zhang
    Muhammad Aamir Cheema
    Xuemin Lin
    [J]. World Wide Web, 2014, 17 : 285 - 309
  • [22] Top-K Interesting Subgraph Discovery in Information Networks
    Gupta, Manish
    Gao, Jing
    Yan, Xifeng
    Cam, Hasan
    Han, Jiawei
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 820 - 831
  • [23] Optimal Join Algorithms Meet Top-k
    Tziavelis, Nikolaos
    Gatterbauer, Wolfgang
    Riedewald, Mirek
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2659 - 2665
  • [24] Top-k String Similarity Joins
    Qi, Shuyao
    Bouros, Panagiotis
    Mamoulis, Nikos
    [J]. PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,
  • [25] On Top-k Structural Similarity Search
    Lee, Pei
    Lakshmanan, Laks V. S.
    Yu, Jeffrey Xu
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 774 - 785
  • [26] Top-k Set Similarity Joins
    Xiao, Chuan
    Wang, Wei
    Lin, Xuemin
    Shang, Haichuan
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 916 - +
  • [27] Fast top-k similarity search in large dynamic attributed networks
    Meng, Zaiqiao
    Shen, Hong
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [28] Supporting top-k join queries in relational databases
    Ilyas, IF
    Aref, WG
    Elmagarmid, AK
    [J]. VLDB JOURNAL, 2004, 13 (03): : 207 - 221
  • [29] A Distributed Approach for Top-k Star Queries on Massive Information Networks
    Jin, Jiahui
    Khemmarat, Samamon
    Gao, Lixin
    Luo, Junzhou
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 9 - 16
  • [30] Scalable top-k query on information networks with hierarchical inheritance relations
    Wu, Fubao
    Gao, Lixin
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2024, 42 (01) : 1 - 30