Top-k Similarity Join in Heterogeneous Information Networks

被引:44
|
作者
Xiong, Yun [1 ]
Zhu, Yangyong [1 ]
Yu, Philip S. [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Data Sci, Shanghai 200433, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会; 上海市科技启明星计划;
关键词
Heterogeneous network; graph; similarity join; ATRIAL-FIBRILLATION; ALGORITHMS; SEARCH;
D O I
10.1109/TKDE.2014.2373385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
引用
收藏
页码:1710 / 1723
页数:14
相关论文
共 50 条
  • [1] Semantic Enhanced Top-k Similarity Search on Heterogeneous Information Networks
    Yu, Minghe
    Zhang, Yun
    Zhang, Tiancheng
    Yu, Ge
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 104 - 119
  • [2] Top-k Tree Similarity Join
    Wang, Jianhua
    Yang, Jianye
    Zhang, Wenjie
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1939 - 1948
  • [3] Fast top-k similarity join for SimRank
    Li, Ruiqi
    Zhao, Xiang
    Shang, Haichuan
    Chen, Yifan
    Xiao, Weidong
    [J]. INFORMATION SCIENCES, 2017, 381 : 1 - 19
  • [4] Top-k Spatio-Textual Similarity Join
    Hu, Huiqi
    Li, Guoliang
    Bao, Zhifeng
    Feng, Jianhua
    Wu, Yongwei
    Gong, Zhiguo
    Xu, Yaoqiang
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1576 - 1577
  • [5] Top-k Spatio-Textual Similarity Join
    Hu, Huiqi
    Li, Guoliang
    Bao, Zhifeng
    Feng, Jianhua
    Wu, Yongwei
    Gong, Zhiguo
    Xu, Yaoqiang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 551 - 565
  • [6] SETJoin: a novel top-k similarity join algorithm
    Hongya Wang
    Lihong Yang
    Yingyuan Xiao
    [J]. Soft Computing, 2020, 24 : 14577 - 14592
  • [7] Top-k similarity search in heterogeneous information networks with x-star network schema
    Zhang, Mingxi
    Hu, Hao
    He, Zhenying
    Wang, Wei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (02) : 699 - 712
  • [8] PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
    Sunt, Yizhou
    Hant, Jiawei
    Yant, Xifeng
    Yu, Philip S.
    Wuo, Tianyi
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (11): : 992 - 1003
  • [9] Parallel Top-K Similarity Join Algorithms Using MapReduce
    Kim, Younghoon
    Shim, Kyuseok
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 510 - 521
  • [10] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1603 - 1604