Top-k Similarity Join in Heterogeneous Information Networks

被引:44
|
作者
Xiong, Yun [1 ]
Zhu, Yangyong [1 ]
Yu, Philip S. [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Data Sci, Shanghai 200433, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会; 上海市科技启明星计划;
关键词
Heterogeneous network; graph; similarity join; ATRIAL-FIBRILLATION; ALGORITHMS; SEARCH;
D O I
10.1109/TKDE.2014.2373385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
引用
收藏
页码:1710 / 1723
页数:14
相关论文
共 50 条
  • [31] Scalable top-k query on information networks with hierarchical inheritance relations
    Fubao Wu
    Lixin Gao
    [J]. Distributed and Parallel Databases, 2024, 42 : 1 - 30
  • [32] Exploratory product search using top-k join queries
    Gkorgkas, Orestis
    Vlachou, Akrivi
    Doulkeridis, Christos
    Norvag, Kjetil
    [J]. INFORMATION SYSTEMS, 2017, 64 : 75 - 92
  • [33] Using semantic links to support top-K join queries in peer-to-peer networks
    Liu, Jie
    Feng, Liang
    Zhuge, Hai
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (15): : 2031 - 2046
  • [34] Holistic Top-k Simple Shortest Path Join in Graphs
    Gao, Jun
    Yu, Jeffrey Xu
    Qiu, Huida
    Jiang, Xiao
    Wang, Tengjiao
    Yang, Dongqing
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) : 665 - 677
  • [35] Distributed Top-K Join Queries Optimizing for RDF Datasets
    Gu, Jinguang
    Dong, Hao
    Liu, Zhao
    Xu, Fangfang
    [J]. INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2017, 14 (03) : 67 - 83
  • [36] TJJE: An efficient algorithm for top-k join on massive data
    Han, Xixian
    Li, Jianzhong
    Wang, Jinbao
    Yang, Donghua
    [J]. INFORMATION SCIENCES, 2013, 222 : 362 - 383
  • [37] Approximate Top-k Inner Product Join with a Proximity Graph
    Nakama, Hayato
    Amagata, Daichi
    Hara, Takahiro
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4468 - 4471
  • [38] Top-k Spatio-textual Similarity Search
    Liu, Sitong
    Chu, Yaping
    Hu, Huiqi
    Feng, Jianhua
    Zhu, Xuan
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 602 - 614
  • [39] Indexing and progressive top-k similarity retrieval of trajectories
    Nikolaos Pliakis
    Eleftherios Tiakas
    Yannis Manolopoulos
    [J]. World Wide Web, 2021, 24 : 51 - 83
  • [40] On Perspective-Aware Top-k Similarity Search in Multi-relational Networks
    Zhang, Yinglong
    Li, Cuiping
    Chen, Hong
    Sheng, Likun
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 171 - 187