I/O-efficient algorithms for top-k nearest keyword search in massive graphs

被引:4
|
作者
Zhu, Qiankun [1 ]
Cheng, Hong [1 ]
Huang, Xin [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
来源
VLDB JOURNAL | 2017年 / 26卷 / 04期
关键词
I/O-efficient algorithms; Nearest keywords search; Top-k; Massive graphs;
D O I
10.1007/s00778-017-0464-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G, in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword (k-NK) search. Given a query node q in G and a keyword lambda, a k-NK query searches k nodes which contain lambda and are nearest to q. The k-NK problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a k-NK query with constant I/Os. The key to an accurate k-NK result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6: 901-912, 2013) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original k-NK query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.
引用
收藏
页码:563 / 583
页数:21
相关论文
共 50 条
  • [1] I/O-efficient algorithms for top-k nearest keyword search in massive graphs
    Qiankun Zhu
    Hong Cheng
    Xin Huang
    [J]. The VLDB Journal, 2017, 26 : 563 - 583
  • [2] Top-K Nearest Keyword Search on Large Graphs
    Qiao, Miao
    Qin, Lu
    Cheng, Hong
    Yu, Jeffrey Xu
    Tian, Wentao
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (10): : 901 - 912
  • [3] A Distributed Index for Efficient Parallel Top-k Keyword Search on Massive Graphs
    Zhong, Ming
    Liu, Mengchi
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL WORKSHOP ON WEB INFORMATION AND DATA MANAGEMENT, 2012, : 27 - 32
  • [4] Efficient Top-k Keyword Search in Graphs with Polynomial Delay
    Kargar, Mehdi
    An, Aijun
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1269 - 1272
  • [5] Privacy-Preserving Top-k Nearest Keyword Search on Outsourced Graphs
    Teng, Yiping
    Cheng, Xiang
    Su, Sen
    Bi, Rong
    [J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 815 - 822
  • [6] Exact Top-k Nearest Keyword Search in Large Networks
    Jiang, Minhao
    Fu, Ada Wai-Chee
    Wong, Raymond Chi-Wing
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 393 - 404
  • [7] Top-k Nearest Keyword Search in Public Transportation Networks
    Huang, Wuwei
    Dai, Genan
    Ge, Youming
    Liu, Yubao
    [J]. 2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 67 - 74
  • [8] Efficient Top-k Keyword Search on XML Streams
    Li, Lingli
    Wang, Hongzhi
    Li, Jianzhong
    Luo, Jizhou
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1041 - 1046
  • [9] Top-k Keyword Search Over Graphs Based On Backward Search
    Zeng, Jia-Hui
    Huang, Jiu-Ming
    Yang, Shu-Qiang
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [10] I/O-efficient algorithms for sparse graphs
    Toma, L
    Zeh, N
    [J]. ALGORITHMS FOR MEMORY HIERARCHIES: ADVANCED LECTURES, 2003, 2625 : 85 - 109