I/O-efficient algorithms for top-k nearest keyword search in massive graphs

被引:4
|
作者
Zhu, Qiankun [1 ]
Cheng, Hong [1 ]
Huang, Xin [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
来源
VLDB JOURNAL | 2017年 / 26卷 / 04期
关键词
I/O-efficient algorithms; Nearest keywords search; Top-k; Massive graphs;
D O I
10.1007/s00778-017-0464-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G, in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword (k-NK) search. Given a query node q in G and a keyword lambda, a k-NK query searches k nodes which contain lambda and are nearest to q. The k-NK problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a k-NK query with constant I/Os. The key to an accurate k-NK result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6: 901-912, 2013) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original k-NK query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.
引用
收藏
页码:563 / 583
页数:21
相关论文
共 50 条
  • [41] Efficient Top-k Search for PageRank
    Fujiwara, Yasuhiro
    Nakatsuji, Makoto
    Shiokawa, Hiroaki
    Mishima, Takeshi
    Onizuka, Makoto
    [J]. Transactions of the Japanese Society for Artificial Intelligence, 2015, 30 (02) : 473 - 478
  • [42] Top-k Reliability Search on Uncertain Graphs
    Zhu, Rong
    Zou, Zhaonian
    Li, Jianzhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 659 - 668
  • [43] Diversified Top-k Keyword Query Interpretation on Knowledge Graphs
    Wang, Ying
    Zhong, Ming
    Zhu, Yuanyuan
    Li, Xuhui
    Qian, Tieyun
    [J]. WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 541 - 555
  • [44] Finding top-k r-cliques for keyword search from graphs in polynomial delay
    Kargar, Mehdi
    An, Aijun
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (02) : 249 - 280
  • [45] Multiway Simple Cycle Separators and I/O-Efficient Algorithms for Planar Graphs
    Arge, Lars
    van Walderveen, Freek
    Zeh, Norbert
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 901 - 918
  • [46] Social-Aware Top-k Spatial Keyword Search
    Wu, Dingming
    Li, Yafei
    Choi, Byron
    Xu, Jianliang
    [J]. 2014 IEEE 15TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM), VOL 1, 2014, : 235 - 244
  • [47] Continuous top-k spatial–keyword search on dynamic objects
    Yuyang Dong
    Chuan Xiao
    Hanxiong Chen
    Jeffrey Xu Yu
    Kunihiro Takeoka
    Masafumi Oyamada
    Hiroyuki Kitagawa
    [J]. The VLDB Journal, 2021, 30 : 141 - 161
  • [48] Efficient Top-k Retrieval on Massive Data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) : 2687 - 2699
  • [49] Scalable continual top-k keyword search in relational databases
    Xu, Yanwei
    Guan, Jihong
    Li, Fengrong
    Zhou, Shuigeng
    [J]. DATA & KNOWLEDGE ENGINEERING, 2013, 86 : 206 - 223
  • [50] top-k aggregation keyword search over relational databases
    [J]. Lin, Z. (ziyulin@xmu.edu.cn), 1600, Science Press (51):