Efficient top-k similarity join processing over multi-valued objects

被引:0
|
作者
Wenjie Zhang
Liming Zhan
Ying Zhang
Muhammad Aamir Cheema
Xuemin Lin
机构
[1] University of New South Wales,School of Computer Science & Engineering
来源
World Wide Web | 2014年 / 17卷
关键词
Query processing; Joins; Multi-valued objects;
D O I
暂无
中图分类号
学科分类号
摘要
The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal U$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal V$\end{document}, a top-k similarity join returns k pairs of most similar objects from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal U \times \mathcal V$\end{document}. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, ϕ-quantile distance and ϕ-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.
引用
收藏
页码:285 / 309
页数:24
相关论文
共 50 条
  • [1] Efficient top-k similarity join processing over multi-valued objects
    Zhang, Wenjie
    Zhan, Liming
    Zhang, Ying
    Cheema, Muhammad Aamir
    Lin, Xuemin
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (03): : 285 - 309
  • [2] Top-k Tree Similarity Join
    Wang, Jianhua
    Yang, Jianye
    Zhang, Wenjie
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1939 - 1948
  • [3] Efficient Top-k Join Processing over Encrypted Data in a Cloud Environment
    Kim, Jong Wook
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2016, 10 (10): : 5153 - 5170
  • [4] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1603 - 1604
  • [5] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    Yu, Minghe
    Li, Guoliang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 317 - 328
  • [6] Processing Top-k Join Queries
    Wu, Minji
    Berti-Equille, Laure
    Marian, Amelie
    Procopiuc, Cecilia M.
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 860 - 870
  • [7] Fast top-k similarity join for SimRank
    Li, Ruiqi
    Zhao, Xiang
    Shang, Haichuan
    Chen, Yifan
    Xiao, Weidong
    [J]. INFORMATION SCIENCES, 2017, 381 : 1 - 19
  • [8] Efficient Top-k Similarity Join of Massive Time Series Using MapReduce
    Chen, Dehua
    Shen, Changgan
    Li, Yue
    Le, Jiajin
    Rong, Chunming
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 1025 - 1032
  • [9] Top-k Similarity Join in Heterogeneous Information Networks
    Xiong, Yun
    Zhu, Yangyong
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1710 - 1723
  • [10] Top-k Spatio-Textual Similarity Join
    Hu, Huiqi
    Li, Guoliang
    Bao, Zhifeng
    Feng, Jianhua
    Wu, Yongwei
    Gong, Zhiguo
    Xu, Yaoqiang
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1576 - 1577