Comparing top-k XML lists

被引:2
|
作者
Varadarajan, Ramakrishna [1 ]
Farfan, Fernando [2 ]
Hristidis, Vagelis [3 ]
机构
[1] Hewlett Packard Corp, Billerica, MA 01821 USA
[2] Univ Michigan, Dept Comp Sci & Engn, Ann Arbor, MI 48109 USA
[3] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA USA
基金
美国国家科学基金会;
关键词
Total mapping; Partial mapping; Similarity distance; Position distance; TREE EDIT DISTANCE; DOCUMENTS;
D O I
10.1016/j.is.2013.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Systems that produce ranked lists of results are abundant. For instance, Web search engines return ranked lists of Web pages. There has been work on distance measure for list permutations, like Kendall tau and Spearman's footrule, as well as extensions to handle top-k lists, which are more common in practice. In addition to ranking whole objects (e.g., Web pages), there is an increasing number of systems that provide keyword search on XML or other semistructured data, and produce ranked lists of XML sub-trees. Unfortunately, previous distance measures are not suitable for ranked lists of sub-trees since they do not account for the possible overlap between the returned sub-trees. That is, two sub-trees differing by a single node would be considered separate objects. In this paper, we present the first distance measures for ranked lists of sub-trees, and show under what conditions these measures are metrics. Furthermore, we present algorithms to efficiently compute these distance Measures. Finally, we evaluate and compare the proposed measures on real data using three popular XML keyword proximity search systems. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:820 / 834
页数:15
相关论文
共 50 条
  • [1] Mallows Models for Top-k Lists
    Chierichetti, Flavio
    Dasgupta, Anirban
    Haddadan, Shahrzad
    Kumar, Ravi
    Lattanzi, Silvio
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [2] Comparing top k lists
    Fagin, R
    Kumar, R
    Sivakumar, D
    [J]. PROCEEDINGS OF THE FOURTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2003, : 28 - 36
  • [3] Comparing top k lists
    Fagin, R
    Kumar, R
    Sivakumar, D
    [J]. SIAM JOURNAL ON DISCRETE MATHEMATICS, 2003, 17 (01) : 134 - 160
  • [4] Discovering Diverse Top-K Characteristic Lists
    Lopez-Martinez-Carrasco, Antonio
    Proenca, Hugo M.
    Juarez, Jose M.
    van Leeuwen, Matthijs
    Campos, Manuel
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023, 2023, 13876 : 262 - 273
  • [5] Efficient Techniques for Crowdsourced Top-k Lists
    de Alfaro, Luca
    Polychronopoulos, Vassilis
    Polyzotis, Neoklis
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4801 - 4805
  • [6] Adaptive processing of top-k queries in XML
    Marian, A
    Amer-Yahia, S
    Koudas, N
    Srivastava, D
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 162 - +
  • [7] Top-k answers for XML keyword queries
    Khanh Nguyen
    Cao, Jinli
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2012, 15 (5-6): : 485 - 515
  • [8] Top-k answers for XML keyword queries
    Khanh Nguyen
    Jinli Cao
    [J]. World Wide Web, 2012, 15 : 485 - 515
  • [9] Quick Detection of Top-k Personalized PageRank Lists
    Avrachenkov, Konstantin
    Litvak, Nelly
    Nemirovsky, Danil
    Smirnova, Elena
    Sokol, Marina
    [J]. ALGORITHMS AND MODELS FOR THE WEB GRAPH, 2011, 6732 : 50 - 61
  • [10] Automatic Extraction of Top-k Lists from the Web
    Zhang, Zhixian
    Zhu, Kenny Q.
    Wang, Haixun
    Li, Hongsong
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1057 - 1068