Crowdsourced Top-k Algorithms: An Experimental Evaluation

被引:31
|
作者
Zhang, Xiaohang [1 ]
Li, Guoliang [1 ]
Feng, Jianhua [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 08期
关键词
D O I
10.14778/2921558.2921559
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crowdsourced top-k computation has attracted significant attention recently, thanks to emerging crowdsourcing platforms, e. g., Amazon Mechanical Turk and CrowdFlower. Crowdsourced top-k algorithms ask the crowd to compare the objects and infer the top-k objects based on the crowdsourced comparison results. The crowd may return incorrect answers, but traditional top-k algorithms cannot tolerate the errors from the crowd. To address this problem, the database and machine-learning communities have independently studied the crowdsourced top-k problem. The database community proposes the heuristic-based solutions while the machine-learning community proposes the learning-based methods (e. g., maximum likelihood estimation). However, these two types of techniques have not been compared systematically under the same experimental framework. Thus it is rather difficult for a practitioner to decide which algorithm should be adopted. Furthermore, the experimental evaluation of existing studies has several weaknesses. Some methods assume the crowd returns high-quality results and some algorithms are only tested on simulated experiments. To alleviate these limitations, in this paper we present a comprehensive comparison of crowdsourced top-k algorithms. Using various synthetic and real datasets, we evaluate each algorithm in terms of result quality and efficiency on real crowdsourcing platforms. We reveal the characteristics of different techniques and provide guidelines on selecting appropriate algorithms for various scenarios.
引用
收藏
页码:612 / 623
页数:12
相关论文
共 50 条
  • [41] Evaluating Top-k Algorithms with Various Sources of Data and User Preferences
    Eckhardt, Alan
    Hornicak, Erik
    Vojtas, Peter
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, 2011, 7022 : 258 - 269
  • [42] Optimal algorithms for selecting top-k combinations of attributes: theory and applications
    Lin, Chunbin
    Lu, Jiaheng
    Wei, Zhewei
    Wang, Jianguo
    Xiao, Xiaokui
    [J]. VLDB JOURNAL, 2018, 27 (01): : 27 - 52
  • [43] Top-k overlapping densest subgraphs: approximation algorithms and computational complexity
    Dondi, Riccardo
    Hosseinzadeh, Mohammad Mehdi
    Mauri, Giancarlo
    Zoppis, Italo
    [J]. JOURNAL OF COMBINATORIAL OPTIMIZATION, 2021, 41 (01) : 80 - 104
  • [44] Optimal algorithms for selecting top-k combinations of attributes: theory and applications
    Chunbin Lin
    Jiaheng Lu
    Zhewei Wei
    Jianguo Wang
    Xiaokui Xiao
    [J]. The VLDB Journal, 2018, 27 : 27 - 52
  • [45] Top-k overlapping densest subgraphs: approximation algorithms and computational complexity
    Riccardo Dondi
    Mohammad Mehdi Hosseinzadeh
    Giancarlo Mauri
    Italo Zoppis
    [J]. Journal of Combinatorial Optimization, 2021, 41 : 80 - 104
  • [46] Efficient Algorithms for Skyline Top-K Keyword Queries on XML Streams
    Li, Lingli
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 283 - 287
  • [47] Anytime measures for top-k algorithms on exact and fuzzy data sets
    Benjamin Arai
    Gautam Das
    Dimitrios Gunopulos
    Nick Koudas
    [J]. The VLDB Journal, 2009, 18 : 407 - 427
  • [48] Efficient Top-k Query Processing Algorithms in Highly Distributed Environments
    Fang, Qiming
    Yang, Guangwen
    [J]. JOURNAL OF COMPUTERS, 2014, 9 (09) : 2000 - 2006
  • [49] Anytime measures for top-k algorithms on exact and fuzzy data sets
    Arai, Benjamin
    Das, Gautam
    Gunopulos, Dimitrios
    Koudas, Nick
    [J]. VLDB JOURNAL, 2009, 18 (02): : 407 - 427
  • [50] Top-K Oracle: A New Way to Present Top-K Tuples for Uncertain Data
    Song, Chunyao
    Li, Zheng
    Ge, Tingjian
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 146 - 157