Human-supervised clustering of multidimensional data using crowdsourcing

被引:1
|
作者
Butyaev, Alexander [1 ]
Drogaris, Chrisostomos [1 ]
Tremblay-Savard, Olivier [2 ]
Waldispuehl, Jerome [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[2] Univ Manitoba, Dept Comp Sci, Winnipeg, Manitoba, Canada
来源
ROYAL SOCIETY OPEN SCIENCE | 2022年 / 9卷 / 05期
基金
加拿大健康研究院;
关键词
data clustering; human-computing; crowdsourcing; games; MECHANICAL TURK;
D O I
10.1098/rsos.211189
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Visualization of high-dimensional data using an association of multidimensional scaling to clustering
    Naud, A
    2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 252 - 255
  • [32] Supervised clustering of high-dimensional data using regularized mixture modeling
    Chang, Wennan
    Wan, Changlin
    Zang, Yong
    Zhang, Chi
    Cao, Sha
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [33] Data-Oriented Maintenance of Clinical Pathway using Clustering and Multidimensional Scaling
    Tsumoto, Shusaku
    Hirano, Shoji
    Iwata, Haruko
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2596 - 2600
  • [34] An Approach for Classification of Network Traffic on Semi - Supervised Data using Clustering Techniques
    Shukla, Dheeraj Basant
    Chandel, Gajendra Singh
    2013 4TH NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2013), 2013,
  • [35] On clustering biological data using unsupervised and semi-supervised message passing
    Geng, HM
    Deng, XT
    Bastola, M
    Ali, H
    BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 294 - 298
  • [36] A Simulation-Based Framework for Generating Alerts for Human-Supervised Multi-Robot Teams in Challenging Environments
    Al-Hussaini, Sarah
    Gregory, Jason M.
    Dhanaraj, Neel
    Gupta, Satyandra K.
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR), 2021, : 168 - 175
  • [37] Visualization, clustering and classification of multidimensional astronomical data
    Staiano, A
    Ciaramella, A
    De Vinco, L
    Donalek, C
    Longo, G
    Raiconi, G
    Tagliaferri, R
    Amato, R
    Del Mondo, C
    Mangano, G
    Miele, G
    CAMP 2005: SEVENTH INTERNATIONAL WORKSHOP ON COMPUTER ARCHITECTURE FOR MACHINE PERCEPTION , PROCEEDINGS, 2005, : 141 - 146
  • [38] Simultaneous Registration and Clustering for Multidimensional Functional Data
    Zeng, Pengcheng
    Shi, Jian Qing
    Kim, Won-Seok
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2019, 28 (04) : 943 - 953
  • [39] Multidimensional visualization and clustering of historical process data
    Thornhill, Nina F.
    Melbo, Hallgeir
    Wiik, Jan
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2006, 45 (17) : 5971 - 5985
  • [40] Combining supervised and unsupervised learning for data clustering
    Paolo Corsini
    Beatrice Lazzerini
    Francesco Marcelloni
    Neural Computing & Applications, 2006, 15 : 289 - 297