Semi-Supervised Consensus Clustering: Reducing Human Effort

被引:1
|
作者
Vogel, Tobias [1 ]
Naumann, Felix [1 ]
机构
[1] Hasso Plattner Inst, Potsdam, Germany
关键词
D O I
10.1109/ICDMW.2014.97
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine-based clustering yields fuzzy results. For example, when detecting duplicates in a dataset, different tools might end up with different clusterings. Eventually, a decision needs to be made, defining which records are in the same cluster, i.e., are duplicates. Such a definitive result is called a Consensus Clustering and can be created by evaluating the clustering attempts against each other and only resolving the disagreements by human experts. Yet, there can be different consensus clusterings, depending on the choice of disagreements presented to the human expert. In particular, they may require a different number of manual inspections. We present a set of strategies to select the smallest set of manual inspections to arrive at a consensus clustering and evaluate their efficiency on a set of real-world and synthetic datasets.
引用
收藏
页码:1095 / 1104
页数:10
相关论文
共 50 条
  • [1] Semi-supervised consensus clustering based on closed patterns
    Yang, Tianshu
    Pasquier, Nicolas
    Precioso, Frederic
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 235
  • [2] Semi-Supervised Consensus Clustering for ECG Pathology Classification
    Aidos, Helena
    Lourenco, Andre
    Batista, Diana
    Bulo, Samuel Rota
    Fred, Ana
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2015, 9286 : 150 - 164
  • [3] Semi-supervised consensus clustering for gene expression data analysis
    Wang, Yunli
    Pan, Youlian
    [J]. BIODATA MINING, 2014, 7
  • [4] Semi-Supervised Clustering Ensemble Based on Cluster Consensus Selection
    Liu, Yanxi
    Al-Khafaji, Ali Hussein Demin
    [J]. CYBERNETICS AND SYSTEMS, 2022,
  • [5] Semi-supervised Consensus Clustering Based on Frequent Closed Itemsets
    Yang, Tianshu
    Pasquier, Nicolas
    Hom, Antoine
    Dolle, Laurent
    Precioso, Frederic
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3341 - 3344
  • [6] Semi-supervised consensus clustering for gene expression data analysis
    Yunli Wang
    Youlian Pan
    [J]. BioData Mining, 7
  • [7] Semi-supervised clustering methods
    Bair, Eric
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [8] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [9] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    [J]. INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [10] Multiview Semi-Supervised Learning with Consensus
    Li, Guangxia
    Chang, Kuiyu
    Hoi, Steven C. H.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (11) : 2040 - 2051