Clustering with missing values: No imputation required

被引:0
|
作者
Wagstaff, K [1 ]
机构
[1] CALTECH, Jet Prop Lab, Pasadena, CA 91125 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as being just as reliable as the observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
引用
收藏
页码:649 / 658
页数:10
相关论文
共 50 条
  • [1] Missing values imputation for a clustering genetic algorithm
    Hruschka, ER
    Hruschka, ER
    Ebecken, NFF
    [J]. ADVANCES IN NATURAL COMPUTATION, PT 3, PROCEEDINGS, 2005, 3612 : 245 - 254
  • [2] A Novel Algorithm for the Integration of the Imputation of Missing Values and Clustering
    Ben Ihay, Roni
    Herman, Maya
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2015, 2015, 9166 : 115 - 129
  • [3] FUSAIN: Combining Functional Dependencies and Clustering for Missing Values Imputation
    Wu, Huaiguang
    Li, Shuaichao
    Shi, Wenjun
    Du, Shaoqing
    [J]. ENGINEERING LETTERS, 2022, 30 (02) : 513 - 521
  • [4] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Rabea Aschenbruck
    Gero Szepannek
    Adalbert F. X. Wilhelm
    [J]. Journal of Classification, 2023, 40 : 2 - 24
  • [5] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Aschenbruck, Rabea
    Szepannek, Gero
    Wilhelm, Adalbert F. X.
    [J]. JOURNAL OF CLASSIFICATION, 2023, 40 (01) : 2 - 24
  • [6] Multiple imputation of missing values
    Royston, Patrick
    [J]. STATA JOURNAL, 2004, 4 (03): : 227 - 241
  • [7] Sequential imputation for missing values
    Verboven, Sabine
    Branden, Karlien Vanden
    Goos, Peter
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2007, 31 (5-6) : 320 - 327
  • [8] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [9] GBKII: An imputation method for missing values
    Zhang, Chengqi
    Zhu, Xiaofeng
    Zhang, Jilian
    Qin, Yongsong
    Zhang, Shichao
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 1080 - +
  • [10] Multiple imputation of missing values: update
    Royston, P
    [J]. STATA JOURNAL, 2005, 5 (02): : 188 - 201