Clustering Data with the Presence of Missing Values by Ensemble Approach

被引：0

作者：

Pattanodom, Mullika ^{[1
]}

Iam-On, Natthakan ^{[1
]}

Boongoen, Tossapon ^{[2
]}

机构：

[1] Mae Fah Luang Univ, Sch Informat Technol, Chiang Rai, Thailand

[2] Navaminda Kasattriyadhiraj Royal Air Force Acad, Dept Math & Comp Sci, Bangkok, Thailand

来源：

2016 SECOND ASIAN CONFERENCE ON DEFENCE TECHNOLOGY (ACDT) | 2016年

关键词：

data clustering; missing value; cluster ensemble; random imputation;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The problem of missing values arise as one of the major difficulties in data mining and the downstreaming applications. In fact, most of the analytical techniques established in this field have been developed to handle a complete data set. Imputing or filling in missing values is generally regarded as a data preprocessing task, for which several methods has been introduced. These include a collection of statistical alternatives such as average and zero imputes, as well as learning-led models like nearest neighbors and regression. As for cluster analysis, various clustering algorithms, even k-means the most well-known, are hardly design to handle such a problem. This is also the case with cluster ensembles, where an improved decision is generated upon multiple results of clustering complete data. The paper presents a new framework that allows clustering incomplete data without the usual preprocessing step. Intuitively, different versions of the original data can be created by filling in those unknown values with arbitrary ones. This random selection is simple and efficient, while promotes the diversity within an ensemble, hence its quality. In particular, Binary cluster-association matrix (BA) has been adopted to summarize ensemble information, from which k-means is exploited to derive the final clustering. The proposed model is evaluated against a number of benchmark imputation methods, over different datasets obtained from UCI repository. Based on the evaluation metric of cluster accuracy (CA), the findings suggest more accurate outcome is usually observed with the new framework. This motivates an application of the proposed approach to problems specific to Thai armed forces, such as identification of attacks that is presently in the spotlight for cyber security.

引用

页码：151 / 156

页数：6

共 50 条

[1] A dynamic ensemble approach to robust classification in the presence of missing data
Conroy, Bryan
Eshelman, Larry
Potes, Cristhian
Xu-Wilson, Minnan
MACHINE LEARNING, 2016, 102 (03) : 443 - 463
[2] A dynamic ensemble approach to robust classification in the presence of missing data
Bryan Conroy
Larry Eshelman
Cristhian Potes
Minnan Xu-Wilson
Machine Learning, 2016, 102 : 443 - 463
[3] Regression in the presence missing data using ensemble methods
Hassan, Mostafa M.
Atiya, Amir F.
El-Gayar, Neamat
El-Fouly, Raafat
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1261 - +
[4] Data clustering using proximity matrices with missing values
Karimzadeh, Samira
Olafsson, Sigurdur
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 126 : 265 - 276
[5] Clustering mixed numerical and categorical data with missing values
Dinh, Duy-Tai
Huynh, Van-Nam
Sriboonchitta, Songsak
INFORMATION SCIENCES, 2021, 571 : 418 - 442
[6] ROUGH FUZZY SUBSPACE CLUSTERING FOR DATA WITH MISSING VALUES
Siminski, Krzysztof
COMPUTING AND INFORMATICS, 2014, 33 (01) : 131 - 153
[7] Clustering with Missing Values
Siminski, Krzysztof
FUNDAMENTA INFORMATICAE, 2013, 123 (03) : 331 - 350
[8] Verification of Improving a Clustering Algorithm for Microarray Data with Missing Values
Kim, SuYoung
KOREAN JOURNAL OF APPLIED STATISTICS, 2011, 24 (02) : 315 - 321
[9] Compressive Sensing and Hierarchical Clustering for Microarray Data with Missing Values
Ciaramellila, Angelo
Nardone, Davide
Staiano, Antonino
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2018, 2020, 11925 : 3 - 10
[10] An Ensemble Learning Approach for Data Stream Clustering
Fathzadeh, Ramin
Mokhtari, Vahid
2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,

← 1 2 3 4 5 →