Similarity-based data reduction techniques

被引:0
|
作者
Guo, G [1 ]
Wang, H
Bell, D
机构
[1] Univ Ulster, Sch Comp & Math, Coleraine BT37 0QB, Londonderry, North Ireland
[2] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
[3] Queens Univ Belfast, Sch Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-nearest neighbours (kNN) is a simple but effective method for classification. Its major drawbacks are (1) low efficiency, and (2) dependency on the selection of a "good value" for k. In this paper, we propose a novel similarity-based data reduction method (SBModel) together with three variants aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the amount of data needed for classification, thus making classification faster. Experiments conducted on some public data sets show that SBModel and its variants compare well with C5.0, kNN, wkNN, and other data reduction methods in both efficiency and effectiveness.
引用
收藏
页码:211 / 232
页数:22
相关论文
共 50 条
  • [21] Similarity-Based Fast Analysis of Data Center Networks
    Narayana, Shruti Yadav
    Shriver, Emily
    O'Neal, Kenneth
    Yildirim, Nuriye
    Begaliyeva, Khamida
    Ogras, Umit Y.
    IEEE DESIGN & TEST, 2023, 40 (06) : 100 - 111
  • [22] Data integration by fuzzy similarity-based hierarchical clustering
    Ciaramella, Angelo
    Nardone, Davide
    Staiano, Antonino
    BMC BIOINFORMATICS, 2020, 21 (Suppl 10)
  • [23] Similarity-based unification
    Formato, Ferrante
    Gerla, Giangiacomo
    Sessa, Maria I.
    Fundamenta Informaticae, 2000, 41 (04) : 393 - 414
  • [24] On Similarity-Based Unfolding
    Moreno, Gines
    Penabad, Jaime
    Antonio Riaza, Jose
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2017), 2017, 10564 : 420 - 426
  • [25] Derivation digraphs for dependencies in ordinal and similarity-based data
    Urbanova, Lucie
    Vychodil, Vilem
    INFORMATION SCIENCES, 2014, 268 : 381 - 396
  • [26] Data integration by fuzzy similarity-based hierarchical clustering
    Angelo Ciaramella
    Davide Nardone
    Antonino Staiano
    BMC Bioinformatics, 21
  • [27] A similarity-based data warehousing environment for medical images
    Teixeira, Jefferson William
    Annibal, Luana Peixoto
    Felipe, Joaquim Cezar
    Ciferri, Ricardo Rodrigues
    de Aguiar Ciferri, Cristina Dutra
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 66 : 190 - 208
  • [28] Similarity-based second chance autoencoders for textual data
    Goudarzvand, Saria
    Gharibi, Gharib
    Lee, Yugyung
    APPLIED INTELLIGENCE, 2022, 52 (11) : 12330 - 12346
  • [29] PySEF: A python']python library for similarity-based dimensionality reduction
    Passalis, Nikolaos
    Tefas, Anastasios
    KNOWLEDGE-BASED SYSTEMS, 2018, 152 : 186 - 187
  • [30] Similarity-based Fisherfaces
    Delgado-Gomez, David
    Fagertun, Jens
    Ersboll, Bjarne
    Sukno, Federico M.
    Frangi, Alejandro F.
    PATTERN RECOGNITION LETTERS, 2009, 30 (12) : 1110 - 1116