Similarity-based data reduction techniques

被引:0
|
作者
Guo, G [1 ]
Wang, H
Bell, D
机构
[1] Univ Ulster, Sch Comp & Math, Coleraine BT37 0QB, Londonderry, North Ireland
[2] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
[3] Queens Univ Belfast, Sch Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-nearest neighbours (kNN) is a simple but effective method for classification. Its major drawbacks are (1) low efficiency, and (2) dependency on the selection of a "good value" for k. In this paper, we propose a novel similarity-based data reduction method (SBModel) together with three variants aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the amount of data needed for classification, thus making classification faster. Experiments conducted on some public data sets show that SBModel and its variants compare well with C5.0, kNN, wkNN, and other data reduction methods in both efficiency and effectiveness.
引用
收藏
页码:211 / 232
页数:22
相关论文
共 50 条
  • [1] Similarity-based data reduction and classification
    Guo, GD
    Wang, H
    Bell, D
    Liao, ZN
    Monitoring, Security, and Rescue Techniques in Multiagent Systems, 2005, : 227 - 238
  • [2] Evaluating Similarity-based Trace Reduction Techniques for Scalable Performance Analysis
    Mohror, Kathryn
    Karavanic, Karen L.
    PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,
  • [3] Similarity-based chemical clustering techniques
    Gute, BD
    Basak, SC
    Mills, D
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U789 - U789
  • [4] Interactive Data Visualization Using Dimensionality Reduction and Similarity-Based Representations
    Rosero-Montalvo, P.
    Diaz, P.
    Salazar-Castro, J. A.
    Pena-Unigarro, D. F.
    Anaya-Isaza, A. J.
    Alvarado-Perez, J. C.
    Theron, R.
    Peluffo-Ordonez, D. H.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 334 - 342
  • [5] Similarity-based Method for Reduction of Fuzzy Rules
    Garcia-Garcia, Arturo
    Reformat, Marek Z.
    Mendez-Vazquez, Andres
    2016 ANNUAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY (NAFIPS), 2016,
  • [6] Federated similarity-based learning with incomplete data
    Pekala, Barbara
    Szkola, Jaroslaw
    Dyczkowski, Krzysztof
    Wilbik, Anna
    2023 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ, 2023,
  • [7] Extensible and similarity-based grouping for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 277 - 277
  • [8] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [9] Efficient similarity-based operations for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    DATA & KNOWLEDGE ENGINEERING, 2004, 48 (03) : 361 - 387
  • [10] Similarity-Based Compression of GPS Trajectory Data
    Birnbaum, Jeremy
    Meng, Hsiang-Cheng
    Hwang, Jeong-Hyon
    Lawson, Catherine
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING FOR GEOSPATIAL RESEARCH AND APPLICATION (COM.GEO), 2013, : 92 - 95