Similarity-based data reduction techniques

被引:0
|
作者
Guo, G [1 ]
Wang, H
Bell, D
机构
[1] Univ Ulster, Sch Comp & Math, Coleraine BT37 0QB, Londonderry, North Ireland
[2] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
[3] Queens Univ Belfast, Sch Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-nearest neighbours (kNN) is a simple but effective method for classification. Its major drawbacks are (1) low efficiency, and (2) dependency on the selection of a "good value" for k. In this paper, we propose a novel similarity-based data reduction method (SBModel) together with three variants aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the amount of data needed for classification, thus making classification faster. Experiments conducted on some public data sets show that SBModel and its variants compare well with C5.0, kNN, wkNN, and other data reduction methods in both efficiency and effectiveness.
引用
收藏
页码:211 / 232
页数:22
相关论文
共 50 条
  • [31] Relational similarity-based model of data part 2: dependencies in data
    Belohlavek, Radim
    Vychodil, Vilem
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2018, 47 (01) : 1 - 50
  • [32] A similarity-based automatic data recommendation approach for geographic models
    Zhu, Yunqiang
    Zhu, A-Xing
    Feng, Min
    Song, Jia
    Zhao, Hongwei
    Yang, Jie
    Zhang, Qiuyi
    Sun, Kai
    Zhang, Jinqu
    Yao, Ling
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2017, 31 (07) : 1403 - 1424
  • [33] Visually exploring movement data via similarity-based analysis
    Nikos Pelekis
    Gennady Andrienko
    Natalia Andrienko
    Ioannis Kopanakis
    Gerasimos Marketos
    Yannis Theodoridis
    Journal of Intelligent Information Systems, 2012, 38 : 343 - 391
  • [34] Unsupervised Similarity-based Sensor Selection for Time Series Data
    Almarri, Badar
    Rajasekaran, Sanguthevar
    Huang, Chun-Hsi
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 395 - 400
  • [35] Similarity-based attribute reduction in rough set theory: a clustering perspective
    Xiuyi Jia
    Ya Rao
    Lin Shang
    Tongjun Li
    International Journal of Machine Learning and Cybernetics, 2020, 11 : 1047 - 1060
  • [36] Random Similarity-Based Entropy/Alpha Classification of PolSAR Data
    Li, Dong
    Zhang, Yunhua
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (12) : 5712 - 5723
  • [37] Secure similarity-based cloud data deduplication in Ubiquitous city
    Liu, Jinfeng
    Wang, Jianfeng
    Tao, Xiaoling
    Shen, Jian
    PERVASIVE AND MOBILE COMPUTING, 2017, 41 : 231 - 242
  • [38] A Similarity-Based Disease Diagnosis System for Medical Big Data
    Yuan, Youwei
    Chen, Weixin
    Yan, Lamei
    Huang, Binbin
    Li, Jianyuan
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2017, 7 (02) : 364 - 370
  • [39] Similarity-Based Analytics for Trajectory Data: Theory, Algorithms and Applications
    Zheng, Kai
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 549 - 550
  • [40] Visually exploring movement data via similarity-based analysis
    Pelekis, Nikos
    Andrienko, Gennady
    Andrienko, Natalia
    Kopanakis, Ioannis
    Marketos, Gerasimos
    Theodoridis, Yannis
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 38 (02) : 343 - 391