Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

被引:0
|
作者
Kai Ming Ting
Ye Zhu
Mark Carman
Yue Zhu
Takashi Washio
Zhi-Hua Zhou
机构
[1] Federation University,School of Engineering and Information Technology
[2] Deakin University,School of Information Technology
[3] Monash University,Faculty of Information Technology
[4] Nanjing University,National Key Laboratory for Novel Software Technology
[5] Osaka University,The Institute of Scientific and Industrial Research
来源
Machine Learning | 2019年 / 108卷
关键词
Nearest neighbour; Distance metric; Lowest probability mass neighbour; Mass-based dissimilarity; Classification; Clustering;
D O I
暂无
中图分类号
学科分类号
摘要
The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models. We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into lowest probability mass neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks. Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant.
引用
收藏
页码:331 / 376
页数:45
相关论文
共 44 条
  • [31] Enhancing the scalability of distance-based link prediction algorithms in recommender systems through similarity selection
    Su, Zhan
    Huang, Zhong
    Ai, Jun
    Zhang, Xuanxiong
    Shang, Lihui
    Zhao, Fengyu
    PLOS ONE, 2022, 17 (07):
  • [32] A Comparison of Distance-based Semi-Supervised Fuzzy c-Means Clustering Algorithms
    Lai, Daphne Teck Ching
    Garibaldi, Jonathan M.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1580 - 1586
  • [33] Change detection using distance-based algorithms between synthetic aperture radar polarimetric decompositions
    Najafi, Amir
    Hasanlou, Mahdi
    Akbari, Vahid
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2019, 40 (15) : 6084 - 6097
  • [34] A performance comparison of distance-based query algorithms using R-trees in spatial databases
    Corral, Antonio
    Almendros-Jimenez, Jesus M.
    INFORMATION SCIENCES, 2007, 177 (11) : 2207 - 2237
  • [35] A Mahalanobis Distance-based Fitness Approximation Method for Estimation of Distribution Algorithms in Solving Expensive Optimization Problems
    Liang, Yongsheng
    Ren, Zhigang
    Yang, Yang
    Guo, Daofu
    Pang, Bei
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1608 - 1613
  • [36] Experiential and Stochastic Learning Algorithms Based on the Probability of a Fuzzy Event and Modified Fuzzy Metric Distance in Intelligent Robotic Part Micro-Assembly
    Son, Changman
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2022, 30 (02) : 311 - 333
  • [37] Soil Classification System from Cone Penetration Test Data Applying Distance-Based Machine Learning Algorithms
    Carvalho, L. O.
    Ribeiro, D. B.
    SOILS AND ROCKS, 2019, 42 (02): : 167 - 178
  • [38] Combinations of Simplex and Weighted Distance-Based Grey Wolf Algorithms for Seismic Source Parameter Inversion with GPS Measurements
    Wang, Leyang
    Sun, Longxiang
    Xu, Guangyu
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (07): : 1140 - 1154
  • [39] A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms
    Shibnath Mukherjee
    Zhiyuan Chen
    Aryya Gangopadhyay
    The VLDB Journal, 2006, 15 : 293 - 315
  • [40] A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms
    Mukherjee, Shibnath
    Chen, Zhiyuan
    Gangopadhyay, Aryya
    VLDB JOURNAL, 2006, 15 (04): : 293 - 315