Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

被引:0
|
作者
Kai Ming Ting
Ye Zhu
Mark Carman
Yue Zhu
Takashi Washio
Zhi-Hua Zhou
机构
[1] Federation University,School of Engineering and Information Technology
[2] Deakin University,School of Information Technology
[3] Monash University,Faculty of Information Technology
[4] Nanjing University,National Key Laboratory for Novel Software Technology
[5] Osaka University,The Institute of Scientific and Industrial Research
来源
Machine Learning | 2019年 / 108卷
关键词
Nearest neighbour; Distance metric; Lowest probability mass neighbour; Mass-based dissimilarity; Classification; Clustering;
D O I
暂无
中图分类号
学科分类号
摘要
The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models. We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into lowest probability mass neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks. Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant.
引用
收藏
页码:331 / 376
页数:45
相关论文
共 44 条
  • [1] Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms
    Ting, Kai Ming
    Zhu, Ye
    Carman, Mark
    Zhu, Yue
    Washio, Takashi
    Zhou, Zhi-Hua
    MACHINE LEARNING, 2019, 108 (02) : 331 - 376
  • [2] Distance-based outliers: algorithms and applications
    Knorr, EM
    Ng, RT
    Tucakov, V
    VLDB JOURNAL, 2000, 8 (3-4): : 237 - 253
  • [3] Distance-based outliers: algorithms and applications
    Edwin M. Knorr
    Raymond T. Ng
    Vladimir Tucakov
    The VLDB Journal, 2000, 8 : 237 - 253
  • [4] Research on algorithms for mining distance-based outliers
    Wang, LZ
    Zou, LK
    CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (03): : 485 - 490
  • [5] A distance-based selection of parents in genetic algorithms
    Drezner, Z
    Marcoulides, GA
    METAHEURISTICS: COMPUTER DECISION-MAKING, 2004, 86 : 257 - 278
  • [6] On distance-based inconsistency reduction algorithms for pairwise comparisons
    Koczkodaj, W. W.
    Szarek, S. J.
    LOGIC JOURNAL OF THE IGPL, 2010, 18 (06) : 859 - 869
  • [7] Distance-Based Decision Tree Algorithms for Label Ranking
    de Sa, Claudio Rebelo
    Rebelo, Carla
    Soares, Carlos
    Knobbe, Arno
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 525 - 534
  • [8] DISTANCE-BASED PHYLOGENETIC ALGORITHMS: NEW INSIGHTS AND APPLICATIONS
    Pompei, S.
    Caglioti, E.
    Loreto, V.
    Tria, F.
    MATHEMATICAL MODELS & METHODS IN APPLIED SCIENCES, 2010, 20 : 1511 - 1532
  • [9] Parallel algorithms for distance-based and density-based outliers
    Lozano, E
    Acuña, E
    Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 729 - 732
  • [10] Extending Distance-based Ranking Models in Estimation of Distribution Algorithms
    Ceberio, Josu
    Irurozki, Ekhine
    Mendiburu, Alexander
    Lozano, Jose A.
    2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2459 - 2466