Multiplicative distance: a method to alleviate distance instability for high-dimensional data

被引:0
|
作者
Jafar Mansouri
Morteza Khademi
机构
[1] Ferdowsi University of Mashhad,Department of Electrical Engineering
来源
关键词
Distance instability; High-dimensional data; Minkowski and fractional norms; Multiplicative and additive distances;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, it has been shown that under a broad set of conditions, the commonly used distance functions will become unstable in high-dimensional data space; i.e., the distance to the farthest data point approaches the distance to the nearest data point of a given query point with increasing dimensionality. It has been shown that if dimensions are independently distributed, and normalized to have zero mean and unit variance, instability happens. In this paper, it is shown that the normalization condition is not necessary, but all appropriate moments must be finite. Furthermore, a new distance function, namely multiplicative distance, is introduced. It is theoretically proved that this function is stable for data with independent dimensions (with identical or nonidentical distribution). In contrast to usual distance functions which are based on the summation of distances over all dimensions (distance components), the multiplicative distance is based on the multiplication of distance components. Experimental results show the stability of the multiplicative distance for data with independent and correlated dimensions in the high-dimensional space and the superiority of the multiplicative distance over the norm distances for the high-dimensional data.
引用
收藏
页码:783 / 805
页数:22
相关论文
共 50 条
  • [31] Asymptotic properties of the misclassification rates for Euclidean Distance Discriminant rule in high-dimensional data
    Watanabe, Hiroki
    Hyodo, Masashi
    Seo, Takashi
    Pavlenko, Tatjana
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2015, 140 : 234 - 244
  • [32] Instability results for Euclidean distance, nearest neighbor search on high dimensional Gaussian data
    Giannella, Chris R.
    [J]. INFORMATION PROCESSING LETTERS, 2021, 169
  • [33] EIGENVALUE DISTRIBUTION OF A HIGH-DIMENSIONAL DISTANCE COVARIANCE MATRIX WITH APPLICATION
    Li, Weiming
    Wang, Qinwen
    Yao, Jianfeng
    [J]. STATISTICA SINICA, 2023, 33 (01) : 149 - 168
  • [34] An ISVM Algorithm Based on High-Dimensional Distance and Forgetting Characteristics
    Xie, Wenhao
    Li, Jinfeng
    Li, Juanni
    Wang, Xiaoyan
    [J]. SCIENTIFIC PROGRAMMING, 2022, 2022
  • [35] A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data
    Chen, Yewang
    Tang, Shengyu
    Bouguila, Nizar
    Wang, Cheng
    Du, Jixiang
    Li, HaiLin
    [J]. PATTERN RECOGNITION, 2018, 83 : 375 - 387
  • [36] Out-of-Distribution Detection in High-Dimensional Data Using Mahalanobis Distance - Critical Analysis
    Maciejewski, Henryk
    Walkowiak, Tomasz
    Szyc, Kamil
    [J]. COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 262 - 275
  • [37] Weighted Distance Functions Improve Analysis of High-Dimensional Data: Application to Molecular Dynamics Simulations
    Bloechliger, Nicolas
    Caflisch, Amedeo
    Vitalis, Andreas
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2015, 11 (11) : 5481 - 5492
  • [38] Distance assessment and analysis of high-dimensional samples using variational autoencoders
    Inacio, Marco
    Izbicki, Rafael
    Gyires-Toth, Balint
    [J]. INFORMATION SCIENCES, 2021, 557 (557) : 407 - 420
  • [39] Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications
    Seungdo Jeong
    Sang-Wook Kim
    Byung-Uk Choi
    [J]. Multimedia Tools and Applications, 2009, 42 : 251 - 271
  • [40] An encoding-based dual distance tree high-dimensional index
    Yi Zhuang
    YueTing Zhuang
    Fei Wu
    [J]. Science in China Series F: Information Sciences, 2008, 51 : 1401 - 1414