Mahalanobis distance informed by clustering

被引:6
|
作者
Lahav, Almog [1 ]
Talmon, Ronen [1 ]
Kluger, Yuval [2 ]
机构
[1] Technion Israel Inst Technol, IL-32000 Haifa, Israel
[2] Yale Univ, Sch Med, Dept Pathol, New Haven, CT 06520 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
metric learning; geometric analysis; manifold learning; intrinsic modelling; biclustering; gene expression; DIMENSIONALITY REDUCTION; GEOMETRY;
D O I
10.1093/imaiai/iay011
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan-Meier survival plot.
引用
收藏
页码:377 / 406
页数:30
相关论文
共 50 条
  • [1] Sentiment Clustering By Mahalanobis Distance
    Fattah, H. M. Abdul
    Al Masba, Masum
    Hasan, K. M. Azharul
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 478 - 481
  • [2] Kernelized Mahalanobis Distance for Fuzzy Clustering
    Zeng, Shan
    Wang, Xiuying
    Duan, Xiangjun
    Zeng, Sen
    Xiao, Zuyin
    Feng, David
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (10) : 3103 - 3117
  • [3] An Incremental Clustering Algorithm Based on Mahalanobis Distance
    Aik, Lim Eng
    Choon, Tan Wee
    [J]. INTERNATIONAL CONFERENCE ON QUANTITATIVE SCIENCES AND ITS APPLICATIONS (ICOQSIA 2014), 2014, 1635 : 788 - 793
  • [4] Improved Spectral Clustering using Adaptive Mahalanobis Distance
    Fu, Xiping
    Martin, Shawn
    Mills, Steven
    McCane, Brendan
    [J]. 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 171 - 175
  • [5] Mahalanobis Distance Based K-Means Clustering
    Brown, Paul O.
    Chiang, Meng Ching
    Guo, Shiqing
    Jin, Yingzi
    Leung, Carson K.
    Murray, Evan L.
    Pazdor, Adam G. M.
    Cuzzocrea, Alfredo
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 256 - 262
  • [6] Learning a Mahalanobis distance metric for data clustering and classification
    Xiang, Shiming
    Nie, Feiping
    Zhang, Changshui
    [J]. PATTERN RECOGNITION, 2008, 41 (12) : 3600 - 3612
  • [7] Mahalanobis distance based on fuzzy clustering algorithm for image segmentation
    Zhao, Xuemei
    Li, Yu
    Zhao, Quanhua
    [J]. DIGITAL SIGNAL PROCESSING, 2015, 43 : 8 - 16
  • [8] Robust clustering and outlier rejection using the Mahalanobis distance distribution
    Roizman, Violeta
    Jonckheere, Matthieu
    Pascal, Frederic
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 2448 - 2452
  • [9] An Approach to Online Fuzzy Clustering Based on the Mahalanobis Distance Measure
    Hu, Zhengbing
    Tyshchenko, Oleksii K.
    [J]. ADVANCES IN INTELLIGENT SYSTEMS, COMPUTER SCIENCE AND DIGITAL ECONOMICS, 2020, 1127 : 364 - 374
  • [10] Wind farm monitoring using Mahalanobis distance and fuzzy clustering
    de la Hermosa Gonzalez-Carrato, Raul Ruiz
    [J]. RENEWABLE ENERGY, 2018, 123 : 526 - 540