Similarity Learning for High-Dimensional Sparse Data

被引:0
|
作者
Liu, Kuan [1 ]
Bellet, Aurelien [2 ]
Sha, Fei [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] Telecom ParisTech, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.
引用
收藏
页码:653 / 662
页数:10
相关论文
共 50 条
  • [1] PCA learning for sparse high-dimensional data
    Hoyle, DC
    Rattray, M
    [J]. EUROPHYSICS LETTERS, 2003, 62 (01): : 117 - 123
  • [2] Group Learning for High-Dimensional Sparse Data
    Cherkassky, Vladimir
    Chen, Hsiang-Han
    Shiao, Han-Tai
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [3] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222
  • [4] Sparse Learning of the Disease Severity Score for High-Dimensional Data
    Stojkovic, Ivan
    Obradovic, Zoran
    [J]. COMPLEXITY, 2017,
  • [5] On the challenges of learning with inference networks on sparse, high-dimensional data
    Krishnan, Rahul G.
    Liang, Dawen
    Hoffman, Matthew D.
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [6] High-dimensional Similarity Learning via Dual-sparse Random Projection
    Yao, Dezhong
    Zhao, Peilin
    Tuan-Anh Nguyen Pham
    Cong, Gao
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3005 - 3011
  • [7] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
  • [8] Interpolation of sparse high-dimensional data
    Thomas C. H. Lux
    Layne T. Watson
    Tyler H. Chang
    Yili Hong
    Kirk Cameron
    [J]. Numerical Algorithms, 2021, 88 : 281 - 313
  • [9] Interpolation of sparse high-dimensional data
    Lux, Thomas C. H.
    Watson, Layne T.
    Chang, Tyler H.
    Hong, Yili
    Cameron, Kirk
    [J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
  • [10] XDL: An Industrial Deep Learning Framework for High-dimensional Sparse Data
    Jiang, Biye
    Deng, Chao
    Yi, Huimin
    Hu, Zelin
    Zhou, Guorui
    Zheng, Yang
    Huang, Sui
    Guo, Xinyang
    Wang, Dongyue
    Song, Yue
    Zhao, Liqin
    Wang, Zhi
    Sun, Peng
    Zhang, Yu
    Zhang, Di
    Li, Jinhui
    Xu, Jian
    Zhu, Xiaoqiang
    Gai, Kun
    [J]. 1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,