Similarity Learning for High-Dimensional Sparse Data

被引：0

作者：

Liu, Kuan ^{[1
]}

Bellet, Aurelien ^{[2
]}

Sha, Fei ^{[1
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90007 USA

[2] Telecom ParisTech, Paris, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38 | 2015年 / 38卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.

引用

页码：653 / 662

页数：10

共 50 条

[1] PCA learning for sparse high-dimensional data
Hoyle, DC
Rattray, M
[J]. EUROPHYSICS LETTERS, 2003, 62 (01): : 117 - 123
[2] Group Learning for High-Dimensional Sparse Data
Cherkassky, Vladimir
Chen, Hsiang-Han
Shiao, Han-Tai
[J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[3] Efficient Sparse Representation for Learning With High-Dimensional Data
Chen, Jie
Yang, Shengxiang
Wang, Zhu
Mao, Hua
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222
[4] Sparse Learning of the Disease Severity Score for High-Dimensional Data
Stojkovic, Ivan
Obradovic, Zoran
[J]. COMPLEXITY, 2017,
[5] On the challenges of learning with inference networks on sparse, high-dimensional data
Krishnan, Rahul G.
Liang, Dawen
Hoffman, Matthew D.
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[6] High-dimensional Similarity Learning via Dual-sparse Random Projection
Yao, Dezhong
Zhao, Peilin
Tuan-Anh Nguyen Pham
Cong, Gao
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3005 - 3011
[7] On the anonymization of sparse high-dimensional data
Ghinita, Gabriel
Tao, Yufei
Kalnis, Panos
[J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
[8] Interpolation of sparse high-dimensional data
Thomas C. H. Lux
Layne T. Watson
Tyler H. Chang
Yili Hong
Kirk Cameron
[J]. Numerical Algorithms, 2021, 88 : 281 - 313
[9] Interpolation of sparse high-dimensional data
Lux, Thomas C. H.
Watson, Layne T.
Chang, Tyler H.
Hong, Yili
Cameron, Kirk
[J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
[10] XDL: An Industrial Deep Learning Framework for High-dimensional Sparse Data
Jiang, Biye
Deng, Chao
Yi, Huimin
Hu, Zelin
Zhou, Guorui
Zheng, Yang
Huang, Sui
Guo, Xinyang
Wang, Dongyue
Song, Yue
Zhao, Liqin
Wang, Zhi
Sun, Peng
Zhang, Yu
Zhang, Di
Li, Jinhui
Xu, Jian
Zhu, Xiaoqiang
Gai, Kun
[J]. 1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,

← 1 2 3 4 5 →