Localized Centering: Reducing Hubness in Large-Sample Data

被引:0
|
作者
Hara, Kazuo [1 ]
Suzuki, Ikumi [1 ]
Shimbo, Masashi [2 ]
Kobayashi, Kei [3 ]
Fukumizu, Kenji [3 ]
Radovanovic, Milos [4 ]
机构
[1] Natl Inst Genet, Mishima, Shizuoka, Japan
[2] Nara Inst Sci & Technol, Nara, Japan
[3] Inst Stat Math, Tachikawa, Tokyo, Japan
[4] Univ Novi Sad, Novi Sad, Serbia
关键词
HUBS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hubness has been recently identified as a problematic phenomenon occurring in high-dimensional space. In this paper, we address a different type of hubness that occurs when the number of samples is large. We investigate the difference between the hubness in high dimensional data and the one in large-sample data. One finding is that centering, which is known to reduce the former, does not work for the latter. We then propose a new hub-reduction method, called localized centering. It is an extension of centering, yet works effectively for both types of hubness. Using real-world datasets consisting of a large number of documents, we demonstrate that the proposed method improves the accuracy of k-nearest neighbor classification.
引用
收藏
页码:2645 / 2651
页数:7
相关论文
共 50 条