Asymptotic Evaluation of Distance Measure on High Dimensional Vector Spaces in Text Mining

被引:0
|
作者
Goto, Masayuki [1 ]
Ishida, Takashi [2 ]
Suzuki, Makoto [3 ]
Hirasawa, Shigeichi [2 ]
机构
[1] Musashi Inst Technol, Fac Environm & Informat Studies, Tsuzuki Ku, Kanagawa 2240015, Japan
[2] Waseda Univ, Sch Creat Sci & Engn, Tokyo 1690015, Japan
[3] Shonan Inst Technol, Fac Engn, Fujisawa, Kanagawa 2518511, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics axe applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents axe analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.
引用
收藏
页码:439 / +
页数:3
相关论文
共 50 条
  • [1] Statistical evaluation of measure and distance on document classification problems in text mining
    Goto, Masayuki
    Ishida, Takashi
    Hirasawa, Shigeichi
    [J]. 2007 CIT: 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 674 - +
  • [2] A Fuzzy Indiscernibility Based Measure of Distance between Semantic Spaces Towards Automatic Evaluation of Free Text Answers
    Chakraborty, Udit Kr.
    Roy, Samir
    Choudhury, Sankhayan
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (06) : 987 - 1004
  • [3] Distance Functions in Some Class of Infinite Dimensional Vector Spaces
    Anne, Bator
    Briec, Walter
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 201 (02) : 899 - 931
  • [4] Effectiveness of the Euclidean distance in high dimensional spaces
    Xia, Shuyin
    Xiong, Zhongyang
    Luo, Yueguo
    Xu, Wei
    Zhang, Guanghua
    [J]. OPTIK, 2015, 126 (24): : 5614 - 5619
  • [5] Mining interlacing manifolds in high dimensional spaces
    Ban, Tao
    Zhang, Changshui
    Abe, Shigeo
    Takahashi, Takeshi
    Kadobayashi, Youki
    [J]. Proceedings of the ACM Symposium on Applied Computing, 2011, : 942 - 949
  • [6] Mining Projected Clusters in High-Dimensional Spaces
    Bouguessa, Mohamed
    Wang, Shengrui
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (04) : 507 - 522
  • [7] ASYMPTOTIC DISTRIBUTIONS OF HIGH-DIMENSIONAL DISTANCE CORRELATION INFERENCE
    Gao, Lan
    Fan, Yingying
    Lv, Jinchi
    Shao, Qi-Man
    [J]. ANNALS OF STATISTICS, 2021, 49 (04): : 1999 - 2020
  • [8] Earth Mover Distance over High-Dimensional Spaces
    Andoni, Alexandr
    Indyk, Piotr
    Krauthgamer, Robert
    [J]. PROCEEDINGS OF THE NINETEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2008, : 343 - +
  • [9] Vector-distance and neighborhood development for high dimensional data
    Ling, Ping
    Rong, Xiangsheng
    You, Xiangyang
    Xu, Ming
    [J]. Journal of Software, 2012, 7 (12) : 2832 - 2839
  • [10] Asymptotic distribution of the maximum interpoint distance for high-dimensional data
    Tang, Ping
    Lu, Rongrong
    Xie, Junshan
    [J]. STATISTICS & PROBABILITY LETTERS, 2022, 190