Statistical evaluation of measure and distance on document classification problems in text mining

被引:3
|
作者
Goto, Masayuki [1 ]
Ishida, Takashi [2 ]
Hirasawa, Shigeichi [2 ]
机构
[1] Musashi Inst Technol, Fac Environm & Informat Studies, Tsuzuki Ku, Yokohama, Kanagawa 2240015, Japan
[2] Waseda Univ, Sch Creat Sci & Engn, Shinjyuku Ku, Tokyo 1698555, Japan
关键词
D O I
10.1109/CIT.2007.171
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.
引用
收藏
页码:674 / +
页数:3
相关论文
共 50 条
  • [1] A New Similarity Measure for Document Classification and Text Mining
    Eminagaoglu, Mete
    Goksen, Yilmaz
    [J]. ECONOMIES OF THE BALKAN AND EASTERN EUROPEAN COUNTRIES, 2020, : 353 - 366
  • [2] Hybrid Text Mining Model for Document Classification
    Vidhya, K. A.
    Aghila, G.
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 210 - 214
  • [3] Asymptotic Evaluation of Distance Measure on High Dimensional Vector Spaces in Text Mining
    Goto, Masayuki
    Ishida, Takashi
    Suzuki, Makoto
    Hirasawa, Shigeichi
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS, VOLS 1-3, 2008, : 439 - +
  • [4] The Problems and Methods of Automatic Text Document Classification
    Yatsko, V. A.
    [J]. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (06) : 274 - 285
  • [5] The Problems and Methods of Automatic Text Document Classification
    V. A. Yatsko
    [J]. Automatic Documentation and Mathematical Linguistics, 2021, 55 : 274 - 285
  • [6] Pattern Document Weight Discovery For Text Classification Mining
    Brindha, S.
    Prabha, K.
    Sukumaran, S.
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 651 - 655
  • [7] Distance Weighted Cosine Similarity Measure for Text Classification
    Li, Baoli
    Han, Liping
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2013, 2013, 8206 : 611 - 618
  • [8] Document classification using a deep neural network in text mining
    Lee, Bo-Hui
    Lee, Su-Jin
    Choi, Yong-Seok
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2020, 33 (05) : 615 - 625
  • [9] Speculative text mining for document-level sentiment classification
    Wen, Jiahui
    Zhang, Guangda
    Zhang, Hongyun
    Yin, Wei
    Ma, Jingwei
    [J]. NEUROCOMPUTING, 2020, 412 (412) : 52 - 62
  • [10] DISTANCE MEASURE FOR AUTOMATIC DOCUMENT CLASSIFICATION BY SEQUENTIAL-ANALYSIS
    KAR, G
    WHITE, LJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1978, 14 (02) : 57 - 69