A Local Latent Semantic Analysis-based Kernel for Document Similarities

被引:0
|
作者
Aseervatham, Sujeevan [1 ]
机构
[1] Univ Paris 13, CNRS, LIPN, UMR 7030, F-93430 Villetaneuse, France
关键词
D O I
10.1109/IJCNN.2008.4633792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The document similarity measure is a key point in textual data processing. It is the main responsible of the performance of a processing system. Since a decade, kernels are used as similarity functions within inner-product based algorithms such as the SVM for NLP problems and especially for text categorization. In this paper, we present a semantic space constructed from latent concepts. The concepts are extracted using the Latent Semantic Analysis (LSA). To take into account of the specificity of each document category, we use the local LSA to define the global semantic space. Furthermore, we propose a weighted semantic kernel for the global space. The experimental results of the kernel, on text categorization tasks, show that this kernel performs better than global LSA kernels and especially for small LSA dimensions.
引用
收藏
页码:214 / 219
页数:6
相关论文
共 50 条
  • [21] A comparison of latent semantic analysis and correspondence analysis of document-term matrices
    Qi, Qianqian
    Hessen, David J.
    Deoskar, Tejaswini
    van der Heijden, Peter G. M.
    NATURAL LANGUAGE ENGINEERING, 2024, 30 (04) : 722 - 752
  • [22] Local and Global Latent Semantic Analysis for Text Categorization
    Ghanem, Khadoudja
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2014, 4 (03) : 1 - 13
  • [23] Latent Semantic Analysis Boosted Convolutional Neural Networks for Document Classification
    Gultepe, Eren
    Kamkarhaghighi, Mehran
    Makrehchi, Masoud
    2018 5TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC, AND SOCIO-CULTURAL COMPUTING (BESC), 2018, : 93 - 98
  • [24] Web document classification based on rough set latent semantic indexing
    He, Ming
    Feng, Boqin
    Fu, Xianghua
    Jisuanji Gongcheng/Computer Engineering, 2004, 30 (13):
  • [25] Research on multi-document summarization based on latent semantic indexing
    Qin, Bing
    Liu, Ting
    Zhang, Yu
    Li, Sheng
    Journal of Harbin Institute of Technology (New Series), 2005, 12 (01) : 91 - 94
  • [26] A Latent Semantic Indexing-based approach to multilingual document clustering
    Wei, Chih-Ping
    Yang, Christopher C.
    Lin, Chia-Min
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 606 - 620
  • [27] Linear Discriminant Analysis-based Random Features for Kernel Machines
    Liu, Xueyi
    Zhao, Min
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 312 - 315
  • [28] Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization
    Wan, Yuan
    Tong, Hengqing
    Deng, Yanfang
    APPLIED INFORMATICS AND COMMUNICATION, PT III, 2011, 226 : 321 - 329
  • [29] Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization
    Wan, Yuan
    Tong, Hengqing
    Deng, Yanfang
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 168 - 171
  • [30] Supervised latent semantic indexing for document categorization
    Sun, JT
    Chen, Z
    Zeng, HJ
    Lu, YC
    Shi, CY
    Ma, WY
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 535 - 538