Hierarchy-regularized latent semantic indexing

被引:0
|
作者
Huang, Y [1 ]
Yu, K [1 ]
Schubert, M [1 ]
Yu, SP [1 ]
Tresp, V [1 ]
Kriegel, HP [1 ]
机构
[1] Univ Munich, Inst Comp Sci, D-80539 Munich, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge management. Beside textual features, the hierarchical structure of directories reflect additional and important knowledge annotated by experts. It is generally desired to incorporate this information into text mining processes. In this paper we propose hierarchy-regularized latent semantic indexing, which encodes the hierarchy into a similarity graph of documents and then formulates an optimization problem mapping each document into a low dimensional vector space. The new feature space preserves the intrinsic structure of the original taxonomy and thus provides a meaningful basis for various learning tasks like visualization and classification. Our approach employs the information about class proximity and class specificity, and can naturally cope with multi-labeled documents. Our empirical studies show very encouraging results on two real-world data sets, the new Reuters (RCV1) benchmark and the Swissprot protein database.
引用
收藏
页码:178 / 185
页数:8
相关论文
共 50 条
  • [1] Regularized Latent Semantic Indexing
    Wang, Quan
    Xu, Jun
    Li, Hang
    Craswell, Nick
    [J]. PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 685 - 694
  • [2] Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
    Wang, Quan
    Xu, Jun
    Li, Hang
    Craswell, Nick
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (01)
  • [3] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57
  • [4] INDEXING BY LATENT SEMANTIC ANALYSIS
    DEERWESTER, S
    DUMAIS, ST
    FURNAS, GW
    LANDAUER, TK
    HARSHMAN, R
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1990, 41 (06): : 391 - 407
  • [5] Automated Chinese Essay Scoring From Topic Perspective Using Regularized Latent Semantic Indexing
    Hao, Shudong
    Xu, Yanyan
    Peng, Hengli
    Su, Kaile
    Ke, Dengfeng
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3092 - 3097
  • [6] Latent semantic indexing: A probabilistic analysis
    Papadimitriou, CH
    Raghavan, P
    Tamaki, H
    Vempala, S
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2000, 61 (02) : 217 - 235
  • [7] On updating problems in latent semantic indexing
    Zha, HY
    Simon, HD
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1999, 21 (02): : 782 - 791
  • [8] A probabilistic model for Latent Semantic Indexing
    Ding, CHQ
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (06): : 597 - 608
  • [9] Text segmentation by latent semantic indexing
    Ishioka, T
    [J]. NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [10] Matrix Factorization in Latent Semantic Indexing
    Ng, Wei Shean
    Tang, Wen Kai Adrian
    [J]. 2ND SEA-STEM INTERNATIONAL CONFERENCE 2021, 2021, : 136 - 139