Fast extraction of semantic features from a latent semantic indexed text corpus

被引:2
|
作者
Kabán, A [1 ]
Girolami, MA [1 ]
机构
[1] Aalto Univ, Lab Comp & Informat Sci, FIN-02015 Espoo, Finland
关键词
latent semantic indexing; probabilistic latent semantic analysis; projection pursuit; semantic feature extraction; text analysis;
D O I
10.1023/A:1013801028884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.
引用
收藏
页码:31 / 34
页数:4
相关论文
共 50 条
  • [11] Evaluation of text semantic features using latent dirichlet allocation model
    Zhou C.
    Li N.
    Zhang C.
    Yang X.
    International Journal of Performability Engineering, 2020, 16 (06) : 968 - 978
  • [12] Text classification using genetic algorithm oriented latent semantic features
    Uysal, Alper Kursat
    Gunal, Serkan
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5938 - 5947
  • [13] Text segmentation by latent semantic indexing
    Ishioka, T
    NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [14] Latent semantic analysis for text segmentation
    Choi, FYY
    Wiemer-Hastings, P
    Moore, J
    PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, : 109 - 117
  • [15] SENT: semantic features in text
    Vazquez, Miguel
    Carmona-Saez, Pedro
    Nogales-Cadenas, Ruben
    Chagoyen, Monica
    Tirado, Francisco
    Maria Carazo, Jose
    Pascual-Montano, Alberto
    NUCLEIC ACIDS RESEARCH, 2009, 37 : W153 - W159
  • [16] A Semantic Framework for Extracting Taxonomic Relations from Text Corpus
    Phuoc Thi Hong Doan
    Arch-int, Ngamnij
    Arch-int, Somjit
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (03) : 325 - 337
  • [17] Extraction of Semantic Features from Transaction Dialogues
    Mustapha, Aida
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 348 - 359
  • [18] Automatic Extraction of Semantic Relations from Text Documents
    Ta, Chien D. C.
    Tuoi Phan Thi
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2016, 2016, 10018 : 344 - 351
  • [19] Information Extraction from Text Based on Semantic Inferentialism
    Pinheiro, Vladia
    Pequeno, Tarcisio
    Furtado, Vasco
    Nogueira, Douglas
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 333 - 344
  • [20] Interoperability of text corpus annotations with the semantic web
    Karin Verspoor
    Jin-Dong Kim
    Michel Dumontier
    BMC Proceedings, 9 (Suppl 5)