Large-scale information retrieval with latent semantic indexing

被引:83
|
作者
Letsche, TA
Berry, MW
机构
[1] Department of Computer Science, University of Tennessee, Knoxville
基金
美国国家科学基金会;
关键词
D O I
10.1016/S0020-0255(97)00044-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the amount of electronic information increases, traditional lexical (or Boolean) information retrieval techniques will become less useful. Large, heterogeneous collections will be difficult to search since the sheer volume of unranked documents returned in response to a query will overwhelm the user. Vector-space approaches to information retrieval, on the other hand, allow the user to search for concepts rather than specific words, and rank the results of the search according to their relative similarity to the query. One vector-space approach, Latent Semantic Indexing (LSI), has achieved up to 30% better retrieval performance than lexical searching techniques by employing a reduced-rank model of the term-document space. However, the original implementation of LSI lacked the execution efficiency required to make LSI useful for large data sets. A new implementation of LSI, LSI++, seeks to make LSI efficient, extensible, portable, and maintainable. The LSI++ Application Programming Interface (API) allows applications to immediately use LSI without knowing the implementation details of the underlying system. LSI++ supports both serial and distributed searching of large data sets, providing the same programming interface regardless of the implementation actually executing. In addition, a World Wide Web interface was created to allow simple, intuitive searching of document collections using LSI++. Timing results indicate that the serial implementation of LSI++ searches up to six times faster than the original implementation of LSI, while the parallel implementation searches nearly 180 times faster on large document collections. (C) Elsevier Science Inc. 1997.
引用
收藏
页码:105 / 137
页数:33
相关论文
共 50 条
  • [1] A Fast Approximate Algorithm for Large-Scale Latent Semantic Indexing
    Zhang, Dell
    Zhu, Zheng
    [J]. 2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 639 - 644
  • [2] Analyzing large-scale proteomics projects with latent semantic indexing
    Klie, Sebastian
    Martens, Lennart
    Vizcaino, Juan Antonio
    Cote, Richard
    Jones, Phil
    Apweiler, Rolf
    Hinneburg, Alexander
    Hermjakob, Henning
    [J]. JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) : 182 - 191
  • [3] Personal information retrieval based on latent semantic indexing
    Yang, Z
    Deng, GS
    [J]. PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 287 - 291
  • [4] IMPROVING INFORMATION-RETRIEVAL WITH LATENT SEMANTIC INDEXING
    DEERWESTER, S
    DUMAIS, S
    LANDAUER, T
    FURNAS, G
    BECK, L
    [J]. PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1988, 25 : 36 - 40
  • [5] Using latent semantic indexing for multilanguage information retrieval
    Berry, MW
    Young, PG
    [J]. COMPUTERS AND THE HUMANITIES, 1995, 29 (06): : 413 - 429
  • [6] Semantic overlay network for large-scale spatial information indexing
    Zou, Zhiqiang
    Wang, Yue
    Cao, Kai
    Qu, Tianshan
    Wang, Zhongmin
    [J]. COMPUTERS & GEOSCIENCES, 2013, 57 : 208 - 217
  • [7] Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
    Wang, Quan
    Xu, Jun
    Li, Hang
    Craswell, Nick
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (01)
  • [8] Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud
    Hong, Richang
    Li, Lei
    Cai, Junjie
    Tao, Dapeng
    Wang, Meng
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (09) : 4128 - 4138
  • [9] A probabilistic model for latent semantic indexing in information retrieval and filtering
    Ding, CHQ
    [J]. COMPUTATIONAL INFORMATION RETRIEVAL, 2001, : 65 - 73
  • [10] Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval
    Srinivas, S.
    AswaniKumar, Ch
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2006, 5 (02) : 97 - 105