A scalable and efficient probabilistic information retrieval and text mining system

被引:0
|
作者
Stensmo, M [1 ]
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A system for probabilistic information retrieval and text mining that is both scalable and efficient is presented. Separate feature extraction or stop-word lists are not needed since the system can remove unneeded paxameters dynamically based on a local mutual information measure. This is shown to be as effective as using a global measure. A novel way of storing system parameters eliminates the need for a ranking step during information retrieval from queries. Probability models over word contexts provide a method to suggest related words that can be added to a query. Test results are presented on a categorization task and screen shots from a live system axe shown to demonstrate its capabilities.
引用
收藏
页码:643 / 648
页数:6
相关论文
共 50 条
  • [1] Text mining and information retrieval
    Forest, Dominic
    Da Sylva, Lyne
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2011, 35 (03): : 217 - 227
  • [2] Text mining research based on intelligent computing in information retrieval system
    Li, Yong
    [J]. Telkomnika (Telecommunication Computing Electronics and Control), 2015, 13 (04) : 1384 - 1389
  • [3] Information Retrieval and Text Mining Technologies for Chemistry
    Krallinger, Martin
    Rabal, Obdulia
    Lourenco, Analia
    Oyarzabal, Julen
    Valencia, Alfonso
    [J]. CHEMICAL REVIEWS, 2017, 117 (12) : 7673 - 7761
  • [4] Text Analyzer for Efficient Information Retrieval
    Palaniappan, Sellappan
    Shing, Looi Siang
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (11): : 199 - 207
  • [5] Efficient storage and retrieval of probabilistic latent semantic information for information retrieval
    Laurence A. F. Park
    Kotagiri Ramamohanarao
    [J]. The VLDB Journal, 2009, 18 : 141 - 155
  • [6] Efficient storage and retrieval of probabilistic latent semantic information for information retrieval
    Park, Laurence A. F.
    Ramamohanarao, Kotagiri
    [J]. VLDB JOURNAL, 2009, 18 (01): : 141 - 155
  • [7] A Probabilistic Model for Information Retrieval by Mining User Behaviors
    Cai, Fei
    Chen, Honghui
    [J]. COGNITIVE COMPUTATION, 2016, 8 (03) : 494 - 504
  • [8] A Probabilistic Model for Information Retrieval by Mining User Behaviors
    Fei Cai
    Honghui Chen
    [J]. Cognitive Computation, 2016, 8 : 494 - 504
  • [9] GraphRep: Boosting Text Mining, NLP and Information Retrieval with Graphs
    Vazirgiannis, Michalis
    Malliaros, Fragkiskos D.
    Nikolentzos, Giannis
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 2295 - 2296
  • [10] Extraction of keyterms by simple text mining for business information retrieval
    Gao, XZ
    Murugesan, S
    Lo, B
    [J]. ICEBE 2005: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2005, : 332 - 339