A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

被引:0
|
作者
Liu, Yang [1 ]
Li, Maozhen [2 ,3 ]
Khan, Mukhtaj [2 ]
Qi, Man [4 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
[4] Canterbury Christ Church Univ, Dept Comp, Canterbury CT1 1QU, Kent, England
关键词
Information retrieval; latent semantic indexing; Map Reduce; load balancing; genetic algorithms;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
引用
下载
收藏
页码:259 / 280
页数:22
相关论文
共 50 条
  • [21] A distributed information retrieval manner based on the statistic information for ubiquitous services
    Tsuchiya, Takeshi
    Sawano, Hiroaki
    Lihan, Marc
    Yoshinaga, Hirokazu
    Koyanagi, Keiichi
    Progress in Informatics, 2009, (06): : 63 - 77
  • [22] A Scalable XSLT Processing Framework based on MapReduce
    Li, Ren
    Luo, Jianhua
    Yang, Dan
    Hu, Haibo
    Chen, Ling
    JOURNAL OF COMPUTERS, 2013, 8 (09) : 2175 - 2181
  • [23] Scalable Recommender System based on MapReduce Framework
    Rohit
    Singh, Anil Kumar
    2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 2892 - 2895
  • [24] Amadeus: A scalable HMM-based audio information retrieval system
    Battle, E
    Masip, J
    Guaus, E
    ISCCSP : 2004 FIRST INTERNATIONAL SYMPOSIUM ON CONTROL, COMMUNICATIONS AND SIGNAL PROCESSING, 2004, : 731 - 734
  • [25] Methodologies for distributed information retrieval
    de Kretser, O
    Moffat, A
    Shimmin, T
    Zobel, J
    18TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 1998, : 66 - 73
  • [26] COMPARISON OF VSM, GVSM, AND LSI IN INFORMATION RETRIEVAL FOR INDONESIAN TEXT
    Pardede, Jasman
    Husada, Milda Gustiana
    JURNAL TEKNOLOGI, 2016, 78 (5-6): : 51 - 56
  • [27] Distributed Video Transcoding Based on MapReduce
    Song, Chenwei
    Shen, Wenfeng
    Sun, Lianqiang
    Lei, Zhou
    Xu, Weimin
    2014 IEEE/ACIS 13TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2014, : 303 - 308
  • [28] Metadata harvesting for content-based distributed information retrieval
    Simeoni, Fabio
    Yakici, Murat
    Neely, Steve
    Crestani, Fabio
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (01): : 12 - 24
  • [29] Z39.50-based distributed information retrieval in WWW
    Ding, F.
    Ma, F.Y.
    2001, Shanghai Computer Society (27):
  • [30] ON A MODEL OF DISTRIBUTED INFORMATION-RETRIEVAL SYSTEMS BASED ON THESAURI
    MAZUR, Z
    INFORMATION PROCESSING & MANAGEMENT, 1984, 20 (04) : 499 - 505