A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

被引:0
|
作者
Liu, Yang [1 ]
Li, Maozhen [2 ,3 ]
Khan, Mukhtaj [2 ]
Qi, Man [4 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
[4] Canterbury Christ Church Univ, Dept Comp, Canterbury CT1 1QU, Kent, England
关键词
Information retrieval; latent semantic indexing; Map Reduce; load balancing; genetic algorithms;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
引用
下载
收藏
页码:259 / 280
页数:22
相关论文
共 50 条
  • [1] Scalable Distributed Information Retrieval Model Based on Topic Map and Mobil Agent
    Xia, Li-xin
    Wang, Zhong-yi
    Chen, Chen
    2008 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE AND EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2008, : 454 - 459
  • [2] Scalable Distributed Reasoning Using MapReduce
    Urbani, Jacopo
    Kotoulas, Spyros
    Oren, Eyal
    van Harmelen, Frank
    SEMANTIC WEB - ISWC 2009, PROCEEDINGS, 2009, 5823 : 634 - 649
  • [3] A MapReduce-Based Distributed SVM for Scalable Data Type Classification
    Jiang, Chong
    Wu, Ting
    Xu, Jian
    Zheng, Ning
    Xu, Ming
    Yang, Tao
    COLLABORATE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2016, 2017, 201 : 115 - 126
  • [4] MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages
    Srinivasa, K. G.
    Muppalla, Anil Kumar
    Varun, Bharghava A.
    Amulya, M.
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2011, 1 (04) : 23 - 37
  • [5] A Distributed Polygon Retrieval Algorithm using MapReduce
    Guo, Q.
    Palanisamy, B.
    Karimi, H. A.
    ISPRS INTERNATIONAL WORKSHOP ON SPATIOTEMPORAL COMPUTING, 2015, : 51 - 53
  • [6] A Distributed Polygon Retrieval Algorithm using MapReduce
    Guo, Qiulei
    Palanisamy, Balaji
    Karimi, Hassan A.
    2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 435 - 436
  • [7] A MapReduce-based distributed and scalable framework for stitching of satellite mosaic images
    Eken S.
    Sayar A.
    Arabian Journal of Geosciences, 2021, 14 (18)
  • [8] A MapReduce-based distributed SVM ensemble for scalable image classification and annotation
    Alham, Nasullah Khalid
    Li, Maozhen
    Liu, Yang
    Qi, Man
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (10) : 1920 - 1934
  • [9] Scalable Distributed RDFS Reasoning Using MapReduce and Bigtable
    Shi Huijun
    Rao Ruonan
    INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2012), 2013, 8768
  • [10] Multimedia Video Information Retrieval Based on MapReduce under Cloud Computing
    Zhang Jing-zhai
    Qiao Xiang-Dong
    Zhang Peng-zhou
    INFORMATION COMPUTING AND APPLICATIONS, ICICA 2013, PT I, 2013, 391 : 436 - +