A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

被引:0
|
作者
Liu, Yang [1 ]
Li, Maozhen [2 ,3 ]
Khan, Mukhtaj [2 ]
Qi, Man [4 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
[4] Canterbury Christ Church Univ, Dept Comp, Canterbury CT1 1QU, Kent, England
关键词
Information retrieval; latent semantic indexing; Map Reduce; load balancing; genetic algorithms;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
引用
下载
收藏
页码:259 / 280
页数:22
相关论文
共 50 条
  • [31] A Web Services-Based Distributed Information Retrieval Model
    Meng, Jian
    Yan, Zhao
    Li, Ji
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 12306 - 12309
  • [32] Distributed information retrieval based on hierarchical semantic overlay network
    Liu, F
    Ma, FY
    Li, ML
    Huang, LP
    GRID AND COOPERATIVE COMPUTING GCC 2004, PROCEEDINGS, 2004, 3251 : 657 - 664
  • [33] A scalable distributed information management system
    Yalagandula, P
    Dahlin, M
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2004, 34 (04) : 379 - 390
  • [34] Applied-information Technology with Distributed Text Feature Extraction Method Based on MapReduce
    Chen, Lu
    Zhang, Tao
    Ma, Yuanyuan
    Zhou, Cheng
    ADVANCED DEVELOPMENT OF ENGINEERING SCIENCE IV, 2014, 1046 : 444 - 448
  • [35] Scalable image retrieval from distributed images database
    Tillo, Tammum
    Grangetto, Marco
    Olmo, Gabriella
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1789 - 1792
  • [36] Scalable distributed genetic algorithm for data ordering problem with inversion using mapreduce
    Logofatu, Doina
    Stamate, Daniel
    IFIP Advances in Information and Communication Technology, 2014, 436 : 325 - 334
  • [37] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Sébastien Rivault
    Mostafa Bamha
    Sébastien Limet
    Sophie Robert
    International Journal of Parallel Programming, 2022, 50 : 360 - 380
  • [38] A Scalable Real-Time Agent-Based Information Retrieval Engine
    Al-Akashi, Falah
    Inkpen, Diana
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [39] The Scalable Distributed Two-layer Content Based Image Retrieval Data Store
    Deniziak, Stanislaw
    Michno, Tomasz
    Krechowicz, Adam
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 827 - 832
  • [40] Load Balancing in MapReduce Based on Scalable Cardinality Estimates
    Gufler, Benjamin
    Augsten, Nikolaus
    Reiser, Angelika
    Kemper, Alfons
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 522 - 533