A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

被引:0
|
作者
Liu, Yang [1 ]
Li, Maozhen [2 ,3 ]
Khan, Mukhtaj [2 ]
Qi, Man [4 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
[4] Canterbury Christ Church Univ, Dept Comp, Canterbury CT1 1QU, Kent, England
关键词
Information retrieval; latent semantic indexing; Map Reduce; load balancing; genetic algorithms;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
引用
收藏
页码:259 / 280
页数:22
相关论文
共 50 条
  • [41] SEMI: A Scalable Entity Matching System Based on MapReduce
    Chao, Pingfu
    Li, Yuming
    Gao, Zhu
    Fang, Junhua
    He, Xiaofeng
    Zhang, Rong
    DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 328 - 332
  • [42] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Rivault, Sebastien
    Bamha, Mostafa
    Limet, Sebastien
    Robert, Sophie
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2022, 50 (3-4) : 360 - 380
  • [43] Scalable Multi-agent Simulation Based on MapReduce
    Ahlbrecht, Tobias
    Dix, Juergen
    Fiekas, Niklas
    MULTI-AGENT SYSTEMS AND AGREEMENT TECHNOLOGIES, EUMAS 2016, 2017, 10207 : 364 - 371
  • [44] Soft approaches to distributed information retrieval
    Pasi, G. (gabriella.pasi@itc.cnr.it), 1600, Elsevier Inc. (34): : 2 - 3
  • [45] Semantic Information Retrieval in a Distributed Environment
    Iqbal, Ahmad Ali
    Ott, Maximilan
    Seneviratne, Aruna
    2009 6TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1 AND 2, 2009, : 786 - +
  • [46] Soft approaches to distributed information retrieval
    Bordogna, G
    Pasi, G
    Yager, RR
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2003, 34 (2-3) : 105 - 120
  • [47] DISTRIBUTED INFORMATION RETRIEVAL: DEVELOPMENTS AND STRATEGIES
    Ghansah, Benjamin
    Wu, Shengli
    INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH IN AFRICA, 2015, 16 (110-144) : 110 - 144
  • [48] The Collaborative relevance in the distributed information retrieval
    Enaanai, Adil
    Doukkali, Aziz Sdigui
    Saif, Ichrak
    Moutachaouik, Hicham
    Hain, Mustapha
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [49] An application framework for distributed information retrieval
    Simeoni, Fabio
    Azzopardi, Leif
    Crestani, Fabio
    DIGITAL LIBRARIES: ACHIEVEMENTS, CHALLENGES AND OPPORTUNITIES, PROCEEDINGS, 2006, 4312 : 192 - +
  • [50] An Islamic Distributed Information Retrieval Approach
    Al-akashi, Falah Hassan Ali
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2020, 12 (03): : 38 - 54