Efficient Information Retrieval using Lucene, LIndex and HIndex in Hadoop

被引:0
|
作者
Mathew, Anita Brigit [1 ]
Pattnaik, Priyabrat [1 ]
Kumar, S. D. Madhu [1 ]
机构
[1] NIT Calicut, Dept CSE, Calicut, Kerala, India
关键词
Hadoop; MapReduce; Complete-text indexing; Lucene; LIndex; HIndex; MAPREDUCE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The growth of unstructured and partially-structured data in biological networks, social media, geographical information and other web-based applications present an open challenge to the cloud database community. Hence, the approach to exhaustive BigData analysis that integrates structured and unstructured data processing have become increasingly critical in today's world. MapReduce, has recently emerged as a popular framework for extensive data analytics. Use of powerful indexing techniques would allow users to significantly speed up query processing among MapReduce jobs. Currently, there are a number of indexing techniques like Hadoop++, HAIL, LIAH, Adaptive Indexing etc., but none of them provide an optimized technique for text based selection operations. This paper proposes two indexing approaches in HDFS, namely LIndex and HIndex. These indexing approaches are found to carefully perform selection operation better compared to existing Lucene index approach. A fast retrieval technique is suggested in the MapReduce framework with the new LIndex and HIndex approaches. LIndex provides a complete-text index and it informs the Hadoop implementation engine to scan only those data blocks which contain the terms of interest. LIndex also enhances the throughput (minimizes response time) and overcome some of the drawbacks like upfront cost and long idle time for index creation. This gave a better performance than Lucene but lacked in response and computation time. Hence a new index named HIndex is suggested. This scheme is found to perform better than LIndex in response and computation time.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 50 条
  • [1] A Content-based Image Retrieval System Based on Hadoop and Lucene
    Gu, Chunhao
    Gao, Yang
    [J]. SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 684 - 687
  • [2] Information Retrieval Using Hadoop Big Data Analysis
    Motwani, Deepak
    Madan, Madan Lal
    [J]. ADVANCES IN OPTICAL SCIENCE AND ENGINEERING, 2015, 166 : 409 - 415
  • [3] Information Retrieval Services Based on Lucene Architecture
    Li, Hang
    Li, Wanlong
    Wang, Guochun
    Peng, Xinyi
    [J]. INFORMATION COMPUTING AND APPLICATIONS, PT 1, 2012, 307 : 638 - 645
  • [4] The Study on Lucene Based IETM Information Retrieval
    Wu, Jiaju
    Liu, Zhenji
    Zhu, Xinglin
    Yu, Rong
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, INFORMATION MANAGEMENT AND NETWORK SECURITY, 2016, 47 : 221 - 224
  • [5] An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval
    Ben Ayed, Alaidine
    Biskri, Ismail
    Meunier, Jean-Guy
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [6] Enterprise Network Status Analysis Using Hadoop and Lucene
    He, Gang
    Xiao, Yijian
    Yu, Decheng
    Wu, Xiaochun
    [J]. 2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 527 - 530
  • [7] Efficient Query Retrieval from Social Data in Neo4j using LIndex
    Mathew, Anita Brigit
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (05): : 2211 - 2232
  • [8] Anserini: Enabling the Use of Lucene for Information Retrieval Research
    Yang, Peilin
    Fang, Hui
    Lin, Jimmy
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1253 - 1256
  • [9] Using semantics for efficient information retrieval
    Todirascu, A
    de Beuvron, F
    Gâlea, D
    Keith, B
    Rousselot, F
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2001, 1959 : 239 - 250
  • [10] Retrieval of bibliographic records using Apache Lucene
    Milosavljevic, Branko
    Boberic, Danijela
    Surla, Dugan
    [J]. ELECTRONIC LIBRARY, 2010, 28 (04): : 525 - 539