Design of a vertical search engine for synchrotron data: a big data approach using Hadoop ecosystem

被引:0
|
作者
Ali Khaleghi
Kamran Mahmoudi
Sonia Mozaffari
机构
[1] Imam Khomeini International University,
来源
SN Applied Sciences | 2019年 / 1卷
关键词
Synchrotron; Search Engine; Information retrieval; Big data; Hadoop; Solr; Nutch;
D O I
暂无
中图分类号
学科分类号
摘要
A synchrotron as an experimental physics facility can provide the opportunity of a multi-disciplinary research and collaboration between scientists in various fields of study such as physics, chemistry, etc. During the construction and operation of such facility valuable data regarding the design of the facility, instruments and conducted experiments are published and stored. It takes researchers a long time going through different results from generalized search engines to find their needed scientific information so that the design of a domain specific search engine can help researchers to find their desired information with greater precision. It also provides the opportunity to use the crawled data to create a knowledgebase and also to generate different datasets required by the researchers. There have been several other vertical search engines that are designed for scientific data search such as medical information. In this paper we propose the design of such search engine on top of the Apache Hadoop framework. Usage of Hadoop ecosystem provides the necessary features such as scalability, fault tolerance and availability. It also abstracts the complexities of search engine design by using different open source tools as building blocks, among them Apache Nutch for the crawling block and Apache Solr for indexing and query processing. Our primary results obtained by implementing the proposed method in single node mode, the index of over a hundred thousand pages was created with the average fetch interval of 30 days having 28 segments and approximately 570 MB size. The performance factors such as the usage of available bandwidth and system load were logged using Linux’s sysstat package.
引用
收藏
相关论文
共 50 条
  • [1] Design of a vertical search engine for synchrotron data: a big data approach using Hadoop ecosystem
    Khaleghi, Ali
    Mahmoudi, Kamran
    Mozaffari, Sonia
    [J]. SN APPLIED SCIENCES, 2019, 1 (12)
  • [2] Anomaly Detection for Big Log Data Using a Hadoop Ecosystem
    Son, Siwoon
    Gil, Myeong-Seon
    Moon, Yang-Sae
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 377 - 380
  • [3] Design and Implementation of Vertical Search Engine Based on Hadoop
    Cheng Lin
    Ma Yajie
    [J]. PROCEEDINGS 2016 EIGHTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION ICMTMA 2016, 2016, : 199 - 205
  • [4] Design and Implementation of Search Engine Based on Big Data
    Zhang Zhifeng
    Han Susu
    [J]. AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (01): : 1355 - 1359
  • [5] IoT Big Data provenance scheme using blockchain on Hadoop ecosystem
    Pajooh, Houshyar Honar
    Rashid, Mohammed A.
    Alam, Fakhrul
    Demidenko, Serge
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [6] IoT Big Data provenance scheme using blockchain on Hadoop ecosystem
    Houshyar Honar Pajooh
    Mohammed A. Rashid
    Fakhrul Alam
    Serge Demidenko
    [J]. Journal of Big Data, 8
  • [7] Parallelization of Vertical Search Engine using Hadoop and MapReduce
    Pasari, Rajat
    Chaudhari, Vaibhav
    Borkar, Atharva
    Joshi, Amit
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [9] Big Data Management Performance Evaluation in Hadoop Ecosystem
    Liu, Qing
    Fu, Yinjin
    Ni, Guiqiang
    Mei, Jianmin
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 413 - 421
  • [10] Prediction of Diseases Using Hadoop in Big Data - A Modified Approach
    Jayalatchumy, D.
    Thambidurai, P.
    [J]. ARTIFICIAL INTELLIGENCE TRENDS IN INTELLIGENT SYSTEMS, CSOC2017, VOL 1, 2017, 573 : 229 - 238