Design of a vertical search engine for synchrotron data: a big data approach using Hadoop ecosystem

被引:0
|
作者
Ali Khaleghi
Kamran Mahmoudi
Sonia Mozaffari
机构
[1] Imam Khomeini International University,
来源
SN Applied Sciences | 2019年 / 1卷
关键词
Synchrotron; Search Engine; Information retrieval; Big data; Hadoop; Solr; Nutch;
D O I
暂无
中图分类号
学科分类号
摘要
A synchrotron as an experimental physics facility can provide the opportunity of a multi-disciplinary research and collaboration between scientists in various fields of study such as physics, chemistry, etc. During the construction and operation of such facility valuable data regarding the design of the facility, instruments and conducted experiments are published and stored. It takes researchers a long time going through different results from generalized search engines to find their needed scientific information so that the design of a domain specific search engine can help researchers to find their desired information with greater precision. It also provides the opportunity to use the crawled data to create a knowledgebase and also to generate different datasets required by the researchers. There have been several other vertical search engines that are designed for scientific data search such as medical information. In this paper we propose the design of such search engine on top of the Apache Hadoop framework. Usage of Hadoop ecosystem provides the necessary features such as scalability, fault tolerance and availability. It also abstracts the complexities of search engine design by using different open source tools as building blocks, among them Apache Nutch for the crawling block and Apache Solr for indexing and query processing. Our primary results obtained by implementing the proposed method in single node mode, the index of over a hundred thousand pages was created with the average fetch interval of 30 days having 28 segments and approximately 570 MB size. The performance factors such as the usage of available bandwidth and system load were logged using Linux’s sysstat package.
引用
收藏
相关论文
共 50 条
  • [21] Big Data Analysis using Apache Hadoop
    Manikandan, Shankar Ganesh
    Ravi, Siddarth
    [J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [22] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [23] Big Data Analysis Using Hadoop Cluster
    Saldhi, Ankita
    Goel, Abhinav
    Yadav, Dipesh
    Saldhi, Ankur
    Saksena, Dhruv
    Indu, S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 572 - 575
  • [24] A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
    Sunil Kumar
    Maninder Singh
    [J]. Big Data Mining and Analytics, 2019, 2 (04) : 240 - 247
  • [25] A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
    Kumar, Sunil
    Singh, Maninder
    [J]. BIG DATA MINING AND ANALYTICS, 2019, 2 (04): : 240 - 247
  • [26] A Literature Review on Hadoop Ecosystem and Various Techniques of Big Data Optimization
    Singh, Vikash Kumar
    Taram, Manish
    Agrawal, Vinni
    Baghel, Bhartee Singh
    [J]. ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 231 - 240
  • [27] Using search engine big data for predicting new HIV diagnoses
    Young, Sean D.
    Zhang, Qingpeng
    [J]. PLOS ONE, 2018, 13 (07):
  • [28] Using Hadoop on the Mainframe: A Big Solution for the Challenges of Big Data
    Seay, Cameron
    Agrawal, Rajeev
    Kadadi, Anirudh
    Barel, Yannick
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY - NEW GENERATIONS, 2015, : 765 - 769
  • [29] Demonetization-Twitter Data Analysis using Big Data & Hadoop
    Goyal, Malvika
    Anuranjana
    [J]. PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 156 - 158
  • [30] Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
    Hussein, Eslam
    Sadiki, Ronewa
    Jafta, Yahlieel
    Sungay, Muhammad Mujahid
    Ajayi, Olasupo
    Bagula, Antoine
    [J]. E-INFRASTRUCTURE AND E-SERVICES FOR DEVELOPING COUNTRIES (AFRICOMM 2019), 2020, 311 : 180 - 185