Processing RDF Using Hadoop

被引:0
|
作者
Ali, Mehreen [1 ]
Bharat, K. Sriram [1 ]
Ranichandra, C. [1 ]
机构
[1] VIT Univ, Vellore, Tamil Nadu, India
关键词
Semantic Web; Distributed Computing; Map-Reduce Programming; SPARQL; Graph Data; Performance Evaluation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The basic inspiration of the Semantic Web is to broaden the existing human-readable web by encoding some of the semantics of resources in a machine-understandable form. There are various formats and technologies that help in making it possible. These technologies comprise of the Resource Description Framework (RDF), an assortment of data interchange formats like RDF/XML, N3, N-Triples, and representations such as RDF Schema (RDFS) and Web Ontology Language (OWL), all of which help in providing a proper description of concepts, terms and associations in a particular knowledge domain. Presently, there are some existing frameworks for semantic web technologies but they have limitations for large RDF graphs. Thus storing and efficiently querying a large number of RDF triples is a challenging and important problem. We propose a framework which is constructed using Hadoop to store and retrieve massive numbers of RDF triples by taking advantage of the cloud computing paradigm. Hadoop permits the development of reliable, scalable, proficient, cost-effective and distributed computing using very simple Java interfaces. Hadoop comprises of a distributed file system HDFS to stock up RDF data. Hadoop Map Reduce framework is used to answer the queries. MapReduce job divides the input data-set into independent units which are processed in parallel by the map tasks, which then serve as inputs to the reduce tasks. This framework takes care of task scheduling, supervising them and re-execution of the failed tasks. Uniqueness of our approach is its efficient, automatic allocation of data and work across machines and in turn exploiting the fundamental parallelism of the CPU cores. Results confirm that our proposed framework offers multi-fold efficiencies and benefits which include on-demand processing, operational scalability, competence, cost efficiency and local access to enormous data, contrasting the various traditional approaches.
引用
收藏
页码:385 / 394
页数:10
相关论文
共 50 条
  • [1] Extensions to the Pig Data Processing Platform for Scalable RDF Data Processing Using Hadoop
    Tanimura, Yusuke
    Matono, Akiyoshi
    Lynden, Steven
    Kojima, Isao
    [J]. 2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 251 - 256
  • [2] Fast execution of RDF queries using Apache Hadoop
    Mazumdar, Somnath
    Scionti, Alberto
    [J]. ADVANCES IN COMPUTERS, VOL 119, 2020, 119 : 1 - 33
  • [3] An Entity Based RDF Indexing Schema Using Hadoop And HBase
    Abiri, Fateme
    Kahani, Mohsen
    Zarinkalam, Fatane
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 68 - 73
  • [4] Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
    Husain, Mohammad Farhan
    Doshi, Pankil
    Khan, Latifur
    Thuraisingham, Bhavani
    [J]. CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 680 - 686
  • [5] Comparison and Analysis of RDF Data Using SPARQL, HIVE, PIG in Hadoop
    Chandel, Anshul
    Garg, Deepak
    [J]. COMPUTING AND NETWORK SUSTAINABILITY, 2017, 12 : 361 - 369
  • [6] Distributed processing of biological interactions using Hadoop
    Auer, Bence
    Antal, Balint
    [J]. 2015 6TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2015, : 175 - 178
  • [7] Distributed Image Processing Using Hadoop and HIPI
    Arsh, Swapnil
    Bhatt, Abhishek
    Kumar, Praveen
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2673 - 2676
  • [8] A Development of RDF Data Transfer and Query on Hadoop Framework
    Kawises, Jutamard
    Vatanawood, Wiwat
    [J]. 2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 217 - 220
  • [9] 基于Hadoop的RDF数据存储策略综述
    杨健
    罗军
    [J]. 网络空间安全, 2015, 6 (05) : 46 - 48
  • [10] Parallel Processing of Image Segmentation Data Using Hadoop
    Akhtar, M. Nishat
    Saleh, Junita Mohamad
    Grelck, C.
    [J]. INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2018, 10 (01): : 74 - 84