Efficient Handling of Heterogeneous File Formats in HDFS

被引:0
|
作者
Prashant, More Vaishali [1 ]
Raut, Suhas D. [1 ]
机构
[1] NK Orchid Coll Engn & Tech, Dept Comp Sci & Engn, Solapur, Maharashtra, India
关键词
Big Data; Hadoop; HDFS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The amount of data in our industry and the world is exploding. Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. In an Organization, there are multiple types of documents collected from the different sources. This documents that needs to be accessible immediately; documents that needs to be accessed within a few seconds or minutes; and documents that is accessed in frequently. While these types of documents play different roles within an organization, each is valuable. These different types of documents require different kinds of storage solutions. For handling of such heterogeneous file format we use Hadoop. In Hadoop, storage of different documents is provided by HDFS (Hadoop Distributed File System). Also in educational organization, documents categorization is one of the most important tasks. Availability of a document and need of providing a category to a document motivated for implementing this project.
引用
收藏
页数:6
相关论文
共 50 条