Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata

被引:10
|
作者
Ismail, Mahmoud [1 ]
Gebremeskel, Ermias [2 ]
Kakantousis, Theofilos [2 ]
Berthou, Gautier [2 ]
Dowling, Jim [1 ,2 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
[2] RISE SICS, Kista, Sweden
基金
欧盟地平线“2020”;
关键词
D O I
10.1109/ICDCS.2017.41
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is a popular system for storing, managing, and processing large volumes of data, but it has bare-bones internal support for metadata, as metadata is a bottleneck and less means more scalability. The result is a scalable platform with rudimentary access control that is neither user-nor developer friendly. Also, metadata services that are built on Hadoop, such as SQL-on-Hadoop, access control, data provenance, and data governance are necessarily implemented as eventually consistent services, resulting in increased development effort and more brittle software. In this paper, we present a new project-based multi-tenancy model for Hadoop, built on a new distribution of Hadoop that provides a distributed database backend for the Hadoop Distributed Filesystem's (HDFS) metadata layer. We extend Hadoop's metadata model to introduce projects, datasets, and project-users as new core concepts that enable a user-friendly, UI-driven Hadoop experience. As our metadata service is backed by a transactional database, developers can easily extend metadata by adding new tables and ensure the strong consistency of extended metadata using both transactions and foreign keys.
引用
收藏
页码:2525 / 2528
页数:4
相关论文
共 3 条
  • [1] Joint Hierarchical Nodes based User Management (JoHNUM) Infrastructure for the Development of Scalable and Consistent Virtual Worlds
    Farooq, Umar
    Glauert, John
    [J]. 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS, PROCEEDINGS, 2009, : 105 - 112
  • [2] AIM - Agile Instrumented Monitoring for Improving User Experience of Participation in HealthIT Development
    Pitkanen, Janne
    Nieminen, Marko
    [J]. BUILDING CAPACITY FOR HEALTH INFORMATICS IN THE FUTURE, 2017, 234 : 269 - 274
  • [3] Development of a computerized paediatric intensive care unit septic shock pathway: improving user experience
    Shivers, Lauren
    Feldman, Sue S.
    Hayes, Leslie W.
    [J]. HEALTH SYSTEMS, 2019, 8 (03) : 155 - 161