Data Management Challenges for Deep Learning

被引:46
|
作者
Raj, Aiswarya [1 ]
Bosch, Jan [1 ]
Olsson, Helena Holmstrom [2 ]
Arpteg, Anders [3 ]
Brinne, Bjorn [3 ]
机构
[1] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden
[2] Malmo Univ, Dept Comp Sci & Media Technol, Malmo, Sweden
[3] Peltarion AB, Stockholm, Sweden
关键词
Deep learning; Data Management; Machine learning; Artificial intelligence; Deep Neural Networks;
D O I
10.1109/SEAA.2019.00030
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning is one of the most exciting and fast-growing techniques in Artificial Intelligence. The unique capacity of deep learning models to automatically learn patterns from the data differentiates it from other machine learning techniques. Deep learning is responsible for a significant number of recent breakthroughs in AI. However, deep learning models are highly dependent on the underlying data. So, consistency, accuracy, and completeness of data is essential for a deep learning model. Thus, data management principles and practices need to be adopted throughout the development process of deep learning models. The objective of this study is to identify and categorise data management challenges faced by practitioners in different stages of end-to-end development. In this paper, a case study approach is employed to explore the data management issues faced by practitioners across various domains when they use real-world data for training and deploying deep learning models. Our case study is intended to provide valuable insights to the deep learning community as well as for data scientists to guide discussion and future research in applied deep learning with real-world data.
引用
收藏
页码:140 / 147
页数:8
相关论文
共 50 条
  • [1] Data management for production quality deep learning models: Challenges and solutions
    Munappy, Aiswarya Raj
    Bosch, Jan
    Olsson, Helena Holmstrom
    Arpteg, Anders
    Brinne, Bjoern
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 191
  • [2] Big Data Deep Learning: Challenges and Perspectives
    Chen, Xue-Wen
    Lin, Xiaotong
    [J]. IEEE ACCESS, 2014, 2 : 514 - 525
  • [3] Data Collection and Quality Challenges for Deep Learning
    Whang, Steven Euijong
    Lee, Jae-Gil
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3429 - 3432
  • [4] Data Management Challenges in Production Machine Learning
    Polyzotis, Neoklis
    Roy, Sudip
    Whang, Steven Euijong
    Zinkevich, Martin
    [J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1723 - 1726
  • [5] Deep learning applications and challenges in big data analytics
    Najafabadi M.M.
    Villanustre F.
    Khoshgoftaar T.M.
    Seliya N.
    Wald R.
    Muharemagic E.
    [J]. Journal of Big Data, 2 (1)
  • [6] Data Management in Machine Learning: Challenges, Techniques, and Systems
    Kumar, Arun
    Boehm, Matthias
    Yang, Jun
    [J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1717 - 1722
  • [7] Towards Unified Data and Lifecycle Management for Deep Learning
    Miao, Hui
    Li, Ang
    Davis, Larry S.
    Deshpande, Amol
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 571 - 582
  • [8] Scaling deep learning data management with Cassandra DB
    Versaci, Francesco
    Busonera, Giovanni
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5301 - 5310
  • [9] DSL Approach to Deep Learning Lifecycle Data Management
    Celms, Edgars
    Barzdins, Janis
    Kalnins, Audris
    Barzdins, Paulis
    Sprogis, Arturs
    Grasmanis, Mikus
    Rikacovs, Sergejs
    [J]. BALTIC JOURNAL OF MODERN COMPUTING, 2020, 8 (04): : 597 - 617
  • [10] Deep learning for prognostics and health management: State of the art, challenges, and opportunities
    Rezaeianjouybari, Behnoush
    Shang, Yi
    [J]. MEASUREMENT, 2020, 163