Open Source Software Tools for Data Management and Deep Model Training Automation

被引:0
|
作者
Tirasoglu, Umut [1 ]
Turker, Abdussamet [1 ]
Ekici, Adnan [1 ]
Yigit, Hayri [1 ]
Bolukbasi, Yusuf Enes [1 ]
Akgun, Toygar [2 ]
机构
[1] ORDULU Technol Corp, Artificial Intelligence Grp, Ankara, Turkiye
[2] TOBB Univ Econ & Technol, Dept Comp Engn, Ankara, Turkiye
关键词
dataset management; training automation; deep model; augmentation;
D O I
10.1109/ASE56229.2023.00014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing and optimizing deep models require managing large datasets and conducting carefully designed controlled experiments that depend on large sets of hyper-parameters and problem dependent software/data configurations. These experiments are executed by training the model under observation with varying configurations. Since executing a typical training run can take days even on proven acceleration fabrics such as Graphics Processing Units (GPU), properly managing training data, avoiding human error in configuration preparations and securing the repeatability of the experiments are of utmost importance. In this paper, we present two open source software tools that aim to achieve these goals, namely, a Dataset Manager (DatumAid) tool and a Training Automation Manager (OrchesTrain) tool. DatumAid is a software tool that integrates with Computer Vision Annotation Tool (CVAT) to facilitate the management of annotated datasets. By adding additional functionality, DatumAid allows users to filter labeled data, manipulate datasets, and export datasets for training purposes. The tool adopts a simple code structure while providing flexibility to users through configuration files. OrchesTrain aims to automate model training process by facilitating rapid preparation and training of models in the desired style for the intended tasks. Users can seamlessly integrate their models prepared in the PyTorch library into the system and leverage the full capabilities of OrchesTrain. It enables the simultaneous or separate usage of Wandb, MLflow, and TensorBoard loggers. To ensure reproducibility of the conducted experiments, all configurations and codes are saved to the selected logger in an appropriate structure within a YAML file along with the serialized model files. Both software tools are publicly available on GitHub.
引用
收藏
页码:1814 / 1818
页数:5
相关论文
共 50 条
  • [1] Management of Astronomical Software Projects with Open Source Tools
    Briegel, Florian
    Bertram, Thomas
    Berwein, Juergen
    Kittmann, Frank
    [J]. ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIX, 2010, 434 : 225 - 228
  • [2] itom: an open source metrology, automation, and data evaluation software
    Gronle, Marc
    Lyda, Wolfram
    Wilke, Marc
    Kohler, Christian
    Osten, Wolfgang
    [J]. APPLIED OPTICS, 2014, 53 (14) : 2974 - 2982
  • [3] LEVERAGING OPEN SOURCE SOFTWARE IN THE EDUCATION MANAGEMENT AND LEADERSHIP TRAINING
    Nordin, Norazah
    Ibrahim, Sham
    Hamzah, Mohd. Izham Mohd.
    Embi, Mohamed Amin
    Din, Rosseni
    [J]. TURKISH ONLINE JOURNAL OF EDUCATIONAL TECHNOLOGY, 2012, 11 (03): : 215 - 221
  • [4] FREEWAT: FREE and open source software tools for WATer resource management
    Rossetto, Rudy
    Borsi, Iacopo
    Foglia, Laura
    [J]. RENDICONTI ONLINE SOCIETA GEOLOGICA ITALIANA, 2015, 35 : 252 - 255
  • [5] Applying Open-Source Software to Laboratory Data Management
    Murray, Glenn A.
    Crocker, David P.
    [J]. JALA, 2011, 16 (05): : 327 - 334
  • [6] Open Source Software For Patient Data Management In Critical Care
    Massaut, Jacques
    Charretk, Nicolas
    Gayraud, Olivia
    Van den Bergh, Rafael
    Charles, Adelin
    Edema, Nathalie
    [J]. MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 920 - 920
  • [7] Open source software in translator training
    Canovas, Marcos
    Samson, Richard
    [J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2011, (09): : 46 - 56
  • [8] Plug and model simulation tools for automation software
    Kowal, John
    [J]. Control Engineering, 2019, 66 (04): : 27 - 28
  • [9] Proactive Data Centre & Network Room Overheating Management System (DCNROMS) Utilizing Open Source Software and Tools
    Maurya, V. K.
    Yadav, S. K.
    Bachhil, K. K.
    Chouhan, H. S.
    Chaudhari, S.
    Tomar, S. S.
    Rajan, A.
    Rawat, A.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 37 - 42
  • [10] A Reference Model for the Selection of Open Source Tools for Requirements Management
    Chrabski, Bartosz
    Orlowski, Cezary
    [J]. NEW RESULTS IN DEPENDABILITY AND COMPUTER SYSTEMS, 2013, 224 : 93 - 107