PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI

被引:0
|
作者
Banday, Banooqa H. [1 ]
Islam, Tanzima Z. [1 ]
Marathe, Aniruddha [2 ]
机构
[1] Texas State Univ, San Marcos, TX 78666 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
关键词
Large Language Model; Generative Modeling; Evaluation; Scientific Data;
D O I
10.1109/COMPSAC61105.2024.00035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collecting data in High-Performance Computing (HPC) is a laborious task, demanding that application scientists execute the application multiple times with different configurations. Due to the essential nature of performance modeling and root cause analysis as initial phases of performance enhancement, the data collection phase prolongs the optimization process. Motivated by this observation, we investigate the feasibility of leveraging the recent advancement in the field of generative Artificial Intelligence (AI) to synthesize performance samples. However, generating synthetic performance data introduces an additional hurdle: the absence of ground truths to assess the quality of the synthetic data. This work takes a step toward bridging this gap where we propose a framework-PERFGEN-for generating performance data and evaluating its quality using a novel metric called Dissimilarity. Our experiments with three performance and five machine learning datasets (including three classification and two regression datasets), confirm that our proposed Dissimilarity correlates with model accuracy better than three of the state-of-the-art metrics-SD quality, Kullback-Leibler Divergence (KL), and TabSyndex, demonstrating that the Dissimilarity metric strongly correlates with the quality of generated scientific data. We evaluate the quality by measuring how well the generated data enables a downstream Machine Learning (ML) task to generalize. Since performance data is a special case of scientific data-typically stored in tabular format and consisting of numerical, categorical, and ordinal features-our methodologies and metrics apply to scientific data from other domains as well.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 50 条
  • [1] Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI
    Zhang, Liang
    Lin, Jionghao
    Sabatini, John
    Borchers, Conrad
    Weitekamp, Daniel
    Cao, Meng
    Hollander, John
    Hu, Xiangen
    Graesser, Arthur C.
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2025, 18 : 145 - 164
  • [2] Generative Policy Framework for AI Training Data Curation
    Salapura, V.
    Wood, D.
    Witherspoon, S. A.
    Grueneberg, K.
    Bertino, E.
    Jabal, A. A.
    Calo, S.
    2019 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2019), 2019, : 475 - 477
  • [4] Memory Workload Synthesis Using Generative AI
    Shi, Chengao
    Jiang, Fan
    Liu, Zhenguo
    Ding, Chen
    Xu, Jiang
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2023, 2023,
  • [5] AI Ethical Framework: A Government-Centric Tool Using Generative AI
    Kone, Lalla Aicha
    Leonteva, Anna Ouskova
    Diallo, Mamadou Tourad
    Haouba, Ahmedou
    Collet, Pierre
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 77 - 89
  • [6] Demystifying the Ethical Framework for Generative AI in Healthcare: A Data Science Perspective
    Lakshmi, R. Vani
    Clare, Rahul Sheshan
    Kamath, Asha
    ARTIFICIAL INTELLIGENCE IN HEALTHCARE, PT II, AIIH 2024, 2024, 14976 : 279 - 289
  • [7] The ethics of using generative AI for qualitative data analysis
    Davison, Robert M.
    Chughtai, Hameed
    Nielsen, Petter
    Marabelli, Marco
    Iannacci, Federico
    van Offenbeek, Marjolein
    Tarafdar, Monideepa
    Trenz, Manuel
    Techatassanasoontorn, Angsana A.
    Diaz Andrade, Antonio
    Panteli, Niki
    INFORMATION SYSTEMS JOURNAL, 2024, 34 (05) : 1433 - 1439
  • [8] Policy framework for the utilization of generative AI
    Cheng, Kunming
    Wu, Haiyang
    CRITICAL CARE, 2024, 28 (01)
  • [9] Deep Generative Modeling: From Probabilistic Framework to Generative AI
    Tomczak, Jakub M.
    ENTROPY, 2025, 27 (03)
  • [10] Generative AI-Enhanced Cybersecurity Framework for Enterprise Data Privacy Management
    Nadella, Geeta Sandeep
    Addula, Santosh Reddy
    Yadulla, Akhila Reddy
    Sajja, Guna Sekhar
    Meesala, Mohan
    Maturi, Mohan Harish
    Meduri, Karthik
    Gonaygunta, Hari
    COMPUTERS, 2025, 14 (02)