PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI

被引:0
|
作者
Banday, Banooqa H. [1 ]
Islam, Tanzima Z. [1 ]
Marathe, Aniruddha [2 ]
机构
[1] Texas State Univ, San Marcos, TX 78666 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
关键词
Large Language Model; Generative Modeling; Evaluation; Scientific Data;
D O I
10.1109/COMPSAC61105.2024.00035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collecting data in High-Performance Computing (HPC) is a laborious task, demanding that application scientists execute the application multiple times with different configurations. Due to the essential nature of performance modeling and root cause analysis as initial phases of performance enhancement, the data collection phase prolongs the optimization process. Motivated by this observation, we investigate the feasibility of leveraging the recent advancement in the field of generative Artificial Intelligence (AI) to synthesize performance samples. However, generating synthetic performance data introduces an additional hurdle: the absence of ground truths to assess the quality of the synthetic data. This work takes a step toward bridging this gap where we propose a framework-PERFGEN-for generating performance data and evaluating its quality using a novel metric called Dissimilarity. Our experiments with three performance and five machine learning datasets (including three classification and two regression datasets), confirm that our proposed Dissimilarity correlates with model accuracy better than three of the state-of-the-art metrics-SD quality, Kullback-Leibler Divergence (KL), and TabSyndex, demonstrating that the Dissimilarity metric strongly correlates with the quality of generated scientific data. We evaluate the quality by measuring how well the generated data enables a downstream Machine Learning (ML) task to generalize. Since performance data is a special case of scientific data-typically stored in tabular format and consisting of numerical, categorical, and ordinal features-our methodologies and metrics apply to scientific data from other domains as well.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 50 条
  • [41] Student Learning Performance Evaluation: Mitigating the Challenges of Generative AI Chatbot Misuse in Student Assessments
    Tang, Chun Meng
    Chaw, Lee Yen
    PROCEEDINGS OF THE 23RD EUROPEAN CONFERENCE ON E-LEARNING, ECEL 2024, 2024, 23/1 : 357 - 364
  • [42] Towards data-efficient mechanical design of bicontinuous composites using generative AI
    Masrouri, Milad
    Qin, Zhao
    THEORETICAL AND APPLIED MECHANICS LETTERS, 2024, 14 (01)
  • [43] Student Learning Performance Evaluation: Mitigating the Challenges of Generative AI Chatbot Misuse in Student Assessments
    Tang, Chun Meng
    Chaw, Lee Yen
    Proceedings of the European Conference on e-Learning, ECEL, 2024, 23 (01): : 357 - 364
  • [44] Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools
    Bhatia, Shreya
    Gandhi, Tarushi
    Kumar, Dhruv
    Jalote, Pankaj
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 54 - 61
  • [45] Performance Modeling of Data Storage Systems Using Generative Models
    Al-Maeeni, Abdalaziz R.
    Temirkhanov, Aziz
    Ryzhikov, Artem
    Hushchyn, Mikhail
    IEEE ACCESS, 2025, 13 : 49643 - 49658
  • [46] How Generative AI Liberates Data to Streamline Decisions
    Schern, Jason
    Hart's E and P, 2024, 99 (01): : 58 - 59
  • [47] AI Pro: Data Processing Framework for AI Models
    Frost, Richie
    Paul, Debjyoti
    Li, Feifei
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1980 - 1983
  • [48] ICF target optimization using generative AI
    Ben Tayeb, M.
    Tikhonchuk, V.
    Feugeas, J. -L
    PHYSICS OF PLASMAS, 2024, 31 (10)
  • [49] Legal implications of using generative AI in the media
    Bayer, Judit
    INFORMATION & COMMUNICATIONS TECHNOLOGY LAW, 2024, 33 (03) : 310 - 329
  • [50] Benchmarking Generative AI Performance Requires a Holistic Approach
    Dholakia, Ajay
    Ellison, David
    Hodak, Miro
    Dutta, Debojyoti
    Binnig, Carsten
    PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2023, 2024, 14247 : 34 - 43