PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI

被引：0

作者：

Banday, Banooqa H. ^{[1
]}

Islam, Tanzima Z. ^{[1
]}

Marathe, Aniruddha ^{[2
]}

机构：

[1] Texas State Univ, San Marcos, TX 78666 USA

[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

来源：

2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年

关键词：

Large Language Model; Generative Modeling; Evaluation; Scientific Data;

D O I：

10.1109/COMPSAC61105.2024.00035

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Collecting data in High-Performance Computing (HPC) is a laborious task, demanding that application scientists execute the application multiple times with different configurations. Due to the essential nature of performance modeling and root cause analysis as initial phases of performance enhancement, the data collection phase prolongs the optimization process. Motivated by this observation, we investigate the feasibility of leveraging the recent advancement in the field of generative Artificial Intelligence (AI) to synthesize performance samples. However, generating synthetic performance data introduces an additional hurdle: the absence of ground truths to assess the quality of the synthetic data. This work takes a step toward bridging this gap where we propose a framework-PERFGEN-for generating performance data and evaluating its quality using a novel metric called Dissimilarity. Our experiments with three performance and five machine learning datasets (including three classification and two regression datasets), confirm that our proposed Dissimilarity correlates with model accuracy better than three of the state-of-the-art metrics-SD quality, Kullback-Leibler Divergence (KL), and TabSyndex, demonstrating that the Dissimilarity metric strongly correlates with the quality of generated scientific data. We evaluate the quality by measuring how well the generated data enables a downstream Machine Learning (ML) task to generalize. Since performance data is a special case of scientific data-typically stored in tabular format and consisting of numerical, categorical, and ordinal features-our methodologies and metrics apply to scientific data from other domains as well.

引用

页码：188 / 197

页数：10

共 50 条

[21] Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use
Mizumoto, Atsushi
APPLIED CORPUS LINGUISTICS, 2023, 3 (03):
[22] The MADE Framework: Best Practices for Creating Effective Experimental Stimuli Using Generative AI
van Berlo, Zeph M. C.
Campbell, Colin
Voorveld, Hilde A. M.
JOURNAL OF ADVERTISING, 2024, 53 (05) : 732 - 753
[23] Grading Generative AI-based Assignments Using a 3R Framework
Chan, Henry C. B.
2023 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT AND LEARNING FOR ENGINEERING, TALE, 2023, : 128 - 132
[24] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
Gwon, Yong Nam
Kim, Jae Heon
Chung, Hyun Soo
Jung, Eun Jee
Chun, Joey
Lee, Serin
Shim, Sung Ryul
JMIR MEDICAL INFORMATICS, 2024, 12
[25] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
Gwon, Yong Nam
Kim, Jae Heon
Chung, Hyun Soo
Jung, Eun Jee
Chun, Joey
Lee, Serin
Shim, Sung Ryul
JMIR MEDICAL INFORMATICS, 2024, 12
[26] A Tutorial on Teaching Data Analytics with Generative AI
Bray, Robert L.
INFORMS JOURNAL ON APPLIED ANALYTICS, 2025,
[27] Generative AI to Generate Test Data Generators
Baudry, Benoit
Etemadi, Khashayar
Fang, Sen
Gamage, Yogya
Liu, Yi
Liu, Yuxin
Monperrus, Martin
Ron, Javier
Silva, Andre
Tiwari, Deepika
IEEE SOFTWARE, 2024, 41 (06) : 55 - 64
[28] Synthesizing Training Data for Intelligent Weed Control Systems Using Generative AI
Modak, Sourav
Stein, Anthony
ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 112 - 126
[29] Constructing Dreams using Generative AI
Ali, Safinah
Ravi, Prerna
Williams, Randi
DiPaola, Daniella
Breazeal, Cynthia
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23268 - 23275
[30] Evaluation of the Effectiveness of Prompts and Generative AI Responses
Bandi, Ajay
Zeng, Ruida
COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, CAINE 2024, 2025, 2242 : 56 - 69

← 1 2 3 4 5 →