A qualitative assessment of using ChatGPT as large language model for scientific workflow development

被引：1

作者：

Saenger, Mario ^{[1
]}

De Mecquenem, Ninon ^{[1
]}

Lewinska, Katarzyna Ewa ^{[2
,3
]}

Bountris, Vasilis ^{[1
]}

Lehmann, Fabian ^{[1
]}

Leser, Ulf ^{[1
]}

Kosch, Thomas ^{[1
]}

机构：

[1] Humboldt Univ, Dept Comp Sci, D-10099 Berlin, Germany

[2] Humboldt Univ, Dept Geog, D-10099 Berlin, Germany

[3] Univ Wisconsin Madison, Dept Forest & Wildlife Ecol, Madison, WI 53706 USA

来源：

GIGASCIENCE | 2024年 / 13卷

关键词：

large language models; scientific workflows; user support; ChatGPT; END-USER DEVELOPMENT; GENERATION; ALIGNMENT; FUTURE;

D O I：

10.1093/gigascience/giae030

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages.Results To address these challenges, we investigate the efficiency of large language models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed 3 user studies in 2 scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.Conclusions Our results show a high accuracy for comprehending and explaining scientific workflows while achieving a reduced performance for modifying and extending workflow descriptions. These findings clearly illustrate the need for further research in this area.

引用

页数：19

共 50 条

[21] Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Zhang, Jizhi
Bao, Keqin
Zhang, Yang
Wang, Wenjie
Feng, Fuli
He, Xiangnan
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 993 - 999
[22] Response Performance Evaluations of ChatGPT Models on Large Language Model Frameworks
Kaplan, Alper
Sayan, Ismail Utku
Saban, Huseyin
Begen, Emre
Bayrak, Ahmet Tugrul
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[23] Large language model (ChatGPT) as a support tool for breast tumor board
Vera Sorin
Eyal Klang
Miri Sklair-Levy
Israel Cohen
Douglas B. Zippel
Nora Balint Lahat
Eli Konen
Yiftach Barash
npj Breast Cancer, 9
[24] Performance of the ChatGPT large language model for decision support in community pharmacy
Shin, Euibeom
Hartman, Maggie
Ramanathan, Murali
BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2024, 90 (12) : 3320 - 3333
[25] Transforming Educational Assessment: Insights Into the Use of ChatGPT and Large Language Models in Grading
Kooli, Chokri
Yusuf, Nadia
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2025, 41 (05) : 3388 - 3399
[26] A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots
Kiyak, Yavuz Selim
Emekli, Emre
SPANISH JOURNAL OF MEDICAL EDUCATION, 2024, 5 (03):
[27] The potential and pitfalls of using a large language model such as ChatGPT, GPT-4, or LLaMA as a clinical assistant
Zhang, Jingqing
Sun, Kai
Jagadeesh, Akshay
Falakaflaki, Parastoo
Kayayan, Elena
Tao, Guanyu
Ghahfarokhi, Mahta Haghighat
Gupta, Deepa
Gupta, Ashok
Gupta, Vibhor
Guo, Yike
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1884 - 1891
[28] Application of Large Language Models in Medical Training Evaluation-Using ChatGPT as a Standardized Patient: Multimetric Assessment
Wang, Chenxu
Li, Shuhan
Lin, Nuoxi
Zhang, Xinyu
Han, Ying
Wang, Xiandi
Liu, Di
Tan, Xiaomei
Pu, Dan
Li, Kang
Qian, Guangwu
Yin, Rong
JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
[29] Disability Expertise and Large Language Models: A Qualitative Study of Autistic TikTok Creators' Use of ChatGPT
Mc Nally, Kellan
Wright, Kathryn
Goldkind, Lauri
Kattari, Shanna K.
Victor, Bryan G.
SOCIAL MEDIA + SOCIETY, 2024, 10 (03):
[30] Scientific workflow execution in the cloud using a dynamic runtime model
Johannes Erbel
Jens Grabowski
Software and Systems Modeling, 2024, 23 : 163 - 193

← 1 2 3 4 5 →