A qualitative assessment of using ChatGPT as large language model for scientific workflow development

被引：1

作者：

Saenger, Mario ^{[1
]}

De Mecquenem, Ninon ^{[1
]}

Lewinska, Katarzyna Ewa ^{[2
,3
]}

Bountris, Vasilis ^{[1
]}

Lehmann, Fabian ^{[1
]}

Leser, Ulf ^{[1
]}

Kosch, Thomas ^{[1
]}

机构：

[1] Humboldt Univ, Dept Comp Sci, D-10099 Berlin, Germany

[2] Humboldt Univ, Dept Geog, D-10099 Berlin, Germany

[3] Univ Wisconsin Madison, Dept Forest & Wildlife Ecol, Madison, WI 53706 USA

来源：

GIGASCIENCE | 2024年 / 13卷

关键词：

large language models; scientific workflows; user support; ChatGPT; END-USER DEVELOPMENT; GENERATION; ALIGNMENT; FUTURE;

D O I：

10.1093/gigascience/giae030

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages.Results To address these challenges, we investigate the efficiency of large language models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed 3 user studies in 2 scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.Conclusions Our results show a high accuracy for comprehending and explaining scientific workflows while achieving a reduced performance for modifying and extending workflow descriptions. These findings clearly illustrate the need for further research in this area.

引用

页数：19

共 50 条

[41] Assessing the performance of ChatGPT in bioethics: a large language model's moral compass in medicine
Chen, Jamie
Cadiente, Angelo
Kasselman, Lora J.
Pilkington, Bryan
JOURNAL OF MEDICAL ETHICS, 2024, 50 (02) : 97 - 101
[42] Evaluating the Accuracy of Large Language Model (ChatGPT) in Providing Information on Metastatic Breast Cancer
Gummadi, Ramakrishna
Dasari, Nagasen
Kumar, D. Sathis
Pindiprolu, Sai Kiran S. S.
ADVANCED PHARMACEUTICAL BULLETIN, 2024, 14 (03) : 499 - 503
[43] Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4
Martin Krusche
Johnna Callhoff
Johannes Knitza
Nikolas Ruffer
Rheumatology International, 2024, 44 : 303 - 306
[44] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
Euibeom Shin
Murali Ramanathan
Journal of Pharmacokinetics and Pharmacodynamics, 2024, 51 : 101 - 108
[45] Artificial intelligence with ChatGPT 4: a large language model in support of ocular oncology cases
Federico Giannuzzi
Matteo Mario Carlà
Lorenzo Hu
Valentina Cestrone
Carmela Grazia Caputo
Maria Grazia Sammarco
Gustavo Savino
Stanislao Rizzo
Maria Antonietta Blasi
Monica Maria Pagliara
International Ophthalmology, 45 (1)
[46] ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model
Hanyao Huang
Ou Zheng
Dongdong Wang
Jiayi Yin
Zijin Wang
Shengxuan Ding
Heng Yin
Chuan Xu
Renjie Yang
Qian Zheng
Bing Shi
International Journal of Oral Science, 15
[47] Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT
Monroe, Cynthia L.
Abdelhafez, Yasser G.
Atsina, Kwame
Aman, Edris
Nardo, Lorenzo
Madani, Mohammad H.
CLINICAL IMAGING, 2024, 112
[48] Can ChatGPT recognize impoliteness? An exploratory study of the pragmatic awareness of a large language model
Andersson, Marta
Mcintyre, Dan
JOURNAL OF PRAGMATICS, 2025, 239 : 16 - 36
[49] Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4
Krusche, Martin
Callhoff, Johnna
Knitza, Johannes
Ruffer, Nikolas
RHEUMATOLOGY INTERNATIONAL, 2024, 44 (02) : 303 - 306
[50] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
Shin, Euibeom
Ramanathan, Murali
JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (02) : 101 - 108

← 1 2 3 4 5 →