Research on Dataset Generation in the Development of Large Language Models for Digital Textbooks

被引:0
|
作者
Lee, Youngho [1 ]
机构
[1] Daegu Natl Univ Educ, Comp Educ, Daegu, South Korea
基金
新加坡国家研究基金会;
关键词
LLMs; Prompt Design; Self-Instruct; Data Generation; Digital Textbook;
D O I
10.1109/RAA/59955.2023.10601206
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, numerous institutions have been developing Large Language Models (LLMs). This model is ushering in revolutionary changes in various fields including society, economy, and education. The LLM in education is expanding its use, and this expansion includes providing personalized learning experiences. However, the LLM currently being developed is a general model, rather than a model specialized for a specific subject or textbook. This may have limitations in its use by teachers and learners. Therefore, in this study, the LLM development, an open-source model, is being fine-tuned using a specific dataset. Before proceeding, it is necessary to develop a specific dataset. Human-generated datasets are expensive and subject-specific, thereby having disadvantages. Therefore, in this study, we propose a method of developing a textbook dataset by applying the self-instruct technique. It is expected that a textbook-specific dataset can be developed at a low cost through this.
引用
收藏
页码:297 / 300
页数:4
相关论文
共 50 条
  • [1] Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
    Ko, Hyung-Kwon
    Jeon, Hyeon
    Park, Gwanmo
    Kim, Dae Hyun
    Kim, Nam Wook
    Kim, Juho
    Seo, Jinwook
    [J]. PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [2] Demystifying large language models in second language development research
    Cong, Yan
    [J]. COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [3] Causal Dataset Discovery with Large Language Models
    Liu, Junfei
    Sun, Shaotong
    Nargesian, Fatemeh
    [J]. WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
  • [4] Discovering Research Areas in Dataset Applications through Knowledge Graphs and Large Language Models
    Gerasimov, Irina
    Mehrabian, Armin
    Kc, Binita
    Alfred, Jerome
    Mcguire, Michael P.
    [J]. Proceedings - 2024 IEEE 20th International Conference on e-Science, e-Science 2024, 2024,
  • [5] Understanding the Dataset Practitioners Behind Large Language Models
    Qian, Crystal
    Reif, Emily
    Kahng, Minsuk
    [J]. EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [6] Level Generation Through Large Language Models
    Todd, Graham
    Earle, Sam
    Nasir, Muhammad Umair
    Green, Michael Cerny
    Togelius, Julian
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2023, 2023,
  • [7] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 149
  • [8] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    [J]. Journal of Biomedical Informatics, 2024, 149
  • [9] ChatTwin: Toward Automated Digital Twin Generation for Data Center via Large Language Models
    Li, Minghao
    Wang, Ruihang
    Zhou, Xin
    Zhu, Zhaomeng
    Wen, Yonggang
    Tan, Rui
    [J]. PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2023, 2023, : 208 - 211
  • [10] STELLAR: A LARGE SATELLITE STEREO DATASET FOR DIGITAL SURFACE MODEL GENERATION
    Patil, Sonali
    Guo, Qi
    [J]. 39TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT ISRSE-39 FROM HUMAN NEEDS TO SDGS, VOL. 48-M-1, 2023, : 433 - 440