Research on Dataset Generation in the Development of Large Language Models for Digital Textbooks

被引:0
|
作者
Lee, Youngho [1 ]
机构
[1] Daegu Natl Univ Educ, Comp Educ, Daegu, South Korea
基金
新加坡国家研究基金会;
关键词
LLMs; Prompt Design; Self-Instruct; Data Generation; Digital Textbook;
D O I
10.1109/RAA/59955.2023.10601206
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, numerous institutions have been developing Large Language Models (LLMs). This model is ushering in revolutionary changes in various fields including society, economy, and education. The LLM in education is expanding its use, and this expansion includes providing personalized learning experiences. However, the LLM currently being developed is a general model, rather than a model specialized for a specific subject or textbook. This may have limitations in its use by teachers and learners. Therefore, in this study, the LLM development, an open-source model, is being fine-tuned using a specific dataset. Before proceeding, it is necessary to develop a specific dataset. Human-generated datasets are expensive and subject-specific, thereby having disadvantages. Therefore, in this study, we propose a method of developing a textbook dataset by applying the self-instruct technique. It is expected that a textbook-specific dataset can be developed at a low cost through this.
引用
收藏
页码:297 / 300
页数:4
相关论文
共 50 条
  • [21] On the Evaluation of Large Language Models in Unit Test Generation
    Yang, Lin
    Yang, Chen
    Gao, Shutao
    Wang, Weijing
    Wang, Bo
    Zhu, Qihao
    Chu, Xiao
    Zhou, Jianyi
    Liang, Guangtai
    Wang, Qianxiang
    Chen, Junjie
    [J]. Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 1607 - 1619
  • [22] StableYolo: Optimizing Image Generation for Large Language Models
    Berger, Harel
    Dakhama, Aidan
    Ding, Zishuo
    Even-Mendoza, Karine
    Kelly, David
    Menendez, Hector
    Moussa, Rebecca
    Sarro, Federica
    [J]. SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 133 - 139
  • [23] CONCEPTUAL DESIGN GENERATION USING LARGE LANGUAGE MODELS
    Ma, Kevin
    Grandi, Daniele
    McComb, Christopher
    Goucher-Lambert, Kosa
    [J]. PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 6, 2023,
  • [24] On the Evaluation of Large Language Models in Unit Test Generation
    Yang, Lin
    Yang, Chen
    Gao, Shutao
    Wang, Weijing
    Wang, Bo
    Zhu, Qihao
    Chu, Xiao
    Zhou, Jianyi
    Liang, Guangtai
    Wang, Qianxiang
    Chen, Junjie
    [J]. arXiv,
  • [25] Bootstrapping Large Language Models for Radiology Report Generation
    Liu, Chang
    Tian, Yuanhe
    Chen, Weidong
    Song, Yan
    Zhang, Yongdong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18635 - 18643
  • [26] Model development for bespoke large language models for digital triage assistance in mental health care
    Taylor, Niall
    Kormilitzin, Andrey
    Lorge, Isabelle
    Nevado-Holgado, Alejo
    Cipriani, Andrea
    Joyce, Dan W.
    [J]. Artificial Intelligence in Medicine, 2024, 157
  • [27] Navigating Ontology Development with Large Language Models
    Saeedizade, Mohammad Javad
    Blomqvist, Eva
    [J]. SEMANTIC WEB, PT I, ESWC 2024, 2024, 14664 : 143 - 161
  • [28] Addressing digital inequities in the age of large language models (LLMs)
    Ng, Olivia
    Han, Siew Ping
    [J]. MEDICAL EDUCATION, 2024,
  • [29] Intelligent Practices of Large Language Models in Digital Government Services
    Han, Jiawei
    Lu, Jiankang
    Xu, Ying
    You, Jin
    Wu, Bingxin
    [J]. IEEE ACCESS, 2024, 12 : 8633 - 8640
  • [30] Grammar Prompting for Domain-Specific Language Generation with Large Language Models
    Wang, Bailin
    Wang, Zi
    Wang, Xuezhi
    Cao, Yuan
    Saurous, Rif A.
    Kim, Yoon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,