How to Protect Copyright Data in Optimization of Large Language Models?

被引:0
|
作者
Chu, Timothy [1 ]
Song, Zhao [2 ]
Yang, Chiwun [3 ]
机构
[1] Google, Mountain View, CA 94043 USA
[2] Adobe Res, San Jose, CA USA
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) and generative AI have played a transformative role in computer research and applications. Controversy has arisen as to whether these models output copyrighted data, which can occur if the data the models are trained on is copyrighted. LLMs are built on the transformer neural network architecture, which in turn relies on a mathematical computation called Attention that uses the softmax function. In this paper, we observe that large language model training and optimization can be seen as a softmax regression problem. We then establish a method of efficiently performing softmax regression, in a way that prevents the regression function from generating copyright data. This establishes a theoretical method of training large language models in a way that avoids generating copyright data.
引用
收藏
页码:17871 / 17879
页数:9
相关论文
共 50 条
  • [41] Data augmented large language models for medical record generation
    Zhang, Xuanyi
    Zhao, Genghong
    Ren, Yi
    Wang, Weiguang
    Cai, Wei
    Zhao, Yan
    Zhang, Xia
    Liu, Jiren
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [42] How large language models and artificial intelligence are transforming civil engineering
    Dudhee, Vishak
    Vukovic, Vladimir
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-CIVIL ENGINEERING, 2023, 176 (04) : 150 - 150
  • [43] Large Language Models for Equivalent Mutant Detection: How Far Are We?
    Tian, Zhao
    Shu, Honglin
    Wang, Dong
    Ca, Xuejie
    Kamei, Yasutaka
    Chen, Junjie
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1733 - 1745
  • [44] Learning How to Use Large Language Models for Empirical Legal Research
    Stiglitz, Edward H.
    JOURNAL OF INSTITUTIONAL AND THEORETICAL ECONOMICS-ZEITSCHRIFT FUR DIE GESAMTE STAATSWISSENSCHAFT, 2024, 180 (02): : 239 - 243
  • [45] How should the advancement of large language models affect the practice of science?
    Binz, Marcel
    Alaniz, Stephan
    Roskies, Adina
    Aczel, Balazs
    Bergstrom, Carl T.
    Allen, Colin
    Schad, Daniel
    Wulff, Dirk
    West, Jevin D.
    Zhang, Qiong
    Shiffrin, Richard M.
    Gershman, Samuel J.
    Popov, Vencislav
    Bender, Emily M.
    Marelli, Marco
    Botvinick, Matthew M.
    Akata, Zeynep
    Schulz, Eric
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2025, 122 (05)
  • [46] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [47] Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation
    Salemi, Alireza
    Kallumadi, Surya
    Zamani, Hamed
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 752 - 762
  • [48] A Survey on the Integration and Optimization of Large Language Models in Edge Computing Environments
    Bhardwaj, Sarthak
    Singh, Pardeep
    Pandit, Mohammad Khalid
    2024 16TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, ICCAE 2024, 2024, : 168 - 172
  • [49] Leveraging Large Language Models for the Generation of Novel Metaheuristic Optimization Algorithms
    Pluhacek, Michal
    Kazikova, Anezka
    Kadavy, Tomas
    Viktorin, Adam
    Senkerik, Roman
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1812 - 1820
  • [50] Robust Prompt Optimization for Large Language Models Against Distribution Shifts
    Li, Moxin
    Wang, Wenjie
    Feng, Fuli
    Cao, Yixin
    Zhang, Jizhi
    Chua, Tat-Seng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1539 - 1554