Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models

被引:0
|
作者
Ghilardi, Davide [1 ]
Belotti, Federico [1 ]
Molinari, Marco [2 ,4 ]
Lim, Jaehyuk [2 ,3 ]
机构
[1] University of Milan-Bicocca, Italy
[2] LSE.AI
[3] University of Pennsylvania, United States
[4] London School of Economics, United Kingdom
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Computational linguistics
引用
收藏
页码:530 / 550
相关论文
共 50 条
  • [41] What do end-to-end speech models learn about speaker, language and channel information? A layer-wise and neuron-level analysis
    Chowdhury, Shammur Absar
    Durrani, Nadir
    Ali, Ahmed
    COMPUTER SPEECH AND LANGUAGE, 2023, 83
  • [42] Chinese Diabetes Question Classification Using Large Language Models and Transfer Learning
    Ge, Chengze
    Ling, Hongshun
    Quan, Fuliang
    Zeng, Jianping
    HEALTH INFORMATION PROCESSING: EVALUATION TRACK PAPERS, CHIP 2023, 2024, 2080 : 205 - 213
  • [43] Predicting polymerization reactions via transfer learning using chemical language models
    Ferrari, Brenda S.
    Manica, Matteo
    Giro, Ronaldo
    Laino, Teodoro
    Steiner, Mathias B.
    NPJ COMPUTATIONAL MATERIALS, 2024, 10 (01)
  • [44] Learning from Mistakes via Cooperative Study Assistant for Large Language Models
    Wang, Danqing
    Li, Lei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10667 - 10685
  • [45] Regularized Continual Learning for Large-Scale Language Models via Probing
    Song, Xingshen
    Ren, Tianxiang
    Deng, Jinsheng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 29 - 41
  • [46] Teamwork Conflict Management Training and Conflict Resolution Practice via Large Language Models
    Aggrawal, Sakhi
    Magana, Alejandra J.
    FUTURE INTERNET, 2024, 16 (05)
  • [47] Co-training Improves Prompt-based Learning for Large Language Models
    Lang, Hunter
    Agrawal, Monica
    Kim, Yoon
    Sontag, David
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [48] Navigating WebAI: Training Agents to CompleteWeb Tasks with Large Language Models and Reinforcement Learning
    Thil, Lucas-Andrei
    Popa, Mirela
    Spanakis, Gerasimos
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 866 - 874
  • [49] OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
    Guo, Cong
    Tang, Jiaming
    Hu, Weiming
    Leng, Jingwen
    Zhang, Chen
    Yang, Fan
    Liu, Yunxin
    Guo, Minyi
    Zhu, Yuhao
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 33 - 47
  • [50] TLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks
    Lin, Jiaye
    Peng, Shuang
    Zhang, Zhong
    Zhao, Peilin
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 1119 - 1124