Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models

被引:0
|
作者
Ghilardi, Davide [1 ]
Belotti, Federico [1 ]
Molinari, Marco [2 ,4 ]
Lim, Jaehyuk [2 ,3 ]
机构
[1] University of Milan-Bicocca, Italy
[2] LSE.AI
[3] University of Pennsylvania, United States
[4] London School of Economics, United Kingdom
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Computational linguistics
引用
收藏
页码:530 / 550
相关论文
共 50 条
  • [31] PreAdapter: Sparse Adaptive Parameter-efficient Transfer Learning for Language Models
    Mao, Chenyang
    Jin, Xiaoxiao
    Yue, Dengfeng
    Leng, Tuo
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 218 - 225
  • [32] Accelerating Matrix-Vector Multiplications of Large Language Models via Efficient Encoding
    Tao, Yongjin
    Sun, Wendi
    Chen, Song
    Kang, Yi
    2024 IEEE 17th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2024, 2024,
  • [33] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
    Yuan, Tailing
    Liu, Yuliang
    Ye, Xucheng
    Zhang, Shenglong
    Tan, Jianchao
    Chen, Bin
    Song, Chengru
    Zhang, Di
    PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 545 - 561
  • [34] Investigation of Layer-Wise Speech Representations in Self-Supervised Learning Models: A Cross-Lingual Study in Detecting Depression
    Maji, Bubai
    Guha, Rajlakshmi
    Routray, Aurobinda
    Nasreen, Shazia
    Majumdar, Debabrata
    INTERSPEECH 2024, 2024, : 3020 - 3024
  • [35] SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
    Thangarasa, Vithursan
    Gupta, Abhay
    Marshall, William
    Li, Tianda
    Leong, Kevin
    DeCoste, Dennis
    Lie, Sean
    Saxena, Shreyas
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2134 - 2146
  • [36] Layer-Wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-Trained Models: An Evolutionary Approach
    Bu, Chenyang
    Liu, Yuxin
    Huang, Manzong
    Shao, Jianxuan
    Ji, Shengwei
    Luo, Wenjian
    Wu, Xindong
    ACM Transactions on Evolutionary Learning and Optimization, 2024, 4 (04):
  • [37] ACCUARTE PREDICTION OF PROCESS-INDUCED DEFORMATIONS IN COMPOSITES USING LAYER-WISE MODELS AND THEORY-GUIDED PROBABILISTIC MACHINE LEARNING
    Schoenholz, Caleb
    Zappino, Enrico
    Petrolo, Marco
    Zobeiry, Navid
    PROCEEDINGS OF ASME 2024 AEROSPACE STRUCTURES, STRUCTURAL DYNAMICS, AND MATERIALS CONFERENCE, SSDM2024, 2024,
  • [38] Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models
    Deng, Yinlin
    Xia, Chunqiu Steven
    Peng, Haoran
    Yang, Chenyuan
    Zhan, Lingming
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 423 - 435
  • [39] Overcoming language barriers via machine translation with sparse Mixture-of-Experts fusion of large language models
    Zhu, Shaolin
    Jian, Dong
    Xiong, Deyi
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [40] Dynamic Susceptibility and Structural Heterogeneity of Large Reverse Micellar Water: An Examination of the Core-Shell Model via Probing the Layer-wise Features
    Baksi, Atanu
    Ghorai, Pradip Kr
    Biswas, Ranjit
    JOURNAL OF PHYSICAL CHEMISTRY B, 2020, 124 (14): : 2848 - 2863