Obfuscated Activations Bypass LLM Latent-Space Defenses

被引:0
|
作者
Bailey, Luke [1 ]
Serrano, Alex [2 ]
Sheshadri, Abhay [3 ]
Seleznyov, Mikhail [4 ]
Taylor, Jordan [5 ]
Jenner, Erik [6 ]
Hilton, Jacob [7 ]
Casper, Stephen [8 ]
Guestrin, Carlos [1 ,9 ]
Emmons, Scott [6 ]
机构
[1] Stanford University, United States
[2] Polytechnic University of Catalonia, Spain
[3] Georgia Institute of Technology, United States
[4] Skoltech, Russia
[5] University of Queensland, Australia
[6] UC Berkeley, United States
[7] Alignment Research Center, United States
[8] MIT CSAIL, United States
[9] Chan Zuckerberg Biohub, United States
来源
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Data obfuscation
引用
收藏
相关论文
共 47 条
  • [21] Toward Unbiased High-Quality Portraits through Latent-Space Evaluation
    Almhaithawi, Doaa
    Bellini, Alessandro
    Cerquitelli, Tania
    JOURNAL OF IMAGING, 2024, 10 (07)
  • [22] Principal component analysis-based latent-space dimensionality under-estimation, with uncorrelated latent variables
    Hope, Thomas M. H.
    Halai, Ajay
    Crinion, Jenny
    Castelli, Paola
    Price, Cathy J.
    Bowman, Howard
    BRAIN, 2024, 147 (02) : e14 - e16
  • [23] Compressible Latent-Space Invertible Networks for Generative Model-Constrained Image Reconstruction
    Kelkar, Varun A.
    Bhadra, Sayantan
    Anastasio, Mark A.
    IEEE Transactions on Computational Imaging, 2021, 7 : 209 - 223
  • [24] MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control
    Lu, Zhenyi
    Wei, Wei
    Qu, Xiaoye
    Mao, XianLing
    Chen, Dangyang
    Chen, Jixiong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5933 - 5957
  • [25] Compressible Latent-Space Invertible Networks for Generative Model-Constrained Image Reconstruction
    Kelkar, Varun A.
    Bhadra, Sayantan
    Anastasio, Mark A.
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 : 209 - 223
  • [26] EXPLORING THE RELATIONSHIPS BETWEEN SCATTERING PHYSICS AND AUTO-ENCODER LATENT-SPACE EMBEDDING
    De, Shaunak
    Clanton, Christian
    Bickerton, Steven
    Baney, Oliwia
    Patnaik, Kaushik
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 3501 - 3504
  • [27] Time-series learning of latent-space dynamics for reduced-order model closure
    Maulik, Romit
    Mohan, Arvind
    Lusch, Bethany
    Madireddy, Sandeep
    Balaprakash, Prasanna
    Livescu, Daniel
    PHYSICA D-NONLINEAR PHENOMENA, 2020, 405
  • [28] Latent-space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds
    Egiazarian, Vage
    Ignatyev, Savva
    Artemov, Alexey
    Voynov, Oleg
    Kravchenko, Andrey
    Zheng, Youyi
    Velho, Luiz
    Burnaev, Evgeny
    VISAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4: VISAPP, 2020, : 421 - 428
  • [29] Latent-space inversion (LSI): a deep learning framework for inverse mapping of subsurface flow data
    Syamil Mohd Razak
    Anyue Jiang
    Behnam Jafarpour
    Computational Geosciences, 2022, 26 : 71 - 99
  • [30] Latent-space inversion (LSI): a deep learning framework for inverse mapping of subsurface flow data
    Razak, Syamil Mohd
    Jiang, Anyue
    Jafarpour, Behnam
    COMPUTATIONAL GEOSCIENCES, 2022, 26 (01) : 71 - 99