Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

被引:0
|
作者
Mehrotra, Anay [1 ]
Zampetakis, Manolis [2 ]
Kassianik, Paul [3 ]
Nelson, Blaine [3 ]
Anderson, Hyrum [3 ]
Singer, Yaron [3 ]
Karbasi, Amin [4 ]
机构
[1] Yale University, Robust Intelligence, United States
[2] Yale University, United States
[3] Robust Intelligence, United States
[4] Yale University, Google Research, United States
来源
arXiv | 2023年
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Iterative methods
引用
收藏
相关论文
共 50 条
  • [21] Practical Black-Box Attacks against Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Goodfellow, Ian
    Jha, Somesh
    Celik, Z. Berkay
    Swami, Ananthram
    PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 506 - 519
  • [22] Impossibility of Black-Box Simulation Against Leakage Attacks
    Ostrovsky, Rafail
    Persiano, Giuseppe
    Visconti, Ivan
    ADVANCES IN CRYPTOLOGY, PT II, 2015, 9216 : 130 - 149
  • [23] Reverse Attack: Black-box Attacks on Collaborative Recommendation
    Zhang, Yihe
    Yuan, Xu
    Li, Jin
    Lou, Jiadong
    Chen, Li
    Tzeng, Nian-Feng
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 51 - 68
  • [24] Curls & Whey: Boosting Black-Box Adversarial Attacks
    Shi, Yucheng
    Wang, Siyu
    Han, Yahong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6512 - 6520
  • [25] Boundary Defense Against Black-box Adversarial Attacks
    Aithal, Manjushree B.
    Li, Xiaohua
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2349 - 2356
  • [26] Knowledge-enhanced Black-box Attacks for Recommendations
    Chen, Jingfan
    Fan, Wenqi
    Zhu, Guanghui
    Zhao, Xiangyu
    Yuan, Chunfeng
    Li, Qing
    Huang, Yihua
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 108 - 117
  • [27] Black-box Adversarial Attacks with Limited Queries and Information
    Ilyas, Andrew
    Engstrom, Logan
    Athalye, Anish
    Lin, Jessy
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [28] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    Information Sciences, 2021, 550 : 285 - 296
  • [29] White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks
    Gil, Yotam
    Chai, Yoav
    Gorodissky, Or
    Berant, Jonathan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1373 - 1379
  • [30] Monte Carlo Tree Descent for Black-Box Optimization
    Zhai, Yaoguang
    Gao, Sicun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,