A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation

被引:0
|
作者
Liu W. [1 ]
Guan Y. [1 ]
Huo J. [1 ]
Ding Y. [1 ]
Guo H. [1 ,2 ]
Li B. [2 ]
机构
[1] School of Cyber Science and Technology, Beihang University, Beijing
[2] State Key Laboratory of Complex and Critical Software Environment (Beihang University), Beijing
基金
中国国家自然科学基金;
关键词
knowledge distillation; secure inference; secure multi-party computation (MPC); secure processing unit (SPU); Transformer;
D O I
10.7544/issn1000-1239.202330966
中图分类号
学科分类号
摘要
Transformer has been widely used in many fields such as natural language processing and computer vision, and has outstanding performance. The users’ data will be leaked to the Transformer model provider during inference. With the increasing public attention on data privacy, the above data leakage problem has triggered researchers’ study on secure Transformer inference. Implementing secure Transformer inference with secure multi-party computation (MPC) is today’s hot topic. Due to the widely existence of non-linear functions in Transformer, it is hard to use MPC to implement secure Transformer inference, which leads to huge computation and communication cost. We focus on Softmax attention, bottleneck in secure Transformer inference, and propose two kinds of MPC-friendly attention mechanism, Softmax freeDiv Attention and 2Quad freeDiv Attention. By replacing the Softmax attention in Transformer with the MPC-friendly attention mechanism proposed, combining with the replacement of activation function GeLU and knowledge distillation, we propose an MPC-friendly Transformer convert framework, which can convert Transformer model to an MPC-friendly one, so as to improve the performance of secure Transformer inference later. Based on the proposed MPC-friendly Transformer convert framework, we perform secure Bert-Base inference on SST-2 in the LAN setting, using privacy computing protocols provided by secure processing unit (SPU). The result shows that the secure inference achieves 2.26 times speedup while maintaining the accuracy with non-approximation model. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1218 / 1229
页数:11
相关论文
共 32 条
  • [1] Hoffmann J, Borgeaud S, Mensch A, Et al., An empirical analysis of compute-optimal large language model training, Advances in Neural Information Processing Systems, 35, pp. 30016-30030, (2022)
  • [2] Chan S, Santoro A, Lampinen A, Et al., Data distributional properties drive emergent in-context learning in transformers[J], Advances in Neural Information Processing Systems, 35, pp. 18878-18891, (2022)
  • [3] Liu Ze, Lin Yutong, Cao Yue, Et al., Swin transformer: Hierarchical vision transformer using shifted windows, Proc of the IEEE/CVF Int Conf on Computer Vision, pp. 10012-10022, (2021)
  • [4] Liu Ze, Hu Han, Lin Yutong, Et al., Swin transformer v2: Scaling up capacity and resolution, Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition, pp. 12009-12019, (2022)
  • [5] Jawalkar N, Gupta K, Basu A, Et al., Orca: FSS-based secure training with GPUs[J], Cryptology ePrint Archive, (2023)
  • [6] Hao Meng, Li Hongwei, Chen Hanxiao, Et al., Iron: Private inference on transformers, Advances in Neural Information Processing Systems, 35, pp. 15718-15731, (2022)
  • [7] Chen Tianyu, Bao Hangbo, Huang Shaohan, Et al., THE-X: Privacy-preserving transformer inference with homomorphic encryption, Findings of the Association for Computational Linguistics: ACL 2022, pp. 3510-3520, (2022)
  • [8] Mengxin Zheng, Qian Lou, Jiang Lei, Primer: Fast private transformer inference on encrypted data, (2023)
  • [9] Gupta K, Jawalkar N, Mukherjee A, Et al., Sigma: Secure gpt inference with function secret sharing[J], Cryptology ePrint Archive, (2023)
  • [10] Juvekar C, Vaikuntanathan V, Chandrakasan A., GAZELLE: A low latency framework for secure neural network inference[C], Proc of the 27th USENIX Conf on Security Symp, pp. 1651-1669, (2018)