Cost-effective Deployment of BERT Models in a Serverless Environment

被引:0
|
作者
Benesova, Katarina [1 ]
Svec, Andrej [1 ]
Suppa, Marek [1 ]
机构
[1] Slido, New York, NY 10036 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study we demonstrate the viability of deploying BERT-style models to serverless environments in a production setting. Since the freely available pre-trained models are too large to be deployed in this way, we utilize knowledge distillation and fine-tune the models on proprietary datasets for two real-world tasks: sentiment analysis and semantic textual similarity. As a result, we obtain models that are tuned for a specific domain and deployable in serverless environments. The subsequent performance analysis shows that this solution results in latency levels acceptable for production use and that it is also a cost-effective approach for small-to-medium size deployments of BERT models, all without any infrastructure overhead.
引用
收藏
页码:187 / 195
页数:9
相关论文
共 50 条
  • [1] Scalable and Cost-effective Serverless Architecture for Information ExtractionWorkflows
    Chahal, Dheeraj
    Palepu, Surya Chaitanya
    Singhal, Rekha
    [J]. PROCEEDINGS OF THE 2ND WORKSHOP ON HIGH PERFORMANCE SERVERLESS COMPUTING, HIPS 2022, 2022, : 15 - 23
  • [2] Cost-Effective Web Application Replication and Deployment in Multi-Cloud Environment
    Shi, Tao
    Ma, Hui
    Chen, Gang
    Hartmann, Sven
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (08) : 1982 - 1995
  • [3] Cost-Effective Deployment of Bandwidth Partitioning in Broadband Networks
    Å. Arvidsson
    J.M. de Kock
    A.E. Krzesinski
    P.G. Taylor
    [J]. Telecommunication Systems, 2004, 25 : 33 - 49
  • [4] Cost-effective deployment of bandwidth partitioning in broadband networks
    Arvidsson, Å
    De Kock, JM
    Krzesinski, AE
    Taylor, PG
    [J]. TELECOMMUNICATION SYSTEMS, 2004, 25 (1-2) : 33 - 49
  • [5] Cost-Effective Cache Deployment in Mobile Heterogeneous Networks
    Zhang, Shan
    Zhang, Ning
    Yang, Peng
    Shen, Xuemin
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (12) : 11264 - 11276
  • [6] Cost-effective deployment of certified cloud composite services
    Anisetti, Marco
    Ardagna, Claudio A.
    Damiani, Ernesto
    Gaudenzi, Filippo
    Jeon, Gwanggil
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 135 : 203 - 218
  • [7] INFINICACHE: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache
    Wang, Ao
    Zhang, Jingyuan
    Ma, Xiaolong
    Anwar, Ali
    Rupprecht, Lukas
    Skourtis, Dimitrios
    Tarasov, Vasily
    Yan, Feng
    Cheng, Yue
    [J]. PROCEEDINGS OF THE 18TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2020, : 267 - 281
  • [8] Making Serverless Not So Cold in Edge Clouds: A Cost-Effective Online Approach
    Xiao, Ke
    Yang, Song
    Li, Fan
    Zhu, Liehuang
    Chen, Xu
    Fu, Xiaoming
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (09) : 8789 - 8802
  • [9] CoDeC: A Cost-Effective and Delay-Aware SFC Deployment
    Tashtarian, Farzad
    Zhani, Mohamed Faten
    Fatemipour, Bita
    Yazdani, Delaram
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (02): : 793 - 806
  • [10] SLIP: A Cost-effective Infrastructure for a Smart Environment
    Pinnaka, Vinay D.
    King, Scott A.
    Katangur, Ajay K.
    [J]. 2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 1439 - 1443