Applying SoftTriple Loss for Supervised Language Model Fine Tuning

被引:1
|
作者
Sosnowski, Witold [1 ]
Wroblewska, Anna [1 ]
Gawrysiak, Piotr [2 ]
机构
[1] Warsaw Univ Technol, Fac Math & Informat Sci, Warsaw, Poland
[2] Warsaw Univ Technol, Fac Elect & Informat Technol, Warsaw, Poland
关键词
SIMILARITY;
D O I
10.15439/2022F185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new loss function based on cross entropy and SoftTriple loss, TripleEntropy, to improve classification performance for fine-tuning general knowledge pre-trained language models. This loss function can improve the robust RoBERTa baseline model fine-tuned with cross-entropy loss by about 0.02-2.29 percentage points. Thorough tests on popular datasets using our loss function indicate a steady gain. The fewer samples in the training dataset, the higher gain-thus, for smallsized dataset, it is about 0.71 percentage points, for mediumsized-0.86 percentage points, for large-0.20 percentage points, and for extra-large 0.04 percentage points.
引用
收藏
页码:141 / 147
页数:7
相关论文
共 50 条
  • [1] Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
    Zhang, Hengyuan
    Wu, Yanru
    Li, Dawei
    Yang, Sak
    Zhao, Rui
    Jiang, Yong
    Tan, Fei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7467 - 7509
  • [2] Fine-Tuning Language Models For Semi-Supervised Text Mining
    Chen, Xinyu
    Beaver, Ian
    Freeman, Cynthia
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3608 - 3617
  • [3] Fine Tuning Large Language Model for Secure Code Generation
    Li, Junjie
    Sangalay, Aseem
    Cheng, Cheng
    Tian, Yuan
    Yang, Jinqiu
    PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024, 2024, : 86 - 90
  • [4] Knowledge Graph Fusion for Language Model Fine-Tuning
    Bhana, Nimesh
    van Zyl, Terence L.
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 167 - 172
  • [5] Comprehensive Review of Large Language Model Fine-Tuning
    Zhang, Qintong
    Wang, Yuchao
    Wang, Hexi
    Wang, Junxin
    Chen, Hai
    Computer Engineering and Applications, 2024, 60 (17) : 17 - 33
  • [6] Patent classification by fine-tuning BERT language model
    Lee, Jieh-Sheng
    Hsiang, Jieh
    WORLD PATENT INFORMATION, 2020, 61
  • [7] Fine tuning the large language pegasus model for dialogue summarization
    Vinay Sarthak
    Preeti Rishiwal
    Mano Yadav
    Sushil Yadav
    Ashutosh Gangwar
    undefined Shankdhar
    International Journal of Information Technology, 2025, 17 (2) : 1165 - 1177
  • [8] Universal Language Model Fine-tuning for Text Classification
    Howard, Jeremy
    Ruder, Sebastian
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 328 - 339
  • [9] COUNTERPART STRATEGIES - FINE TUNING LANGUAGE WITH LANGUAGE
    KUCER, SB
    RHODES, LK
    READING TEACHER, 1986, 40 (02): : 186 - 193
  • [10] Selecting Informative Contexts Improves Language Model Fine-tuning
    Antonello, Richard
    Beckage, Nicole M.
    Turek, Javier S.
    Huth, Alexander G.
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1072 - 1085