Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis

被引：0

作者：

Shah, Sahar ^{[1
]}

Manzoni, Sara Lucia ^{[1
]}

Zaman, Farooq ^{[2
]}

Es Sabery, Fatima ^{[3
]}

Epifania, Francesco ^{[4
]}

Zoppis, Italo Francesco ^{[1
]}

机构：

[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy

[2] Informat Technol Univ, Dept Comp Sci, Lahore, Pakistan

[3] Hassan II Univ, Lab Econ & Logist Performance, Fac Law Econ & Social Sci Mohammedia, Casablanca, Morocco

[4] Social Things srl, Milan, Italy

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Continual learning; natural language processing; text classification; fine-tuning; Distil-BERT;

D O I：

10.1109/ACCESS.2024.3435537

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Continual learning (CL) with bidirectional encoder representation from transformer (BERT) and its variant Distil-BERT, have shown remarkable performance in various natural language processing (NLP) tasks, such as text classification (TC). However, the model degrading factors like catastrophic forgetting (CF), accuracy, task dependent architecture ruined its popularity for complex and intelligent tasks. This research article proposes an innovative approach to address the challenges of CL in TC tasks. The objectives are to enable the model to learn continuously without forgetting previously acquired knowledge and perfectly avoid CF. To achieve this, a task-independent model architecture is introduced, allowing training of multiple tasks on the same model, thereby improving overall performance in CL scenarios. The framework incorporates two auxiliary tasks, namely next sentence prediction and task identifier prediction, to capture both the task-generic and task-specific contextual information. The Distil-BERT model, enhanced with two linear layers, categorizes the output representation into a task-generic space and a task-specific space. The proposed methodology is evaluated on diverse sets of TC tasks, including Yahoo, Yelp, Amazon, DB-Pedia, and AG-News. The experimental results demonstrate impressive performance across multiple tasks in terms of F1 score, model accuracy, model evaluation loss, learning rate, and training loss of the model. For the Yahoo task, the proposed model achieved an F1 score of 96.84 %, accuracy of 95.85 %, evaluation loss of 0.06, learning rate of 0.00003144. In the Yelp task, our model achieved an F1 score of 96.66 %, accuracy of 97.66 %, evaluation loss of 0.06, and similarly minimized training losses by achieving the learning rate of 0.00003189. For the Amazon task, the F1 score was 95.82 %, the observed accuracy is 97.83 %, evaluation loss was 0.06, and training losses were effectively minimized by securing the learning rate of 0.00003144. In the DB-Pedia task, we achieved an F1 score of 96.20 %, accuracy of 95.21 %, evaluation loss of 0.08, with learning rate 0.0001972 and rapidly minimized training losses due to the limited number of epochs and instances. In the AG-News task, our model obtained an F1 score of 94.78 %, accuracy of 92.76 %, evaluation loss of 0.06, and fixed the learning rate to 0.0001511. These results highlight the exceptional performance of our model in various TC tasks, with gradual reduction in training losses over time, indicating effective learning and retention of knowledge.

引用

页码：104964 / 104982

页数：19

共 50 条

[1] BERT MODEL FINE-TUNING FOR TEXT CLASSIFICATION IN KNEE OA RADIOLOGY REPORTS
Chen, L.
Shah, R.
Link, T.
Bucknor, M.
Majumdar, S.
Pedoia, V.
OSTEOARTHRITIS AND CARTILAGE, 2020, 28 : S315 - S316
[2] An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification
Bhopale, Amol P.
Tiwari, Ashish
MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 67 - 77
[3] Investigating Learning Dynamics of BERT Fine-Tuning
Hao, Yaru
Dong, Li
Wei, Furu
Xu, Ke
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
[4] Compressing BERT for Binary Text Classification via Adaptive Truncation before Fine-Tuning
Zhang, Xin
Fan, Jing
Hei, Mengzhe
APPLIED SCIENCES-BASEL, 2022, 12 (23):
[5] Patent classification by fine-tuning BERT language model
Lee, Jieh-Sheng
Hsiang, Jieh
WORLD PATENT INFORMATION, 2020, 61
[6] Extreme Fine-tuning: A Novel and Fast Fine-tuning Approach for Text Classification
Jiaramaneepinit, Boonnithi
Chay-intr, Thodsaporn
Funakoshi, Kotaro
Okumura, Manabu
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 368 - 379
[7] Hierarchical BERT with an adaptive fine-tuning strategy for document classification
Kong, Jun
Wang, Jin
Zhang, Xuejie
KNOWLEDGE-BASED SYSTEMS, 2022, 238
[8] Kaizen: Practical self-supervised continual learning with continual fine-tuning
Tang, Chi Ian
Qendrol, Lorena
Spathis, Dimitris
Kawsar, Fahim
Mascolo, Cecilia
Mathur, Akhil
2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 2829 - 2838
[9] Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning
Prottasha, Nusrat Jahan
Sami, Abdullah As
Kowsher, Md
Murad, Saydul Akbar
Bairagi, Anupam Kumar
Masud, Mehedi
Baz, Mohammed
SENSORS, 2022, 22 (11)
[10] EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment Classification
Narejo, Komal Rani
Zan, Hongying
Dharmani, Kheem Parkash
Zhou, Lijuan
Alahmadi, Tahani Jaser
Assam, Muhammad
Sehito, Nabila
Ghadi, Yazeed Yasin
IEEE ACCESS, 2024, 12 : 131954 - 131967

← 1 2 3 4 5 →