An ensemble model for classifying idioms and literal texts using BERT and RoBERTa

被引:55
|
作者
Briskilal, J. [1 ]
Subalalitha, C. N. [1 ]
机构
[1] SRM Inst Sci & Technol, Chengalpattu, Tamil Nadu, India
关键词
BERT; RoBERTa; Ensemble model; Idiom; Literal classification;
D O I
10.1016/j.ipm.2021.102756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain appli-cations like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classi-fication, which categorizes text into structured categories known as text labeling or categoriza-tion. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a pre-dictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [21] RECOGNIZING EMOTIONS FROM TEXTS USING A BERT-BASED APPROACH
    Adoma, Acheampong Francisca
    Henry, Nunoo-Mensah
    Chen, Wenyu
    Andre, Niyongabo Rubungo
    2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 62 - 66
  • [22] Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT
    Mutinda, Faith Wavinya
    Yada, Shuntaro
    Wakamiya, Shoko
    Aramaki, Eiji
    METHODS OF INFORMATION IN MEDICINE, 2021, 60 : E56 - E64
  • [23] A deep ensemble network model for classifying and predicting breast cancer
    Subramanian, Arul Antran Vijay
    Venugopal, Jothi Prakash
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 258 - 282
  • [24] An Ensemble Model for Stance Detection in Social Media Texts
    Sherif, Sara S.
    Shawky, Doaa M.
    Fayed, Hatem A.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2023, 22 (02) : 737 - 775
  • [25] Classifying unlabeled short texts using a fuzzy declarative approach
    Francisco P. Romero
    Pascual Julián-Iranzo
    Andrés Soto
    Mateus Ferreira-Satler
    Juan Gallardo-Casero
    Language Resources and Evaluation, 2013, 47 : 151 - 178
  • [26] Using Language Models for Classifying the Party Affiliation of Political Texts
    Tu My Doan
    Kille, Benjamin
    Gulla, Jon Atle
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 382 - 393
  • [27] Classifying unlabeled short texts using a fuzzy declarative approach
    Romero, Francisco P.
    Julian-Iranzo, Pascual
    Soto, Andres
    Ferreira-Satler, Mateus
    Gallardo-Casero, Juan
    LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (01) : 151 - 178
  • [28] Research on the Classification of New Energy Industry Policy Texts Based on BERT Model
    Li, Qian
    Xiao, Zezhong
    Zhao, Yanyun
    SUSTAINABILITY, 2023, 15 (14)
  • [29] Alexithymic traits predict the speed of classifying non-literal statements using nonverbal cues
    Jakobson, Lorna S.
    Pearson, Pauline M.
    COGNITION & EMOTION, 2021, 35 (03) : 569 - 575
  • [30] Similarity Matching for Patent Documents Using Ensemble BERT-Related Model and Novel Text Processing Method
    Yu, Liqiang
    Liu, Bo
    Lin, Qunwei
    Zhao, Xinyu
    Che, Chang
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (03) : 446 - 450