Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引：0

作者：

Kim, Minji ^{[1
]}

Cho, Whanhee ^{[2
,3
]}

Kim, Soohyeong ^{[1
]}

Choi, Yong Suk ^{[1
,2
]}

机构：

[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea

[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea

[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA

来源：

ADVANCED INTELLIGENT SYSTEMS | 2024年 / 6卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

contrastive learning; sentence embedding; syntactic transformation;

D O I：

10.1002/aisy.202300717

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH

引用

页数：10

共 50 条

[31] Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics
Yu, Zhiyi
Li, Hong
Feng, Jialin
JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 5428 - 5445
[32] Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics
Zhiyi Yu
Hong Li
Jialin Feng
The Journal of Supercomputing, 2024, 80 : 5428 - 5445
[33] A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
Tan, Haochen
Shao, Wei
Wu, Han
Yang, Ke
Song, Linqi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 246 - 256
[34] Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning
Korade, Nilesh B.
Salunke, Mahendra B.
Bhosle, Amol A.
Kumbharkar, Prashant B.
Asalkar, Gayatri G.
Khedkar, Rutuja G.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (04) : 821 - 829
[35] Similarity contrastive estimation for image and video soft contrastive self-supervised learning
Denize, Julien
Rabarisoa, Jaonary
Orcesi, Astrid
Herault, Romain
MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
[36] Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
Long, Zijun
Zhuang, Lipeng
Killick, George
McCreadie, Richard
Aragon-Camarasa, Gerardo
Henderson, Paul
COMPUTER VISION - ECCV 2024, PT LIV, 2025, 15112 : 435 - 454
[37] Similarity contrastive estimation for image and video soft contrastive self-supervised learning
Julien Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
Machine Vision and Applications, 2023, 34
[38] Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives
Wang, Wei
Ge, Liangzhu
Zhang, Jingqiao
Yang, Cheng
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2159 - 2165
[39] SimCSE plus plus : Improving Contrastive Learning for Sentence Embeddings from Two Perspectives
Xu, Jiahao
Shao, Wei
Chen, Lihui
Liu, Lemao
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12028 - 12040
[40] Semi-Supervised Contrastive Learning With Similarity Co-Calibration
Zhang, Yuhang
Zhang, Xiaopeng
Li, Jie
Qiu, Robert C.
Xu, Haohang
Tian, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1749 - 1759

← 1 2 3 4 5 →