Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引：0

作者：

Kim, Minji ^{[1
]}

Cho, Whanhee ^{[2
,3
]}

Kim, Soohyeong ^{[1
]}

Choi, Yong Suk ^{[1
,2
]}

机构：

[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea

[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea

[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA

来源：

ADVANCED INTELLIGENT SYSTEMS | 2024年 / 6卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

contrastive learning; sentence embedding; syntactic transformation;

D O I：

10.1002/aisy.202300717

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH

引用

页数：10

共 50 条

[1] SimCSE: Simple Contrastive Learning of Sentence Embeddings
Gao, Tianyu
Yao, Xingcheng
Chen, Danqi
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6894 - 6910
[2] Attention-Driven Dropout: A Simple Method to Improve Self-supervised Contrastive Sentence Embeddings
Stermann, Fabian
Chalkidis, Ilias
Vahidi, Amihossein
Bischl, Bernd
Rezaei, Mina
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 89 - 106
[3] Composition-contrastive Learning for Sentence Embeddings
Chanchani, Sachin
Huang, Ruihong
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15836 - 15848
[4] Contrastive Learning of Sentence Embeddings from Scratch
Zhang, Junlei
Lan, Zhenzhong
He, Junxian
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3916 - 3932
[5] DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
Xu, Jiahao
Shao, Wei
Chen, Lihui
Liu, Lemao
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8153 - 8165
[6] MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Zhang, Miaoran
Mosbach, Marius
Adelani, David Ifeoluwa
Hedderich, Michael A.
Klakow, Dietrich
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5959 - 5969
[7] Stable Contrastive Learning for Self-Supervised Sentence Embeddings With Pseudo-Siamese Mutual Learning
Xie, Yutao
Wu, Qiyu
Chen, Wei
Wang, Tengjiao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3046 - 3059
[8] AdCSE: An Adversarial Method for Contrastive Learning of Sentence Embeddings
Li, Renhao
Duan, Lei
Xie, Guicai
Xiao, Shan
Jiang, Weipeng
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 165 - 180
[9] Pairwise Supervised Contrastive Learning of Sentence Representations
Zhang, Dejiao
Li, Shang-Wen
Xiao, Wei
Zhu, Henghui
Nallapati, Ramesh
Arnold, Andrew O.
Xiang, Bing
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5786 - 5798
[10] HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings
Wu, Zhuofeng
Xiao, Chaowei
Vydiswaran, V. G. Vinod
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2461 - 2476

← 1 2 3 4 5 →