Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引：0

作者：

Kim, Minji ^{[1
]}

Cho, Whanhee ^{[2
,3
]}

Kim, Soohyeong ^{[1
]}

Choi, Yong Suk ^{[1
,2
]}

机构：

[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea

[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea

[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA

来源：

ADVANCED INTELLIGENT SYSTEMS | 2024年 / 6卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

contrastive learning; sentence embedding; syntactic transformation;

D O I：

10.1002/aisy.202300717

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH

引用

页数：10

共 50 条

[21] Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Chen, Qian
Wang, Wen
Zhang, Qinglin
Zheng, Siqi
Deng, Chong
Yu, Hai
Liu, Jiaqing
Ma, Yukun
Zhang, Chong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5868 - 5875
[22] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
Denize, Julien
Rabarisoa, Jaonary
Orcesi, Astrid
Herault, Romain
Canu, Stephane
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715
[23] A Study of Contrastive Learning Algorithms for Sentence Representation Based on Simple Data Augmentation
Liu, Xiaodong
Gong, Wenyin
Li, Yuxin
Li, Yanchi
Li, Xiang
APPLIED SCIENCES-BASEL, 2023, 13 (18):
[24] Counterfactual contrastive learning for weakly supervised temporal sentence grounding
Xu, Yenan
Xu, Wanru
Miao, Zhenjiang
NEUROCOMPUTING, 2025, 624
[25] Grouped Contrastive Learning of Self-Supervised Sentence Representation
Wang, Qian
Zhang, Weiqi
Lei, Tianyi
Peng, Dezhong
APPLIED SCIENCES-BASEL, 2023, 13 (17):
[26] On Compositions of Transformations in Contrastive Self-Supervised Learning
Patrick, Mandela
Asano, Yuki M.
Kuznetsova, Polina
Fong, Ruth
Henriques, Joao F.
Zweig, Geoffrey
Vedaldi, Andrea
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9557 - 9567
[27] SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection
Xia, Fengliang
Wu, Guixing
Zhao, Guochao
Li, Xiangyu
INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2022, 2022, 13407 : 458 - 471
[28] RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training
Asl, Javad Rafiei
Blanco, Eduardo
Takabi, Daniel
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4587 - 4603
[29] Extracting Multimodal Embeddings via Supervised Contrastive Learning for Psychological Screening
Kalanadhabhatta, Manasa
Santana, Adrelys Mateo
Ganesan, Deepak
Rahman, Tauhidur
Grabell, Adam
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
[30] miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings
Klein, Tassilo
Nabi, Moin
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6159 - 6177

← 1 2 3 4 5 →