Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引:0
|
作者
Kim, Minji [1 ]
Cho, Whanhee [2 ,3 ]
Kim, Soohyeong [1 ]
Choi, Yong Suk [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA
基金
新加坡国家研究基金会;
关键词
contrastive learning; sentence embedding; syntactic transformation;
D O I
10.1002/aisy.202300717
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH
引用
收藏
页数:10
相关论文
共 50 条
  • [1] SimCSE: Simple Contrastive Learning of Sentence Embeddings
    Gao, Tianyu
    Yao, Xingcheng
    Chen, Danqi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6894 - 6910
  • [2] Attention-Driven Dropout: A Simple Method to Improve Self-supervised Contrastive Sentence Embeddings
    Stermann, Fabian
    Chalkidis, Ilias
    Vahidi, Amihossein
    Bischl, Bernd
    Rezaei, Mina
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 89 - 106
  • [3] Composition-contrastive Learning for Sentence Embeddings
    Chanchani, Sachin
    Huang, Ruihong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15836 - 15848
  • [4] Contrastive Learning of Sentence Embeddings from Scratch
    Zhang, Junlei
    Lan, Zhenzhong
    He, Junxian
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3916 - 3932
  • [5] DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
    Xu, Jiahao
    Shao, Wei
    Chen, Lihui
    Liu, Lemao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8153 - 8165
  • [6] MCSE: Multimodal Contrastive Learning of Sentence Embeddings
    Zhang, Miaoran
    Mosbach, Marius
    Adelani, David Ifeoluwa
    Hedderich, Michael A.
    Klakow, Dietrich
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5959 - 5969
  • [7] Stable Contrastive Learning for Self-Supervised Sentence Embeddings With Pseudo-Siamese Mutual Learning
    Xie, Yutao
    Wu, Qiyu
    Chen, Wei
    Wang, Tengjiao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3046 - 3059
  • [8] AdCSE: An Adversarial Method for Contrastive Learning of Sentence Embeddings
    Li, Renhao
    Duan, Lei
    Xie, Guicai
    Xiao, Shan
    Jiang, Weipeng
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 165 - 180
  • [9] Pairwise Supervised Contrastive Learning of Sentence Representations
    Zhang, Dejiao
    Li, Shang-Wen
    Xiao, Wei
    Zhu, Henghui
    Nallapati, Ramesh
    Arnold, Andrew O.
    Xiang, Bing
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5786 - 5798
  • [10] HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings
    Wu, Zhuofeng
    Xiao, Chaowei
    Vydiswaran, V. G. Vinod
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2461 - 2476