Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引:0
|
作者
Kim, Minji [1 ]
Cho, Whanhee [2 ,3 ]
Kim, Soohyeong [1 ]
Choi, Yong Suk [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA
基金
新加坡国家研究基金会;
关键词
contrastive learning; sentence embedding; syntactic transformation;
D O I
10.1002/aisy.202300717
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
    Chen, Qian
    Wang, Wen
    Zhang, Qinglin
    Zheng, Siqi
    Deng, Chong
    Yu, Hai
    Liu, Jiaqing
    Ma, Yukun
    Zhang, Chong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5868 - 5875
  • [22] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
    Denize, Julien
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Herault, Romain
    Canu, Stephane
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715
  • [23] A Study of Contrastive Learning Algorithms for Sentence Representation Based on Simple Data Augmentation
    Liu, Xiaodong
    Gong, Wenyin
    Li, Yuxin
    Li, Yanchi
    Li, Xiang
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [24] Counterfactual contrastive learning for weakly supervised temporal sentence grounding
    Xu, Yenan
    Xu, Wanru
    Miao, Zhenjiang
    NEUROCOMPUTING, 2025, 624
  • [25] Grouped Contrastive Learning of Self-Supervised Sentence Representation
    Wang, Qian
    Zhang, Weiqi
    Lei, Tianyi
    Peng, Dezhong
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [26] On Compositions of Transformations in Contrastive Self-Supervised Learning
    Patrick, Mandela
    Asano, Yuki M.
    Kuznetsova, Polina
    Fong, Ruth
    Henriques, Joao F.
    Zweig, Geoffrey
    Vedaldi, Andrea
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9557 - 9567
  • [27] SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection
    Xia, Fengliang
    Wu, Guixing
    Zhao, Guochao
    Li, Xiangyu
    INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2022, 2022, 13407 : 458 - 471
  • [28] RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training
    Asl, Javad Rafiei
    Blanco, Eduardo
    Takabi, Daniel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4587 - 4603
  • [29] Extracting Multimodal Embeddings via Supervised Contrastive Learning for Psychological Screening
    Kalanadhabhatta, Manasa
    Santana, Adrelys Mateo
    Ganesan, Deepak
    Rahman, Tauhidur
    Grabell, Adam
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [30] miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings
    Klein, Tassilo
    Nabi, Moin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6159 - 6177