Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

被引:0
|
作者
Kim, Minji [1 ]
Cho, Whanhee [2 ,3 ]
Kim, Soohyeong [1 ]
Choi, Yong Suk [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA
基金
新加坡国家研究基金会;
关键词
contrastive learning; sentence embedding; syntactic transformation;
D O I
10.1002/aisy.202300717
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at . Using the ParsEVAL and the word overlap metric, it shows that sentence pairs in the natural language inference dataset have strong syntactic similarity. For compensating this problem, applying simple syntactic transformations and expanding the objective to SupCon Loss to leverage variants of sentences are used. This approach is effective in grasping semantics without heuristics.image (c) 2024 WILEY-VCH GmbH
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics
    Yu, Zhiyi
    Li, Hong
    Feng, Jialin
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 5428 - 5445
  • [32] Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics
    Zhiyi Yu
    Hong Li
    Jialin Feng
    The Journal of Supercomputing, 2024, 80 : 5428 - 5445
  • [33] A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
    Tan, Haochen
    Shao, Wei
    Wu, Han
    Yang, Ke
    Song, Linqi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 246 - 256
  • [34] Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning
    Korade, Nilesh B.
    Salunke, Mahendra B.
    Bhosle, Amol A.
    Kumbharkar, Prashant B.
    Asalkar, Gayatri G.
    Khedkar, Rutuja G.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (04) : 821 - 829
  • [35] Similarity contrastive estimation for image and video soft contrastive self-supervised learning
    Denize, Julien
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Herault, Romain
    MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
  • [36] Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
    Long, Zijun
    Zhuang, Lipeng
    Killick, George
    McCreadie, Richard
    Aragon-Camarasa, Gerardo
    Henderson, Paul
    COMPUTER VISION - ECCV 2024, PT LIV, 2025, 15112 : 435 - 454
  • [37] Similarity contrastive estimation for image and video soft contrastive self-supervised learning
    Julien Denize
    Jaonary Rabarisoa
    Astrid Orcesi
    Romain Hérault
    Machine Vision and Applications, 2023, 34
  • [38] Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives
    Wang, Wei
    Ge, Liangzhu
    Zhang, Jingqiao
    Yang, Cheng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2159 - 2165
  • [39] SimCSE plus plus : Improving Contrastive Learning for Sentence Embeddings from Two Perspectives
    Xu, Jiahao
    Shao, Wei
    Chen, Lihui
    Liu, Lemao
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12028 - 12040
  • [40] Semi-Supervised Contrastive Learning With Similarity Co-Calibration
    Zhang, Yuhang
    Zhang, Xiaopeng
    Li, Jie
    Qiu, Robert C.
    Xu, Haohang
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1749 - 1759