Prefix Data Augmentation for Contrastive Learning of Unsupervised Sentence Embedding

被引:0
|
作者
Wang, Chunchun [1 ]
Lv, Shu [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, Chengdu 611731, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 07期
关键词
contrastive learning; sentence embedding; prefix data augmentation;
D O I
10.3390/app14072880
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper presents prefix data augmentation (Prd) as an innovative method for enhancing sentence embedding learning through unsupervised contrastive learning. The framework, dubbed PrdSimCSE, uses Prd to create both positive and negative sample pairs. By appending positive and negative prefixes to a sentence, the basis for contrastive learning is formed, outperforming the baseline unsupervised SimCSE. PrdSimCSE is positioned within a probabilistic framework that expands the semantic similarity event space and generates superior negative samples, contributing to more accurate semantic similarity estimations. The model's efficacy is validated on standard semantic similarity tasks, showing a notable improvement over that of existing unsupervised models, specifically a 1.08% enhancement in performance on BERTbase. Through detailed experiments, the effectiveness of positive and negative prefixes in data augmentation and their impact on the learning model are explored, and the broader implications of prefix data augmentation are discussed for unsupervised sentence embedding learning.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Keyword Extractor for Contrastive Learning of Unsupervised Sentence Embedding
    Cai, Hua
    Chen, Weihong
    Shi, Kehuan
    Li, Shuaishuai
    Xu, Qing
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 88 - 93
  • [2] Contrastive Learning for Unsupervised Sentence Embedding with False Negative Calibration
    Chiu, Chi-Min
    Lin, Ying-Jia
    Kao, Hung-Yu
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024, 2024, 14647 : 290 - 301
  • [3] DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective
    Miao, Pu
    Du, Zeyao
    Zhang, Junlin
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1847 - 1856
  • [4] Debiased Contrastive Learning of Unsupervised Sentence Representations
    Zhou, Kun
    Zhang, Beichen
    Zhao, Wayne Xin
    Wen, Ji-Rong
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6120 - 6130
  • [5] Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations
    Zhou, Kun
    Zhou, Yuanhang
    Zhao, Wayne Xin
    Wen, Ji-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3935 - 3944
  • [6] CLSEP: Contrastive learning of sentence embedding with prompt
    Wang, Qian
    Zhang, Weiqi
    Lei, Tianyi
    Cao, Yu
    Peng, Dezhong
    Wang, Xu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 266
  • [7] A Study of Contrastive Learning Algorithms for Sentence Representation Based on Simple Data Augmentation
    Liu, Xiaodong
    Gong, Wenyin
    Li, Yuxin
    Li, Yanchi
    Li, Xiang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [8] Virtual Augmentation Supported Contrastive Learning of Sentence Representations
    Zhang, Dejiao
    Xiao, Wei
    Zhu, Henghui
    Ma, Xiaofei
    Arnold, Andrew O.
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 864 - 876
  • [9] Unsupervised low-light image enhancement by data augmentation and contrastive learning
    Shao, Junzhe
    Zhang, Zhibin
    [J]. IMAGING SCIENCE JOURNAL, 2024,
  • [10] Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives
    Zhang, Yanzhao
    Zhang, Richong
    Mensah, Samuel
    Liu, Xudong
    Mao, Yongyi
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11730 - 11738