Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models

被引:0
|
作者
Huang, James Y. [1 ]
Huang, Kuan-Hao [1 ]
Chang, Kai-Wei [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models have achieved huge success on a wide range of NLP tasks. However, contextual representations from pre-trained models contain entangled semantic and syntactic information, and therefore cannot be directly used to derive useful semantic sentence embeddings for some tasks. Paraphrase pairs offer an effective way of learning the distinction between semantics and syntax, as they naturally share semantics and often vary in syntax. In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models. ParaBART is trained to perform syntax-guided paraphrasing, based on a source sentence that shares semantics with the target paraphrase, and a parse tree that specifies the target syntax. In this way, ParaBART learns disentangled semantic and syntactic representations from their respective inputs with separate encoders. Experiments in English show that ParaBART outperforms state-of-the-art sentence embedding models on unsupervised semantic similarity tasks. Additionally, we show that our approach can effectively remove syntactic information from semantic sentence embeddings, leading to better robustness against syntactic variation on downstream semantic tasks.
引用
收藏
页码:1372 / 1379
页数:8
相关论文
共 50 条
  • [1] On the Sentence Embeddings from Pre-trained Language Models
    Li, Bohan
    Zhou, Hao
    He, Junxian
    Wang, Mingxuan
    Yang, Yiming
    Li, Lei
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
  • [2] Capturing Semantics for Imputation with Pre-trained Language Models
    Mei, Yinan
    Song, Shaoxu
    Fang, Chenguang
    Yang, Haifeng
    Fang, Jingyun
    Long, Jiang
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72
  • [3] Distilling Relation Embeddings from Pre-trained Language Models
    Ushio, Asahi
    Camacho-Collados, Jose
    Schockaert, Steven
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9044 - 9062
  • [4] An Empirical study on Pre-trained Embeddings and Language Models for Bot Detection
    Garcia-Silva, Andres
    Berrio, Cristian
    Manuel Gomez-Perez, Jose
    [J]. 4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 148 - 155
  • [5] On the Branching Bias of Syntax Extracted from Pre-trained Language Models
    Li, Huayang
    Liu, Lemao
    Huang, Guoping
    Shi, Shuming
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4473 - 4478
  • [6] Integrating Knowledge Graph Embeddings and Pre-trained Language Models in Hypercomplex Spaces
    Nayyeri, Mojtaba
    Wang, Zihao
    Akter, Mst. Mahfuja
    Alam, Mirza Mohtashim
    Rony, Md Rashad Al Hasan
    Lehmann, Jens
    Staab, Steffen
    [J]. SEMANTIC WEB, ISWC 2023, PART I, 2023, 14265 : 388 - 407
  • [7] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    [J]. ENGINEERING, 2023, 25 : 51 - 65
  • [8] From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough
    Mars, Mourad
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [9] General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
    Du, Jingfei
    Ott, Myle
    Li, Haoran
    Zhou, Xing
    Stoyanov, Veselin
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [10] Solving ESL Sentence Completion Questions via Pre-trained Neural Language Models
    Liu, Qiongqiong
    Liu, Tianqiao
    Zhao, Jiafu
    Fang, Qiang
    Ding, Wenbiao
    Wu, Zhongqin
    Xia, Feng
    Tang, Jiliang
    Liu, Zitao
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 256 - 261