Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引:0
|
作者
Jipeng Qiang
Feng Zhang
Yun Li
Yunhao Yuan
Yi Zhu
Xindong Wu
机构
[1] Yangzhou University,Department of Computer Science
[2] Ministry of Education,Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology)
[3] Mininglamp Academy of Sciences,undefined
来源
关键词
text simplification; pre-trained language modeling; BERT; word embeddings;
D O I
暂无
中图分类号
学科分类号
摘要
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.
引用
收藏
相关论文
共 50 条
  • [1] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Qiang, Jipeng
    Zhang, Feng
    Li, Yun
    Yuan, Yunhao
    Zhu, Yi
    Wu, Xindong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
  • [2] Extremely Low Resource Text simplification with Pre-trained Transformer Language Model
    Maruyama, Takumi
    Yamamoto, Kazuhide
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 53 - 58
  • [3] ADAPTING PRE-TRAINED LANGUAGE MODELS TO LOW-RESOURCE TEXT SIMPLIFICATION: THE PATH MATTERS
    Garbacea, Cristina
    Mei, Qiaozhu
    [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [4] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [5] Unsupervised Statistical Text Simplification
    Qiang, Jipeng
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1802 - 1806
  • [7] Question Answering based Clinical Text Structuring Using Pre-trained Language Model
    Qiu, Jiahui
    Zhou, Yangming
    Ma, Zhiyuan
    Ruan, Tong
    Liu, Jinlin
    Sun, Jing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1596 - 1600
  • [8] Modeling Second Language Acquisition with pre-trained neural language models
    Palenzuela, Alvaro J. Jimenez
    Frasincar, Flavius
    Trusca, Maria Mihaela
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [9] RoBERTuito: a pre-trained language model for social media text in Spanish
    Manuel Perez, Juan
    Furman, Damian A.
    Alonso Alemany, Laura
    Luque, Franco
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7235 - 7243
  • [10] Non-Autoregressive Text Generation with Pre-trained Language Models
    Su, Yixuan
    Cai, Deng
    Wang, Yan
    Vandyke, David
    Baker, Simon
    Li, Piji
    Collier, Nigel
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 234 - 243