Unsupervised Statistical Text Simplification

被引:8
|
作者
Qiang, Jipeng [1 ]
Wu, Xindong [2 ,3 ]
机构
[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China
[2] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 10084, Anhui, Peoples R China
[3] Mininglamp Acad Sci, Minininglamp, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Encyclopedias; Electronic publishing; Internet; Benchmark testing; Standards; Mathematical model; Text simplification; machine translation; unsupervised;
D O I
10.1109/TKDE.2019.2947679
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most recent approaches for Text Simplification (TS) have drawn on insights from machine translation to learn simplification rewrites from the monolingual parallel corpus of complex and simple sentences, yet their effectiveness strongly relies on large amounts of parallel sentences. However, there has been a serious problem haunting TS for decades, that is, the availability of parallel TS corpora is scarce or not fit for the learning task. In this paper, we will focus on one especially useful and challenging problem of unsupervised TS without a single parallel sentence. To the best of our knowledge, we present the first unsupervised text simplification system based on phrase-based machine translation system, which leverages a careful initialization of phrase tables and language models. On the widely used WikiLarge and WikiSmall benchmarks, our system respectively obtains 39.08 and 25.12 SARI points, even outperforms some supervised baselines.
引用
收藏
页码:1802 / 1806
页数:5
相关论文
共 50 条
  • [31] Controllable Text Simplification with Explicit Paraphrasing
    Maddela, Mounica
    Alva-Manchego, Fernando
    Xu, Wei
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3536 - 3553
  • [32] Spanish Text Simplification: An Exploratory Study
    Bott, Stefan
    Saggion, Horacio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 87 - 95
  • [33] Text Simplification Using Transformer and BERT
    Alissa, Sarah
    Wald, Mike
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3479 - 3495
  • [34] BLEU is Not Suitable for the Evaluation of Text Simplification
    Sulem, Elior
    Abend, Omri
    Rappoport, Ari
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 738 - 744
  • [35] Exploring Neural Text Simplification Models
    Nisioi, Sergiu
    Stajner, Sanja
    Ponzetto, Simone Paolo
    Dinu, Liviu P.
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 85 - 91
  • [36] Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification
    Garbacea, Cristina
    Guo, Mengtian
    Carton, Samuel
    Mei, Qiaozhu
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1086 - 1097
  • [37] Medical Text Simplification Using Reinforcement Learning (TESLEA): Deep Learning-Based Text Simplification Approach
    Phatak, Atharva
    Savage, David W.
    Ohle, Robert
    Smith, Jonathan
    Mago, Vijay
    JMIR MEDICAL INFORMATICS, 2022, 10 (11)
  • [38] GRS: Combining Generation and Revision in Unsupervised Sentence Simplification
    Dehghan, Mohammad
    Kumar, Dhruv
    Golab, Lukasz
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 949 - 960
  • [39] Unsupervised Matching of Data and Text
    Ahmadi, Naser
    Sand, Hansjorg
    Papotti, Paolo
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1058 - 1070
  • [40] MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
    Martin, Louis
    Fan, Angela
    de la Clergerie, Eric
    Bordes, Antoine
    Sagot, Benoit
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1651 - 1664