Data-Driven Sentence Simplification: Survey and Benchmark

被引:2
|
作者
Alva-Manchego, Fernando [1 ]
Scarton, Carolina [1 ]
Specia, Lucia [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
[2] Imperial Coll London, Dept Comp, London, England
关键词
TEXT SIMPLIFICATION; READABILITY; ALGORITHMS;
D O I
10.1162/coli_a_00370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common data sets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.
引用
收藏
页码:135 / 187
页数:53
相关论文
共 50 条
  • [1] Data-Driven Network Neuroscience: On Data Collection and Benchmark
    Xu, Jiaxing
    Yang, Yunhan
    Huang, David Tse Jung
    Gururajapathy, Sophi Shilpa
    Ke, Yiping
    Qiao, Miao
    Wang, Alan
    Kumar, Haribalan
    McGeown, Josh
    Kwon, Eryn
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Benchmark examples for data-driven site characterisation
    Phoon, Kok-Kwang
    Shuku, Takayuki
    Ching, Jianye
    Yoshida, Ikumasa
    [J]. GEORISK-ASSESSMENT AND MANAGEMENT OF RISK FOR ENGINEERED SYSTEMS AND GEOHAZARDS, 2022, 16 (04) : 599 - 621
  • [3] WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting
    Rasp, Stephan
    Dueben, Peter D.
    Scher, Sebastian
    Weyn, Jonathan A.
    Mouatadid, Soukayna
    Thuerey, Nils
    [J]. JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2020, 12 (11)
  • [4] Data-driven approaches in FinTech: a survey
    Tian, Xin
    He, Jing Selena
    Han, Meng
    [J]. INFORMATION DISCOVERY AND DELIVERY, 2021, 49 (02) : 123 - 135
  • [5] A Survey on Data-Driven Video Completion
    Ilan, S.
    Shamir, A.
    [J]. COMPUTER GRAPHICS FORUM, 2015, 34 (06) : 60 - 85
  • [6] Data-driven Design of a Sentence List for an Articulatory Speech Corpus
    Berry, Jeffrey
    Fadiga, Luciano
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1286 - 1290
  • [7] Bayesian Analysis of Benchmark Examples for Data-Driven Site Characterization
    Mavritsakis, Antonis
    Schweckendiek, Timo
    Teixeira, Ana
    Smyrniou, Eleni
    Nuttall, Jonathan
    [J]. ASCE-ASME JOURNAL OF RISK AND UNCERTAINTY IN ENGINEERING SYSTEMS PART A-CIVIL ENGINEERING, 2023, 9 (02)
  • [8] Data-Driven Fault Diagnosis of a Wind Farm Benchmark Model
    Simani, Silvio
    Castaldi, Paolo
    Farsoni, Saverio
    [J]. ENERGIES, 2017, 10 (07):
  • [9] DATA-DRIVEN TECHNIQUES FOR THE FAULT DIAGNOSIS OF A WIND TURBINE BENCHMARK
    Simani, Silvio
    Farsoni, Saverio
    Castaldi, Paolo
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2018, 28 (02) : 247 - 268
  • [10] Data-Driven Intelligent Transportation Systems: A Survey
    Zhang, Junping
    Wang, Fei-Yue
    Wang, Kunfeng
    Lin, Wei-Hua
    Xu, Xin
    Chen, Cheng
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2011, 12 (04) : 1624 - 1639