Splitting Complex Sentences for Natural Language Processing Applications: Building a Simplified Spanish Corpus

被引:8
|
作者
Camacho Collados, Jose [1 ]
机构
[1] Univ Autonoma Barcelona, Barcelona 08290, Spain
关键词
text simplification; syntactic simplification; parallel corpus; spanish; natural language processing;
D O I
10.1016/j.sbspro.2013.10.670
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper presents a new Spanish parallel corpus of original and syntactically simplified texts. The simplification carried out basically consists of opportunistically splitting a complex original sentence into several simple ones. This parallel corpus is envisioned as a first step in order to create an automatic syntactic simplification system to be used as a preprocessing tool for other Natural Language Processing tasks such as Text Summarization, Information Extraction, parsing or Machine Translation. The corpus has been evaluated by human annotators regarding its grammaticality and preservation of meaning. The results suggest that the meaning of simplified and original sentences is almost identical. (C) 2013 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:464 / 472
页数:9
相关论文
共 50 条
  • [31] UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks
    Antonio Garcia-Diaz, Jose
    Almela, Angela
    Alcaraz-Marmol, Gema
    Valencia-Garcia, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 139 - 142
  • [32] Dunn. 2022. Natural Language Processing for Corpus Linguistics
    Zhang, Yujiao
    CORPORA, 2024, 19 (02) : 259 - 262
  • [33] Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing
    El Maarouf, Ismail
    Baisa, Vit
    Bradbury, Jane
    Hanks, Patrick
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1001 - 1006
  • [34] Anonymising a French SMS corpus using natural language processing techniques
    Accorsi, Pierre
    Patel, Namrata
    Lopez, Cedric
    Panckhurst, Rachel
    Roche, Mathieu
    LINGUISTICAE INVESTIGATIONES, 2012, 35 (02): : 163 - 180
  • [35] MedLexSp - a medical lexicon for Spanish medical natural language processing
    Campillos-Llanos, Leonardo
    JOURNAL OF BIOMEDICAL SEMANTICS, 2023, 14 (01)
  • [36] MedLexSp – a medical lexicon for Spanish medical natural language processing
    Leonardo Campillos-Llanos
    Journal of Biomedical Semantics, 14
  • [37] Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods
    Ozan, Sukru
    Tasar, D. Emre
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [38] NATURAL-LANGUAGE INTERFACES - FROM THE PROCESSING OF UNEXPECTED TERMS TO AIDED SENTENCES COMPOSITION
    SABATIER, P
    ANNALES DES TELECOMMUNICATIONS-ANNALS OF TELECOMMUNICATIONS, 1989, 44 (1-2): : 77 - 84
  • [39] Applications of natural language processing and large language models in materials discovery
    Xue Jiang
    Weiren Wang
    Shaohan Tian
    Hao Wang
    Turab Lookman
    Yanjing Su
    npj Computational Materials, 11 (1)
  • [40] Ludic Applications for Language Teaching Support using Natural Language Processing
    Percovich, Analia
    Tosi, Alejandro
    Chiruzzo, Luis
    Rosa, Aiala
    2019 38TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2019,