Creation of a parallel corpora from comparable corpora for the simplification of medical texts in French

被引:0
|
作者
Cardon, Remi [1 ]
Grabar, Natalia [1 ]
机构
[1] Univ Lille, UMR 8163 STL CNRS, F-59000 Lille, France
来源
TRAITEMENT AUTOMATIQUE DES LANGUES | 2020年 / 61卷 / 02期
关键词
automatic simplification; medical texts; corpus with parallel sentences; resource building;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The purpose of automatic simplification is to create version of texts which is easier to understand for a given targeted population. We aim at simplifying medical texts. Usually, lexicon and rules required for the simplification are acquired from parallel corpora. Since such corpora are not available for French, we propose methods for their creation from comparable corpora. Our method relies on filtering step, which purpose is to keep the best sentence candidates for alignment, and alignment step considered as categorization problem. The aim is to decide whether a pair of sentences is alignable or not. We exploit different types of features (mainly issued from lexicon and corpora) and get up to 0.97 F-measure with balanced data.
引用
收藏
页码:15 / 39
页数:25
相关论文
共 50 条
  • [1] Extracting Parallel Phrases from Comparable Corpora
    Zhang, Jiexin
    Cao, Hailong
    Zhao, Tiejun
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 166 - 169
  • [2] Building English - Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
    Kaur, Dilshad
    Singh, Satwinder
    [J]. APPLIED COMPUTER SYSTEMS, 2023, 28 (02) : 245 - 251
  • [3] Looking for french-english translations in comparable medical corpora
    Chiao, YC
    Zweigenbaum, P
    [J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 150 - 154
  • [4] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    [J]. DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [5] Automatic creation of WordNets from parallel corpora
    Oliver, Antoni
    Climent, Salvador
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1112 - 1116
  • [6] Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
    Liu, Siyou
    Wang, Longyue
    Liu, Chao-Hong
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1485 - 1492
  • [7] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    [J]. MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [8] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
    Ion, Radu
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
  • [9] Online Parallel and Comparable Corpora for Legal Translations
    Giampieri, Patrizia
    [J]. ALTRE MODERNITA-RIVISTA DI STUDI LETTERARI E CULTURALI, 2018, 20 : 237 - 252
  • [10] Mining Parallel Resources for Machine Translation from Comparable Corpora
    Pal, Santanu
    Pakray, Partha
    Gelbukh, Alexander
    van Genabith, Josef
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 534 - 544