Creation of a parallel corpora from comparable corpora for the simplification of medical texts in French

被引:0
|
作者
Cardon, Remi [1 ]
Grabar, Natalia [1 ]
机构
[1] Univ Lille, UMR 8163 STL CNRS, F-59000 Lille, France
来源
TRAITEMENT AUTOMATIQUE DES LANGUES | 2020年 / 61卷 / 02期
关键词
automatic simplification; medical texts; corpus with parallel sentences; resource building;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The purpose of automatic simplification is to create version of texts which is easier to understand for a given targeted population. We aim at simplifying medical texts. Usually, lexicon and rules required for the simplification are acquired from parallel corpora. Since such corpora are not available for French, we propose methods for their creation from comparable corpora. Our method relies on filtering step, which purpose is to keep the best sentence candidates for alignment, and alignment step considered as categorization problem. The aim is to decide whether a pair of sentences is alignable or not. We exploit different types of features (mainly issued from lexicon and corpora) and get up to 0.97 F-measure with balanced data.
引用
收藏
页码:15 / 39
页数:25
相关论文
共 50 条
  • [21] Creation of Parallel Medical and Social Domains Corpora for the Machine Translation and Speech Synthesis Systems
    Suprunchuk, Mikita
    Yarash, Nastassia
    Hetsevich, Yuras
    Varanovich, Valery
    Gaidurau, Siarhey
    Zianouka, Yauheniya
    Sakava, Palina
    [J]. FORMALIZING NATURAL LANGUAGES: APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND DIGITAL HUMANITIES, NOOJ 2022, 2022, 1758 : 139 - 150
  • [22] Terminology Extraction from Comparable Corpora for Latvian
    Gornostay, Tatiana
    Ramm, Anita
    Heid, Ulrich
    Morin, Emmanuel
    Harastani, Rima
    Planas, Emmanuel
    [J]. HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 66 - +
  • [23] Building comparable corpora from social networks
    Trabelsi, Maroua
    Hajjem, Malek
    Latiri, Chiraz
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [24] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    [J]. HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
  • [25] Structure of medical research articles in Polish and English comparable corpora
    Taczalska, A
    [J]. PALC'99: PRACTICAL APPLICATIONS IN LANGUAGE CORPORA, 2000, 1 : 567 - 580
  • [26] The use of English, Czech and French punctuation marks in reference, parallel and comparable web corpora: a question of methodology
    Nadvornikova, Olga
    [J]. LINGUISTICA PRAGENSIA, 2020, 30 (01) : 30 - 50
  • [27] Spoken and signed languages hand in hand: parallel and directly comparable corpora of French Belgian Sign Language (LSFB) and French
    Lepeut, Alysson
    Lombart, Clara
    Vandenitte, Sebastien
    Meurant, Laurence
    [J]. CORPORA, 2024, 19 (02) : 241 - 253
  • [28] A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora
    Fung, P
    [J]. MACHINE TRANSLATION AND THE INFORMATION SOUP, 1998, 1529 : 1 - 17
  • [29] From questionnaires to parallel corpora in typology
    Dahl, Osten
    [J]. STUF-LANGUAGE TYPOLOGY AND UNIVERSALS, 2007, 60 (02) : 172 - 181
  • [30] Extracting translation equivalents from bilingual comparable corpora
    Kaji, H
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02): : 313 - 323