Parallel Sentence Alignment from Biomedical Comparable Corpora

被引:0
|
作者
Cardon, Remi [1 ]
Grabar, Natalia [1 ]
机构
[1] UMR CNRS 8163 STL, F-59000 Lille, France
来源
关键词
sentence alignment; text simplification; classification;
D O I
10.3233/SHTI200183
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. We treat this task as binary classification (alignment/non-alignment). Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus.
引用
收藏
页码:362 / 366
页数:5
相关论文
共 50 条
  • [1] Sentence alignment for monolingual comparable corpora
    Barzilay, R
    Elhadad, N
    [J]. PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32
  • [2] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    [J]. MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [3] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
    Ion, Radu
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
  • [4] Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
    Chu, Chenhui
    Dabre, Raj
    Kurohashi, Sadao
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2931 - 2935
  • [5] A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
    Zweigenbaum, Pierre
    Sharoff, Serge
    Rapp, Reinhard
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3828 - 3833
  • [6] A Quantitative Analysis and Sentence Alignment for Parallel Corpora of ShiJi
    Liu, Ying
    Wang, Nan
    Yuan, Bo
    [J]. JOURNAL OF QUANTITATIVE LINGUISTICS, 2016, 23 (01) : 71 - 108
  • [7] Context-based sentence alignment in parallel corpora
    Bicici, Ergun
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 434 - 444
  • [8] Sentence Level Alignment of Digitized Books Parallel Corpora
    Laukaitis, Algirdas
    Plikynas, Darius
    Ostasius, Egidijus
    [J]. INFORMATICA, 2018, 29 (04) : 693 - 710
  • [9] Document and Sentence Alignment in Comparable Corpora Using Bipartite Graph Matching
    Rahimi, Zeinab
    Taghipour, Kaveh
    Khadivi, Shahram
    Afhami, Nasim
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 817 - 821
  • [10] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    [J]. HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272