Sentence alignment for monolingual comparable corpora

被引:0
|
作者
Barzilay, R [1 ]
Elhadad, N [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules. We incorporate context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic structure and further refining the matching through local alignment to find good sentence pairs. Evaluation shows that our alignment method outperforms state-of-the-art systems developed for the same task.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 50 条
  • [1] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [2] Document and Sentence Alignment in Comparable Corpora Using Bipartite Graph Matching
    Rahimi, Zeinab
    Taghipour, Kaveh
    Khadivi, Shahram
    Afhami, Nasim
    2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 817 - 821
  • [3] USE OF MONOLINGUAL AND COMPARABLE CORPORA IN THE CLASSROOM TO TRANSLATE ADVERBIAL CONNECTORS
    Sanchez Cardenas, Beatriz
    Faber, Pamela
    CADERNOS DE TRADUCAO, 2016, 36 (01): : 147 - 176
  • [4] Set-Theoretic Alignment for Comparable Corpora
    Etchegoyhen, Thierry
    Azpeitia, Andoni
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 2009 - 2018
  • [5] Fast and accurate sentence alignment of bilingual corpora
    Moore, RC
    MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 135 - 144
  • [6] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [7] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
    Ion, Radu
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
  • [8] A Quantitative Analysis and Sentence Alignment for Parallel Corpora of ShiJi
    Liu, Ying
    Wang, Nan
    Yuan, Bo
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2016, 23 (01) : 71 - 108
  • [9] Context-based sentence alignment in parallel corpora
    Bicici, Ergun
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 434 - 444
  • [10] Sentence Level Alignment of Digitized Books Parallel Corpora
    Laukaitis, Algirdas
    Plikynas, Darius
    Ostasius, Egidijus
    INFORMATICA, 2018, 29 (04) : 693 - 710