Sentence alignment for monolingual comparable corpora

被引:0
|
作者
Barzilay, R [1 ]
Elhadad, N [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules. We incorporate context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic structure and further refining the matching through local alignment to find good sentence pairs. Evaluation shows that our alignment method outperforms state-of-the-art systems developed for the same task.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 50 条
  • [21] Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs
    Wolk, Krzysztof
    Marasek, Krzysztof
    INTERNATIONAL WORKSHOP ON INNOVATIONS IN INFORMATION AND COMMUNICATION SCIENCE AND TECHNOLOGY, IICST 2014, 2014, 18 : 126 - 132
  • [22] CHALLENGING THE MYTH OF MONOLINGUAL CORPORA
    Vessey, Rachelle
    APPLIED LINGUISTICS, 2019, 40 (05) : 864 - 866
  • [23] Mining monolingual and bilingual corpora
    Latiri, Chiraz
    Smaili, Kamel
    Lavecchia, Caroline
    Langlois, David
    INTELLIGENT DATA ANALYSIS, 2010, 14 (06) : 663 - 682
  • [24] Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia
    Goyal, Vishal
    Kumar, Ajit
    Lehal, Manpreet Singh
    INTERNATIONAL JOURNAL OF E-ADOPTION, 2020, 12 (01) : 42 - 51
  • [25] Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm
    Toth, Krisztina
    Farkas, Richard
    Kocsor, Andras
    ACTA CYBERNETICA, 2008, 18 (03): : 463 - 478
  • [26] Corpora as a correction tool for monolingual dictionaries
    Geyken, A
    LILI-ZEITSCHRIFT FUR LITERATURWISSENSCHAFT UND LINGUISTIK, 2004, 34 (136): : 72 - 100
  • [27] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Yan Xu
    Luoxin Chen
    Junsheng Wei
    Sophia Ananiadou
    Yubo Fan
    Yi Qian
    Eric I-Chao Chang
    Junichi Tsujii
    BMC Bioinformatics, 16
  • [28] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Xu, Yan
    Chen, Luoxin
    Wei, Junsheng
    Ananiadou, Sophia
    Fan, Yubo
    Qian, Yi
    Chang, Eric I-Chao
    Tsujii, Junichi
    BMC BIOINFORMATICS, 2015, 16
  • [29] Combining Lexical Context with Pseudo-alignment for Bilingual Lexicon Extraction from Comparable Corpora
    Li, Bo
    Zhu, Qunyan
    He, Tingting
    Chen, Qianjun
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 223 - 233
  • [30] Can comparable corpora be compared?
    Lopez Arroyo, Belen
    IBERICA, 2020, (39): : 43 - 68