PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

被引:3
|
作者
Zhang, Bowei [1 ,2 ,4 ]
Sun, Weiwei [1 ,2 ,3 ]
Wan, Xiaojun [1 ,2 ]
Guo, Zongming [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing, Peoples R China
[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Peking Univ, Ctr Chinese Linguist, Beijing, Peoples R China
[4] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Paraphrase; Paraphrase extraction; Sentence embedding; Sentence similarity;
D O I
10.1007/978-3-030-32233-5_63
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraphrased sentences. The quality of this new corpus is manually examined. Our new model is language-independent, meaning that such paraphrase corpora for other languages can be built in the same way.
引用
收藏
页码:814 / 826
页数:13
相关论文
共 50 条
  • [31] Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity
    Eyecioglu, Asli
    Keller, Bill
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 588 - 599
  • [32] Paraphrase Generation with Chinese Short Text Dataset
    Song, GuoHui
    Wang, Yongbin
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2020), 2020, : 60 - 64
  • [33] Low-Level Features for Paraphrase Identification
    Pronoza, Ekaterina
    Yagunova, Elena
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, MICAI 2015, PT I, 2015, 9413 : 59 - 71
  • [34] Paraphrase of Chinese Sentences Based on Associated Word
    Wang, Zhongjian
    Wang, Ling
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, 2012, 137 : 109 - 117
  • [35] A Study of Paraphrase Strategies Employed by Chinese Students
    黄小辉
    海外英语, 2010, (05) : 89 - 90
  • [36] An Unsupervised Approach of Paraphrase Discovery from Large Crime Corpus
    Das, Priyanka
    Das, Asit Kumar
    2018 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2018,
  • [37] Relational paraphrase acquisition from Wikipedia: The WRPA method and corpus
    Vila, M.
    Rodriguez, H.
    Marti, M. A.
    NATURAL LANGUAGE ENGINEERING, 2015, 21 (03) : 355 - 389
  • [38] Paraphrase Generation and Identification at Paragraph-Level
    Al Saqaabi, Arwa
    Stewart, Craig
    Akrida, Eleni
    Cristea, Alexandra I.
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 278 - 291
  • [39] Sentence combining: A sentence-level writing intervention
    Saddler, B
    READING TEACHER, 2005, 58 (05): : 468 - 471
  • [40] Sentence-Level Attachment Prediction
    Albakour, M-Dyaa
    Kruschwitz, Udo
    Lucas, Simon
    ADVANCES IN MULTIDISCIPLINARY RETRIEVAL, 2010, 6107 : 6 - 19