N-Gram Based Paraphrase Generator from Large text Document

被引:0
|
作者
Gadag, Ashwini I. [1 ]
Sagar, B. M. [1 ]
机构
[1] RVCE, Dept ISE, Bengaluru, Karnataka, India
关键词
N-gram; candidate paraphrase; reference paraphrase; paraphrase generator;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes the paraphrase generation based on n-gram approach. N-grams are relevant words of text document that can be applied for a range of Natural Language Processing (NLP) applications. The candidate paraphrases are generated based on trigrams approach. The reference paraphrases (keyphrases) are the set of relevant paraphrases, which acts like training data set for generating candidate paraphrases. The task of paraphrase generation is similar to machine translation; hence we used machine translation evaluation metrics. R-precision evaluation metric is used to find the number of common words between candidate and reference paraphrases.
引用
收藏
页码:91 / 94
页数:4
相关论文
共 50 条
  • [1] N-Gram Based Secure Similar Document Detection
    Jiang, Wei
    Samanthula, Bharath K.
    [J]. DATA AND APPLICATIONS SECURITY AND PRIVACY XXV, 2011, 6818 : 239 - 246
  • [2] Text mining with n-gram variables
    Schonlau, Matthias
    Guenther, Nick
    Sucholutsky, Ilia
    [J]. STATA JOURNAL, 2017, 17 (04): : 866 - 881
  • [3] SEARCHING FOR TEXT - SEND AN N-GRAM
    KIMBRELL, RE
    [J]. BYTE, 1988, 13 (05): : 297 - &
  • [4] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    [J]. IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [5] NOVEL TOPIC N-GRAM COUNT LM INCORPORATING DOCUMENT-BASED TOPIC DISTRIBUTIONS AND N-GRAM COUNTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2310 - 2314
  • [6] The textcat Package for n-Gram Based Text Categorization in R
    Hornik, Kurt
    Mair, Patrick
    Rauch, Johannes
    Geiger, Wilhelm
    Buchta, Christian
    Feinerer, Ingo
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2013, 52 (06):
  • [7] A Short Text Classification Method Based on N-Gram and CNN
    WANG Haitao
    HE Jie
    ZHANG Xiaohong
    LIU Shufen
    [J]. Chinese Journal of Electronics, 2020, 29 (02) : 248 - 254
  • [8] A Short Text Classification Method Based on N-Gram and CNN
    Wang, Haitao
    He, Jie
    Zhang, Xiaohong
    Liu, Shufen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (02) : 248 - 254
  • [9] Character n-Gram Spotting in Document Images
    Praveen, Sudha M.
    Sankar, Pramod K.
    Jawahar, C. V.
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 941 - 945
  • [10] Construction of Scholarly n-Gram from Huge Text Data
    Hwang, Myunggwon
    Hwang, Mi-Nyeong
    Yeom, Ha-Neul
    Jung, Hanmin
    [J]. 2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 31 - 35