Identifying Similar Sentences by Using N-Grams of Characters

被引:2
|
作者
Sultana, Saima [1 ]
Biskri, Ismail [1 ]
机构
[1] Univ Quebec Trois Rivieres, Trois Rivieres, PQ G8Z 4M3, Canada
关键词
Similar sentences; N-grams of characters;
D O I
10.1007/978-3-319-92058-0_80
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, detecting similar sentences can play a major role in various fundamental applications for reading and analyzing sentences like information retrieval, categorization, detection of paraphrases, summarizing, translation etc. In this work, we present a novel method for the detection of similar sentences. This method highlights the using of units of n-grams of characters. The online dictionary as well as any search engine are not being used. Hence, this idea leads our method a simplest and optimum way to handle the similarities between two sentences. In addition, the grammar rules as well as any syntax have not been used in our method. That's why, our approach is language-independent. We analyze and compare a range of similarity measures with our methodology. Meanwhile, the complexity of our method is O(N2) which is pretty much better.
引用
收藏
页码:833 / 843
页数:11
相关论文
共 50 条
  • [21] Source code authorship attribution using n-grams
    Burrows, Steven
    Tahaghoghi, S.M.M.
    [J]. ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
  • [22] Texture Image Classification Using Pixel N-grams
    Kulkarni, Pradnya
    Stranieri, Andrew
    Ugon, Julien
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 137 - 141
  • [23] Using n-grams for the Automated Clustering of Structural Models
    Babur, Onder
    Cleophas, Loek
    [J]. SOFSEM 2017: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2017, 10139 : 510 - 524
  • [24] Comparing Medline citations using modified N-grams
    Nawab, Rao Muhammad Adeel
    Stevenson, Mark
    Clough, Paul
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (01) : 105 - 110
  • [25] Algorithms for correcting recognition results using N-grams
    Manzhikov T.V.
    Slavin O.A.
    Faradjev I.A.
    Janiszewski I.M.
    [J]. Pattern Recognition and Image Analysis, 2017, 27 (4) : 832 - 837
  • [26] Statistical Analysis of the Indus Script Using n-Grams
    Yadav, Nisha
    Joglekar, Hrishikesh
    Rao, Rajesh P. N.
    Vahia, Mayank N.
    Adhikari, Ronojoy
    Mahadevan, Iravatham
    [J]. PLOS ONE, 2010, 5 (03):
  • [27] Language Distance using Common N-Grams Approach
    Kosmajac, Dijana
    Keselj, Vlado
    [J]. 2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [28] Classification of Metamorphic Virus Using N-Grams Signatures
    Hamid, Isredza Rahmi A.
    Sani, Nur Sakinah Md
    Abdullah, Zubaile
    Foozy, Cik Feresa Mohd
    Kipli, Kuryati
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 140 - 149
  • [29] Embedded malware detection using Markov n-grams
    Shafiq, M. Zubair
    Khayam, Syed Ali
    Farooq, Muddassar
    [J]. DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, 2008, 5137 : 88 - +
  • [30] Authorship Identification of the Azerbaijani Texts Using n-grams
    Aida-zade, K. R.
    Talibov, S. Q.
    [J]. 2016 IEEE 10TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2016, : 210 - 212