Identifying Similar Sentences by Using N-Grams of Characters

被引:2
|
作者
Sultana, Saima [1 ]
Biskri, Ismail [1 ]
机构
[1] Univ Quebec Trois Rivieres, Trois Rivieres, PQ G8Z 4M3, Canada
关键词
Similar sentences; N-grams of characters;
D O I
10.1007/978-3-319-92058-0_80
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, detecting similar sentences can play a major role in various fundamental applications for reading and analyzing sentences like information retrieval, categorization, detection of paraphrases, summarizing, translation etc. In this work, we present a novel method for the detection of similar sentences. This method highlights the using of units of n-grams of characters. The online dictionary as well as any search engine are not being used. Hence, this idea leads our method a simplest and optimum way to handle the similarities between two sentences. In addition, the grammar rules as well as any syntax have not been used in our method. That's why, our approach is language-independent. We analyze and compare a range of similarity measures with our methodology. Meanwhile, the complexity of our method is O(N2) which is pretty much better.
引用
收藏
页码:833 / 843
页数:11
相关论文
共 50 条
  • [1] Identifying Metamorphic Virus Using n-grams And Hidden Markov Model
    Thunga, Shiva Prasad
    Neelisetti, Raghu Kisore
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 2016 - 2022
  • [2] Identifying the Dominant Language of Web Page using Supervised N-grams
    Ng, Choon-Ching
    Liew, Siau-Chuin
    Hussin, Wan Muhammad Syahrir Wan
    Herawan, Tutut
    [J]. 2012 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2012, : 344 - 348
  • [3] The Distribution of N-Grams
    Leo Egghe
    [J]. Scientometrics, 2000, 47 : 237 - 252
  • [4] Collocations and N-grams
    FREEBURY-JONES, D. A. R. R. E. N.
    [J]. RENAISSANCE AND REFORMATION, 2021, 44 (04) : 210 - 216
  • [5] The distribution of N-grams
    Egghe, L
    [J]. SCIENTOMETRICS, 2000, 47 (02) : 237 - 252
  • [6] Text classification and multilinguism: Getting at words via N-grams of characters
    Biskri, I
    Delisle, S
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS: COMPUTER SCI I, 2002, : 110 - 115
  • [7] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [8] Automatically identifying code features for software defect prediction: Using AST N-grams
    Shippey, Thomas
    Bowes, David
    Hall, Tracy
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 106 : 142 - 160
  • [9] Plagiarism Detection Using Stopword n-grams
    Stamatatos, Efstathios
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (12): : 2512 - 2527
  • [10] Spam detection using character N-grams
    Kanaris, Ioannis
    Kanaris, Konstantinos
    Stamatatos, Efstathios
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 95 - 104