Text Similarity Calculations Using Text and Syntactical Structures

被引：0

作者：

Elhadi, Mohamed T. ^{[1
]}

机构：

[1] Univ Zawia, Dept Comp Sci, Zawia, Libya

来源：

2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012) | 2012年

关键词：

Syntaical strctures; document similarity; Longest Common Subsequnce;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

this paper reports on experiments performed to investigate the use of syntactical structures of sentences as the basis of similarity calculation between two text documents. Sentences of the documents are converted into an ordered Part of Speech (POS) tags that are then fed to Longest Common Subsequence (LCS) algorithm to determine the size and count of the LCSs found when comparing the document sentence by sentence. In the first stage the syntactical features of the text were used as a structural representation of the document's text. It also serves as a text reduction to improve the efficiency of the LCS when used in comparing. In the second stage, documents that score well in the first stage as measured by computing an accumulative score that is a function of the number of the LCSs, are then subjects to further comparison using the actual sentences (content words) in a sentence by sentence fashion to produce a final measure of similarity based on common words (accumulated for the whole file) and the total number of LCSs from the first step. Experiments done on two different corpuses and results obtained have showed the utility of the proposed procedure in calculating similarities between written documents.

引用

页码：715 / 719

页数：5

共 50 条

[41] Text classification using similarity measures on intuitionistic fuzzy sets
Intarapaiboon, Peerasak
SCIENCEASIA, 2016, 42 (01): : 52 - 60
[42] K Nearest Neighbor for Text Summarization using Feature Similarity
Jo, Taeho
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION, CONTROL, COMPUTING AND ELECTRONICS ENGINEERING (ICCCCEE), 2017,
[43] Text Steganography Approaches Using Similarity of English Font Styles
El Rahman, Sahar A.
INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2019, 7 (03) : 29 - 50
[44] Using text analysis to quantify the similarity and evolution of scientific disciplines
Dias, Laercio
Gerlach, Martin
Scharloth, Joachim
Altmann, Eduardo G.
ROYAL SOCIETY OPEN SCIENCE, 2018, 5 (01):
[45] Using K Nearest Neighbors for Text Segmentation with Feature Similarity
Jo, Taeho
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION, CONTROL, COMPUTING AND ELECTRONICS ENGINEERING (ICCCCEE), 2017,
[46] StABLE: Analyzing Player Movement Similarity Using Text Mining
Fragoso, Luana
Stanley, Kevin G.
2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 437 - 444
[47] Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
Atoum, Issa
Otoom, Ahmed
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 124 - 130
[48] SimiT: A Text Similarity Method Using Lexicon and Dependency Representations
Inan, Emrah
NEW GENERATION COMPUTING, 2020, 38 (03) : 509 - 530
[49] Improvement of the Log Pattern Extracting Algorithm Using Text Similarity
Zhao, Yining
Wang, Xiaodong
Xiao, Haili
Chi, Xuebin
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 507 - 514
[50] Text-based Document Similarity Matching Using sdtext
Shields, Clay
PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 5607 - 5616

← 1 2 3 4 5 →