Text Similarity Calculations Using Text and Syntactical Structures

被引：0

作者：

Elhadi, Mohamed T. ^{[1
]}

机构：

[1] Univ Zawia, Dept Comp Sci, Zawia, Libya

来源：

2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012) | 2012年

关键词：

Syntaical strctures; document similarity; Longest Common Subsequnce;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

this paper reports on experiments performed to investigate the use of syntactical structures of sentences as the basis of similarity calculation between two text documents. Sentences of the documents are converted into an ordered Part of Speech (POS) tags that are then fed to Longest Common Subsequence (LCS) algorithm to determine the size and count of the LCSs found when comparing the document sentence by sentence. In the first stage the syntactical features of the text were used as a structural representation of the document's text. It also serves as a text reduction to improve the efficiency of the LCS when used in comparing. In the second stage, documents that score well in the first stage as measured by computing an accumulative score that is a function of the number of the LCSs, are then subjects to further comparison using the actual sentences (content words) in a sentence by sentence fashion to produce a final measure of similarity based on common words (accumulated for the whole file) and the total number of LCSs from the first step. Experiments done on two different corpuses and results obtained have showed the utility of the proposed procedure in calculating similarities between written documents.

引用

页码：715 / 719

页数：5

共 50 条

[1] Using similarity network analysis to improve text similarity calculations
Witschard, Daniel
Kucher, Kostiantyn
Jusufi, Ilir
Kerren, Andreas
Applied Network Science, 2025, 10 (01)
[2] Use of Text Syntactical Structures in Detection of Document Duplicates
Elhadi, Mohamed
Al-Tobi, Amjad
2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 531 - 536
[3] Text mining using the hierarchical syntactical structure of documents
Danger, R
Ruíz-Shulcloper, J
Berlanga, R
CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2004, 3040 : 556 - 565
[4] Text mining: identification of similarity of text documents using hybrid similarity model
K. M. Shiva Prasad
Iran Journal of Computer Science, 2023, 6 (2) : 123 - 135
[5] A SYNTACTICAL ANALYSIS OF AN AMUESHA (ARAWAK) TEXT
DUFF, M
INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 1957, 23 (03) : 171 - 178
[6] Analyzing statistical and syntactical English text for word prediction and text generation
Homeed, Taher S. K.
Al-A'ali, Mansoor
Information Technology Journal, 2007, 6 (07) : 954 - 965
[7] Interactive optimization of embedding-based text similarity calculations
Witschard, Daniel
Jusufi, Ilir
Martins, Rafael M.
Kucher, Kostiantyn
Kerren, Andreas
INFORMATION VISUALIZATION, 2022, 21 (04) : 335 - 353
[8] Text Similarity Analysis Using IR Lists
Metin, Senem Kumova
Kisla, Tarik
Karaoglan, Bahar
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[9] A new approach for text similarity using articles
Atlam, Elsayed
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2008, 7 (01) : 23 - 34
[10] Assessing text semantic similarity using ontology
Liu, Hongzhe
Wang, Pengfei
1600, Academy Publisher (09): : 490 - 497

← 1 2 3 4 5 →