Text Similarity Calculations Using Text and Syntactical Structures

被引：0

作者：

Elhadi, Mohamed T. ^{[1
]}

机构：

[1] Univ Zawia, Dept Comp Sci, Zawia, Libya

来源：

2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012) | 2012年

关键词：

Syntaical strctures; document similarity; Longest Common Subsequnce;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

this paper reports on experiments performed to investigate the use of syntactical structures of sentences as the basis of similarity calculation between two text documents. Sentences of the documents are converted into an ordered Part of Speech (POS) tags that are then fed to Longest Common Subsequence (LCS) algorithm to determine the size and count of the LCSs found when comparing the document sentence by sentence. In the first stage the syntactical features of the text were used as a structural representation of the document's text. It also serves as a text reduction to improve the efficiency of the LCS when used in comparing. In the second stage, documents that score well in the first stage as measured by computing an accumulative score that is a function of the number of the LCSs, are then subjects to further comparison using the actual sentences (content words) in a sentence by sentence fashion to produce a final measure of similarity based on common words (accumulated for the whole file) and the total number of LCSs from the first step. Experiments done on two different corpuses and results obtained have showed the utility of the proposed procedure in calculating similarities between written documents.

引用

页码：715 / 719

页数：5

共 50 条

[31] TSI: an Ad Text Strength Indicator using Text-to-CTR and Semantic-Ad-Similarity
Mishra, Shaunak
Hu, Changwei
Verma, Manisha
Yen, Kevin
Hu, Yifan
Sviridenko, Maxim
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4036 - 4045
[32] Text Mining using Comparison of Semantic Structures
Montes y Gomez, Manuel
COMPUTACION Y SISTEMAS, 2005, 9 (01): : 63 - 81
[33] Text as Policy: Measuring Policy Similarity through Bill Text Reuse
Linder, Fridolin
Desmarais, Bruce
Burgess, Matthew
Giraudy, Eugenia
POLICY STUDIES JOURNAL, 2020, 48 (02) : 546 - 574
[34] Text Similarity Function Based on Word Embeddings for Short Text Analysis
Pascual, Adrian Jimenez
Fujita, Sumio
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402
[35] An effective short text conceptualization based on new short text similarity
Bekkali, Mohammed
Lachkar, Abdelmonaime
SOCIAL NETWORK ANALYSIS AND MINING, 2018, 9 (01)
[36] Scene Text Retrieval via Joint Text Detection and Similarity Learning
Wang, Hao
Bai, Xiang
Yang, Mingkun
Zhu, Shenggao
Wang, Jing
Liu, Wenyu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4556 - 4565
[37] Energy Efficient Calculations of Text Similarity Measure on FPGA-Accelerated Computing Platforms
Karwatowski, Michal
Russek, Pawel
Wielgosz, Maciej
Koryciak, Sebastian
Wiatr, Kazimierz
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 31 - 40
[38] Using Siamese BiLSTM Models for Identifying Text Semantic Similarity
Fradelos, Georgios
Perikos, Isidoros
Hatzilygeroudis, Ioannis
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2023 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2023, 677 : 381 - 392
[39] MEASURING SHORT TEXT SEMANTIC SIMILARITY USING MULTIPLE MEASUREMENTS
Zhu, Tian-Tian
Lan, Man
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 808 - 813
[40] OntoSeg: a Novel Approach to Text Segmentation using Ontological Similarity
Bayomi, Mostafa
Levacher, Killian
Ghorab, M. Rami
Lawless, Seamus
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1274 - 1281

← 1 2 3 4 5 →