Text Similarity Calculations Using Text and Syntactical Structures

被引:0
|
作者
Elhadi, Mohamed T. [1 ]
机构
[1] Univ Zawia, Dept Comp Sci, Zawia, Libya
关键词
Syntaical strctures; document similarity; Longest Common Subsequnce;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
this paper reports on experiments performed to investigate the use of syntactical structures of sentences as the basis of similarity calculation between two text documents. Sentences of the documents are converted into an ordered Part of Speech (POS) tags that are then fed to Longest Common Subsequence (LCS) algorithm to determine the size and count of the LCSs found when comparing the document sentence by sentence. In the first stage the syntactical features of the text were used as a structural representation of the document's text. It also serves as a text reduction to improve the efficiency of the LCS when used in comparing. In the second stage, documents that score well in the first stage as measured by computing an accumulative score that is a function of the number of the LCSs, are then subjects to further comparison using the actual sentences (content words) in a sentence by sentence fashion to produce a final measure of similarity based on common words (accumulated for the whole file) and the total number of LCSs from the first step. Experiments done on two different corpuses and results obtained have showed the utility of the proposed procedure in calculating similarities between written documents.
引用
收藏
页码:715 / 719
页数:5
相关论文
共 50 条
  • [21] Research on the Text Length's Effect of the Text Similarity Measurement
    Niu, Yan
    Chen, Yongchao
    INFORMATION AND AUTOMATION, 2011, 86 : 112 - 117
  • [22] THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY
    Petrovic, Dorde
    Stankovic, Milena
    FACTA UNIVERSITATIS-SERIES MATHEMATICS AND INFORMATICS, 2019, 34 (05): : 973 - 994
  • [23] Modelling Text Similarity: A Survey
    Mu, Wenchuan
    Lim, Kwan Hui
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 698 - 705
  • [24] Measurement of Text Similarity: A Survey
    Wang, Jiapeng
    Dong, Yihong
    INFORMATION, 2020, 11 (09) : 1 - 17
  • [25] An approach of syntactical text graph representation learning for extractive summarization
    Tham Vo
    International Journal of Intelligent Robotics and Applications, 2023, 7 : 190 - 204
  • [26] Syntactical approach to post-processing of fuzzy recognized text
    Sholomov, DL
    MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 115 - 121
  • [27] Similarity between text and RDF
    Schiessrl, Marcelo
    Berardi, Rita
    Brascher, Marisa
    LET'S PUT DATA TO USE: DIGITAL SCHOLARSHIP FOR THE NEXT GENERATION, 2014, : 128 - 130
  • [28] Similarity between text and RDF
    Schiessl, Marcelo
    Berardi, Rita
    Bräscher, Marisa
    Information Services and Use, 2014, 34 (3-4): : 325 - 330
  • [29] An approach of syntactical text graph representation learning for extractive summarization
    Tham Vo
    INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2023, 7 (01) : 190 - 204
  • [30] FREQUENCY OF OCCURENCE OF BASIC FORMS OF SYNTACTICAL HOMONYMY IN RUSSIAN TEXT
    DREIZIN, FA
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA, 1966, (12): : 55 - &