An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure

被引:0
|
作者
Huang, Cheng-hui [1 ,2 ]
Yin, Jian [1 ]
Han, Dong [2 ]
机构
[1] Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China
[2] Guangdong Univ Finance, Dept Comp Sci & Technol, Guangzhou 510520, Peoples R China
基金
中国国家自然科学基金;
关键词
Retrieval algorithm; suffix tree; document model; similarity measure;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In information retrieval area, popular methods considered word frequency of retrieval terms and text corpus. These methods ignored the word sequence information between retrieval terms and text corpus, and then the good result limited to some special domains. This paper analyzes the word sequence information, and then computes the similarity between the query and text documents of corpus by applying a suffix tree similarity that combines with TF-IDF weighting method. Experimental results on standard document benchmark corpus RUTERS indicate that the new retrieval algorithm is an effective text retrieval algorithm. Comparing with the results of traditional word term weight TF-IDF similarity measure in the same retrieval algorithm, proposed method achieves an improvement of about 20% on the average of precision score.
引用
收藏
页码:150 / +
页数:2
相关论文
共 50 条
  • [1] A Text Similarity Measure Based on Suffix Tree
    Huang, Chenghui
    Liu, Yan
    Xia, Shengzhong
    Yin, Jian
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
  • [2] Using Annotated Suffix Tree Similarity Measure for Text Summarisation
    Yakovlev, Maxim
    Chernyak, Ekaterina
    [J]. ANALYSIS OF LARGE AND COMPLEX DATA, 2016, : 103 - 112
  • [3] ANNOTATED SUFFIX TREE AS A WAY OF TEXT REPRESENTATION FOR INFORMATION RETRIEVAL IN TEXT COLLECTIONS
    Frolov, Dmitry S.
    [J]. BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2015, 34 (04): : 63 - 70
  • [4] An Improved News Recommendation Algorithm Based on Text Similarity
    Gao, Yihang
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON SMART BLOCKCHAIN (SMARTBLOCK), 2020, : 132 - 136
  • [5] An Improved Text Similarity Calculation Algorithm Based On VSM
    Li, Lian
    Zhu, AiHong
    Su, Tao
    [J]. ADVANCED RESEARCH ON AUTOMATION, COMMUNICATION, ARCHITECTONICS AND MATERIALS, PTS 1 AND 2, 2011, 225-226 (1-2): : 1105 - 1108
  • [6] Improved Spectral Clustering Algorithm Based on Similarity Measure
    Yan, Jun
    Cheng, Debo
    Zong, Ming
    Deng, Zhenyun
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014, 2014, 8933 : 641 - 654
  • [7] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    [J]. ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [8] An improved Similarity Measure For Chinese Text Clustering
    Zhang, Shaolei
    Wang, Zhong
    Huang, Wei
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY ENGINEERING (ICMITE 2016), 2016, : 141 - 144
  • [9] Suffix Tree Based Approach for Chinese Information Retrieval
    Huang, Jin Hu
    Powers, David
    [J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 3, PROCEEDINGS, 2008, : 393 - 397
  • [10] A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clusteringa
    Kale, Archana
    Bharambe, Ujwala
    SashiKumar, M.
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 1148 - +