An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure

被引：0

作者：

Huang, Cheng-hui ^{[1
,2
]}

Yin, Jian ^{[1
]}

Han, Dong ^{[2
]}

机构：

[1] Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China

[2] Guangdong Univ Finance, Dept Comp Sci & Technol, Guangzhou 510520, Peoples R China

来源：

INFORMATION COMPUTING AND APPLICATIONS, PT 2 | 2010年 / 106卷

基金：

中国国家自然科学基金;

关键词：

Retrieval algorithm; suffix tree; document model; similarity measure;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In information retrieval area, popular methods considered word frequency of retrieval terms and text corpus. These methods ignored the word sequence information between retrieval terms and text corpus, and then the good result limited to some special domains. This paper analyzes the word sequence information, and then computes the similarity between the query and text documents of corpus by applying a suffix tree similarity that combines with TF-IDF weighting method. Experimental results on standard document benchmark corpus RUTERS indicate that the new retrieval algorithm is an effective text retrieval algorithm. Comparing with the results of traditional word term weight TF-IDF similarity measure in the same retrieval algorithm, proposed method achieves an improvement of about 20% on the average of precision score.

引用

页码：150 / +

页数：2

共 50 条

[1] A Text Similarity Measure Based on Suffix Tree
Huang, Chenghui
Liu, Yan
Xia, Shengzhong
Yin, Jian
[J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
[2] Using Annotated Suffix Tree Similarity Measure for Text Summarisation
Yakovlev, Maxim
Chernyak, Ekaterina
[J]. ANALYSIS OF LARGE AND COMPLEX DATA, 2016, : 103 - 112
[3] ANNOTATED SUFFIX TREE AS A WAY OF TEXT REPRESENTATION FOR INFORMATION RETRIEVAL IN TEXT COLLECTIONS
Frolov, Dmitry S.
[J]. BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2015, 34 (04): : 63 - 70
[4] An Improved News Recommendation Algorithm Based on Text Similarity
Gao, Yihang
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
[J]. 2020 3RD INTERNATIONAL CONFERENCE ON SMART BLOCKCHAIN (SMARTBLOCK), 2020, : 132 - 136
[5] An Improved Text Similarity Calculation Algorithm Based On VSM
Li, Lian
Zhu, AiHong
Su, Tao
[J]. ADVANCED RESEARCH ON AUTOMATION, COMMUNICATION, ARCHITECTONICS AND MATERIALS, PTS 1 AND 2, 2011, 225-226 (1-2): : 1105 - 1108
[6] Improved Spectral Clustering Algorithm Based on Similarity Measure
Yan, Jun
Cheng, Debo
Zong, Ming
Deng, Zhenyun
[J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014, 2014, 8933 : 641 - 654
[7] An Improved Similarity Measure for Text Clustering and Classification
Reddy, G. Suresh
Kanth, T. V. Rajini
Rao, A. Ananda
[J]. ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
[8] An improved Similarity Measure For Chinese Text Clustering
Zhang, Shaolei
Wang, Zhong
Huang, Wei
[J]. 2016 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY ENGINEERING (ICMITE 2016), 2016, : 141 - 144
[9] Suffix Tree Based Approach for Chinese Information Retrieval
Huang, Jin Hu
Powers, David
[J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 3, PROCEEDINGS, 2008, : 393 - 397
[10] A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clusteringa
Kale, Archana
Bharambe, Ujwala
SashiKumar, M.
[J]. 2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 1148 - +

← 1 2 3 4 5 →