A Text Similarity Measure Based on Suffix Tree

被引：0

作者：

Huang, Chenghui ^{[1
,2
]}

Liu, Yan ^{[3
]}

Xia, Shengzhong ^{[4
]}

Yin, Jian ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Dept Comp Sci, Guangzhou 510275, Guangdong, Peoples R China

[2] Guangdong Univ Finance, Dept Comp Sci & Technol, Guangzhou 510520, Guangdong, Peoples R China

[3] Guangdong Univ Finance, Dept Appl Math, Guangzhou 510520, Guangdong, Peoples R China

[4] Guangdong AIB Coll, Guangzhou 510507, Guangdong, Peoples R China

来源：

INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL | 2011年 / 14卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Similarity measure; Suffix tree; Document model; Text clustering;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

It is well known that most text clustering algorithms use the bag-of-words model, which represents a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper presents a new text similarity algorithm (STSM) by applying the TF-IDF method to weight word sequences of a document that modeled as a suffix tree. Experimental results on standard document benchmark corpus RUTERS and BBC show that the new text similarity is effective. Comparing with the results of the state-of-art similarity measure, our proposed method brings an improvement of about 10% on the average of F-measure score.

引用

页码：583 / 592

页数：10

共 50 条

[1] An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure
Huang, Cheng-hui
Yin, Jian
Han, Dong
INFORMATION COMPUTING AND APPLICATIONS, PT 2, 2010, 106 : 150 - +
[2] Text clustering using a suffix tree similarity measure
Huang C.
Yin J.
Hou F.
Journal of Computers, 2011, 6 (10) : 2180 - 2186
[3] Using Annotated Suffix Tree Similarity Measure for Text Summarisation
Yakovlev, Maxim
Chernyak, Ekaterina
ANALYSIS OF LARGE AND COMPLEX DATA, 2016, : 103 - 112
[4] A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clusteringa
Kale, Archana
Bharambe, Ujwala
SashiKumar, M.
2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 1148 - +
[5] A Suffix Tree Or Not a Suffix Tree?
Starikovskaya, Tatiana
Vildhoj, Hjalte Wedel
COMBINATORIAL ALGORITHMS, IWOCA 2014, 2015, 8986 : 338 - 350
[6] A suffix tree or not a suffix tree?
Starikovskaya, Tatiana
Vildhoj, Hjalte Wedel
JOURNAL OF DISCRETE ALGORITHMS, 2015, 32 : 14 - 23
[7] TextFlow: A Text Similarity Measure based on Continuous Sequences
Mrabet, Yassine
Kilicoglu, Halil
Demner-Fushman, Dina
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 763 - 772
[8] A Short Text Similarity Measure Based on Hidden Topics
Chen, Hong-chao
Guo, Xiao-hua
Liu, Ling-qiang
Zhu, Xin-hua
COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 1101 - 1108
[9] Text generation by probabilistic suffix tree language model
Marukatat, Sanparith
16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,
[10] Semantic Similarity Measure Based on Ontology Hierarchical Tree
Ge, Jike
Qiu, Yuhui
Yin, Shiqun
Chen, Zuqin
2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 5290 - 5294

← 1 2 3 4 5 →