A Short Text Similarity Measure Based on Hidden Topics

被引:0
|
作者
Chen, Hong-chao [1 ,2 ]
Guo, Xiao-hua [1 ]
Liu, Ling-qiang [1 ]
Zhu, Xin-hua [1 ,2 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text; Similarity measure; Topic model; KNN; Information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.
引用
收藏
页码:1101 / 1108
页数:8
相关论文
共 50 条
  • [1] Short text similarity based on probabilistic topics
    Quan, Xiaojun
    Liu, Gang
    Lu, Zhi
    Ni, Xingliang
    Wenyin, Liu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) : 473 - 491
  • [2] Short text similarity based on probabilistic topics
    Xiaojun Quan
    Gang Liu
    Zhi Lu
    Xingliang Ni
    Liu Wenyin
    Knowledge and Information Systems, 2010, 25 : 473 - 491
  • [3] Consensus Similarity Measure for Short Text Clustering
    Shin, Youhyun
    Ahn, Yeonchan
    Jeon, Heesik
    Lee, Sang-goo
    2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 264 - 268
  • [4] SyMSS: A syntax-based measure for short-text semantic similarity
    Oliva, Jesus
    Ignacio Serrano, Jose
    Dolores del Castillo, Maria
    Iglesias, Angel
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (04) : 390 - 405
  • [5] A Text Similarity Measure Based on Suffix Tree
    Huang, Chenghui
    Liu, Yan
    Xia, Shengzhong
    Yin, Jian
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
  • [6] An effective short text conceptualization based on new short text similarity
    Bekkali, Mohammed
    Lachkar, Abdelmonaime
    SOCIAL NETWORK ANALYSIS AND MINING, 2018, 9 (01)
  • [7] Discovering news topics from microblogs based on hidden topics analysis and text clustering
    Lu, Rong
    Xiang, Liang
    Liu, Ming-Rong
    Yang, Qing
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (03): : 382 - 387
  • [8] Context-based Arabic Word Sense Disambiguation using Short Text Similarity Measure
    Bekkali, Mohammed
    Lachkar, Abdelmonaime
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [9] A Study of Using Syntactic Cues in Short-text Similarity Measure
    Huang, Po-Sen
    Chiu, Po-Sheng
    Chang, Jia-Wei
    Huang, Yueh-Min
    Lee, Ming-Che
    JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (03): : 839 - 850
  • [10] TextFlow: A Text Similarity Measure based on Continuous Sequences
    Mrabet, Yassine
    Kilicoglu, Halil
    Demner-Fushman, Dina
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 763 - 772