A Short Text Similarity Measure Based on Hidden Topics

被引:0
|
作者
Chen, Hong-chao [1 ,2 ]
Guo, Xiao-hua [1 ]
Liu, Ling-qiang [1 ]
Zhu, Xin-hua [1 ,2 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text; Similarity measure; Topic model; KNN; Information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.
引用
收藏
页码:1101 / 1108
页数:8
相关论文
共 50 条
  • [31] A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS
    Cho, Su Gon
    Kim, Seoung Bum
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2017, 24 (03): : 328 - 339
  • [32] Boolean logic algebra driven similarity measure for text based applications
    Abdalla, Hassan, I
    Amer, Ali A.
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [33] A SHORT TEXT SIMILARITY CALCULATION METHOD BASED ON DEEP LEARNING
    Xu, Yong
    Peng, Yunke
    Wang, Hengna
    Wang, Xue'er
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2024, 86 (01): : 91 - 104
  • [34] Measuring the short text similarity based on semantic and syntactic information
    Yang, Jiaqi
    Li, Yongjun
    Gao, Congjie
    Zhang, Yinyin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 169 - 180
  • [35] Short Text Similarity Calculation Based on Jaccard and Semantic Mixture
    Wu, Shushu
    Liu, Fang
    Zhang, Kai
    Communications in Computer and Information Science, 2021, 1363 CCIS : 37 - 45
  • [36] A SHORT TEXT SIMILARITY CALCULATION METHOD BASED ON DEEP LEARNING
    Xu, Yong
    Peng, Yunke
    Wang, Hengna
    Wang, Xue’Er
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2024, 86 (01): : 91 - 104
  • [37] Boolean logic algebra driven similarity measure for text based applications
    Abdalla H.I.
    Amer A.A.
    PeerJ Computer Science, 2021, 7 : 1 - 34
  • [38] A data-driven text similarity measure based on classification algorithms
    Kim, Seoung Bum (sbkim1@korea.ac.kr), 1600, University of Cincinnati (24):
  • [39] A chinese short text similarity algorithm based on semantic and syntax
    Liao, Zhi-Fang (zfliao@csu.edu.cn), 1600, Hunan University (43):
  • [40] Document Similarity for Texts of Varying Lengths via Hidden Topics
    Gong, Hongyu
    Sakakini, Tarek
    Bhat, Suma
    Xiong, Jinjun
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2341 - 2351