Research on calculation method of text similarity based on smooth inverse frequency

被引:0
|
作者
Ye Y. [1 ]
Minmin Y. [1 ]
Jiming L. [1 ]
机构
[1] Key Laboratory of E-commerce and Modern Logistics, Chongqing University of Posts and Telecommunications, Chongqing
关键词
Part-of-speech; SIF; Word order similarity; Word2vec;
D O I
10.19682/j.cnki.1005-8885.2020.1007
中图分类号
学科分类号
摘要
In order to improve the accuracy of text similarity calculation, this paper presents a text similarity function part of speech and word order-smooth inverse frequency (PO-SIF) based on sentence vector, which optimizes the classical SIF calculation method in two aspects: part of speech and word order. The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise. However, the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation. In our proposed PO-SIF, the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor, to determine the most crucial words. Furthermore, PO-SIF calculates the sentence vector similarity taking into the account of word order, which overcomes the drawback of similarity analysis that is mostly based on the word frequency. The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation. © 2020, Beijing University of Posts and Telecommunications. All rights reserved.
引用
收藏
页码:56 / 64
页数:8
相关论文
共 50 条
  • [21] Calculation and Performance Evaluation of Text Similarity Based on Strong Classification Features
    Shen, Guiquan
    Xiao, Xiaoqing
    Wen, Bojian
    Pan, Junzhen
    Shen, Wuqiang
    Long, Zhenyue
    Liang, Jieliang
    Wang, Yi
    Khder, Moaiad Ahmad
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2022, 8 (01) : 707 - 714
  • [22] A Short Text Similarity Calculation Method Combining Semantic and Headword Attention Mechanism
    Ji, Mingyu
    Zhang, Xinhai
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [24] Research on method of Chinese question similarity calculation in restricted domain
    Zhang Cheng
    Yu Zhengtao
    Deng Jinhui
    Liu Gengyan
    Huang Yuejuan
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 6516 - 6519
  • [25] A new similarity computing method based on concept similarity in Chinese text processing
    Peng Jing
    Yang DongQing
    Tang ShiWei
    Wang TengJiao
    Gao Jun
    SCIENCE IN CHINA SERIES F-INFORMATION SCIENCES, 2008, 51 (09): : 1215 - 1230
  • [26] A new similarity computing method based on concept similarity in Chinese text processing
    PENG Jing1
    2 Department of Science and Technology
    Science in China(Series F:Information Sciences), 2008, (09) : 1215 - 1230
  • [27] A new similarity computing method based on concept similarity in Chinese text processing
    Jing Peng
    DongQing Yang
    ShiWei Tang
    TengJiao Wang
    Jun Gao
    Science in China Series F: Information Sciences, 2008, 51
  • [28] TOPIC MODEL AND SIMILARITY CALCULATION OF TEXT ON SPARK
    Dai, Changsong
    Wang, Yongbin
    Wang, Qi
    2017 14TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2017, : 15 - 19
  • [29] Similarity calculation method for images based on the scene graph
    Jinghui Peng
    Zhen Wang
    Shizhe Wang
    Signal, Image and Video Processing, 2023, 17 : 2395 - 2403
  • [30] Research on text similarity algorithm based on paragraph random walk
    Zhang, J. (zhangjinpengyy1989@163.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):