String Similarity Computing Based on Position And Cosine

被引:0
|
作者
Cheng, Na [1 ]
Yu, Zhongqing [1 ,2 ]
Wang, Kaixi [1 ,2 ]
机构
[1] Qingdao Univ, Coll Comp Sci & Technol, Qingdao, Shandong, Peoples R China
[2] Qingdao Univ, Coll Data Sci & Software Engn, Qingdao, Shandong, Peoples R China
关键词
angle cosine; position encoding; approximately duplicate records; data cleaning; products select;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-Business platform needs to have the production selection functionalities according to the products' feature and their cost performance, and at the same time, we need to clean data in the production and sale process, so it is important to calculate similarity between products. This paper proposes a new way to compute the similarity of string by segmenting string into words, numbering the corresponding positions and vectorizing the string. Then the similarity between the strings is computed by computing the cosine angle of the two vectors. Experiments show that the method avoids the maximum or minimum of LCS and GST. In addition, the proposed method also improves the accuracy of similarity calculation.
引用
收藏
页码:256 / 261
页数:6
相关论文
共 50 条
  • [31] Cosine Similarity Drift Detector
    Gonzalez Hidalgo, Juan Isidro
    Palomino Marino, Laura Maria
    Maior de Barros, Roberto Souto
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 669 - 685
  • [32] Intelligent Question and Answering System Based on SAM and Cosine Similarity
    Song, Wan-li
    Chen, Wei-wei
    Zhang, Ming-zhu
    [J]. INTERNATIONAL CONFERENCE ON MATHEMATICS, MODELLING AND SIMULATION TECHNOLOGIES AND APPLICATIONS (MMSTA 2017), 2017, 215 : 708 - 713
  • [33] Identification System of ID Number Based on Cosine Similarity Algorithm
    Ren Zhijun
    Yang Liu
    Li Shixiong
    [J]. 2010 INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING (ICBSSP 2010), 2010, : 267 - 270
  • [34] Cosine similarity and the Borda rule
    Yoko Kawada
    [J]. Social Choice and Welfare, 2018, 51 : 1 - 11
  • [35] A Triangle Inequality for Cosine Similarity
    Schubert, Erich
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2021, 2021, 13058 : 32 - 44
  • [36] The Similarity Computing of Documents Based on VSM
    Guo, Qinglin
    [J]. NETWORK-BASED INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5186 : 142 - 148
  • [37] Computing User Similarity by Combining SimRank plus plus and Cosine Similarities to Improve Collaborative Filtering
    Wang, Xiuli
    Xu, Zhuoming
    Xia, Xiutao
    Mao, Chengwang
    [J]. 2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 205 - 210
  • [38] An Efficient Similarity Join Algorithm with Cosine Similarity Predicate
    Lee, Dongjoo
    Park, Jaehui
    Shim, Junho
    Lee, Sang-goo
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT 2, 2010, 6262 : 422 - +
  • [39] TitleFinder: Extracting the Headline of News Web Pages based on Cosine Similarity and Overlap Scoring Similarity
    Mohammadzadeh, Hadi
    Gottron, Thomas
    Schweiggert, Franz
    Heyer, Gerhard
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL WORKSHOP ON WEB INFORMATION AND DATA MANAGEMENT, 2012, : 65 - 71
  • [40] Comparing Accuracy of Cosine-based Similarity and Correlation-based Similarity Algorithms in Tourism Recommender Systems
    Bigdeli, Elnaz
    Bahmani, Zeinab
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MANAGEMENT OF INNOVATION AND TECHNOLOGY, VOLS 1-3, 2008, : 469 - +