A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

被引:1
|
作者
Verma, Vijay [1 ]
Aggarwal, Rajesh Kumar [1 ]
机构
[1] Natl Inst Technol, Comp Engn Dept, Kurukshetra 136119, Haryana, India
关键词
Recommender systems; Collaborative filtering; Similarity measures; Simple matching coefficient; Jaccard index; Sorensen-Dice coefficient; Salton's cosine index; Overlap coefficient; E-commerce; MODEL;
D O I
10.1007/s13278-020-00660-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241-272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen-Dice coefficient (SDC), Salton's cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton's cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective
    Vijay Verma
    Rajesh Kumar Aggarwal
    [J]. Social Network Analysis and Mining, 2020, 10
  • [2] Improving Jaccard Index for Measuring Similarity in Collaborative Filtering
    Lee, Soojung
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2017, ICISA 2017, 2017, 424 : 799 - 806
  • [3] SIMILARITY MEASURES IN SCIENTOMETRIC RESEARCH - THE JACCARD INDEX VERSUS SALTON COSINE FORMULA
    HAMERS, L
    HEMERYCK, Y
    HERWEYERS, G
    JANSSEN, M
    KETERS, H
    ROUSSEAU, R
    VANHOUTTE, A
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) : 315 - 318
  • [4] New Similarity Measures Between Generalized Trapezoidal Fuzzy Numbers Using the Jaccard Index
    Hwang, Chao-Ming
    Yang, Miin-Shen
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2014, 22 (06) : 831 - 844
  • [5] A Comparative Analysis of Network-based Similarity Measures for Scientific Paper Recommendations
    Steinert, Laura
    Hoppe, H. Ulrich
    [J]. 2016 THIRD EUROPEAN NETWORK INTELLIGENCE CONFERENCE (ENIC 2016), 2016, : 17 - 24
  • [6] Unifying ontological similarity measures: A theoretical and empirical investigation
    Cross, Valerie
    Yu, Xinran
    Hu, Xueheng
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (07) : 861 - 875
  • [7] New similarity measures of intuitionistic fuzzy sets based on the Jaccard index with its application to clustering
    Hwang, Chao-Ming
    Yang, Miin-Shen
    Hung, Wen-Liang
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2018, 33 (08) : 1672 - 1688
  • [8] From similarity perspective: a robust collaborative filtering approach for service recommendations
    Gao, Min
    Ling, Bin
    Yang, Linda
    Wen, Junhao
    Xiong, Qingyu
    Li, Shun
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (02) : 231 - 246
  • [9] A comparative analysis of trajectory similarity measures
    Tao, Yaguang
    Both, Alan
    Silveira, Rodrigo I.
    Buchin, Kevin
    Sijben, Stef
    Purves, Ross S.
    Laube, Patrick
    Peng, Dongliang
    Toohey, Kevin
    Duckham, Matt
    [J]. GISCIENCE & REMOTE SENSING, 2021, 58 (05) : 643 - 669
  • [10] From similarity perspective: a robust collaborative filtering approach for service recommendations
    Min Gao
    Bin Ling
    Linda Yang
    Junhao Wen
    Qingyu Xiong
    Shun Li
    [J]. Frontiers of Computer Science, 2019, 13 : 231 - 246