Comparison of text-based and linked-based metrics in terms of estimating the similarity of articles

被引:1
|
作者
Goltaji, Marzieh [1 ]
Abbaspour, Javad [2 ]
Jowkar, Abdolrasool [2 ]
Fakhrahmad, Seyed Mostafa [3 ]
机构
[1] Shiraz Univ, Journal Evaluat Dept, Islamic World Sci & Technol Monitoring & Citat In, Shiraz, Iran
[2] Shiraz Univ, Dept Knowledge & Informat Sci, Shiraz, Iran
[3] Shiraz Univ, Dept Comp Sci & Engn & IT, Shiraz, Iran
关键词
Linked-based metrics; scientific articles; similarity metrics; text-based metrics; SCIENTIFIC LITERATURE; RETRIEVAL EFFECTIVENESS; COMPLEX NETWORKS; CITATION; COCITATION; PAGERANK; RANKINGS; CONTEXT; BIAS;
D O I
10.1177/09610006231165759
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The aim of this study is to identify the power of text-based metrics (Cosine and Lucene similarity) and linked-based (Co-citation, bibliographic coupling, Amsler, PageRank, and HITS) and their combination in estimating the similarity of articles with each other. The experiments were conducted on a test collection of 26,262 articles in the PubMed Central Open Access Subset (PMC OAS) of CITREC that, in addition to having linked-based metrics, their full text was available for calculating text-based metrics. Thirty articles were selected as primary articles, and articles related to each of them were retrieved based on the mesh similarity metric. Then, the similarity of the retrieved documents based on text-based and linked-based metrics was also extracted. In the next stage, text-based, linked-based, and hybrid metrics were entered into the generalized regression model to estimate the similarity of the articles to determine their power; finally, the performance of the models was compared based on the mean squared error and correlation. The results showed that the model included Cosine and Lucene similarity metrics in text-based metrics. In linked-based metrics, HITS (Hub), HITS (authority), PageRank, and co-citation had the highest power, respectively; but the bibliographic coupling and Amsler could not enter the model. In general, a comparison of text-based, linked-based, and hybrid metrics performance indicated that the linked-based model estimates similarity between articles better than the text-based model, and the combination of text-based and linked-based metrics makes little change in improving the power of the articles. Despite the importance and application of text-based and linked-based metrics to measure the similarity of articles, a study that examines the power of these metrics alone and in comparison with each other in estimating the similarity of articles was not observed.
引用
收藏
页码:760 / 772
页数:13
相关论文
共 50 条
  • [1] Empirical comparison of text-based mobile apps similarity measurement techniques
    Afnan Al-Subaihin
    Federica Sarro
    Sue Black
    Licia Capra
    [J]. Empirical Software Engineering, 2019, 24 : 3290 - 3315
  • [2] Empirical comparison of text-based mobile apps similarity measurement techniques
    Al-Subaihin, Afnan
    Sarro, Federica
    Black, Sue
    Capra, Licia
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (06) : 3290 - 3315
  • [3] Comparison of Text-Based and Feature-Based Semantic Similarity Between Android Apps
    Uddin, Md Kafil
    He, Qiang
    Han, Jun
    Chua, Caslon
    [J]. WEB INFORMATION SYSTEMS ENGINEERING, WISE 2020, PT I, 2020, 12342 : 530 - 545
  • [4] Text-based Document Similarity Matching Using sdtext
    Shields, Clay
    [J]. PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 5607 - 5616
  • [5] Evaluating text-based similarity measures for musical content
    Garay, A
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB DELIVERING OF MUSIC, PROCEEDINGS, 2002, : 49 - 55
  • [6] Text-Based User-kNN: Measuring User Similarity Based on Text Reviews
    Terzi, Maria
    Rowe, Matthew
    Ferrario, Maria-Angela
    Whittle, Jon
    [J]. USER MODELING, ADAPTATION, AND PERSONALIZATION, UMAP 2014, 2014, 8538 : 195 - 206
  • [7] Text-based informatics
    Valdes-Perez, RE
    [J]. SCIENTIST, 1998, 12 (14): : 10 - 10
  • [8] Capturing Turn-by-Turn Lexical Similarity in Text-Based Communication
    Liebman, Noah
    Gergle, Darren
    [J]. ACM CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW 2016), 2016, : 553 - 559
  • [9] LINGO-DL: a text-based approach for molecular similarity searching
    Ammar Abdo
    Maude Pupin
    [J]. Journal of Computer-Aided Molecular Design, 2021, 35 : 657 - 665
  • [10] Knowledge discovery through text-based similarity searches for astronomy literature
    Kerzendorf, Wolfgang E.
    [J]. JOURNAL OF ASTROPHYSICS AND ASTRONOMY, 2019, 40 (03)