Using proportional transportation distances for measuring document similarity

被引:0
|
作者
Wan, Xiaojun [1 ]
Yang, Jianwu [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is proposed in this paper. The proposed measure improves on the previously proposed similarity measure based on optimal matching by allowing many-to-many matching between subtopics of documents. After documents are decomposed into sets of subtopics, the Proportional Transportation Distance is employed to evaluate the similarity between sets of subtopics for two documents by solving a transportation problem. Experiments on TDT-3 data demonstrate its good ability for measuring document similarity and also its high robustness, i.e. it does not rely on the underlying document decomposition algorithm largely as the optimal matching based measure.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [1] Transportation distances and human perception of melodic similarity
    Typke, Rainer
    Wiering, Frans
    Veltkamp, Remco C.
    MUSICAE SCIENTIAE, 2007, : 153 - 181
  • [2] On the Effectiveness of Distances Measuring Protein Structure Similarity
    Galgonek, Jakub
    Hokzsa, David
    SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 165 - 166
  • [3] A New Method of Measuring Document Similarity for Movie Recommendation
    Kim, Sung-min
    Ha, Young-guk
    2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 41 - 44
  • [4] Measuring document similarity with weighted averages of word embeddings
    Seegmiller, Bryan
    Papanikolaou, Dimitris
    Schmidt, Lawrence D. W.
    EXPLORATIONS IN ECONOMIC HISTORY, 2023, 87
  • [5] Document Versioning Using Feature Space Distances
    Woon, Wei Lee
    Wong, Kuok-Shoong Daniel
    Aung, Zeyar
    Svetinovic, Davor
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 487 - 494
  • [6] A Document Recommendation System Using a Document-Similarity Ontology
    Vences, R.
    Gomez, J.
    Menendez, V.
    IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (07) : 3329 - 3334
  • [7] Measuring sidewalk distances using Google Earth
    Ian Janssen
    Andrei Rosu
    BMC Medical Research Methodology, 12
  • [8] Transportation Distances on the Circle
    Julien Rabin
    Julie Delon
    Yann Gousseau
    Journal of Mathematical Imaging and Vision, 2011, 41 : 147 - 167
  • [9] Measuring sidewalk distances using Google Earth
    Janssen, Ian
    Rosu, Andrei
    BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12
  • [10] Using Ultrasonic and Infrared Sensors for Measuring Distances
    Mohammad, Tarek
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING AND MECHANICS, VOLS 1 AND 2, 2009, : 1670 - 1678