Using proportional transportation distances for measuring document similarity

被引:0
|
作者
Wan, Xiaojun [1 ]
Yang, Jianwu [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is proposed in this paper. The proposed measure improves on the previously proposed similarity measure based on optimal matching by allowing many-to-many matching between subtopics of documents. After documents are decomposed into sets of subtopics, the Proportional Transportation Distance is employed to evaluate the similarity between sets of subtopics for two documents by solving a transportation problem. Experiments on TDT-3 data demonstrate its good ability for measuring document similarity and also its high robustness, i.e. it does not rely on the underlying document decomposition algorithm largely as the optimal matching based measure.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [21] Measuring distances using infrared surface brightness fluctuations
    Jensen, JB
    Tonry, JL
    Luppino, GA
    ASTROPHYSICAL JOURNAL, 1998, 505 (01): : 111 - 128
  • [22] MEASURING DISTANCES OF ENGLISH PHONEMES USING MAGNITUDE ESTIMATION
    SINGH, S
    BROKAW, SP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1969, 45 (01): : 322 - &
  • [23] Measuring Norwegian dialect distances using acoustic features
    Heeringa, Wilbert
    Johnson, Keith
    Gooskens, Charlotte
    SPEECH COMMUNICATION, 2009, 51 (02) : 167 - 183
  • [24] Measuring distances to objects using rotating mirror system
    Faculty of Science and Technology, Keio University, Yokohama, 223-0061, Japan
    不详
    Syst Comput Jpn, 11 (43-50):
  • [25] Similarity Distances Between Permutations
    Su, Lili
    Farnoud, Farzad
    Milenkovic, Olgica
    2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 2267 - 2271
  • [26] Document Similarity Using a Phrase Indexing Graph Model
    Hammouda, Khaled M.
    Kamel, Mohamed S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (06) : 710 - 727
  • [27] Concept based document similarity using graph model
    Sonawane S.S.
    Kulkarni P.
    International Journal of Information Technology, 2022, 14 (1) : 311 - 322
  • [28] Word Similarity for Document Grouping using Soft Computing
    Murad, Masrah Azrifah Azmi
    Martin, Trevor
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (08): : 20 - 28
  • [29] Document Similarity Using a Phrase Indexing Graph Model
    Khaled M. Hammouda
    Mohamed S. Kamel
    Knowledge and Information Systems, 2004, 6 : 710 - 727
  • [30] Incremental document clustering using cluster similarity histograms
    Hammouda, KM
    Kamel, MS
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 597 - 601