Using proportional transportation distances for measuring document similarity

被引:0
|
作者
Wan, Xiaojun [1 ]
Yang, Jianwu [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is proposed in this paper. The proposed measure improves on the previously proposed similarity measure based on optimal matching by allowing many-to-many matching between subtopics of documents. After documents are decomposed into sets of subtopics, the Proportional Transportation Distance is employed to evaluate the similarity between sets of subtopics for two documents by solving a transportation problem. Experiments on TDT-3 data demonstrate its good ability for measuring document similarity and also its high robustness, i.e. it does not rely on the underlying document decomposition algorithm largely as the optimal matching based measure.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [41] Measuring distances to Galactic SNRs using the red clump stars
    Shan, S. S.
    Wu, D.
    Zhu, H.
    Zhang, M. F.
    Tian, W. W.
    SUPERNOVA 1987A: 30 YEARS LLATER - COSMIC RAYS AND NUCLEI FROM SUPERNOVAE AND THEIR AFTERMATHS, 2017, 12 (S331): : 216 - 219
  • [42] Document Visual Similarity Measure For Document Search
    Ahmadullin, Ildus
    Allebach, Jan P.
    Damera-Venkata, Niranjan
    Fan, Jian
    Lee, Seungyon
    Lin, Qian
    Liu, Jerry
    DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 139 - 142
  • [43] Document Clustering using Concept Space and Cosine Similarity Measurement
    Muflikhah, Lailil
    Baharudin, Baharum
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 1, 2009, : 58 - 62
  • [44] USING INTERDOCUMENT SIMILARITY INFORMATION IN DOCUMENT RETRIEVAL SYSTEMS.
    Griffiths, Alan
    Luckhurst, H.Claire
    Willett, Peter
    1600, (37):
  • [45] Sentiment Classification using Document Embeddings trained with Cosine Similarity
    Thongtan, Tan
    Phienthrakul, Tanasanee
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 407 - 414
  • [46] Document Similarity Estimation for Sentiment Analysis Using Neural Network
    Yanagimoto, Hidekazu
    Shimada, Mika
    Yoshimura, Akane
    2013 IEEE/ACIS 12TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2013, : 105 - 110
  • [47] USING INTERDOCUMENT SIMILARITY INFORMATION IN DOCUMENT-RETRIEVAL SYSTEMS
    GRIFFITHS, A
    LUCKHURST, HC
    WILLETT, P
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 37 (01): : 3 - 11
  • [48] A study on the document similarity judgment using similar block expansion
    Jeong, JongGeun
    Cha, ByungRae
    RECENT PROGRESS IN COMPUTATIONAL SCIENCES AND ENGINEERING, VOLS 7A AND 7B, 2006, 7A-B : 229 - +
  • [49] SQLiDDS: SQL injection detection using document similarity measure
    Kar, Debabrata
    Panigrahi, Suvasini
    Sundararajan, Srikanth
    JOURNAL OF COMPUTER SECURITY, 2016, 24 (04) : 507 - 539
  • [50] Similarity of documents and document collections using attributes with low noise
    Biemann, Chris
    Quasthoff, Uwe
    WEBIST 2007: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL WIA: WEB INTERFACES AND APPLICATIONS, 2007, : 130 - +