Extracting shared topics of multiple documents

被引:0
|
作者
Ji, X [1 ]
Zha, HY [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a weighted graph based method to simultaneously compare the textual content of two or more documents and extract the shared (sub)topics of them, if available. A set of documents are modelled with a set of pairwise weighted bipartite graphs. A generalized mutual reinforcement principle is applied to the pairwise bipartite graphs to calculate the saliency scores of sentences in each documents based on pairwise weighted bipartite graphs. Sentences with advantaged saliency are selected, and they together convey the dominant shared topic. If there are more than one shared subtopics among the documents, a spectral min-max cut algorithm can be used to partition a derived sentence similarity graph into several subgraphs. For a subgraph, if all documents contribute some sentences (nodes) to it, then these sentences (nodes) in the subgraph may convey a shared subtopic. The generalized mutual reinforcement principle is applied to them to verify and extract the shared subtopic.
引用
收藏
页码:100 / 110
页数:11
相关论文
共 50 条
  • [1] Extracting hot spots of topics from time-stamped documents
    Chen, Wei
    Chundi, Parvathi
    [J]. DATA & KNOWLEDGE ENGINEERING, 2011, 70 (07) : 642 - 660
  • [2] Extracting Hot Spots of Basic and Complex Topics From Time Stamped Documents
    Chen, Wei
    Chundi, Parvathi
    [J]. 2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 125 - 132
  • [3] Extracting and evaluating topics by region
    Noh, Joonho
    Lee, Soowon
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (20) : 12765 - 12777
  • [4] Extracting and evaluating topics by region
    Joonho Noh
    Soowon Lee
    [J]. Multimedia Tools and Applications, 2016, 75 : 12765 - 12777
  • [5] Extracting structures of HTML']HTML documents
    Lim, SJ
    Ng, YK
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN-12), PROCEEDINGS, 1998, : 420 - 426
  • [6] ASSIGNING TOPICS TO DOCUMENTS BY SUCCESSIVE PROJECTIONS
    Klopp, Olga
    Panov, Maxim
    Sigalla, Suzanne
    Tsybakov, Alexandre B.
    [J]. ANNALS OF STATISTICS, 2023, 51 (05): : 1989 - 2014
  • [7] Extracting relations from XML documents
    Agichtein, E
    Ho, CTH
    Josifovski, V
    Gerhardt, J
    [J]. CONCEPTUAL MODELING FOR NOVEL APPLICATION DOMAINS, PROCEEDINGS, 2003, 2814 : 390 - 401
  • [8] Extracting attractive local-area topics in georeferenced documents using a new density-based spatial clustering algorithm
    Sakai, Tatsuhiro
    Tamura, Keiichi
    Kitakami, Hajime
    [J]. IAENG International Journal of Computer Science, 2014, 41 (03) : 185 - 192
  • [9] Extracting Turkish Tweet Topics Using LDA
    Gemci, Fahriye
    Peker, Kadir A.
    [J]. 2013 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2013, : 531 - 534
  • [10] Customer Experience: Extracting Topics From Tweets
    Mishra, Manit
    [J]. INTERNATIONAL JOURNAL OF MARKET RESEARCH, 2022, 64 (03) : 334 - 353