Multi-Document Summarization by Information Distance

被引:4
|
作者
Long, Chong [1 ]
Huang, Minlie [1 ]
Zhu, Xiaoyan [1 ]
Li, Ming [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing, Peoples R China
[2] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
Data Mining; Text Mining; Kolmogorov Complexity; Information Distance;
D O I
10.1109/ICDM.2009.107
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper described a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC 2007 dataset(1) and the TAC 2008 dataset(2) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
引用
收藏
页码:866 / +
页数:2
相关论文
共 50 条
  • [41] Multi-document summarization based on the Yago ontology
    Baralis, Elena
    Cagliero, Luca
    Jabeen, Saima
    Fiori, Alessandro
    Shah, Sajid
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) : 6976 - 6984
  • [42] SUBTOPIC-BASED MULTI-DOCUMENT SUMMARIZATION
    Dai, Lin
    Tang, Ji-Liang
    Xia, Yun-Qing
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3505 - +
  • [43] A Hybrid Hierarchical Model for Multi-Document Summarization
    Celikyilmaz, Asli
    Hakkani-Tur, Dilek
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 815 - 824
  • [44] MRS for multi-document summarization by sentence extraction
    Xu, Yong-Dong
    Zhang, Xiao-Dong
    Quan, Guang-Ri
    Wang, Ya-Dong
    [J]. TELECOMMUNICATION SYSTEMS, 2013, 53 (01) : 91 - 98
  • [45] Rhetorics-based multi-document summarization
    Atkinson, John
    Munoz, Ricardo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4346 - 4352
  • [46] Subtopic-driven Multi-Document Summarization
    Zheng, Xin
    Sun, Aixin
    Li, Jing
    Muthuswamy, Karthik
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3153 - 3162
  • [47] Do Multi-Document Summarization Models Synthesize?
    DeYoung, Jay
    Martinez, Stephanie C.
    Marshall, Iain J.
    Wallace, Byron C.
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1043 - 1062
  • [48] A Game Theory Approach for Multi-document Summarization
    Amreen Ahmad
    Tanvir Ahmad
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 3655 - 3667
  • [49] Multi-document summarization using discourse models
    Cardoso, Paula C. F.
    Pardo, Thiago A. S.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2016, (56): : 57 - 64
  • [50] Multi-document Summarization Based on Sentence Clustering
    Zheng, Hai-Tao
    Gong, Shu-Qin
    Chen, Hao
    Jiang, Yong
    Xia, Shu-Tao
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 429 - 436