The automatic construction of large-scale corpora for summarization research

被引:35
|
作者
Marcu, D [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
关键词
D O I
10.1145/312624.312668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an (Abstract, Text) tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 50 条
  • [21] A Research on Construction of 30 Bus Large-scale Smart Grid Model
    Kainose, Sho
    Nagasaka, Ken
    2015 INTERNATIONAL CONFERENCE ON ADVANCED MECHATRONIC SYSTEMS (ICAMECHS), 2015, : 295 - 300
  • [22] Optimization Research: Enhancing the Robustness of Large-Scale Multiobjective Optimization in Construction
    Kandil, Amr
    El-Rayes, Khaled
    El-Anwar, Omar
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2010, 136 (01) : 17 - 25
  • [23] Research on the Construction of Large-scale Exhibition Marketing Communication Management Mode
    Hong, Ye
    Wang, Qian
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03): : 766 - 770
  • [24] Research on the Key Measures of Schedule Control for large-scale construction enterprises
    Lu Hu-sheng
    Wu Guo-wei
    Cao Jian-zhong
    EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 2065 - 2068
  • [25] Research on Overall Value Management System of Large-scale Construction Project
    Zhao Wenchuang
    Hao Jun
    Wang Wenshun
    CRIOCM2009: INTERNATIONAL SYMPOSIUM ON ADVANCEMENT OF CONSTRUCTION MANAGEMENT AND REAL ESTATE, VOLS 1-6, 2009, : 1775 - 1783
  • [26] Using Movie Subtitles for Creating a Large-Scale Bilingual Corpora
    Itamar, Einav
    Itai, Alon
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 269 - 272
  • [27] Summarization and visualization of communication patterns in a large-scale social network
    Appan, Preetha
    Sundaram, Hari
    Tseng, Belle
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 371 - 379
  • [28] NEWSFARM: A Large-Scale Chinese Corpus of Long News Summarization
    Zang, Shunan
    Zhang, Chuang
    Liu, Xiaojun
    Chen, Xiaojun
    Zhang, Peng
    Liu, Jie
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2260 - 2272
  • [29] Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
    Kantharaj, Shankar
    Leong, Rixie Tiffany Ko
    Lin, Xiang
    Masry, Ahmed
    Thakkar, Megh
    Hoque, Enamul
    Joty, Shafiq
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4005 - 4023
  • [30] MEDIASUM: A Large-scale Media Interview Dataset for Dialogue Summarization
    Zhu, Chenguang
    Liu, Yang
    Mei, Jie
    Zeng, Michael
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5927 - 5934