The automatic construction of large-scale corpora for summarization research

被引:35
|
作者
Marcu, D [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
关键词
D O I
10.1145/312624.312668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an (Abstract, Text) tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 50 条
  • [1] The Research on Automatic Construction Techniques of Large-scale Corpus for Chinese Text Categorization
    Hu, Yan
    Wu, Wei
    Miao, Miao
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 640 - 645
  • [2] DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles
    Segarra, Encarna
    Ahuir, Vicent
    Hurtado, Lluis-F
    Angel Gonzalez, Jose
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5931 - 5943
  • [3] International large-scale vehicle corpora for research on driver behavior on the road
    Department of Media Science, Graduate School of Information Science, Nagoya University, Nagoya 464-8603, Japan
    不详
    不详
    不详
    不详
    不详
    IEEE Trans. Intell. Transp. Syst., 4 (1609-1623):
  • [4] International Large-Scale Vehicle Corpora for Research on Driver Behavior on the Road
    Takeda, Kazuya
    Hansen, John H. L.
    Boyraz, Pinar
    Malta, Lucas
    Miyajima, Chiyomi
    Abut, Huseyin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2011, 12 (04) : 1609 - 1623
  • [5] Evaluation challenges in large-scale document summarization
    Radev, DR
    Teufel, S
    Saggion, H
    Lam, W
    Blitzer, J
    Qi, H
    41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 375 - 382
  • [6] Research on the A* Algorithm for Automatic Guided Vehicles in Large-Scale Maps
    Chen, Yuandong
    Pang, Jinhao
    Gou, Yuchen
    Lin, Zhiming
    Zheng, Shaofeng
    Chen, Dewang
    Applied Sciences (Switzerland), 2024, 14 (22):
  • [7] RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization
    Kamezawa, Hisashi
    Nishida, Noriki
    Shimizu, Nobuyuki
    Miyazaki, Takashi
    Nakayama, Hideki
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8718 - 8735
  • [8] Research on construction of unified database of large-scale power grid
    Li, Fang
    Chen, Yong
    Zhang, Songsu
    Hu, Tao
    Dianwang Jishu/Power System Technology, 2013, 37 (02): : 417 - 424
  • [9] Research on Design Management of Large-scale Tobacco Construction Project
    Shan, Mei-di
    PROGRESS IN INDUSTRIAL AND CIVIL ENGINEERING II, PTS 1-4, 2013, 405-408 : 3468 - 3472
  • [10] Research on Schedule in Large-Scale Steel Structures' Construction Processing
    Sun, Jiusheng
    Zhang, Qing
    Liu, Ming
    ADVANCED SCIENCE LETTERS, 2011, 4 (6-7) : 2405 - 2408