The CNN-Corpus: A Large Textual Corpus for Single-Document Extractive Summarization

被引:7
|
作者
Lins, Rafael Dueire [1 ,2 ]
Oliveira, Hilario [1 ]
Cabral, Luciano [1 ]
Batista, Jamilson [1 ]
Tenorio, Bruno [1 ]
Ferreira, Rafael [2 ]
Lima, Rinaldo [2 ]
Pereira e Silva, Gabriel de Franca [2 ]
Simske, Steven J. [3 ]
机构
[1] Univ Fed Pernambuco, Recife, PE, Brazil
[2] Univ Fed Rural Pernambuco, Recife, PE, Brazil
[3] Colorado State Univ, Ft Collins, CO 80523 USA
关键词
Single-document Summarization; Corpus; Extractive Summarization; Multi-language Summarization;
D O I
10.1145/3342558.3345388
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper details the features and the methodology adopted in the construction of the CNN-corpus, a test corpus for single document extractive text summarization of news articles. The current version of the CNN-corpus encompasses 3,000 texts in English, and each of them has an abstractive and an extractive summary. The corpus allows quantitative and qualitative assessments of extractive summarization strategies.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] The CNN-Corpus in Spanish: a Large Corpus for Extractive Text Summarization in the Spanish Language
    Lins, Rafael Dueire
    Oliveira, Hilario
    Cabral, Luciano
    Batista, Jamilson
    Tenorio, Bruno
    Salcedo, Diego A.
    Ferreira, Rafael
    Lima, Rinaldo
    Pereira e Silva, Gabriel de Franca
    Simske, Steven J.
    [J]. DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [2] An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization
    Amina Chouigui
    Oussama Ben Khiroun
    Bilel Elayeb
    [J]. Arabian Journal for Science and Engineering, 2021, 46 : 3925 - 3938
  • [3] An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization
    Chouigui, Amina
    Ben Khiroun, Oussama
    Elayeb, Bilel
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (04) : 3925 - 3938
  • [4] Analyzing Preprocessing Settings for Urdu Single-document Extractive Summarization
    Humayoun, Muhammad
    Yu, Hwanjo
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3686 - 3693
  • [5] Features in extractive supervised single-document summarization: case of Persian news
    Rezaei, Hosein
    Mirhosseini, Seyed Amid Moeinzadeh
    Shahgholian, Azar
    Saraee, Mohamad
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [6] Extractive single-document summarization based on genetic operators and guided local search
    Mendoza, Martha
    Bonilla, Susana
    Noguera, Clara
    Cobos, Carlos
    Leon, Elizabeth
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (09) : 4158 - 4169
  • [7] Single Document Text Summarization by Knowledge-Corpus
    Dasari, Durga Bhavani
    Rao, Venu Gopala
    [J]. PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE IN MOOC, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE), 2013, : 134 - 138
  • [8] A LARGE-SCALE CHINESE LONG-TEXT EXTRACTIVE SUMMARIZATION CORPUS
    Chen, Kai
    Fu, Guanyu
    Chen, Qingcai
    Hu, Baotian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7828 - 7832
  • [9] Extractive Single-Document Summarization Based on Global-Best Harmony Search and a Greedy Local Optimizer
    Mendoza, Martha
    Cobos, Carlos
    Leon, Elizabeth
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS, MICAI 2015, PT II, 2015, 9414 : 52 - 66
  • [10] Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
    Almeida, Miguel B.
    Almeida, Mariana S. C.
    Martins, Andre F. T.
    Figueira, Helena
    Mendes, Pedro
    Pinto, Claudia
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,