An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

被引:9
|
作者
Chouigui, Amina [1 ]
Ben Khiroun, Oussama [1 ,2 ]
Elayeb, Bilel [1 ,3 ]
机构
[1] Manouba Univ, RIADI Res Lab, ENSI, Manouba, Tunisia
[2] Univ Carthage, Fac Econ & Management Nabeul, Tunis, Tunisia
[3] Emirates Coll Technol, Abu Dhabi, U Arab Emirates
关键词
Automatic text summarization; Arabic corpus; RSS crawler; TREC format; Language-independent summarizer;
D O I
10.1007/s13369-020-05258-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic text summarization is considered as an important task in various fields in natural language processing such as information retrieval. It is a process of automatically generating a text representation. Text summarization can be a solution to the problem of information overload. Hence, with the large amount of information available on the Internet, the presentation of a document by a summary helps to get the most relevant result of a search. We propose in this paper a new free Arabic structured corpus in the standard XML TREC format. ANT corpus v2.1 is collected using RSS feeds from different news sources. This corpus is useful for multiple text mining purposes such as generic text summarization, clustering or classification. We test this corpus for an unsupervised single-document extractive summarization using statistical and graph-based language-independent summarizers such as LexRank, TextRank, Luhn and LSA. We investigate the sensitivity of the summarization process to the stemming and stop words removal steps. We evaluate these summarizers performance by comparing the extracted texts fragments to the abstracts existing in ANT corpus v2.1 using ROUGE and BLEU metrics. Experimental results show that LexRank summarizer has achieved the best scores for the ROUGE metric using the stop words removal scenario.
引用
收藏
页码:3925 / 3938
页数:14
相关论文
共 50 条
  • [1] An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization
    Amina Chouigui
    Oussama Ben Khiroun
    Bilel Elayeb
    [J]. Arabian Journal for Science and Engineering, 2021, 46 : 3925 - 3938
  • [2] The CNN-Corpus: A Large Textual Corpus for Single-Document Extractive Summarization
    Lins, Rafael Dueire
    Oliveira, Hilario
    Cabral, Luciano
    Batista, Jamilson
    Tenorio, Bruno
    Ferreira, Rafael
    Lima, Rinaldo
    Pereira e Silva, Gabriel de Franca
    Simske, Steven J.
    [J]. DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [3] Features in extractive supervised single-document summarization: case of Persian news
    Rezaei, Hosein
    Mirhosseini, Seyed Amid Moeinzadeh
    Shahgholian, Azar
    Saraee, Mohamad
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [4] Analyzing Preprocessing Settings for Urdu Single-document Extractive Summarization
    Humayoun, Muhammad
    Yu, Hwanjo
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3686 - 3693
  • [5] Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization
    Jin, Hanqi
    Wan, Xiaojun
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2545 - 2554
  • [6] Extractive single-document summarization using adaptive binary constrained multi-objective differential evaluation
    Debnath, Dipanwita
    Das, Ranjita
    Pakray, Partha
    Laskar, Ruzina
    [J]. INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2022,
  • [7] Extractive single-document summarization based on genetic operators and guided local search
    Mendoza, Martha
    Bonilla, Susana
    Noguera, Clara
    Cobos, Carlos
    Leon, Elizabeth
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (09) : 4158 - 4169
  • [8] Arabic Single-Document Text Summarization Using Particle Swarm Optimization Algorithm
    Al-Abdallah, Raed Z.
    Al-Taani, Ahmad T.
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 30 - 37
  • [9] Single-document and multi-document summarization techniques for email threads using sentence compression
    Zajic, David M.
    Dorr, Bonnie J.
    Lin, Jimmy
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (04) : 1600 - 1610
  • [10] Extractive Single-Document Summarization Based on Global-Best Harmony Search and a Greedy Local Optimizer
    Mendoza, Martha
    Cobos, Carlos
    Leon, Elizabeth
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS, MICAI 2015, PT II, 2015, 9414 : 52 - 66