An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

被引:9
|
作者
Chouigui, Amina [1 ]
Ben Khiroun, Oussama [1 ,2 ]
Elayeb, Bilel [1 ,3 ]
机构
[1] Manouba Univ, RIADI Res Lab, ENSI, Manouba, Tunisia
[2] Univ Carthage, Fac Econ & Management Nabeul, Tunis, Tunisia
[3] Emirates Coll Technol, Abu Dhabi, U Arab Emirates
关键词
Automatic text summarization; Arabic corpus; RSS crawler; TREC format; Language-independent summarizer;
D O I
10.1007/s13369-020-05258-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic text summarization is considered as an important task in various fields in natural language processing such as information retrieval. It is a process of automatically generating a text representation. Text summarization can be a solution to the problem of information overload. Hence, with the large amount of information available on the Internet, the presentation of a document by a summary helps to get the most relevant result of a search. We propose in this paper a new free Arabic structured corpus in the standard XML TREC format. ANT corpus v2.1 is collected using RSS feeds from different news sources. This corpus is useful for multiple text mining purposes such as generic text summarization, clustering or classification. We test this corpus for an unsupervised single-document extractive summarization using statistical and graph-based language-independent summarizers such as LexRank, TextRank, Luhn and LSA. We investigate the sensitivity of the summarization process to the stemming and stop words removal steps. We evaluate these summarizers performance by comparing the extracted texts fragments to the abstracts existing in ANT corpus v2.1 using ROUGE and BLEU metrics. Experimental results show that LexRank summarizer has achieved the best scores for the ROUGE metric using the stop words removal scenario.
引用
收藏
页码:3925 / 3938
页数:14
相关论文
共 50 条
  • [31] A Regression-based Approach using Integer Linear Programming for Single-document Summarization
    Oliveira, Hilario
    Lins, Rafael Dueire
    Lima, Rinaldo
    Freitas, Fred
    Simske, Steven J.
    [J]. 2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 270 - 277
  • [32] Towards Coherent Single-Document Summarization: An Integer Linear Programming-based Approach
    Garcia, Rodrigo
    Lima, Rinaldo
    Espinasse, Bernard
    Oliveira, Hilario
    [J]. 33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 712 - 719
  • [33] Extractive text summarization of arabic multi-document using fuzzy C-means and Latent Dirichlet Allocation
    Al-Taani, Ahmad T. T.
    Al-Sayadi, Sami H. H.
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (02) : 713 - 726
  • [34] Extractive text summarization of arabic multi-document using fuzzy C-means and Latent Dirichlet Allocation
    Ahmad T. Al-Taani
    Sami H. Al-Sayadi
    [J]. International Journal of System Assurance Engineering and Management, 2024, 15 : 713 - 726
  • [35] Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid Clustering
    Alqaisi, Rana
    Ghanem, Wasel
    Qaroush, Aziz
    [J]. IEEE ACCESS, 2020, 8 : 228206 - 228224
  • [36] A NEW MODEL FOR ARABIC MULTI-DOCUMENT TEXT SUMMARIZATION
    Abu Maria, Khulood
    Jaber, Khalid Mohammad
    Ibrahim, Mossab Nabil
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (04): : 1443 - 1452
  • [37] SRRank: Leveraging Semantic Roles for Extractive Multi-Document Summarization
    Yan, Su
    Wan, Xiaojun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2048 - 2058
  • [38] Multi-document extractive text summarization: A comparative assessment on features
    Mutlu, Begum
    Sezer, Ebru A.
    Akcayol, M. Ali
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 183
  • [39] Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks
    Tohalino, Jorge Valverde
    Amancio, Diego Raphael
    [J]. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 366 - 371
  • [40] Extractive Multi-Document Summarization: A Review of Progress in the Last Decade
    Jalil, Zakia
    Nasir, Jamal Abdul
    Nasir, Muhammad
    [J]. IEEE ACCESS, 2021, 9 : 130928 - 130946