Two New Datasets for Italian-Language Abstractive Text Summarization

被引:0
|
作者
Landro, Nicola [1 ,2 ,5 ]
Gallo, Ignazio [1 ,5 ]
La Grassa, Riccardo [3 ]
Federici, Edoardo [4 ]
机构
[1] Univ Insubria, Dept Theoret & Appl Sci DISTA, Via JH Dunant 3, I-21100 Varese, Italy
[2] AIKnowYOU Srl, I-21029 Varese, Italy
[3] INAF Osservatorio Astron Padova, I-35122 Padua, Italy
[4] Digitiamo Srl, I-21100 Varese, Italy
[5] Univ Insubria, Dipartimento Sci Teor & Applicate DISTA, Via O Rossi 9, I-21100 Varese, Italy
关键词
abstractive text summarization; datasets; deep learning;
D O I
10.3390/info13050228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text summarization aims to produce a short summary containing relevant parts from a given text. Due to the lack of data for abstractive summarization on low-resource languages such as Italian, we propose two new original datasets collected from two Italian news websites with multi-sentence summaries and corresponding articles, and from a dataset obtained by machine translation of a Spanish summarization dataset. These two datasets are currently the only two available in Italian for this task. To evaluate the quality of these two datasets, we used them to train a T5-base model and an mBART model, obtaining good results with both. To better evaluate the results obtained, we also compared the same models trained on automatically translated datasets, and the resulting summaries in the same training language, with the automatically translated summaries, which demonstrated the superiority of the models obtained from the proposed datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Abstractive Text Summarization for the Urdu Language: Data and Methods
    Awais, Muhammad
    Muhammad Adeel Nawab, Rao
    [J]. IEEE ACCESS, 2024, 12 : 61198 - 61210
  • [2] Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian
    Batuhan Baykara
    Tunga Güngör
    [J]. Language Resources and Evaluation, 2022, 56 : 973 - 1007
  • [3] Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian
    Baykara, Batuhan
    Gungor, Tunga
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 973 - 1007
  • [4] A Survey of Abstractive Text Summarization Utilising Pretrained Language Models
    Syed, Ayesha Ayub
    Gaol, Ford Lumban
    Boediman, Alfred
    Matsuo, Tokuro
    Budiharto, Widodo
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 532 - 544
  • [5] Abstractive text summarization for Hungarian
    Yang, Zijian Gyozo
    Agocs, Adam
    Kusper, Gabor
    Varadi, Tamas
    [J]. ANNALES MATHEMATICAE ET INFORMATICAE, 2021, 53 : 299 - 316
  • [6] A Survey on Abstractive Text Summarization
    Moratanch, N.
    Chitrakala, S.
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [7] An approach to Abstractive Text Summarization
    Huong Thanh Le
    Tien Manh Le
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 371 - 376
  • [8] Survey on Abstractive Text Summarization
    Raphal, Nithin
    Duwarah, Hemanta
    Daniel, Philemon
    [J]. PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 513 - 517
  • [9] Italian-Language Press
    Fati, Walter A.
    [J]. JOURNALISM QUARTERLY, 1940, 17 (04): : 383 - 388
  • [10] Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
    Aksenov, Dmitrii
    Moreno-Schneider, Julian
    Bourgonje, Peter
    Schwarzenberg, Robert
    Hennig, Leonhard
    Rehm, Georg
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6680 - 6689