Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning

被引:21
|
作者
Lamsiyah, Salima [1 ]
El Mahdaouy, Abdelkader [3 ]
Ouatik, Said El Alaoui [1 ,2 ]
Espinasse, Bernard [4 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, FSDM, Lab Informat Signals Automat & Cognitivism, BP 1796, Fez Atlas 30003, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Lab Engn Sci, Kenitra, Morocco
[3] Mohammed VI Polytech Univ UM6P, Sch Comp Sci UM6P CS, Ben Guerir, Morocco
[4] Univ Toulon & Var, Aix Marseille Univ, CNRS, LIS,UMR 7020, Toulon, France
关键词
BERT fine-tuning; multi-document summarization; multi-task learning; sentence representation learning; transfer learning; SENTENCE SCORING TECHNIQUES;
D O I
10.1177/0165551521990616
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods. Transfer learning using pre-trained word embedding models has shown promising results. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. To overcome this issue, the current study proposes an unsupervised method for extractive multi-document summarization based on transfer learning from BERT sentence embedding model. Moreover, to improve sentence representation learning, we fine-tune BERT model on supervised intermediate tasks from GLUE benchmark datasets using single-task and multi-task fine-tuning methods. Experiments are performed on the standard DUC'2002-2004 datasets. The obtained results show that our method has significantly outperformed several baseline methods and achieves a comparable and sometimes better performance than the recent state-of-the-art deep learning-based methods. Furthermore, the results show that fine-tuning BERT using multi-task learning has considerably improved the performance.
引用
收藏
页码:164 / 182
页数:19
相关论文
共 50 条
  • [41] Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification
    Elnaggar, Ahmed
    Gebendorfer, Christoph
    Glaser, Ingo
    Matthes, Florian
    PROCEEDINGS OF 2018 ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE (AICCC 2018), 2018, : 9 - 15
  • [42] Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem
    Naserasadi, Ali
    Khosravi, Hamid
    Sadeghi, Faramarz
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 121 - 146
  • [43] A multi-task transfer learning method with dictionary learning
    Zheng, Xin
    Lin, Luyue
    Liu, Bo
    Xiao, Yanshan
    Xiong, Xiaoming
    KNOWLEDGE-BASED SYSTEMS, 2020, 191
  • [44] Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position
    Hai Cao Manh
    Huong Le Thanh
    Tuan Luu Minh
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 29 - 35
  • [45] Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning
    Prottasha, Nusrat Jahan
    Sami, Abdullah As
    Kowsher, Md
    Murad, Saydul Akbar
    Bairagi, Anupam Kumar
    Masud, Mehedi
    Baz, Mohammed
    SENSORS, 2022, 22 (11)
  • [46] Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning
    Lee, Haeju
    Jeong, Minchan
    Yun, Se-Young
    Kim, Kee-Eung
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4942 - 4958
  • [47] Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization
    Peyrard, Maxime
    Eckle-Kohler, Judith
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1084 - 1094
  • [48] Improving Deep Learning based Multi-document Summarization through Linguistic Knowledge
    Ma, Congbo
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2704 - 2704
  • [49] Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
    Mahabadi, Rabeeh Karimi
    Ruder, Sebastian
    Dehghani, Mostafa
    Henderson, James
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 565 - 576
  • [50] Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization
    Wan, Xiaojun
    Xiao, Jianguo
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1586 - 1591