Cross-lingual extreme summarization of scholarly documents

被引:3
|
作者
Takeshita, Sotaro [1 ]
Green, Tommaso [1 ]
Friedrich, Niklas [1 ]
Eckert, Kai [2 ]
Ponzetto, Simone Paolo [1 ]
机构
[1] Univ Mannheim, Data & Web Sci Grp, Mannheim, Germany
[2] Mannheim Univ Appl Sci, Dept Comp Sci, Mannheim, Germany
关键词
Scholarly document processing; Summarization; Multilinguality;
D O I
10.1007/s00799-023-00373-2
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder-decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage pipeline approach that independently summarizes and translates, as well as a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios. Finally, we investigate how to make our approach more efficient on the basis of knowledge distillation methods, which make it possible to shrink the size of our models, so as to reduce the computational complexity of the summarization inference.
引用
收藏
页码:249 / 271
页数:23
相关论文
共 50 条
  • [1] X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents
    Takeshita, Sotaro
    Green, Tommaso
    Friedrich, Niklas
    Eckert, Kai
    Ponzetto, Simone Paolo
    [J]. 2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
  • [2] Cross-lingual timeline summarization
    Cagliero, Luca
    La Quatra, Moreno
    Garza, Paolo
    Baralis, Elena
    [J]. 2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
  • [3] A Survey on Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1304 - 1323
  • [4] NCLS: Neural Cross-Lingual Summarization
    Zhu, Junnan
    Wang, Qian
    Wang, Yining
    Zhou, Yu
    Zhang, Jiajun
    Wang, Shaonan
    Zong, Chengqing
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3054 - 3064
  • [5] Review of Research on Cross-Lingual Summarization
    Zheng, Bofei
    Yun, Jing
    Liu, Limin
    Jiao, Lei
    Yuan, Jingshu
    [J]. Computer Engineering and Applications, 2023, 59 (13) : 49 - 60
  • [6] Cross-Lingual Speech-to-Text Summarization
    Pontes, Elvys Linhares
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    [J]. MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
  • [7] Towards Unifying Multi-Lingual and Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15127 - 15143
  • [8] A Robust Abstractive System for Cross-Lingual Summarization
    Ouyang, Jessica
    Song, Boya
    McKeown, Kathleen
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2025 - 2031
  • [9] A Cross-Lingual Summarization method based on cross-lingual Fact-relationship Graph Generation
    Zhang, Yongbing
    Gao, Shengxiang
    Huang, Yuxin
    Tan, Kaiwen
    Yu, Zhengtao
    [J]. PATTERN RECOGNITION, 2024, 146
  • [10] Mixed-Lingual Pre-training for Cross-lingual Summarization
    Xu, Ruochen
    Zhu, Chenguang
    Shi, Yu
    Zeng, Michael
    Huang, Xuedong
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 536 - 541