MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset

被引:0
|
作者
Shi, Xiaorui [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
关键词
Multimodal Summarization; Cross-lingual Summarization; Knowledge Distillation;
D O I
10.1007/978-981-99-6207-5_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal summarization which aims to generate summaries with multimodal inputs, e.g., text and visual features, has attracted much attention in the research community. However, previous studies only focus on monolingual multimodal summarization and neglect the non-native reader to understand the cross-lingual news in practical applications. It inspires us to present a new task, named Multimodal Cross-Lingual Summarization for news (MCLS), which generates cross-lingual summaries from multi-source information. To this end, we present a large-scale multimodal cross-lingual summarization dataset, which consists of 1.1 million article-summary pairs with 3.4 million images in 44 * 43 language pairs. To generate a summary in any language, we propose a unified framework that jointly trains the multimodal monolingual and cross-lingual summarization tasks, where a bi-directional knowledge distillation approach is designed to transfer knowledge between both tasks. Extensive experiments on many-to-many settings show the effectiveness of the proposed model.
引用
收藏
页码:273 / 288
页数:16
相关论文
共 50 条
  • [1] Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
    Zhang, Ran
    Ouni, Jihed
    Eger, Steffen
    [J]. COMPUTATIONAL LINGUISTICS, 2024, 50 (03) : 1001 - 1047
  • [2] Boosting to Build a Large-Scale Cross-Lingual Ontology
    Wang, Zhigang
    Pan, Liangming
    Li, Juanzi
    Li, Shuangjie
    Li, Mingyang
    Tang, Jie
    [J]. KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: SEMANTIC, KNOWLEDGE, AND LINKED BIG DATA, 2016, 650 : 41 - 53
  • [3] WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
    Ladhak, Faisal
    Durmus, Esin
    Cardie, Claire
    McKeown, Kathleen
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4034 - 4048
  • [4] CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset
    Chen, Zheng
    Lin, Hongyu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6932 - 6937
  • [5] TrendMiner: Large-scale Cross-lingual Trend Mining Summarization of Real-time Media Streams
    Martinez, Paloma
    Segura, Isabel
    Declerck, Thierry
    Martinez, Jose L.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2014, (53): : 163 - 166
  • [6] Cross-lingual timeline summarization
    Cagliero, Luca
    La Quatra, Moreno
    Garza, Paolo
    Baralis, Elena
    [J]. 2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
  • [7] A Survey on Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1304 - 1323
  • [8] Large-scale Cross-lingual Language Resources for Referencing and Framing
    Vossen, Piek
    Ilievski, Filip
    Postma, Marten
    Fokkens, Antske
    Minnema, Gosse
    Remijnse, Levi
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3162 - 3171
  • [9] Cross-Lingual Entity Query from Large-Scale Knowledge Graphs
    Su, Yonghao
    Zhang, Chi
    Li, Jinyang
    Wang, Chengyu
    Qian, Weining
    Zhou, Aoying
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2015 WORKSHOPS, 2015, 9461 : 139 - 150
  • [10] Dataset construction method of cross-lingual summarization based on filtering and text augmentation
    Pan, Hangyu
    Xi, Yaoyi
    Wang, Ling
    Nan, Yu
    Su, Zhizhong
    Cao, Rong
    [J]. PeerJ Computer Science, 2023, 9