Cross-lingual training of summarization systems using annotated corpora in a foreign language

被引：0

作者：

Marina Litvak

Mark Last

机构：

[1] Sami Shamoon Academic College of Engineering,

[2] Ben Gurion University of the Negev,undefined

来源：

Information Retrieval | 2013年 / 16卷

关键词：

Multilingual summarization; Genetic Algorithm; Cross-lingual training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The increasing trend of cross-border globalization and acculturation requires text summarization techniques to work equally well for multiple languages. However, only some of the automated summarization methods can be defined as “language-independent,” i.e., not based on any language-specific knowledge. Such methods can be used for multilingual summarization, defined in Mani (Automatic summarization. Natural language processing. John Benjamins Publishing Company, Amsterdam, 2001) as “processing several languages, with a summary in the same language as input”, but, their performance is usually unsatisfactory due to the exclusion of language-specific knowledge. Moreover, supervised machine learning approaches need training corpora in multiple languages that are usually unavailable for rare languages, and their creation is a very expensive and labor-intensive process. In this article, we describe cross-lingual methods for training an extractive single-document text summarizer called MUSE (MUltilingual Sentence Extractor)—a supervised approach, based on the linear optimization of a rich set of sentence ranking measures using a Genetic Algorithm. We evaluated MUSE’s performance on documents in three different languages: English, Hebrew, and Arabic using several training scenarios. The summarization quality was measured using ROUGE-1 and ROUGE-2 Recall metrics. The results of the extensive comparative analysis showed that the performance of MUSE was better than that of the best known multilingual approach (TextRank) in all three languages. Moreover, our experimental results suggest that using the same sentence ranking model across languages results in a reasonable summarization quality, while saving considerable annotation efforts for the end-user. On the other hand, using parallel corpora generated by machine translation tools may improve the performance of a MUSE model trained on a foreign language. Comparative evaluation of an alternative optimization technique—Multiple Linear Regression—justifies the use of a Genetic Algorithm.

引用

页码：629 / 656

页数：27

共 50 条

[1] Cross-lingual training of summarization systems using annotated corpora in a foreign language
Litvak, Marina
Last, Mark
[J]. INFORMATION RETRIEVAL, 2013, 16 (05): : 629 - 656
[2] Mixed-Lingual Pre-training for Cross-lingual Summarization
Xu, Ruochen
Zhu, Chenguang
Shi, Yu
Zeng, Michael
Huang, Xuedong
[J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 536 - 541
[3] Cross-lingual timeline summarization
Cagliero, Luca
La Quatra, Moreno
Garza, Paolo
Baralis, Elena
[J]. 2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
[4] A Survey on Cross-Lingual Summarization
Wang, Jiaan
Meng, Fandong
Zheng, Duo
Liang, Yunlong
Li, Zhixu
Qu, Jianfeng
Zhou, Jie
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1304 - 1323
[5] NCLS: Neural Cross-Lingual Summarization
Zhu, Junnan
Wang, Qian
Wang, Yining
Zhou, Yu
Zhang, Jiajun
Wang, Shaonan
Zong, Chengqing
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3054 - 3064
[6] Review of Research on Cross-Lingual Summarization
Zheng, Bofei
Yun, Jing
Liu, Limin
Jiao, Lei
Yuan, Jingshu
[J]. Computer Engineering and Applications, 2023, 59 (13) : 49 - 60
[7] Investigating cross-lingual training for offensive language detection
Pelicon, Andraz
Shekhar, Ravi
Skrlj, Blaz
Purver, Matthew
Pollak, Senja
[J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 2 - 39
[8] Cross-Lingual Speech-to-Text Summarization
Pontes, Elvys Linhares
Gonzalez-Gallardo, Carlos-Emiliano
Torres-Moreno, Juan-Manuel
Huet, Stephane
[J]. MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
[9] Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora
Selvam, M.
Natarajan, A. M.
[J]. CURRENT SCIENCE, 2010, 98 (07): : 922 - 929
[10] Towards Unifying Multi-Lingual and Cross-Lingual Summarization
Wang, Jiaan
Meng, Fandong
Zheng, Duo
Liang, Yunlong
Li, Zhixu
Qu, Jianfeng
Zhou, Jie
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15127 - 15143

← 1 2 3 4 5 →