Learning bilingual word embedding for automatic text summarization in low resource language

被引:2
|
作者
Wijayanti, Rini [1 ,3 ]
Khodra, Masayu Leylia [1 ,2 ]
Surendro, Kridanto [1 ]
Widyantoro, Dwi H. [1 ,2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
[2] Univ Ctr Excellence Artificial Intelligence Vis, Inst Teknol Bandung, Nat Language Proc & Big Data Analyt U CoE AI VLB, Bandung, Indonesia
[3] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung 40132, Indonesia
关键词
Bilingual word embedding; Cross -lingual transfer learning; Extractive summarization; Low -resource language;
D O I
10.1016/j.jksuci.2023.03.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Studies in low-resource languages have become more challenging with the increasing volume of texts in today ' s digital era. Also, the lack of labeled data and text processing libraries in a language further widens the research gap between high and low-resource languages, such as English and Indonesian. This has led to the use of a transfer learning approach, which applies pre-trained models to solve similar problems, even in different languages by using bilingual or cross-lingual word embedding. Therefore, this study aims to investigate two bilingual word embedding methods, namely VecMap and BiVec, for Indonesian - English language and evaluates them for bilingual lexicon induction and text summarization tasks. The generated bilingual embedding was compared with MUSE (Multilingual Unsupervised and Supervised Embeddings) as the existing multilingual word created with the generative adversarial network method. Furthermore, the VecMap was improved by creating shared vocabulary spaces and mapping the unshared ones between languages. The result showed the embedding produced by the joint methods of BiVec, performed better in intrinsic evaluation, especially with CSLS (Cross-Domain Similarity Local Scaling) retrieval. Meanwhile, the improved VecMap outperformed the regular type by 16.6% without surpassing the BiVec evaluation score. These methods enabled model transfer between languages when applied to cross-lingual-based text summarization. Moreover, the ROUGE score outperformed classical text summarization by adding only 10% of the training dataset of the target language. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
引用
收藏
页码:224 / 235
页数:12
相关论文
共 50 条
  • [21] Automatic text summarization using a machine learning approach
    Neto, JL
    Freitas, AA
    Kaestner, CAA
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 2507 : 205 - 215
  • [22] The effectiveness of automatic text summarization in mobile learning contexts
    Yang, Guangbing
    Chen, Nian-Shing
    Kinshuk
    Sutinen, Erkki
    Anderson, Terry
    Wen, Dunwei
    [J]. COMPUTERS & EDUCATION, 2013, 68 : 233 - 243
  • [23] Automatic text summarization based on word-clusters and ranking algorithms
    Amini, MR
    Usunier, N
    Gallinari, P
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 142 - 156
  • [24] Word Embedding-based Text Processing for Comprehensive Summarization and Distinct Information Extraction
    Wan, Xiangpeng
    Ghazzai, Hakim
    Massoud, Yehia
    [J]. 2020 IEEE TECHNOLOGY & ENGINEERING MANAGEMENT CONFERENCE (TEMSCON 2020), 2020,
  • [25] Language-independent extractive automatic text summarization based on automatic keyword extraction
    Hernandez-Castaneda, Angel
    Arnulfo Garcia-Hernandez, Rene
    Ledeneva, Yulia
    Eduardo Millan-Hernandez, Christian
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [26] Multi-language transfer learning for low-resource legal case summarization
    Moro, Gianluca
    Piscaglia, Nicola
    Ragazzi, Luca
    Italiani, Paolo
    [J]. ARTIFICIAL INTELLIGENCE AND LAW, 2023,
  • [27] Word-sentence co-ranking for automatic extractive text summarization
    Fang, Changjian
    Mu, Dejun
    Deng, Zhenghong
    Wu, Zhiang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 : 189 - 195
  • [28] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [29] The impact analysis of language differences on an automatic multilingual text summarization system
    Wang, FL
    Yang, CC
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (05): : 684 - 696
  • [30] A Survey of Automatic Text Summarization Technology Based on Deep Learning
    Zhang, Mengli
    Zhou, Gang
    Yu, Wanting
    Liu, Wenfen
    [J]. 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING (ICAICE 2020), 2020, : 211 - 217