Learning bilingual word embedding for automatic text summarization in low resource language

被引:2
|
作者
Wijayanti, Rini [1 ,3 ]
Khodra, Masayu Leylia [1 ,2 ]
Surendro, Kridanto [1 ]
Widyantoro, Dwi H. [1 ,2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
[2] Univ Ctr Excellence Artificial Intelligence Vis, Inst Teknol Bandung, Nat Language Proc & Big Data Analyt U CoE AI VLB, Bandung, Indonesia
[3] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung 40132, Indonesia
关键词
Bilingual word embedding; Cross -lingual transfer learning; Extractive summarization; Low -resource language;
D O I
10.1016/j.jksuci.2023.03.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Studies in low-resource languages have become more challenging with the increasing volume of texts in today ' s digital era. Also, the lack of labeled data and text processing libraries in a language further widens the research gap between high and low-resource languages, such as English and Indonesian. This has led to the use of a transfer learning approach, which applies pre-trained models to solve similar problems, even in different languages by using bilingual or cross-lingual word embedding. Therefore, this study aims to investigate two bilingual word embedding methods, namely VecMap and BiVec, for Indonesian - English language and evaluates them for bilingual lexicon induction and text summarization tasks. The generated bilingual embedding was compared with MUSE (Multilingual Unsupervised and Supervised Embeddings) as the existing multilingual word created with the generative adversarial network method. Furthermore, the VecMap was improved by creating shared vocabulary spaces and mapping the unshared ones between languages. The result showed the embedding produced by the joint methods of BiVec, performed better in intrinsic evaluation, especially with CSLS (Cross-Domain Similarity Local Scaling) retrieval. Meanwhile, the improved VecMap outperformed the regular type by 16.6% without surpassing the BiVec evaluation score. These methods enabled model transfer between languages when applied to cross-lingual-based text summarization. Moreover, the ROUGE score outperformed classical text summarization by adding only 10% of the training dataset of the target language. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
引用
收藏
页码:224 / 235
页数:12
相关论文
共 50 条
  • [1] A Framework for Word Embedding Based Automatic Text Summarization and Evaluation
    Hailu, Tulu Tilahun
    Yu, Junqing
    Fantaye, Tessfu Geteye
    [J]. INFORMATION, 2020, 11 (02)
  • [2] Bilingual Automatic Text Summarization Using Unsupervised Deep Learning
    Singh, Shashi Pal
    Kumar, Ajai
    Mangal, Abhilasha
    Singhal, Shikha
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 1195 - 1200
  • [3] Automatic bilingual text document summarization
    Lo, SH
    Meng, HML
    Lam, W
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS: INFORMATION SYSTEMS DEVELOPMENT I, 2002, : 113 - 118
  • [4] Text document summarization using word embedding
    Mohd, Mudasir
    Jan, Rafiya
    Shah, Muzaffar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
  • [5] Extractive Text Summarization using Word Vector Embedding
    Jain, Aditya
    Bhatia, Divij
    Thakur, Manish K.
    [J]. 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 51 - 55
  • [6] Word Embedding-Based Biomedical Text Summarization
    Rouane, Oussama
    Belhadef, Hacene
    Bouakkaz, Mustapha
    [J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 288 - 297
  • [7] Automatic Text Summarization using Word Embeddings
    Easwar, Arjun
    Uthra, Annie
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1065 - 1079
  • [8] Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning
    Alami, Nabil
    Meknassi, Mohammed
    En-nahnahi, Noureddine
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 123 : 195 - 211
  • [9] Automatic Extractive Summarization using GAN Boosted by DistilBERT Word Embedding and Transductive Learning
    Li, Dongliang
    Li, Youyou
    Zhang, Zhigang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 61 - 74
  • [10] Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language
    Michel, Leah
    Hangya, Viktor
    Fraser, Alexander
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2573 - 2580