Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

被引:0
|
作者
Nguyen D.T. [1 ]
Tran T. [1 ]
机构
[1] Saigon University, Ho Chi Minh City
关键词
data augmentation; data-to-text generation; deep learning; fine-tune; pre-trained language models; sequence-to-sequence models; Universal Dependencies;
D O I
10.1504/IJIIDS.2023.10053426
中图分类号
学科分类号
摘要
Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd.
引用
下载
收藏
页码:89 / 105
页数:16
相关论文
共 50 条
  • [41] Low Resource Summarization using Pre-trained Language Models
    Munaf, Mubashir
    Afzal, Hammad
    Mahmood, Khawir
    Iltaf, Naima
    ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23 (10)
  • [42] Effective test generation using pre-trained Large Language Models and mutation testing
    Dakhel, Arghavan Moradi
    Nikanjam, Amin
    Majdinasab, Vahid
    Khomh, Foutse
    Desmarais, Michel C.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
  • [43] Automated Assessment of Inferences Using Pre-Trained Language Models
    Yoo, Yongseok
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [44] Mobile GUI test script generation from natural language descriptions using pre-trained model
    Li, Chun
    9TH IEEE/ACM INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS, MOBILESOFT 2022, 2022, : 112 - 113
  • [45] Probing Simile Knowledge from Pre-trained Language Models
    Chen, Weijie
    Chang, Yongzhu
    Zhang, Rongsheng
    Pu, Jiashu
    Chen, Guandan
    Zhang, Le
    Xi, Yadong
    Chen, Yijiang
    Su, Chang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5875 - 5887
  • [46] Distilling Relation Embeddings from Pre-trained Language Models
    Ushio, Asahi
    Camacho-Collados, Jose
    Schockaert, Steven
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9044 - 9062
  • [47] Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
    Min, Bonan
    Ross, Hayley
    Sulem, Elior
    Ben Veyseh, Amir Pouran
    Nguyen, Thien Huu
    Sainz, Oscar
    Agirre, Eneko
    Heintz, Ilana
    Roth, Dan
    ACM COMPUTING SURVEYS, 2024, 56 (02)
  • [48] DistillingWord Meaning in Context from Pre-trained Language Models
    Arase, Yuki
    Kajiwara, Tomoyuki
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 534 - 546
  • [49] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Impact of data quality for automatic issue classification using pre-trained language models
    Colavito, Giuseppe
    Lanubile, Filippo
    Novielli, Nicole
    Quaranta, Luigi
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210