Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

被引：0

作者：

Nguyen D.T. ^{[1
]}

Tran T. ^{[1
]}

机构：

[1] Saigon University, Ho Chi Minh City

来源：

International Journal of Intelligent Information and Database Systems | 2023年 / 16卷 / 01期

关键词：

data augmentation; data-to-text generation; deep learning; fine-tune; pre-trained language models; sequence-to-sequence models; Universal Dependencies;

D O I：

10.1504/IJIIDS.2023.10053426

中图分类号：

学科分类号：

摘要：

Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd.

引用

下载

页码：89 / 105

页数：16

共 50 条

[1] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
[2] Pre-trained models for natural language processing: A survey
Qiu XiPeng
Sun TianXiang
Xu YiGe
Shao YunFan
Dai Ning
Huang XuanJing
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
[3] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
Science China Technological Sciences, 2020, 63 (10) : 1872 - 1897
[4] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
Science China(Technological Sciences), 2020, (10) : 1872 - 1897
[5] Pre-trained models for natural language processing: A survey
XiPeng Qiu
TianXiang Sun
YiGe Xu
YunFan Shao
Ning Dai
XuanJing Huang
Science China Technological Sciences, 2020, 63 : 1872 - 1897
[6] Pre-Trained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
ACM COMPUTING SURVEYS, 2024, 56 (09)
[7] Leveraging pre-trained language models for code generation
Soliman, Ahmed
Shaheen, Samir
Hadhoud, Mayada
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
[8] Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Ghaddar, Abbas
Wu, Yimeng
Bagga, Sunyam
Rashid, Ahmad
Bibi, Khalil
Rezagholizadeh, Mehdi
Xing, Chao
Wang, Yasheng
Xinyu, Duan
Wang, Zhefeng
Huai, Baoxing
Jiang, Xin
Liu, Qun
Langlais, Philippe
arXiv, 2022,
[9] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[10] Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Processing
Huawei Technologies Co., Ltd.
不详
不详
Proc. Conf. Empir. Methods Nat. Lang. Process., EMNLP, (3135-3151):

← 1 2 3 4 5 →