Improving news headline text generation quality through frequent POS-Tag patterns analysis

被引:3
|
作者
Fatima, Noureen [1 ]
Daudpota, Sher Muhammad [1 ]
Kastrati, Zenun [2 ]
Imran, Ali Shariq [3 ]
Hassan, Saif [1 ]
Elmitwally, Nouh Sabri [4 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan
[2] Linnaeus Univ, Dept Informat, S-35195 Vaxjo, Sweden
[3] Norwegian Univ Sci & Technol NTNU, N-2815 Gjovik, Norway
[4] Birmingham City Univ, Sch Comp & Digital Technol, Birmingham B4 7XG, England
关键词
POS tagging; Text generation; Low resource language; Generative pre -trained transformer; Attention mechanism; MODEL;
D O I
10.1016/j.engappai.2023.106718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Original synthetic content writing is one of the human abilities that algorithms aspire to emulate. The advent of sophisticated algorithms, especially based on neural networks has shown promising results in recent times. A watershed moment was witnessed when the attention mechanism was introduced which paved the way for transformers, a new exciting architecture in natural language processing. Recent sensations like GPT and BERT for synthetic text generation rely on NLP transformers. Although, GPT and BERT-based models are capable of generating creative text given they are properly trained on abundant data, however, the generated text suffers the quality aspect when limited data is available. This is especially an issue for low-resource languages where labeled data is still scarce. In such cases, the generated text, more often than not, lacks the proper sentence structure, thus unreadable. This study proposes a post-processing step in text generation that improves the quality of generated text through the GPT model. The proposed post-processing step is based on the analysis of POS tagging patterns in the original text and accepts only those generated sentences from GPT which satisfy POS patterns that are originally learned from the data. We exploit the GPT model to generate English headlines by utilizing Australian Broadcasting Corporation (ABC) news dataset. Furthermore, for assessing the applicability of the model in low-resource languages, we also train the model on the Urdu news dataset for Urdu news headlines generation. The experiments presented in this paper on these datasets from high- and lowresource languages show that the performance of generated headlines has a significant improvement by using the proposed headline POS pattern extraction. We evaluate the performance through subjective evaluation as well as using text generation quality metrics like BLEU and ROUGE.
引用
收藏
页数:13
相关论文
共 4 条
  • [1] Analysis of the effect of Headline News in financial market through text categorisation
    Takahashi, Satoru
    Takahashi, Hiroshi
    Tsuda, Kazuhiko
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 35 (2-4) : 204 - 209
  • [2] Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation
    Ding, Zijian
    Smith-Renner, Alison
    Zhang, Wenjuan
    Tetreault, Joel R.
    Jaimes, Alejandro
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3321 - 3339
  • [3] Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
    Langure, Alejandro de Leon
    Zareei, Mahdi
    IEEE ACCESS, 2024, 12 : 166512 - 166536
  • [4] Improving Radiology Report Generation Quality and Diversity through Reinforcement Learning and Text Augmentation
    Parres, Daniel
    Albiol, Alberto
    Paredes, Roberto
    BIOENGINEERING-BASEL, 2024, 11 (04):