Text Simplification Using Transformer and BERT

被引:1
|
作者
Alissa, Sarah [1 ]
Wald, Mike [2 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dammam, Saudi Arabia
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton, England
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 75卷 / 02期
关键词
Text simplification; neural machine translation; transformer;
D O I
10.32604/cmc.2023.033647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reading and writing are the main interaction methods with web content. Text simplification tools are helpful for people with cognitive impairments, new language learners, and children as they might find difficulties in understanding the complex web content. Text simplification is the process of changing complex text into more readable and understandable text. The recent approaches to text simplification adopted the machine translation concept to learn simplification rules from a parallel corpus of complex and simple sentences. In this paper, we propose two models based on the transformer which is an encoder-decoder structure that achieves state-of-the-art (SOTA) results in machine translation. The training process for our model includes three steps: preprocessing the data using a subword tokenizer, training the model and optimizing it using the Adam optimizer, then using the model to decode the output. The first model uses the transformer only and the second model uses and integrates the Bidirectional Encoder Representations from Transformer (BERT) as encoder to enhance the training time and results. The performance of the proposed model using the transformer was evaluated using the Bilingual Evaluation Understudy score (BLEU) and recorded (53.78) on the WikiSmall dataset. On the other hand, the experiment on the second model which is integrated with BERT shows that the validation loss decreased very fast compared with the model without the BERT. However, the BLEU score was small (44.54), which could be due to the size of the dataset so the model was overfitting and unable to generalize well. Therefore, in the future, the second model could involve experimenting with a larger dataset such as the WikiLarge. In addition, more analysis has been done on the model's results and the used dataset using different evaluation metrics to understand their performance.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [31] Simple and Effective Text Simplification Using Semantic and Neural Methods
    Sulem, Elior
    Abend, Omri
    Rappoport, Ari
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 162 - 173
  • [32] On the Transformer Growth for Progressive BERT Training
    Gu, Xiaotao
    Liu, Liyuan
    Yu, Hongkun
    Li, Jing
    Chen, Chen
    Han, Jiawei
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5174 - 5180
  • [33] Text classification using improved bidirectional transformer
    Tezgider, Murat
    Yildiz, Beytullah
    Aydin, Galip
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (09):
  • [34] Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model
    Al-Thanyyan, Suha S.
    Azmi, Aqil M.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
  • [35] Text aware Emotional Text-to-speech with BERT
    Mukherjee, Arijit
    Bansal, Shubham
    Satpal, Sandeepkumar
    Mehta, Rupesh
    INTERSPEECH 2022, 2022, : 4601 - 4605
  • [36] Unsupervised Statistical Text Simplification
    Qiang, Jipeng
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1802 - 1806
  • [37] Investigating Text Simplification Evaluation
    Vasquez-Rodriguez, Laura
    Shardlow, Matthew
    Przybyla, Piotr
    Ananiadou, Sophia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 876 - 882
  • [38] Text-to-Text Transfer Transformer Phrasing Model Using Enriched Text Input
    Rezackova, Marketa
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 389 - 400
  • [39] Challenging Choices for Text Simplification
    Gasperin, Caroline
    Maziero, Erick
    Aluisio, Sandra M.
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2010, 6001 : 40 - 50
  • [40] On the Ethical Considerations of Text Simplification
    Gooding, Sian
    NINTH WORKSHOP ON SPEECH AND LANGUAGE PROCESSING FOR ASSISTIVE TECHNOLOGIES (SLPAT-2022), 2022, : 50 - 57