Text Simplification Using Transformer and BERT

被引:1
|
作者
Alissa, Sarah [1 ]
Wald, Mike [2 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dammam, Saudi Arabia
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton, England
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 75卷 / 02期
关键词
Text simplification; neural machine translation; transformer;
D O I
10.32604/cmc.2023.033647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reading and writing are the main interaction methods with web content. Text simplification tools are helpful for people with cognitive impairments, new language learners, and children as they might find difficulties in understanding the complex web content. Text simplification is the process of changing complex text into more readable and understandable text. The recent approaches to text simplification adopted the machine translation concept to learn simplification rules from a parallel corpus of complex and simple sentences. In this paper, we propose two models based on the transformer which is an encoder-decoder structure that achieves state-of-the-art (SOTA) results in machine translation. The training process for our model includes three steps: preprocessing the data using a subword tokenizer, training the model and optimizing it using the Adam optimizer, then using the model to decode the output. The first model uses the transformer only and the second model uses and integrates the Bidirectional Encoder Representations from Transformer (BERT) as encoder to enhance the training time and results. The performance of the proposed model using the transformer was evaluated using the Bilingual Evaluation Understudy score (BLEU) and recorded (53.78) on the WikiSmall dataset. On the other hand, the experiment on the second model which is integrated with BERT shows that the validation loss decreased very fast compared with the model without the BERT. However, the BLEU score was small (44.54), which could be due to the size of the dataset so the model was overfitting and unable to generalize well. Therefore, in the future, the second model could involve experimenting with a larger dataset such as the WikiLarge. In addition, more analysis has been done on the model's results and the used dataset using different evaluation metrics to understand their performance.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [1] Emotion recognition in Hindi text using multilingual BERT transformer
    Tapesh Kumar
    Mehul Mahrishi
    Girish Sharma
    Multimedia Tools and Applications, 2023, 82 : 42373 - 42394
  • [2] Emotion recognition in Hindi text using multilingual BERT transformer
    Kumar, Tapesh
    Mahrishi, Mehul
    Sharma, Girish
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42373 - 42394
  • [3] Robust Text-to-Cypher Using Combination of BERT, GraphSAGE, and Transformer (CoBGT) Model
    Tran, Quoc-Bao-Huy
    Waheed, Aagha Abdul
    Chung, Sun-Tae
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [4] Japanese abstractive text summarization using BERT
    Iwasaki, Yuuki
    Yamashita, Akihiro
    Konno, Yoko
    Matsubayashi, Katsushi
    2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
  • [5] Turkish Medical Text Classification Using BERT
    Celikten, Azer
    Bulut, Hasan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [6] Text Augmentation Using BERT for Image Captioning
    Atliha, Viktar
    Sesok, Dmitrij
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [7] LSBert: Lexical Simplification Based on BERT
    Qiang, Jipeng
    Li, Yun
    Zhu, Yi
    Yuan, Yunhao
    Shi, Yang
    Wu, Xindong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3064 - 3076
  • [8] TEXT SIMPLIFICATION USING DEPENDENCY PARSING FOR SPANISH
    Ballesteros, Miguel
    Bautista, Susana
    Gervas, Pablo
    KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2010, : 330 - 335
  • [9] Text Simplification Using Neural Machine Translation
    Wang, Tong
    Chen, Ping
    Rochford, John
    Qiang, Jipeng
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4270 - 4271
  • [10] Medical Text Simplification Using Reinforcement Learning (TESLEA): Deep Learning-Based Text Simplification Approach
    Phatak, Atharva
    Savage, David W.
    Ohle, Robert
    Smith, Jonathan
    Mago, Vijay
    JMIR MEDICAL INFORMATICS, 2022, 10 (11)