Punctuation Restoration with Transformer Model on Social Media Data

被引:0
|
作者
Bakare, Adebayo Mustapha [1 ]
Anbananthen, Kalaiarasi Sonai Muthu [1 ]
Muthaiyah, Saravanan [2 ]
Krishnan, Jayakumar [1 ]
Kannan, Subarmaniam [1 ]
机构
[1] Multimedia Univ, Fac Informat Sci & Technol, Melaka 75450, Malaysia
[2] Multimedia Univ, Fac Management, Cyberjaya 63100, Malaysia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM);
D O I
10.3390/app13031685
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Evaluation of transformer-based models for punctuation and capitalization restoration in Catalan and Galician
    Pan, Ronghao
    Garcia-Diaz, Jose Antonio
    Vivancos-Vicente, Pedro Jose
    Valencia-Garcia, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 27 - 38
  • [2] Position-context additive transformer-based model for classifying text data on social media
    M. M. Abd-Elaziz
    Nora El-Rashidy
    Ahmed Abou Elfetouh
    Hazem M. El-Bakry
    Scientific Reports, 15 (1)
  • [3] Capitalization and punctuation restoration: a survey
    Vasile Păiş
    Dan Tufiş
    Artificial Intelligence Review, 2022, 55 : 1681 - 1722
  • [4] Capitalization and punctuation restoration: a survey
    Pais, Vasile
    Tufis, Dan
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 1681 - 1722
  • [5] Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform
    Li, Zhenyu
    Zou, Zongfeng
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (03)
  • [6] A Pre-trained Model for Chinese Medical Record Punctuation Restoration
    Yu, Zhipeng
    Ling, Tongtao
    Gu, Fangqing
    Sheng, Huangxu
    Liu, Yi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 101 - 112
  • [7] Online Punctuation Restoration using ELECTRA Model for streaming ASR Systems
    Polacek, Martin
    Cerva, Petr
    Zdansky, Jindrich
    Weingartova, Lenka
    INTERSPEECH 2023, 2023, : 446 - 450
  • [8] Data Privacy Model for Social Media Platforms
    Al-Rabeeah, Abdullah Abdulabbas Nahi
    Saeed, Faisal
    2017 6TH ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2017,
  • [9] A Model of Preprocessing For Social Media Data Extraction
    Abidin, Dodo Zaenal
    Nurmaini, Siti
    Malik, Reza Firsandaya
    Jasmir
    Rasywir, Errissya
    Pratama, Yovi
    2019 INTERNATIONAL CONFERENCE ON INFORMATICS, MULTIMEDIA, CYBER AND INFORMATION SYSTEM (ICIMCIS), 2019, : 67 - 72
  • [10] Joint Chinese Word Segmentation and Punctuation Prediction Using Deep Recurrent Neural Network for Social Media Data
    Wu, Kui
    Wang, Xuancong
    Zhou, Nina
    Aw, AiTi
    Li, Haizhou
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 41 - 44