Punctuation Restoration with Transformer Model on Social Media Data

被引:0
|
作者
Bakare, Adebayo Mustapha [1 ]
Anbananthen, Kalaiarasi Sonai Muthu [1 ]
Muthaiyah, Saravanan [2 ]
Krishnan, Jayakumar [1 ]
Kannan, Subarmaniam [1 ]
机构
[1] Multimedia Univ, Fac Informat Sci & Technol, Melaka 75450, Malaysia
[2] Multimedia Univ, Fac Management, Cyberjaya 63100, Malaysia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM);
D O I
10.3390/app13031685
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Credit Scoring Model for SMEs Based on Social Media Data
    Putra, Septian Gilang Permana
    Joshi, Bikash
    Redi, Judith
    Bozzon, Alessandro
    WEB ENGINEERING, ICWE 2020, 2020, 12128 : 113 - 129
  • [32] A model to analyse social media data to gain a competitive edge
    Naik, Kirtida
    Joshi, Abhijit
    Khanna, Preeti
    Shekokar, Narendra
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [33] A Scalable Data Mining Model for Social Media Influencer Identification
    More, Jyoti Sunil
    Lingam, Chelpa
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 625 - 631
  • [34] Social Media as Social Science Data
    Domalewska, Dorota
    Wilson, Steven Lloyd
    NEW MEDIA & SOCIETY, 2024, 26 (02) : 1155 - 1157
  • [35] Social media as social science data
    Stokan, Eric
    Wilson, Steven Lloyd
    JOURNAL OF PUBLIC AFFAIRS EDUCATION, 2023,
  • [36] Social media as social science data
    Stokan, Eric
    Wilson, Steven Lloyd
    JOURNAL OF PUBLIC AFFAIRS EDUCATION, 2023,
  • [37] From restoration to social media: exploring the nexus of architecture, social media and information sharing behaviours
    Yazici, Busra Topdagi
    Irapoglu, Nuran
    Gulecoglu, Hande Nur
    OPEN HOUSE INTERNATIONAL, 2024,
  • [38] Self-Attention Based Network for Punctuation Restoration
    Wang, Feng
    Chen, Wei
    Yang, Zhen
    Xu, Bo
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2803 - 2808
  • [39] Comparison of Recurrent Neural Networks for Slovak Punctuation Restoration
    Hladek, Daniel
    Stas, Jan
    Ondas, Stanislav
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2019), 2019, : 95 - 99
  • [40] Real-Time Social Media Analytics with Deep Transformer Language Models: A Big Data Approach
    Ahmet, Ahmed
    Abdullah, Tariq
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (BIGDATASE 2020), 2020, : 41 - 48