Punctuation Restoration with Transformer Model on Social Media Data

被引:0
|
作者
Bakare, Adebayo Mustapha [1 ]
Anbananthen, Kalaiarasi Sonai Muthu [1 ]
Muthaiyah, Saravanan [2 ]
Krishnan, Jayakumar [1 ]
Kannan, Subarmaniam [1 ]
机构
[1] Multimedia Univ, Fac Informat Sci & Technol, Melaka 75450, Malaysia
[2] Multimedia Univ, Fac Management, Cyberjaya 63100, Malaysia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM);
D O I
10.3390/app13031685
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A Data Quality Multidimensional Model for Social Media AnalysisA Data Quality Multidimensional Model for Social Media AnalysisM. J. Aramburu et al.
    María José Aramburu
    Rafael Berlanga
    Indira Lanza-Cruz
    Business & Information Systems Engineering, 2024, 66 (6) : 667 - 689
  • [42] A Model for Classifying Emergency Events Based on Social Media Multimodal Data
    Wu, ZhenHua
    Chen, Liangyu
    Song, YuanTao
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT I, 2023, 14134 : 316 - 327
  • [43] Role of Punctuation in Semantic Mapping Between Brain and Transformer Models
    Lamprou, Zenon
    Pollick, Frank
    Moshfeghi, Yashar
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 : 458 - 472
  • [44] Disentangling social media influence in crises: Testing a four-factor model of social media influence with large data
    Zhao, Xinyan
    Zhan, Mengqi
    Liu, Brooke F.
    PUBLIC RELATIONS REVIEW, 2018, 44 (04) : 549 - 561
  • [45] A semantic retrieval model of social media data based on statistical theory
    Li F.
    International Journal of Web Based Communities, 2024, 20 (1-2) : 51 - 62
  • [46] The Semantic Network Model of Creativity: Analysis of Online Social Media Data
    Yu, Feng
    Peng, Theodore
    Peng, Kaiping
    Zheng, Sam Xianjun
    Liu, Zhiyuan
    CREATIVITY RESEARCH JOURNAL, 2016, 28 (03) : 268 - 274
  • [47] The end of social media? How data attraction model in the algorithmic media reshapes the attention economy
    Liang, Meng
    MEDIA CULTURE & SOCIETY, 2022, 44 (06) : 1110 - 1131
  • [48] Biases on Social Media Data
    Baeza-Yates, Ricardo
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 782 - 783
  • [49] Social Media Data Misuse
    Soussan, Tariq
    Trovati, Marcello
    ADVANCES IN INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS-2021), 2022, 312 : 183 - 189
  • [50] Data Forensics On Social Media
    Doultani, Mannat Amit
    Vijayalakshmi, M.
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,