Punctuation Restoration with Transformer Model on Social Media Data

被引:0
|
作者
Bakare, Adebayo Mustapha [1 ]
Anbananthen, Kalaiarasi Sonai Muthu [1 ]
Muthaiyah, Saravanan [2 ]
Krishnan, Jayakumar [1 ]
Kannan, Subarmaniam [1 ]
机构
[1] Multimedia Univ, Fac Informat Sci & Technol, Melaka 75450, Malaysia
[2] Multimedia Univ, Fac Management, Cyberjaya 63100, Malaysia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM);
D O I
10.3390/app13031685
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A Generic Conceptual Data Model of Social Media Services
    Mohammadi, Nazila Gol
    Borchert, Angela
    Pampus, Julia
    Heisel, Maritta
    PROCEEDINGS OF THE 24TH EUROPEAN CONFERENCE ON PATTERN LANGUAGES OF PROGRAMS (EUROPLOP 2019), 2019,
  • [22] A Data Quality Multidimensional Model for Social Media Analysis
    Aramburu, Maria Jose
    Berlanga, Rafael
    Lanza-Cruz, Indira
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2023,
  • [23] Visualizing Punctuation Restoration in Speech Transcripts with Prosograph
    Oktem, Alp
    Farrus, Mireia
    Bonafonte, Antonio
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1493 - 1494
  • [24] AraPunc: Arabic Punctuation Restoration Using Transformers
    Sakr, Abdelrahman
    Torki, Marwan
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [25] Punctuation Restoration in Spoken Italian Transcripts with Transformers
    Miaschi, Alessio
    Ravelli, Andrea Amelio
    Dell'Orletta, Felice
    AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 245 - 260
  • [26] Incorporating External POS Tagger for Punctuation Restoration
    Shi, Ning
    Wang, Wei
    Wang, Boxin
    Li, Jinfeng
    Liu, Xiangyu
    Lin, Zhouhan
    INTERSPEECH 2021, 2021, : 1987 - 1991
  • [27] A Time-Aware Transformer Based Model for Suicide Ideation Detection on Social Media
    Sawhney, Ramit
    Joshi, Harshit
    Gandhi, Saumya
    Shah, Rajiv Ratn
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7685 - 7697
  • [28] Restoration of User Videos Shared on Social Media
    Luo, Hongming
    Zhou, Fei
    Lam, Kin-Man
    Qiu, Guoping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2749 - 2757
  • [29] Research on Restoration of Murals Based on Diffusion Model and Transformer
    Wang, Yaoyao
    Xiao, Mansheng
    Hu, Yuqing
    Yan, Jin
    Zhu, Zeyu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (03): : 4433 - 4449
  • [30] SMDQM- Social Media Data Quality Assessment Model
    Reda, Oumaima
    Zellou, Ahmed
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 733 - 739