Punctuation Restoration with Transformer Model on Social Media Data

被引：0

作者：

Bakare, Adebayo Mustapha ^{[1
]}

Anbananthen, Kalaiarasi Sonai Muthu ^{[1
]}

Muthaiyah, Saravanan ^{[2
]}

Krishnan, Jayakumar ^{[1
]}

Kannan, Subarmaniam ^{[1
]}

机构：

[1] Multimedia Univ, Fac Informat Sci & Technol, Melaka 75450, Malaysia

[2] Multimedia Univ, Fac Management, Cyberjaya 63100, Malaysia

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期

关键词：

punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM);

D O I：

10.3390/app13031685

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.

引用

页数：13

共 50 条

[1] Evaluation of transformer-based models for punctuation and capitalization restoration in Catalan and Galician
Pan, Ronghao
Garcia-Diaz, Jose Antonio
Vivancos-Vicente, Pedro Jose
Valencia-Garcia, Rafael
PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 27 - 38
[2] Position-context additive transformer-based model for classifying text data on social media
M. M. Abd-Elaziz
Nora El-Rashidy
Ahmed Abou Elfetouh
Hazem M. El-Bakry
Scientific Reports, 15 (1)
[3] Capitalization and punctuation restoration: a survey
Vasile Păiş
Dan Tufiş
Artificial Intelligence Review, 2022, 55 : 1681 - 1722
[4] Capitalization and punctuation restoration: a survey
Pais, Vasile
Tufis, Dan
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 1681 - 1722
[5] Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform
Li, Zhenyu
Zou, Zongfeng
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (03)
[6] A Pre-trained Model for Chinese Medical Record Punctuation Restoration
Yu, Zhipeng
Ling, Tongtao
Gu, Fangqing
Sheng, Huangxu
Liu, Yi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 101 - 112
[7] Online Punctuation Restoration using ELECTRA Model for streaming ASR Systems
Polacek, Martin
Cerva, Petr
Zdansky, Jindrich
Weingartova, Lenka
INTERSPEECH 2023, 2023, : 446 - 450
[8] Data Privacy Model for Social Media Platforms
Al-Rabeeah, Abdullah Abdulabbas Nahi
Saeed, Faisal
2017 6TH ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2017,
[9] A Model of Preprocessing For Social Media Data Extraction
Abidin, Dodo Zaenal
Nurmaini, Siti
Malik, Reza Firsandaya
Jasmir
Rasywir, Errissya
Pratama, Yovi
2019 INTERNATIONAL CONFERENCE ON INFORMATICS, MULTIMEDIA, CYBER AND INFORMATION SYSTEM (ICIMCIS), 2019, : 67 - 72
[10] Joint Chinese Word Segmentation and Punctuation Prediction Using Deep Recurrent Neural Network for Social Media Data
Wu, Kui
Wang, Xuancong
Zhou, Nina
Aw, AiTi
Li, Haizhou
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 41 - 44

← 1 2 3 4 5 →