A Natural Language Normalization Approach to Enhance Social Media Text Reasoning

被引：0

作者：

Long Hoang Nguyen ^{[1
]}

Salopek, Andrew ^{[1
]}

Zhao, Liang ^{[2
]}

Jin, Fang ^{[1
]}

机构：

[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA

[2] George Mason Univ, Informat Sci & Technol, Fairfax, VA 22030 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年

关键词：

Language Preprocessing; Information Retrieval; Sentiment Analysis; Social Media Reasoning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Social media has become a popular data source to track and analyze societal events. Targeted domains such as election, civil unrest, and spreading disease all require a natural language normalization tool capable of extracting information pertinent to these domains accurately. Due to the unstructured language, short-length messages, casual posting styles, and homonyms, it is technically difficult and labor-intensive to remove barriers that may lead to inaccurate analysis. Because the fact that typos or other symbolic representations of sentiment may lead to lower frequency of term appearance, language preprocessing becomes critical and necessary to improve social media text reasoning. We propose a novel unsupervised preprocessing approach to enhance text understanding quality and illustrate this approach using one specific domain, flu shot reasoning. The proposed approach relies on a database of synonyms and opposite words and an algorithm to transform negative sentences into its affirmative form. In this form, the features and opinions are reflected accurately via transforming parts of speech. For instance, features are presented as nouns and opinions are presented as verbs or adjectives. The algorithm also corrects words if they are not correctly written and normalizes them to increase its frequency of appearance. The effectiveness of our algorithm is evaluated on the tweets dataset to answer why people are reluctant to take flu shots.

引用

页码：2019 / 2026

页数：8

共 50 条

[1] A Modular Approach for Social Media Text Normalization
Rehan, Palak
Kumar, Mukesh
Singh, Sarbjeet
[J]. INFORMATION AND DECISION SCIENCES, 2018, 701 : 187 - 195
[2] Social media text normalization for Turkish
Eryigit, Gulsen
Torunoglu-Selamet, Dilara
[J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (06) : 835 - 875
[3] Lexical Normalization for Social Media Text
Han, Bo
Cook, Paul
Baldwin, Timothy
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
[4] A customizable pipeline for social media text normalization
Sarker A.
[J]. Social Network Analysis and Mining, 2017, 7 (01)
[5] Neural Text Normalization for Turkish Social Media
Goker, Sinan
Can, Burcu
[J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 161 - 166
[6] Roman to Gurmukhi Social Media Text Normalization
Kaur, Jagroop
Singh, Jaswinder
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2020, 13 (04) : 407 - 435
[7] Text Normalization in Code-Mixed Social Media Text
Dutta, Sukanya
Saha, Tista
Banerjee, Somnath
Naskar, Sudip Kumar
[J]. 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 378 - 382
[8] UTILIZING SOCIAL MEDIA DATA THROUGH SIMILARITY-BASED TEXT NORMALIZATION FOR LVCSR LANGUAGE MODELING
Chotimongkol, Ananlada
Thangthai, Kwanchiva
Wutiwiwatchai, Chai
[J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
[9] Natural Language Processing for Social Media
Vaillant, Pascal
[J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (02): : 97 - 100
[10] Natural Language Processing for Social Media
Farzindar, Atefeh
Inkpen, Diana
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : XVI - XVII

← 1 2 3 4 5 →