A Natural Language Normalization Approach to Enhance Social Media Text Reasoning

被引:0
|
作者
Long Hoang Nguyen [1 ]
Salopek, Andrew [1 ]
Zhao, Liang [2 ]
Jin, Fang [1 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] George Mason Univ, Informat Sci & Technol, Fairfax, VA 22030 USA
关键词
Language Preprocessing; Information Retrieval; Sentiment Analysis; Social Media Reasoning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media has become a popular data source to track and analyze societal events. Targeted domains such as election, civil unrest, and spreading disease all require a natural language normalization tool capable of extracting information pertinent to these domains accurately. Due to the unstructured language, short-length messages, casual posting styles, and homonyms, it is technically difficult and labor-intensive to remove barriers that may lead to inaccurate analysis. Because the fact that typos or other symbolic representations of sentiment may lead to lower frequency of term appearance, language preprocessing becomes critical and necessary to improve social media text reasoning. We propose a novel unsupervised preprocessing approach to enhance text understanding quality and illustrate this approach using one specific domain, flu shot reasoning. The proposed approach relies on a database of synonyms and opposite words and an algorithm to transform negative sentences into its affirmative form. In this form, the features and opinions are reflected accurately via transforming parts of speech. For instance, features are presented as nouns and opinions are presented as verbs or adjectives. The algorithm also corrects words if they are not correctly written and normalizes them to increase its frequency of appearance. The effectiveness of our algorithm is evaluated on the tweets dataset to answer why people are reluctant to take flu shots.
引用
收藏
页码:2019 / 2026
页数:8
相关论文
共 50 条
  • [1] A Modular Approach for Social Media Text Normalization
    Rehan, Palak
    Kumar, Mukesh
    Singh, Sarbjeet
    [J]. INFORMATION AND DECISION SCIENCES, 2018, 701 : 187 - 195
  • [2] Social media text normalization for Turkish
    Eryigit, Gulsen
    Torunoglu-Selamet, Dilara
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (06) : 835 - 875
  • [3] Lexical Normalization for Social Media Text
    Han, Bo
    Cook, Paul
    Baldwin, Timothy
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
  • [5] Neural Text Normalization for Turkish Social Media
    Goker, Sinan
    Can, Burcu
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 161 - 166
  • [6] Roman to Gurmukhi Social Media Text Normalization
    Kaur, Jagroop
    Singh, Jaswinder
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2020, 13 (04) : 407 - 435
  • [7] Text Normalization in Code-Mixed Social Media Text
    Dutta, Sukanya
    Saha, Tista
    Banerjee, Somnath
    Naskar, Sudip Kumar
    [J]. 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 378 - 382
  • [8] UTILIZING SOCIAL MEDIA DATA THROUGH SIMILARITY-BASED TEXT NORMALIZATION FOR LVCSR LANGUAGE MODELING
    Chotimongkol, Ananlada
    Thangthai, Kwanchiva
    Wutiwiwatchai, Chai
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [9] Natural Language Processing for Social Media
    Vaillant, Pascal
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (02): : 97 - 100
  • [10] Natural Language Processing for Social Media
    Farzindar, Atefeh
    Inkpen, Diana
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : XVI - XVII