Opinion mining from noisy text data

被引:0
|
作者
Lipika Dey
Sk. Mirajul Haque
机构
[1] Tata Consultancy Services,Innovation Labs
关键词
Noisy text; Context-dependent cleaning; Opinion mining; WordNet; Text analytics for market knowledge discovery;
D O I
暂无
中图分类号
学科分类号
摘要
The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.
引用
收藏
页码:205 / 226
页数:21
相关论文
共 50 条
  • [21] THE EFFECTS OF NOISY DATA ON TEXT RETRIEVAL
    TAGHVA, K
    BORSACK, J
    CONDIT, A
    ERVA, S
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1994, 45 (01): : 50 - 58
  • [22] Effects of noisy data on text retrieval
    Taghva, Kazem
    Borsack, Julie
    Condit, Allen
    Erva, Srinivas
    Journal of the American Society for Information Science, 1994, 45 (01):
  • [23] An optimal approach to mining Boolean functions from noisy data
    Viswanathan, M
    Wallace, C
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 717 - 724
  • [24] Data Mining and Text Mining - A Survey
    Suresh, R.
    Harshni, S. R.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY INFORMATION AND COMMUNICATION (ICCPEIC), 2017, : 412 - 419
  • [25] Integrative data mining in systems biology: from text to network mining
    Peng, Yonghong
    Zhang, Xuegong
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2007, 41 (02) : 83 - 86
  • [26] Opinion Mining using Frequent Pattern Growth Method from Unstructured Text
    Ahmad, Tanvir
    Doja, Mohammad Najmud
    2013 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2013, : 92 - 95
  • [27] Opinion Mining on Non-English Short Text
    Akbas, Esra
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 417 - 423
  • [28] Mining text databases on clients opinion for oil industry
    Lopes, MCS
    Terra, GS
    Ebecken, NFF
    Cunha, GG
    DATA MINING IV, 2004, 7 : 139 - 147
  • [29] Text Mining in Analysis of Public Opinion on Internet in Emergency
    Wang, Weiduo
    Wu, Bin
    Zhu, Tian
    Zhang, Zhonghui
    INTERNATIONAL SYMPOSIUM ON EMERGENCY MANAGEMENT 2009 (ISEM'09), 2009, : 214 - 218
  • [30] Opinion Mining and Sentiment Analysis Need Text Understanding
    Delmonte, Rodolfo
    Pallotta, Vincenzo
    ADVANCES IN DISTRIBUTED AGENT-BASED RETRIEVAL TOOLS, 2011, 361 : 81 - +