Opinion mining from noisy text data

被引:41
|
作者
Dey, Lipika [1 ]
Haque, Sk. Mirajul [1 ]
机构
[1] Tata Consultancy Serv, Innovat Labs, Udyog Vihar, Gurgaon, India
关键词
Noisy text; Context-dependent cleaning; Opinion mining; WordNet; Text analytics for market knowledge discovery;
D O I
10.1007/s10032-009-0090-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.
引用
收藏
页码:205 / 226
页数:22
相关论文
共 50 条
  • [31] Extensive Study of Text Based Methods for Opinion Mining
    Kulkarni, D. S.
    Rodd, S. F.
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 523 - 527
  • [32] A clustering technique for mining data from text tables
    Davulcu, H
    Mukherjee, S
    Ramakrishnan, IV
    [J]. PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 315 - 332
  • [33] Dual Scaling in Data Mining from Text Databases
    Watada, Junzo
    Aoki, Keisuke
    Kawano, Masahiro
    Hitam, Muhammad Suzuri
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2006, 10 (04) : 453 - 459
  • [34] Data mining algorithm for text data
    Chen, Yuquan
    Zhu, Xijun
    Lu, Ruzhan
    [J]. Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2000, 34 (07): : 936 - 938
  • [35] Conceptual Notion for Opinion Mining from Upcoming Big Data
    Dembala, Rajeshwari
    Vagdevi, S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 688 - 692
  • [36] Opinion mining using ensemble text hidden Markov models for text classification
    Kang, Mangi
    Ahn, Jaelim
    Lee, Kichun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 94 : 218 - 227
  • [37] Data Mining and Opinion Mining: A Tool in Educational Context
    Penafiel, Myriam
    Vasquez, Stefanie
    Vasquez, Diego
    Zaldumbide, Juan
    Lujan-Mora, Sergio
    [J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 74 - 78
  • [38] EFFICIENT UNSUPERVISED MINING FROM NOISY CO-OCCURRENCE DATA
    Mamitsuka, Hiroshi
    [J]. NEW MATHEMATICS AND NATURAL COMPUTATION, 2005, 1 (01) : 173 - 193
  • [39] Dynamic classifier selection for effective mining from noisy data streams
    Zhu, XQ
    Wu, XD
    Yang, Y
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 305 - 312
  • [40] Text-mining Similarity Approximation Operators for Opinion Mining in BI tools
    Kaplanski, Pawel
    Rizun, Nina
    Taranenko, Yurii
    Seganti, Alessandro
    [J]. PROCEEDINGS OF THE 11TH SCIENTIFIC CONFERENCE INTERNET IN THE INFORMATION SOCIETY 2016, 2016, : 121 - 140